Re: Batch Update Fields
You must reindex the complete document, even if you just want to update a single field. On Friday 03 December 2010 04:52:04 Adam Estrada wrote: OK part 2 of my previous question... Is there a way to batch update field values based on a certain criteria? For example, if thousands of documents have a field value of 'US' can I update all of them to 'United States' programmatically? Adam -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Limit number of characters returned
--- On Fri, 12/3/10, Mark static.void@gmail.com wrote: From: Mark static.void@gmail.com Subject: Limit number of characters returned To: solr-user@lucene.apache.org Date: Friday, December 3, 2010, 5:39 AM Is there way to limit the number of characters returned from a stored field? For example: Say I have a document (~2K words) and I search for a word that's somewhere in the middle. I would like the document to match the search query but the stored field should only return the first 200 characters of the document. Is there anyway to accomplish this that doesn't involve two fields? I don't think it is possible out-of-the-box. May be you can hack highlighter to return that first 200 characters in highlighting response. Or a custom response writer can do that. But if you will be always returning first 200 characters of documents, I think creating additional field with indexed=false stored=true will be more efficient. And you can make your original field indexed=true stored=false, your index size will be diminished. copyField source=text dest=textShort maxChars=200/
Re: [Wildcard query] Weird behaviour
On Fri, Dec 3, 2010 at 6:28 AM, Tanguy Moal tanguy.m...@gmail.com wrote: However suddenly CPU usage simply doubles, and sometimes eventually start using all 16 cores of the server, whereas the number of handled request is pretty stable, and even starts decreasing because of degraded user experience due to dramatic response times. Hi Tanguy: This was fixed here: https://issues.apache.org/jira/browse/LUCENE-2620. You can apply the patch file there (https://issues.apache.org/jira/secure/attachment/12452947/LUCENE-2620_3x.patch) and recompile your own lucene 2.9.x, or you can replace the lucene jar file in your solr war with the newly released lucene-2.9.4 core jar... which I think is due to be released later today! Thanks for spending the time to report the problem... let us know the patch/lucene 2.9.4 doesnt fix it!
Re: [Wildcard query] Weird behaviour
Actually, i took a look at the code again, the queries you mentioned: I send queries to that field in the form (*term1*term2*) I think the patch will not fix your problem... The only way i know you can fix this would be to upgrade to lucene/solr trunk, where wildcard comparison is linear to the length of the string. In all other versions, it has much worse runtime, and thats what you are experiencing. Separately, even better than this would be to see if you can index your content in a way to avoid these expensive queries. But this is just a suggestion, what you are doing should still work fine. On Fri, Dec 3, 2010 at 6:56 AM, Robert Muir rcm...@gmail.com wrote: On Fri, Dec 3, 2010 at 6:28 AM, Tanguy Moal tanguy.m...@gmail.com wrote: However suddenly CPU usage simply doubles, and sometimes eventually start using all 16 cores of the server, whereas the number of handled request is pretty stable, and even starts decreasing because of degraded user experience due to dramatic response times. Hi Tanguy: This was fixed here: https://issues.apache.org/jira/browse/LUCENE-2620. You can apply the patch file there (https://issues.apache.org/jira/secure/attachment/12452947/LUCENE-2620_3x.patch) and recompile your own lucene 2.9.x, or you can replace the lucene jar file in your solr war with the newly released lucene-2.9.4 core jar... which I think is due to be released later today! Thanks for spending the time to report the problem... let us know the patch/lucene 2.9.4 doesnt fix it!
Re: [Wildcard query] Weird behaviour
Thank you very much Robert for replying that fast and accurately. I have effectively an other idea in mind to provide similar suggestions less expansively, I was balancing between the work around and the report issue options. I don't regret it since you came with a possible fix. I'll give it a try as soon as possible, and let the list know. Regards, Tanguy 2010/12/3 Robert Muir rcm...@gmail.com: Actually, i took a look at the code again, the queries you mentioned: I send queries to that field in the form (*term1*term2*) I think the patch will not fix your problem... The only way i know you can fix this would be to upgrade to lucene/solr trunk, where wildcard comparison is linear to the length of the string. In all other versions, it has much worse runtime, and thats what you are experiencing. Separately, even better than this would be to see if you can index your content in a way to avoid these expensive queries. But this is just a suggestion, what you are doing should still work fine. On Fri, Dec 3, 2010 at 6:56 AM, Robert Muir rcm...@gmail.com wrote: On Fri, Dec 3, 2010 at 6:28 AM, Tanguy Moal tanguy.m...@gmail.com wrote: However suddenly CPU usage simply doubles, and sometimes eventually start using all 16 cores of the server, whereas the number of handled request is pretty stable, and even starts decreasing because of degraded user experience due to dramatic response times. Hi Tanguy: This was fixed here: https://issues.apache.org/jira/browse/LUCENE-2620. You can apply the patch file there (https://issues.apache.org/jira/secure/attachment/12452947/LUCENE-2620_3x.patch) and recompile your own lucene 2.9.x, or you can replace the lucene jar file in your solr war with the newly released lucene-2.9.4 core jar... which I think is due to be released later today! Thanks for spending the time to report the problem... let us know the patch/lucene 2.9.4 doesnt fix it!
Re: Solr Multi-thread Update Transaction Control
From Solr's perspective, the fact that multiple threads are sending data to be indexed is invisible, Solr is just reading http requests. So I don't think what you're asking for is possible. Could you outline the reason you want to do this? Perhaps there's another way to accomplish it. Best Erick 2010/12/2 wangjb wang-ji...@kdc.benic.co.jp Hi, Now we are using solr1.4.1, and encounter a problem. When multi-threads update solr data at the same time, can every thread have its separate transaction? If this is possible, how can we realize this. Is there any suggestion here? Waiting online. Thank you for any useful reply.
Re: Joining Fields in and Index
Hi, I made a MappingUpdateRequestHandler which lets you map country codes to full country names with a config file. See https://issues.apache.org/jira/browse/SOLR-2151 -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 3. des. 2010, at 00.03, Adam Estrada wrote: Hi, I was hoping to do it directly in the index but it was more out of curiosity than anything. I can certainly map it in the DAO but again...I was hoping to learn if it was possible in the index. Thanks for the feedback! Adam On Dec 2, 2010, at 5:48 PM, Savvas-Andreas Moysidis wrote: Hi, If you are able to do a full re-index then you could index the full names and not the codes. When you later facet on the Country field you'll get the actual name rather than the code. If you are not able to re-index then probably this conversion could be added at your application layer prior to displaying your results.(e.g. in your DAO object) On 2 December 2010 22:05, Adam Estrada estrada.adam.gro...@gmail.comwrote: All, I have an index that has a field with country codes in it. I have 7 million or so documents in the index and when displaying facets the country codes don't mean a whole lot to me. Is there any way to add a field with the full country names then join the codes in there accordingly? I suppose I can do this before updating the records in the index but before I do that I would like to know if there is a way to do this sort of join. Example: US - United States Thanks, Adam
Facet same field with different preifx
Hi Everyone, Can I facet the same field twice with a different prefix as per example below? facet.field=myfield f.myfield.facet.prefix=*make* f.myfield.facet.sort=count facet.field=myfield f.myfield.facet.prefix=*model* f.myfield.facet.sort=count Thanks and Regards Ericz
Re: [Wildcard query] Weird behaviour
On Fri, Dec 3, 2010 at 7:49 AM, Tanguy Moal tanguy.m...@gmail.com wrote: Thank you very much Robert for replying that fast and accurately. I have effectively an other idea in mind to provide similar suggestions less expansively, I was balancing between the work around and the report issue options. I don't regret it since you came with a possible fix. I'll give it a try as soon as possible, and let the list know. I'm afraid the patch is only a hack for the case where you have more than 1 * sequentially (e.g. foobar). It doesn't fix the more general problem, which is that WildcardQuery itself uses an inefficient algorithm: this more general problem is only fixed in lucene/solr trunk. If you really need these queries i definitely suggest at least trying trunk, because you should get much better performance. But it sounds like you might already have an idea to avoid using these queries so this is of course the best.
Re: Batch Update Fields
No, there's no equivalent to SQL update for all values in a column. You'll have to reindex all the documents. On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: OK part 2 of my previous question... Is there a way to batch update field values based on a certain criteria? For example, if thousands of documents have a field value of 'US' can I update all of them to 'United States' programmatically? Adam
Problem with dismax mm
Hi list, I got a little problem with my mm definition: 2-1 450% 566% Here is what it *should* mean: If there are 2 clauses, at least one has to match. If there are more than 2 clauses, at least 50% should match (both rules seem to mean the same, don't they?). And if there are 5 or more than 5 claues, at least 66% should match. In case of 5 clauses, 3 should match, in case of 6 at least 4 should match and so on. However in some test-case I get only the intended behaviour with a 2-clause-query when I say mm=1. If I got longer queries this would lead to very bad search-quality-results. What is wrong with this mm-definition? Thanks for suggestions. - Em -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-dismax-mm-tp2011496p2011496.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem with dismax mm
On 12/3/2010 6:18 AM, Em wrote: I got a little problem with my mm definition: 2-1 450% 566% Are you defining this in a request handler in solrconfig.xml? If you have it entered just like that, I think it may not be understanding it. You need to encode the character. Here's an excerpt from my dismax handler: str name=mm2lt;-1 4lt;-50%/str If that's not the problem, then I am not sure what it is, and the experts will need more information - version, query URL, configs. Shawn
Re: Problem with dismax mm
from: http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29 If there are less than 3 optional clauses, they all must match; for 3 to 5 clauses, one less than the number of clauses must match, for 6 or more clauses, 80% must match, rounded down: 2-1 580% Personally, the mm parameter makes my head hurt. As I read it, there are actually 4 buckets that rules apply to, not three in your mm definition, see below. Your mm param says, I think, that clauses number rule required 1 1 We haven't gotten to a rule yet, this is the default 2 2 We haven't gotten to a rule yet, this is the default 3 2 2-1 4 3 2-1 5 2 450% rounded down 6 3 566% (6 * 0.66 = 3.96) 7 4 566% rounded down Personally, I think the percentages are mind warping and lead to interesting behavior. I prefer to explicitly list the number of causes required or relatively constant numbers of required clauses, something like between 3 and 5, one less. 6 to 9 two less etc. you don't get weird steps like between 4 and 5 above. Plus, by the time you get to, say, 7 clauses nobody can keep track of what correct behavior is anyway G. So I think you're off by one position when applying your rules. Or the Wiki page is misleading. Or the Wiki page is exactly correct and I'm mis-reading it. Like I said, mm makes my head hurt. Best Erick On Fri, Dec 3, 2010 at 8:18 AM, Em mailformailingli...@yahoo.de wrote: Hi list, I got a little problem with my mm definition: 2-1 450% 566% Here is what it *should* mean: If there are 2 clauses, at least one has to match. If there are more than 2 clauses, at least 50% should match (both rules seem to mean the same, don't they?). And if there are 5 or more than 5 claues, at least 66% should match. In case of 5 clauses, 3 should match, in case of 6 at least 4 should match and so on. However in some test-case I get only the intended behaviour with a 2-clause-query when I say mm=1. If I got longer queries this would lead to very bad search-quality-results. What is wrong with this mm-definition? Thanks for suggestions. - Em -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-dismax-mm-tp2011496p2011496.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem with dismax mm
Thank you both! Erick, what you said was absolutely correct. I missunderstood the definition completely. Now it works as intended. Thank you! Kind regards Erick Erickson wrote: from: http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29 If there are less than 3 optional clauses, they all must match; for 3 to 5 clauses, one less than the number of clauses must match, for 6 or more clauses, 80% must match, rounded down: 2-1 580% Personally, the mm parameter makes my head hurt. As I read it, there are actually 4 buckets that rules apply to, not three in your mm definition, see below. Your mm param says, I think, that clauses number rule required 1 1 We haven't gotten to a rule yet, this is the default 2 2 We haven't gotten to a rule yet, this is the default 3 2 2-1 4 3 2-1 5 2 450% rounded down 6 3 566% (6 * 0.66 = 3.96) 7 4 566% rounded down Personally, I think the percentages are mind warping and lead to interesting behavior. I prefer to explicitly list the number of causes required or relatively constant numbers of required clauses, something like between 3 and 5, one less. 6 to 9 two less etc. you don't get weird steps like between 4 and 5 above. Plus, by the time you get to, say, 7 clauses nobody can keep track of what correct behavior is anyway G. So I think you're off by one position when applying your rules. Or the Wiki page is misleading. Or the Wiki page is exactly correct and I'm mis-reading it. Like I said, mm makes my head hurt. Best Erick On Fri, Dec 3, 2010 at 8:18 AM, Em mailformailingli...@yahoo.de wrote: Hi list, I got a little problem with my mm definition: 2-1 450% 566% Here is what it *should* mean: If there are 2 clauses, at least one has to match. If there are more than 2 clauses, at least 50% should match (both rules seem to mean the same, don't they?). And if there are 5 or more than 5 claues, at least 66% should match. In case of 5 clauses, 3 should match, in case of 6 at least 4 should match and so on. However in some test-case I get only the intended behaviour with a 2-clause-query when I say mm=1. If I got longer queries this would lead to very bad search-quality-results. What is wrong with this mm-definition? Thanks for suggestions. - Em -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-dismax-mm-tp2011496p2011496.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-dismax-mm-tp2011496p2012079.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Limit number of characters returned
Correct me if I am wrong but I would like to return highlighted excerpts from the document so I would still need to index and store the whole document right (ie.. highlighting only works on stored fields)? On 12/3/10 3:51 AM, Ahmet Arslan wrote: --- On Fri, 12/3/10, Markstatic.void@gmail.com wrote: From: Markstatic.void@gmail.com Subject: Limit number of characters returned To: solr-user@lucene.apache.org Date: Friday, December 3, 2010, 5:39 AM Is there way to limit the number of characters returned from a stored field? For example: Say I have a document (~2K words) and I search for a word that's somewhere in the middle. I would like the document to match the search query but the stored field should only return the first 200 characters of the document. Is there anyway to accomplish this that doesn't involve two fields? I don't think it is possible out-of-the-box. May be you can hack highlighter to return that first 200 characters in highlighting response. Or a custom response writer can do that. But if you will be always returning first 200 characters of documents, I think creating additional field with indexed=false stored=true will be more efficient. And you can make your original field indexed=true stored=false, your index size will be diminished. copyField source=text dest=textShort maxChars=200/
finding exact case insensitive matches on single and multiword values
Users call this URL on my site: /?search=1city=den+haag or even /?search=1city=Den+Haag (casing of ctyname can be anything) Under water I call Solr: http://localhost:8983/solr/db/select/?indent=onfacet=truefq=city:den+haagq=*:*start=0rows=25fl=id,title,friendlyurl,cityfacet.field=city but this returns 0 results, even though I KNOW there are exactly 54 records that have an exact match on den haag (in this case even with lower casing in DB). citynames are stored with various casings in DB, so when searching with solr, the search must ignore casing. my schema.xml fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true / field name=city type=string indexed=true stored=true/ To check what was going on, I opened my analysis.jsp, for field name I provide: city for Field value (Index) I provide: den haag When I analyze this I get: den haag So that seems correct to me. Why is it that no results are returned? My requirements summarized: - I want to search independant of case on cityname: when user searches on DEn HaAG he will get the records that have value Den Haag, but also records that have den haag etc. - citynames may consists of multiple words but only an exact match is valid, so when user searches for den, he will not find den haag records. And when searched on den haag it will only return match on that and not other cities like den bosch. How can I achieve this? I think I need a new fieldtype in my schema.xml, but am not sure which tokenizers and analyzers I need, here's what I tried: fieldType name=exactmatch class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_dutch.txt / filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Help is really appreciated! -- View this message in context: http://lucene.472066.n3.nabble.com/finding-exact-case-insensitive-matches-on-single-and-multiword-values-tp2012207p2012207.html Sent from the Solr - User mailing list archive at Nabble.com.
Negative fl param
When returning results is there a way I can say to return all fields except a certain one? So say I have stored fields foo, bar and baz but I only want to return foo and bar. Is it possible to do this without specifically listing out the fields I do want?
Re: Limit number of characters returned
Yep, you're correct. CopyField is probably your simplest option here as Ahmet suggested. A more complex solution would be your own response writer, but unless and until you index gets cumbersome, I'd avoid that. Plus, storing the copied contents only shouldn't impact search much, since this doesn't add any terms... Best Erick On Fri, Dec 3, 2010 at 10:32 AM, Mark static.void@gmail.com wrote: Correct me if I am wrong but I would like to return highlighted excerpts from the document so I would still need to index and store the whole document right (ie.. highlighting only works on stored fields)? On 12/3/10 3:51 AM, Ahmet Arslan wrote: --- On Fri, 12/3/10, Markstatic.void@gmail.com wrote: From: Markstatic.void@gmail.com Subject: Limit number of characters returned To: solr-user@lucene.apache.org Date: Friday, December 3, 2010, 5:39 AM Is there way to limit the number of characters returned from a stored field? For example: Say I have a document (~2K words) and I search for a word that's somewhere in the middle. I would like the document to match the search query but the stored field should only return the first 200 characters of the document. Is there anyway to accomplish this that doesn't involve two fields? I don't think it is possible out-of-the-box. May be you can hack highlighter to return that first 200 characters in highlighting response. Or a custom response writer can do that. But if you will be always returning first 200 characters of documents, I think creating additional field with indexed=false stored=true will be more efficient. And you can make your original field indexed=true stored=false, your index size will be diminished. copyField source=text dest=textShort maxChars=200/
Re: Limit number of characters returned
Thanks for the response. Couldn't I just use the highlighter and configure it to use the alternative field to return the first 200 characters? In cases where there is a highlighter match I would prefer to show the excerpts anyway. http://wiki.apache.org/solr/HighlightingParameters#hl.alternateField http://wiki.apache.org/solr/HighlightingParameters#hl.maxAlternateFieldLength Is this something wrong with this method? On 12/3/10 8:03 AM, Erick Erickson wrote: Yep, you're correct. CopyField is probably your simplest option here as Ahmet suggested. A more complex solution would be your own response writer, but unless and until you index gets cumbersome, I'd avoid that. Plus, storing the copied contents only shouldn't impact search much, since this doesn't add any terms... Best Erick On Fri, Dec 3, 2010 at 10:32 AM, Markstatic.void@gmail.com wrote: Correct me if I am wrong but I would like to return highlighted excerpts from the document so I would still need to index and store the whole document right (ie.. highlighting only works on stored fields)? On 12/3/10 3:51 AM, Ahmet Arslan wrote: --- On Fri, 12/3/10, Markstatic.void@gmail.com wrote: From: Markstatic.void@gmail.com Subject: Limit number of characters returned To: solr-user@lucene.apache.org Date: Friday, December 3, 2010, 5:39 AM Is there way to limit the number of characters returned from a stored field? For example: Say I have a document (~2K words) and I search for a word that's somewhere in the middle. I would like the document to match the search query but the stored field should only return the first 200 characters of the document. Is there anyway to accomplish this that doesn't involve two fields? I don't think it is possible out-of-the-box. May be you can hack highlighter to return that first 200 characters in highlighting response. Or a custom response writer can do that. But if you will be always returning first 200 characters of documents, I think creating additional field with indexed=false stored=true will be more efficient. And you can make your original field indexed=true stored=false, your index size will be diminished. copyField source=text dest=textShort maxChars=200/
Re: finding exact case insensitive matches on single and multiword values
The root of your problem, I think, is fq=city:den+haag which parses into city:den +defaultfield:haag Try parens, i.e. city:(den haag). Attaching debugQuery=on is often a way to see thing like this quickly Also, if you haven't seen the analysis page from the admin page, it's really valuable for figuring out the effects of analyzers. You can probably do something like: fieldType name=myField class=solr.TextField sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType to get what you want. Best Erick On Fri, Dec 3, 2010 at 10:46 AM, PeterKerk vettepa...@hotmail.com wrote: Users call this URL on my site: /?search=1city=den+haag or even /?search=1city=Den+Haag (casing of ctyname can be anything) Under water I call Solr: http://localhost:8983/solr/db/select/?indent=onfacet=truefq=city:den+haagq=*:*start=0rows=25fl=id,title,friendlyurl,cityfacet.field=city but this returns 0 results, even though I KNOW there are exactly 54 records that have an exact match on den haag (in this case even with lower casing in DB). citynames are stored with various casings in DB, so when searching with solr, the search must ignore casing. my schema.xml fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true / field name=city type=string indexed=true stored=true/ To check what was going on, I opened my analysis.jsp, for field name I provide: city for Field value (Index) I provide: den haag When I analyze this I get: den haag So that seems correct to me. Why is it that no results are returned? My requirements summarized: - I want to search independant of case on cityname: when user searches on DEn HaAG he will get the records that have value Den Haag, but also records that have den haag etc. - citynames may consists of multiple words but only an exact match is valid, so when user searches for den, he will not find den haag records. And when searched on den haag it will only return match on that and not other cities like den bosch. How can I achieve this? I think I need a new fieldtype in my schema.xml, but am not sure which tokenizers and analyzers I need, here's what I tried: fieldType name=exactmatch class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_dutch.txt / filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Help is really appreciated! -- View this message in context: http://lucene.472066.n3.nabble.com/finding-exact-case-insensitive-matches-on-single-and-multiword-values-tp2012207p2012207.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: finding exact case insensitive matches on single and multiword values
You are right, this is what I see when I append the debug query (very very useful btw!!!) in old situation: arr name=parsed_filter_queries strcity:den title:haag/str strPhraseQuery(themes:hotel en restaur)/str /arr I then changed the schema.xml to: fieldType name=myField class=solr.TextField sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType field name=city type=myField indexed=true stored=true/ !-- used to be string -- I then tried adding parentheses: http://localhost:8983/solr/db/select/?indent=onfacet=truefq=city:(den+haag)q=*:*start=0rows=25fl=id,title,friendlyurl,cityfacet.field=city also tried (without +): http://localhost:8983/solr/db/select/?indent=onfacet=truefq=city:(den haag)q=*:*start=0rows=25fl=id,title,friendlyurl,cityfacet.field=city Then I get: arr name=parsed_filter_queries strcity:den city:haag/str /arr And still 0 results But as you can see the query is split up into 2 separate words, I dont think that is what I need? -- View this message in context: http://lucene.472066.n3.nabble.com/finding-exact-case-insensitive-matches-on-single-and-multiword-values-tp2012207p2012509.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: spellchecker results not as desired
Thanks, I was able to fix this issue with combination of EdgeNGrams and fuzzy query. here are details http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ I just added fuzzyquery operator and seems to be working so far -- View this message in context: http://lucene.472066.n3.nabble.com/spellchecker-results-not-as-desired-tp1789192p2012887.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: finding exact case insensitive matches on single and multiword values
when you went from strField to TextField in your config you enabled tokenizing (which I believe splits on spaces by default), which is why you see seperate 'words' / terms in the debugQuery-explanation. I believe you want to keep your old strField config and try quoting: fq=city:den+haag or fq=city:den haag Concerning the lower-casing: wouldn't if be easiest to do that at the client? (I'm not sure at the moment how to do lowercasing with a strField) . Geert-jan 2010/12/3 PeterKerk vettepa...@hotmail.com You are right, this is what I see when I append the debug query (very very useful btw!!!) in old situation: arr name=parsed_filter_queries strcity:den title:haag/str strPhraseQuery(themes:hotel en restaur)/str /arr I then changed the schema.xml to: fieldType name=myField class=solr.TextField sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType field name=city type=myField indexed=true stored=true/ !-- used to be string -- I then tried adding parentheses: http://localhost:8983/solr/db/select/?indent=onfacet=truefq=city:(den+haag)q=*:*start=0rows=25fl=id,title,friendlyurl,cityfacet.field=city also tried (without +): http://localhost:8983/solr/db/select/?indent=onfacet=truefq=city:(den haag)q=*:*start=0rows=25fl=id,title,friendlyurl,cityfacet.field=city Then I get: arr name=parsed_filter_queries strcity:den city:haag/str /arr And still 0 results But as you can see the query is split up into 2 separate words, I dont think that is what I need? -- View this message in context: http://lucene.472066.n3.nabble.com/finding-exact-case-insensitive-matches-on-single-and-multiword-values-tp2012207p2012509.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 1.4 suggester component
thanks .. i used http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ with fuzzy operator.. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-1-4-suggester-component-tp1766915p2012946.html Sent from the Solr - User mailing list archive at Nabble.com.
boosting certain docs based on a filed value
hi I was looking to boost certain docs based on some values in a indexed field. e.g. pType - post paid go phone Would like to have post paid docs first and then go phone. I checked the functional query but could not figure out. Any help? -- View this message in context: http://lucene.472066.n3.nabble.com/boosting-certain-docs-based-on-a-filed-value-tp2012962p2012962.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Limit number of characters returned
Couldn't I just use the highlighter and configure it to use the alternative field to return the first 200 characters? In cases where there is a highlighter match I would prefer to show the excerpts anyway. http://wiki.apache.org/solr/HighlightingParameters#hl.alternateField http://wiki.apache.org/solr/HighlightingParameters#hl.maxAlternateFieldLength Is this something wrong with this method? No, you can do that. It is perfectly fine.
Re: Batch Update Fields
I wonder...I know that sed would work to find and replace the terms in all of the csv files that I am indexing but would it work to find and replace key terms in the index? find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {} \; That command would iterate through all the files in the data directory and replace the country code with the full country name. I many just back up the directory and try it. I have it running on csv files right now and it's working wonderfully. For those of you interested, I am indexing the entire Geonames dataset http://download.geonames.org/export/dump/ (allCountries.zip) which gives me a pretty comprehensive world gazetteer. My next step is gonna be to display the results as KML to view over a google globe. Thoughts? Adam On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson erickerick...@gmail.comwrote: No, there's no equivalent to SQL update for all values in a column. You'll have to reindex all the documents. On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: OK part 2 of my previous question... Is there a way to batch update field values based on a certain criteria? For example, if thousands of documents have a field value of 'US' can I update all of them to 'United States' programmatically? Adam
nexus of synonyms and stemming, take 2
hi all, [This is a second attempt at emailing. The apache mailing list spam filter apparently did not like my synonyms entry, ie.. classified my email as spam. I have replaced phone with 'foo' , 'cell' with 'sell' and 'mobile' with 'nubile' ] This is a fairly basic synonyms question: how does synonyms handle stemming? Example: Synonyms.txt has entry: sell,sell foo,nubile,nubile foo,wireless foo If I want to match on 'sell foos'... a) do I need to add an entry for 'sell foos' (i.e. in addition to sell foo) b) or will the stemmer (porter/snowball) handle this already thanks will
Re: Batch Update Fields
On Friday 03 December 2010 18:20:44 Adam Estrada wrote: I wonder...I know that sed would work to find and replace the terms in all of the csv files that I am indexing but would it work to find and replace key terms in the index? It'll most likely corrupt your index. Offsets, positions etc won't have the proper meaning anymore. find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {} \; That command would iterate through all the files in the data directory and replace the country code with the full country name. I many just back up the directory and try it. I have it running on csv files right now and it's working wonderfully. For those of you interested, I am indexing the entire Geonames dataset http://download.geonames.org/export/dump/ (allCountries.zip) which gives me a pretty comprehensive world gazetteer. My next step is gonna be to display the results as KML to view over a google globe. Thoughts? Adam On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson erickerick...@gmail.comwrote: No, there's no equivalent to SQL update for all values in a column. You'll have to reindex all the documents. On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: OK part 2 of my previous question... Is there a way to batch update field values based on a certain criteria? For example, if thousands of documents have a field value of 'US' can I update all of them to 'United States' programmatically? Adam -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Negative fl param
When returning results is there a way I can say to return all fields except a certain one? So say I have stored fields foo, bar and baz but I only want to return foo and bar. Is it possible to do this without specifically listing out the fields I do want? There were a similar discussion. http://search-lucene.com/m/2qJaU1wImo3/ A workaround can be getting all stored field names from http://wiki.apache.org/solr/LukeRequestHandler and construct fl accordingly.
Re: finding exact case insensitive matches on single and multiword values
Arrrgh, Geert-Jan is right, that't the 15th time at least this has tripped me up. I'm pretty sure that text will work if you escape the space, e.g. city:(den\ haag). The debug output is a little confusing since it has a line like city:den haag which almost looks wrong... but it worked out OK on a couple of queries I tried. Geert-Jan is also right in that filters aren't applied to string types so there's two possibilities, either handle the casing on the client side as he suggests and use string or make the text type work. Sorry for the confusion Erick On Fri, Dec 3, 2010 at 11:54 AM, Geert-Jan Brits gbr...@gmail.com wrote: when you went from strField to TextField in your config you enabled tokenizing (which I believe splits on spaces by default), which is why you see seperate 'words' / terms in the debugQuery-explanation. I believe you want to keep your old strField config and try quoting: fq=city:den+haag or fq=city:den haag Concerning the lower-casing: wouldn't if be easiest to do that at the client? (I'm not sure at the moment how to do lowercasing with a strField) . Geert-jan 2010/12/3 PeterKerk vettepa...@hotmail.com You are right, this is what I see when I append the debug query (very very useful btw!!!) in old situation: arr name=parsed_filter_queries strcity:den title:haag/str strPhraseQuery(themes:hotel en restaur)/str /arr I then changed the schema.xml to: fieldType name=myField class=solr.TextField sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType field name=city type=myField indexed=true stored=true/ !-- used to be string -- I then tried adding parentheses: http://localhost:8983/solr/db/select/?indent=onfacet=truefq=city:(den+haag)q=*:*start=0rows=25fl=id,title,friendlyurl,cityfacet.field=city also tried (without +): http://localhost:8983/solr/db/select/?indent=onfacet=truefq=city:(den haag)q=*:*start=0rows=25fl=id,title,friendlyurl,cityfacet.field=city Then I get: arr name=parsed_filter_queries strcity:den city:haag/str /arr And still 0 results But as you can see the query is split up into 2 separate words, I dont think that is what I need? -- View this message in context: http://lucene.472066.n3.nabble.com/finding-exact-case-insensitive-matches-on-single-and-multiword-values-tp2012207p2012509.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: score from two cores
Uhhm, what are you trying to do? What do you want to do with the scores from two cores? Best Erick On Fri, Dec 3, 2010 at 11:21 AM, Ma, Xiaohui (NIH/NLM/LHC) [C] xiao...@mail.nlm.nih.gov wrote: I have multiple cores. How can I deal with score? Thanks so much for help! Xiaohui
Re: boosting certain docs based on a filed value
I was looking to boost certain docs based on some values in a indexed field. e.g. pType - post paid go phone Would like to have post paid docs first and then go phone. I checked the functional query but could not figure out. You can use http://wiki.apache.org/solr/DisMaxQParserPlugin#bq_.28Boost_Query.29 if you are using dismax. bq=pType:post paid^100 If you are using default query parser then you can append this optional clause to your query q = some other query pType:post paid^100
Re: Batch Update Fields
Have you consider defining synonyms for your code -country conversion at index time (or query time for that matter)? We may have an XY problem here. Could you state the high-level problem you're trying to solve? Maybe there's a better solution... Best Erick On Fri, Dec 3, 2010 at 12:20 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: I wonder...I know that sed would work to find and replace the terms in all of the csv files that I am indexing but would it work to find and replace key terms in the index? find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {} \; That command would iterate through all the files in the data directory and replace the country code with the full country name. I many just back up the directory and try it. I have it running on csv files right now and it's working wonderfully. For those of you interested, I am indexing the entire Geonames dataset http://download.geonames.org/export/dump/(allCountries.zip) which gives me a pretty comprehensive world gazetteer. My next step is gonna be to display the results as KML to view over a google globe. Thoughts? Adam On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson erickerick...@gmail.com wrote: No, there's no equivalent to SQL update for all values in a column. You'll have to reindex all the documents. On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: OK part 2 of my previous question... Is there a way to batch update field values based on a certain criteria? For example, if thousands of documents have a field value of 'US' can I update all of them to 'United States' programmatically? Adam
can solrj swap cores?
hi all, Does solrj support swapping cores? One of our developers had initially tried swapping solr cores (e.g. core0 and core1) using the solrj api, but it failed. (don't have the exact error) He susequently replaced the call with straight http (i.e. http client). Unfortunately I don't have the exact error in front of me... Solrj code: CoreAdminRequest car = new CoreAdminRequest(); car.setCoreName(production); car.setOtherCoreName(reindex); car.setAction(CoreAdminParams.CoreAdminAction.SWAP); SolrServer solrServer = SolrUtil.getSolrServer(); car.process(solrServer); solrServer.commit(); Finally, can someone comment on the solrj javadoc on CoreAdminRequest: * This class is experimental and subject to change. thanks, will
Re: Batch Update Fields
First off...I know enough about Solr to be VERY dangerous so please bare with me ;-) I am indexing the geonames database which only provides country codes. I can facet the codes but to the end user who may not know all 249 codes, it isn't really all that helpful. Therefore, I want to map the full country names to the country codes provided in the geonames db. http://download.geonames.org/export/dump/ http://download.geonames.org/export/dump/I used a simple split function to chop the 850 meg txt file in to manageable csv's that I can import in to Solr. Now that all 7 million + documents are in there, I want to change the country codes to the actual country names. I would of liked to have done it in the index but finding and replacing the strings in the csv seems to be working fine. After that I can just reindex the entire thing. Adam On Fri, Dec 3, 2010 at 12:42 PM, Erick Erickson erickerick...@gmail.comwrote: Have you consider defining synonyms for your code -country conversion at index time (or query time for that matter)? We may have an XY problem here. Could you state the high-level problem you're trying to solve? Maybe there's a better solution... Best Erick On Fri, Dec 3, 2010 at 12:20 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: I wonder...I know that sed would work to find and replace the terms in all of the csv files that I am indexing but would it work to find and replace key terms in the index? find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {} \; That command would iterate through all the files in the data directory and replace the country code with the full country name. I many just back up the directory and try it. I have it running on csv files right now and it's working wonderfully. For those of you interested, I am indexing the entire Geonames dataset http://download.geonames.org/export/dump/(allCountries.zip) which gives me a pretty comprehensive world gazetteer. My next step is gonna be to display the results as KML to view over a google globe. Thoughts? Adam On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson erickerick...@gmail.com wrote: No, there's no equivalent to SQL update for all values in a column. You'll have to reindex all the documents. On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: OK part 2 of my previous question... Is there a way to batch update field values based on a certain criteria? For example, if thousands of documents have a field value of 'US' can I update all of them to 'United States' programmatically? Adam
dataimports response returns before done?
Hi, After issueing a dataimport, I've noticed solr returns a response prior to finishing the import. Is this correct? Is there anyway i can make solr not return until it finishes? If not, how do I ping for the status whether it finished or not? thanks, tri
Question about Solr Fieldtypes, Chaining of Tokenizers
Hey folks, I'm working with a fairly specific set of requirements for our corpus that needs a somewhat tricky text type for both indexing and searching. The chain currently looks like this: tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.PatternReplaceFilterFactory pattern=(.*?)(\p{Punct}*)$ replacement=$1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ filter class=solr.PatternReplaceFilterFactory pattern=\p{Punct} replacement= / tokenizer class=solr.WhitespaceTokenizerFactory/ Now you will notice that I'm trying to add in a second tokenizer to this chain at the very end, this is due to the final replacement of punctuation to whitespace. At that point I'd like to further break up these tokens to smaller tokens. The reason for this is that we have a mixed normal english word and scientific corpus. For example you could expect string like The symposium of TgThe(RX3fg+and) gene studies being added to the index, and parts of those phrases being searched on. We want to be able to remove the stopwords in the mostly english parts of these types of statements, which the whitespace tokenizer, followed by removing trailing punctuation, followed by the stopfilter takes care of. We do not want to remove references to genetic information contained in allele symbols and the like. Sadly as far as I can tell, you cannot chain tokenizers in the schema.xml, so does anyone have some suggestions on how this could be accomplished? Oh, and let me add that the WordDelimiterFilter comes really close to what I want, but since we are unwilling to promote our solr version to the trunk (we are on the 1.4x) version atm, the inability to turn off the automatic phrase queries makes it a no go. We need to be able to make searches on left/right match right/left. My searches through the old material on this subject isn't really showing me much except some advice on using the copyField attribute. But my understanding is that this will simply take your original input to the field, and then analyze it in two different ways depending on the field definitions. It would be very nice if it were copying the already analyzed version of the text... but that's not what its doing, right? Thanks for any advice on this matter. Matt
Re: dataimports response returns before done?
--- On Fri, 12/3/10, Tri Nguyen tringuye...@yahoo.com wrote: From: Tri Nguyen tringuye...@yahoo.com Subject: dataimports response returns before done? To: solr user solr-user@lucene.apache.org Date: Friday, December 3, 2010, 7:55 PM Hi, After issueing a dataimport, I've noticed solr returns a response prior to finishing the import. Is this correct? Is there anyway i can make solr not return until it finishes? If not, how do I ping for the status whether it finished or not? So you want to do something at the end of the import? http://wiki.apache.org/solr/DataImportHandler#EventListeners may help. Also you can always poll solr/dataimport url and check status (busy,idle)
RE: score from two cores
Please correct me if I am doing something wrong. I really appreciate your help! I have a core for metadata (xml files) and a core for pdf documents. Sometimes I need search them separately, sometimes I need search both of them together. There is the same key which is related them for each item. For example, the xml files look like following: ?xml version=1.0 encoding=ISO-8859-1? List Item Keyrmaaac.pdf/Key TIsomethingTI UIrmaaac/UI /Item Item . /List I index rmaaac.pdf file with same Key and UI field in another core. Here is the example after I index rmaaac.pdf. ?xml version=1.0 encoding=UTF-8 ? response lst name=responseHeader int name=status0/int int name=QTime3/int lst name=params str name=indenton/str str name=start0/str str name=qcollectionid: RM/str str name=rows10/str str name=version2.2/str /lst /lst result name=response numFound=1 start=0 doc str name=UIrm/str str name=Keyrm.pdf/str str name=metadata_contentsomething/str /doc /result The result information which is display to user comes from metadata, not from pdf files. If I search a term from documents, in order to display search results to user, I have to get Keys from documents and then redo search from metadata. Then score is different. Please give me some suggestions! Thanks so much, Xiaohui -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, December 03, 2010 12:37 PM To: solr-user@lucene.apache.org Subject: Re: score from two cores Uhhm, what are you trying to do? What do you want to do with the scores from two cores? Best Erick On Fri, Dec 3, 2010 at 11:21 AM, Ma, Xiaohui (NIH/NLM/LHC) [C] xiao...@mail.nlm.nih.gov wrote: I have multiple cores. How can I deal with score? Thanks so much for help! Xiaohui
Re: Negative fl param
Ok simple enough. I just created a SearchComponent that removes values from the fl param. On 12/3/10 9:32 AM, Ahmet Arslan wrote: When returning results is there a way I can say to return all fields except a certain one? So say I have stored fields foo, bar and baz but I only want to return foo and bar. Is it possible to do this without specifically listing out the fields I do want? There were a similar discussion. http://search-lucene.com/m/2qJaU1wImo3/ A workaround can be getting all stored field names from http://wiki.apache.org/solr/LukeRequestHandler and construct fl accordingly.
Highlighting parameters
Is there a way I can specify separate configuration for 2 different fields. For field 1 I wan to display only 100 chars, Field 2 200 chars
Syncing 'delta-import' with 'select' query
Hello everyone! I would like to ask you a question about DIH. I am using a database and DIH to sync against Solr, and a GUI to display and operate on the items retrieved from Solr. When I change the state of an item through the GUI, the following happens: a. The item is updated in the DB. b. A delta-import command is fired to sync the DB with Solr. c. The GUI is refreshed by making a query to Solr. My problem comes between (b) and (c). The delta-import operation is executed in a new thread, so my call returns immediately, refreshing the GUI before the Solr index is updated causing the item state in the GUI to be outdated. I had two ideas so far: 1. Querying the status of the DIH after the delta-import operation and do not return until it is idle. The problem I see with this is that if other users execute delta-imports, the status will be busy until all operations are finished. 2. Use Zoie. The first problem is that configuring it is not as straightforward as it seems, so I don't want to spend more time trying it until I am sure that this will solve my issue. On the other hand, I think that I may suffer the same problem since the delta-import is still firing in another thread, so I can't be sure it will be called fast enough. Am I pointing on the right direction or is there another way to achieve my goal? Thanks in advance! Juan M.
Re: boosting certain docs based on a filed value
thanks!! that worked.. Can i enter the sequence too like postpaid,free,costly? -- View this message in context: http://lucene.472066.n3.nabble.com/boosting-certain-docs-based-on-a-filed-value-tp2012962p2013895.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Highlighting parameters
Yes Some parameters may be overriden on a per-field basis with the following syntax: f.fieldName.originalParam=value http://wiki.apache.org/solr/HighlightingParameters Is there a way I can specify separate configuration for 2 different fields. For field 1 I wan to display only 100 chars, Field 2 200 chars
Re: Highlighting parameters
Is there a way I can specify separate configuration for 2 different fields. For field 1 I wan to display only 100 chars, Field 2 200 chars yes with the parameter accepts per-field overrides. the syntax is http://wiki.apache.org/solr/HighlightingParameters#HowToOverride f.TEXT.hl.maxAlternateFieldLength=80f.CATEGORY.hl.maxAlternateFieldLength=100
Re: boosting certain docs based on a filed value
thanks!! that worked.. Can i enter the sequence too like postpaid,free,costly? Does that mean you want to display first postpaid, after that free, and lastly costly? If thats you want, i think it is better to create a tint field using these types and then sort by this field. pstpaid=300 free=200 costy=100 sort=newTintField desc, score desc http://wiki.apache.org/solr/CommonQueryParameters#sort
Re: Batch Update Fields
That will certainly work. Another option, assuming the country codes are in their own field would be to put the transformations into a synonym file that was only used on that field. That way you'd get this without having to do the pre-process step of the raw data... That said, if you pre-processing is working for you it may not be worth your while to worry about doing it differently Best Erick On Fri, Dec 3, 2010 at 12:51 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: First off...I know enough about Solr to be VERY dangerous so please bare with me ;-) I am indexing the geonames database which only provides country codes. I can facet the codes but to the end user who may not know all 249 codes, it isn't really all that helpful. Therefore, I want to map the full country names to the country codes provided in the geonames db. http://download.geonames.org/export/dump/ http://download.geonames.org/export/dump/I used a simple split function to chop the 850 meg txt file in to manageable csv's that I can import in to Solr. Now that all 7 million + documents are in there, I want to change the country codes to the actual country names. I would of liked to have done it in the index but finding and replacing the strings in the csv seems to be working fine. After that I can just reindex the entire thing. Adam On Fri, Dec 3, 2010 at 12:42 PM, Erick Erickson erickerick...@gmail.com wrote: Have you consider defining synonyms for your code -country conversion at index time (or query time for that matter)? We may have an XY problem here. Could you state the high-level problem you're trying to solve? Maybe there's a better solution... Best Erick On Fri, Dec 3, 2010 at 12:20 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: I wonder...I know that sed would work to find and replace the terms in all of the csv files that I am indexing but would it work to find and replace key terms in the index? find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {} \; That command would iterate through all the files in the data directory and replace the country code with the full country name. I many just back up the directory and try it. I have it running on csv files right now and it's working wonderfully. For those of you interested, I am indexing the entire Geonames dataset http://download.geonames.org/export/dump/(allCountries.zip) which gives me a pretty comprehensive world gazetteer. My next step is gonna be to display the results as KML to view over a google globe. Thoughts? Adam On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson erickerick...@gmail.com wrote: No, there's no equivalent to SQL update for all values in a column. You'll have to reindex all the documents. On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada estrada.adam.gro...@gmail.com wrote: OK part 2 of my previous question... Is there a way to batch update field values based on a certain criteria? For example, if thousands of documents have a field value of 'US' can I update all of them to 'United States' programmatically? Adam
Re: score from two cores
The scores will not be comparable. Scores are only relevant within one search on one core, so comparing them across two queries (even if it's the same query but against two different cores) is meaningless. So, given your setup I would just use the results from one of the cores and fill in data from the other... But why do you have two cores in the first place? Is it really necessary or is it just making things more complex? Best Erick On Fri, Dec 3, 2010 at 1:36 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] xiao...@mail.nlm.nih.gov wrote: Please correct me if I am doing something wrong. I really appreciate your help! I have a core for metadata (xml files) and a core for pdf documents. Sometimes I need search them separately, sometimes I need search both of them together. There is the same key which is related them for each item. For example, the xml files look like following: ?xml version=1.0 encoding=ISO-8859-1? List Item Keyrmaaac.pdf/Key TIsomethingTI UIrmaaac/UI /Item Item . /List I index rmaaac.pdf file with same Key and UI field in another core. Here is the example after I index rmaaac.pdf. ?xml version=1.0 encoding=UTF-8 ? response lst name=responseHeader int name=status0/int int name=QTime3/int lst name=params str name=indenton/str str name=start0/str str name=qcollectionid: RM/str str name=rows10/str str name=version2.2/str /lst /lst result name=response numFound=1 start=0 doc str name=UIrm/str str name=Keyrm.pdf/str str name=metadata_contentsomething/str /doc /result The result information which is display to user comes from metadata, not from pdf files. If I search a term from documents, in order to display search results to user, I have to get Keys from documents and then redo search from metadata. Then score is different. Please give me some suggestions! Thanks so much, Xiaohui -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, December 03, 2010 12:37 PM To: solr-user@lucene.apache.org Subject: Re: score from two cores Uhhm, what are you trying to do? What do you want to do with the scores from two cores? Best Erick On Fri, Dec 3, 2010 at 11:21 AM, Ma, Xiaohui (NIH/NLM/LHC) [C] xiao...@mail.nlm.nih.gov wrote: I have multiple cores. How can I deal with score? Thanks so much for help! Xiaohui
Re: score from two cores
On Fri, Dec 3, 2010 at 4:47 PM, Erick Erickson erickerick...@gmail.com wrote: But why do you have two cores in the first place? Is it really necessary or is it just making things more complex? I don't know why the OP wants two cores, but I ran into this same problem and had to abandon using a second core. My use case is: I have lots of slowing-changing documents, and a few often-changing documents. Those classes of documents are updated by different people using different processes. I wanted to split them into separate cores so that: 1) The large core wouldn't change except deliberately so there would be less chance of a bug creeping in. Also, that core is the same on different servers, so they could be replicated. 2) The small core would update and optimize quickly and the data in it is different on different servers. The problem is that the search results should return relevancy as if there were only one core.
highlighting wiki confusion
http://wiki.apache.org/solr/HighlightingParameters?#hl.highlightMultiTerm If the SpanScorer is also being used, enables highlighting for range/wildcard/fuzzy/prefix queries. Default is false. Solr1.4. This parameter makes sense for Highlighter only. I think this meant 'for PhraseHighlighter only'? -- Lance Norskog goks...@gmail.com
Re: Restrict access to localhost
If you are using another app to create the index, I think you can remove the update servlet mapping in the web.xml. -- View this message in context: http://lucene.472066.n3.nabble.com/Restrict-access-to-localhost-tp2004475p2014129.html Sent from the Solr - User mailing list archive at Nabble.com.