Re: AW: Leading wildcards
hey, we've stumbled on something weird while using wildcards we enabled leading wildcards in solr (see previous message from Christian Burkamp) when we do a search on a nonexisting field, we get a SolrException: undefined field (this was for query nonfield:test) but when we use wildcards in our query, we dont get the undefined field exception, so the query nonfield:*test works fine ... just zero results... is this normal behaviour ? Burkamp, Christian [EMAIL PROTECTED] 19/04/2007 12:37 Please respond to solr-user@lucene.apache.org To solr-user@lucene.apache.org cc Subject AW: Leading wildcards Hi there, Solr does not support leading wildcards, because it uses Lucene's standard QueryParser class without changing the defaults. You can easily change this by inserting the line parser.setAllowLeadingWildcards(true); in QueryParsing.java line 92. (This is after creating a QueryParser instance in QueryParsing.parseQuery(...)) and it obviously means that you have to change solr's source code. It would be nice to have an option in the schema to switch leading wildcards on or off per field. Leading wildcards really make no sense on richly populated fields because queries tend to result in too many clauses exceptions most of the time. This works for leading wildcards. Unfortunately it does not enable searches with leading AND trailing wildcards. (E.g. searching for *lega* does not find results even if the term elegance is in the index. If you put a second asterisk at the end, the term elegance is found. (search for *lega** to get hits). Can anybody explain this though it seems to be more of a lucene QueryParser issue? -- Christian -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Gesendet: Donnerstag, 19. April 2007 08:35 An: solr-user@lucene.apache.org Betreff: Leading wildcards hi, we have been trying to get the leading wildcards to work. we have been looking around the Solr website, the Lucene website, wiki's and the mailing lists etc ... but we found a lot of contradictory information. so we have a few question : - is the latest version of lucene capable of handling leading wildcards ? - is the latest version of solr capable of handling leading wildcards ? - do we need to make adjustments to the solr source code ? - if we need to adjust the solr source, what do we need to change ? thanks in advance ! Maarten
browse a facet without a query?
When there is no q Solr complains. How can I browse a facet without a keyword query? For example, I want to view all document for a given state; ?q=fq=state:California Thank you. Jennifer Seaman
Re: browse a facet without a query?
On 4/23/07, Jennifer Seaman [EMAIL PROTECTED] wrote: When there is no q Solr complains. How can I browse a facet without a keyword query? For example, I want to view all document for a given state; ?q=fq=state:California With a relatively recent nightly build, you can use q=*:* Before that, use an open-ended range query like q=state:[* TO *] -Yonik
Re: solr utf 16 ?
Are there any plans to make solr UTF-16 compliant in the future? If so, is it in the short-term or long-term? I'm curious what you mean by UTF-16 complaint. Do you mean being able to handle UTF-16 encoded XML? Thanks, -- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378 Find Code, Find Answers
Re: solr utf 16 ?
Yes. I'm assuming if you have UTF-16 encoded data in a document that needs to be added to the index, that solr would not be able to handle this? I'm curious what you mean by UTF-16 complaint. Do you mean being able to handle UTF-16 encoded XML? _ Dont quit your job Take Classes Online and Earn your Degree in 1 year. Start Today! http://www.classesusa.com/clickcount.cfm?id=866146goto=http%3A%2F%2Fwww.classesusa.com%2Ffeaturedschools%2Fonlinedegreesmp%2Fform-dyn1.html%3Fsplovr%3D866144
Re: solr utf 16 ?
I'm curious what you mean by UTF-16 complaint. Do you mean being able to handle UTF-16 encoded XML? Yes. I'm assuming if you have UTF-16 encoded data in a document that needs to be added to the index, that solr would not be able to handle this? I've never tried sending anything but UTF-8 to Solr, so I can't comment on what issues you'll run into. But based on my experience to date, I'd strongly suggest converting it to UTF-8 before you post it to Solr. -- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378 Find Code, Find Answers
Re: solr utf 16 ?
On 4/23/07, brian beard [EMAIL PROTECTED] wrote: Yes. I'm assuming if you have UTF-16 encoded data in a document that needs to be added to the index, that solr would not be able to handle this? I believe that handling arbitrary encodings is on the list of future enhancements, but I couldn't give you a timeline. For the time being, consider that 1. utf-8 is the lingua franca of xml document encoding 2. it is very easy to convert it yourself (it would be a 3-4 line python commandline filter, frinstance). -Mike
Solr on Lucene/Solr site?
Hey there - It just occurred to me that the search on lucene.apache.org is powered by google. Shouldn't it be Solr? heh ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++
Re: does solr handle updates quickly?
This might also be a cool was to increase relevancy. Does Lucene/Solr do, or can it do, any sort of increase on relevancy depending on which search result a user picks? Would it be feasible for me to update an index_id with a click count each time a user clicks a result, and give this field a boost in the results? ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On Apr 22, 2007, at 7:17 PM, Tait Larson wrote: Hi, I'm new to Solr. I've just started playing around with it and learning what it can do. I'd like to include a vote field on all of my indexed documents. Users vote on the content they like. A vote tally is displayed along with the each document returned in the results of a search. Let's say I create a vote field of type SortableIntField. Users vote relatively frequently. Assume I send update commands to solr which change only the vote field approximately 1 time for every 50 searches a user performs. What effects will this have on my index? Will search performance degrade. Thanks, Tait
Re: solr utf 16 ?
Thanks for all the comments. The conversion seems like a good alternative. From: Mike Klaas [EMAIL PROTECTED] Reply-To: solr-user@lucene.apache.org To: solr-user@lucene.apache.org Subject: Re: solr utf 16 ? Date: Mon, 23 Apr 2007 11:13:54 -0700 On 4/23/07, brian beard [EMAIL PROTECTED] wrote: Yes. I'm assuming if you have UTF-16 encoded data in a document that needs to be added to the index, that solr would not be able to handle this? I believe that handling arbitrary encodings is on the list of future enhancements, but I couldn't give you a timeline. For the time being, consider that 1. utf-8 is the lingua franca of xml document encoding 2. it is very easy to convert it yourself (it would be a 3-4 line python commandline filter, frinstance). -Mike _ Need a break? Find your escape route with Live Search Maps. http://maps.live.com/?icid=hmtag3
Re: browse a facet without a query?
Hi - On 4/23/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 4/23/07, Jennifer Seaman [EMAIL PROTECTED] wrote: When there is no q Solr complains. How can I browse a facet without a keyword query? For example, I want to view all document for a given state; ?q=fq=state:California With a relatively recent nightly build, you can use q=*:* Before that, use an open-ended range query like q=state:[* TO *] I was doing the q=state[* TO *] for a short time, and found it very slow. I switched to doing a query on a single field that covered the part of the index I was interested in, for example: inStock:true And got much faster performance. I was getting execution times in seconds (for example, I just manually did this and got. 2.2 seconds for the [* TO *], and 50 milliseconds for the latter (inStock:true), uncached) In my case the filter query hits about 80% of the docs, so it's doing a similar amount of work. I don't know how well *:* performs, but if it is similar to state:[* TO *], I would benchmark it before using. For us, facet queries are a high percentage, so the time was critical. It might even be worth adding a field, if you don't already have an appropriate one. Tom
Re: Leading wildcards
Here is a late response, apache.org was rejecting our e-mails... Allowing leading wildcards opens up a denial of service attack. It becomes trivial to overload the search engine and take it out of service, just hammer it with leading wildcard queries. Please leave the default as disabled. If we add a config option, there should be a security warning with it. wunder On 4/19/07 8:04 AM, Michael Kimsal [EMAIL PROTECTED] wrote: It still seems like it's only something that would be invoked by a user's query. If I queried for *foobar and leading wildcards were not on in the server, I'd get back nothing, which isn't really correct. I'd think the application should tell the user that that syntax isn't supported. Perhaps I'm simplifying it a bit. It would certainly help out our comfort level to have it either be on or configurable by default, rather than having to maintain a 'patched' version (yes, the patch is only one line, but it's the principle of the thing). I suspect this would be the same for others. On 4/19/07, Erik Hatcher [EMAIL PROTECTED] wrote: On Apr 19, 2007, at 10:39 AM, Yonik Seeley wrote: On 4/19/07, Erik Hatcher [EMAIL PROTECTED] wrote: parser.setAllowLeadingWildcards(true); I have also run into this issue and have intended to fix up Solr to allow configuring that switch on QueryParser. Any reason that parser.setAllowLeadingWildcards(true) shouldn't be the default? That's fine by me. But... Does it really need to be configurable? It all depends on how bad of a hit it'd take on Solr. What's the breaking point where the performance of full-term scanning (and subsequently faceting, of course) kills over or dies? FuzzyQuery's die on my 3.7M index and not-super-beefy hardware and system setup. Erik
Re: browse a facet without a query?
: I was doing the q=state[* TO *] for a short time, and found it very slow. I : switched to doing a query on a single field that covered the part of the : index I was interested in, for example: : : inStock:true if you have the filterCache enabled and you aren't opening new searchers very often, the open ended range query should results in a cached bitself just as good as something like inStock:true ... i think yonik just suggested it because if you are faceting on state then you can be confident that you are only interested in docs that have a state field. : And got much faster performance. I was getting execution times in seconds : (for example, I just manually did this and got. 2.2 seconds for the [* TO : *], and 50 milliseconds for the latter (inStock:true), uncached) [* TO *] on the default field might be very slow (because it's iterating over all the terms) but on a field with a small number of discrete values (like state, or inStock) it should be very fast. : similar amount of work. I don't know how well *:* performs, but if it is : similar to state:[* TO *], I would benchmark it before using. *:* is implemented extremeley efficienlty ... it doesn't look at any term info, it just iterates over all the non-deleted docs. -Hoss
Re: browse a facet without a query?
On 4/23/07, Tom Hill [EMAIL PROTECTED] wrote: I was doing the q=state[* TO *] for a short time, and found it very slow. I switched to doing a query on a single field that covered the part of the index I was interested in, for example: inStock:true And got much faster performance. Good point... the fewer the terms, the faster the performance. I don't know how well *:* performs, but if it is similar to state:[* TO *], I would benchmark it before using. *:* will be the fastest as it translates to a MatchAllDocsQuery, which does no term lookups at all, but just skips over deleted documents. -Yonik
Re: Snapshooting or replicating recently indexed data
Here's the Solr Wiki on collection distribution: http://wiki.apache.org/solr/CollectionDistribution It describes the incremental nature of the distribution: A collection is a directory of many files. Collections are distributed to the slaves as snapshots of these files. Each snapshot is made up of hard links to the files so copying of the actual files is not necessary when snapshots are created. Lucene only significantly rewrites files following an optimization command. Generally, a file once written, will change very little if at all. This makes the underlying transport of rsync very useful. Files that have already been transfered and have not changed do not need to be re-transferred with the new edition of a collection. Bill On 4/21/07, Kevin Lewandowski [EMAIL PROTECTED] wrote: snapshooter does create incremental builds of the index. It doesn't appear so if you look at the contents because the existing files are hard links. But it is incremental. On 4/20/07, Doss [EMAIL PROTECTED] wrote: Hi Yonik, Thanks for your quick response, my question is this, can we take incremental backup/replication in SOLR? Regards, Doss. M. MOHANDOSS Software Engineer Ext: 507 (A BharatMatrimony Enterprise) - Original Message - From: Yonik Seeley [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, April 19, 2007 7:42 PM Subject: Re: Snapshooting or replicating recently indexed data On 4/19/07, Doss [EMAIL PROTECTED] wrote: It seems the snapshooter takes the exact copy of the indexed data, that is all the contents inside the index directory, how can we take the recently added once? ... cp -lr ${data_dir}/index ${temp} mv ${temp} ${name} ... I don't quite understand your question, but since hard links are used, it's more like pointing to the index files instead of copying them. Rsync is used as a transport to only move the files that were changed from the master to slaves. -Yonik
Re: snapshooter on OS X
You can also run the script with the -V option. It shows debugging info but not as much as bash -x. I tried snapshooter on OS X 10.4.9. I did get the cp: illegal option -- l error. But that's the only error I got. Bill On 4/23/07, Bertrand Delacretaz [EMAIL PROTECTED] wrote: On 4/23/07, Grant Ingersoll [EMAIL PROTECTED] wrote: ...The error says something about command not found line 15, but all the files I looked at, line 15 was a comment... Running your script with bash -x myscript should help, it will echo commands before executing them. -Bertrand