Re: Collection Distirbution in windows
damn, there goes the platform independance ... is there anybody with a lillte more experience when it comes to collection distribution on Windows ? tnx in advance ! Bill Au [EMAIL PROTECTED] 02/05/2007 15:09 Please respond to solr-user@lucene.apache.org To solr-user@lucene.apache.org cc Subject Re: Collection Distirbution in windows The collection distribution scripts relies on hard links and rsync. It seems that both maybe avaialble on Windows hard links: http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/fsutil_hardlink.mspx?mfr=true rsync: http://samba.anu.edu.au/rsync/download.html I say maybe because I don't know if hard link on windows work the same way as hard link on Linux/Unix. You will also need something like cygwin to run the bash scripts. Bill On 5/2/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: i know this is a stupid question, but are there any collection distribution scripts for windows available ? thanks !
Re: Index corruptions?
In additional to snapshot, you can also make backup copies of your Solr index using the backup script. Backup are created the same way as snapshots using hard links. Each one is a viable full index. Bill On 5/3/07, Charlie Jackson [EMAIL PROTECTED] wrote: I have a couple of questions regarding index corruptions. 1) Has anyone using Solr in a production environment ever experienced an index corruption? If so, how frequently do they occur? 2) It seems like the CollectionDistribution setup would be a good way to put in place a recovery plan for (or at least have some viable backups of) the index. However, I have a small concern that if the index gets corrupted on the master server, the corruption would propagate down to the slave servers as well. Is this concern unfounded? Also, each of the snapshots taken by snapshooter are viable full indexes, correct? If so, that means I'd have a backup of the index each and every time a commit (or optimize for that matter) is done, which would be awesome. One of our biggest requirements for the indexing process is to have a good backup/recover strategy in place and I want to make sure Solr will be able to provide that. Thanks in advance! Charlie
Re: Searchproblem composite words
A agree that multi-word synonyms are an excellent way to do this. This may sound like a hack, but you'd end up doing this even if you had dedicated linguistic compound decomposition software. Those usually use a dictionary of common words and the dictionary rarely has all the words that are important for your site. I'll be doing this for my site to handle things like dreamgirls and dream girls. wunder On 5/2/07 11:58 AM, Chris Hostetter [EMAIL PROTECTED] wrote: : For example I have the composite word wishlist in my document. I can : easily find the document by using the search string wishlist or wish* : but I don't get any result with list. what you are describing is basically a substring search problem ... sometimes this can be dealt with by using something like the WordDeliminterFilter -- but only if people are using WishList in their documents. Another approach would be to use and NGram based tokenizer (built in support for this will probably be added soon) but then searches for things like able will match words like cable ... which may not be what you want (yes it is a substring, but it is not what anyone would consider a composite word the best way to match what you want extremely acurately would be to use the SynonymFilter and enumerate every composite word you care about in the Synonym list ... tedious yes, but also very accurate. -Hoss
Re[2]: Snippet Generation at Punctuation Marks?
Thanks. Looking forward to it! We are working on this and hope to have a solr patch soon. Doing simple splitting on punctuation is a new fragmenter, which trunk solr does not support yet. But we're hoping to fix that asap. -brian
Re: Sorting in Solr
: Just to be clear, I have multiple fields per document that Are coming : back in the queried XML. Let's say it's name, id, date, description. I : want to sort dynamically on fields but for my test case on Description. : Are you suggesting that there be one field defined per document, or you : can only sort on one field per request? I'm not sure I understand this : explanation. if you want to sort on a field called description then there must be at most one indexed term per document for that field. if you also ant to sort on a field called date there must also be at most one indexed value for that field per document. for numeric or date type fields, ensuring that there is only one index value per document is a simple value of making sure the field is defined as multiValue=false in your schema, but for textish fields it's not as simple ... you may send only one field../field per doc for that field name but if you are using a non trivial analyzer you'll wind up with more then one indexed term. so you define name and description to be whatever you type you want with whatever analyzer you want, and then you use copyField to create a second version of each called nameSort and descriptionSort which use the StrField filedtype ... now you can sort on either of those, or both at the same time (ie: nameSort asc, descriptionSort desc) : Chris Hostetter wrote: : : having issues with it. Some fields work some do not and my results seem : : to suggest that it doesn't work when there are any non-alphaNumeric : : values in the fields. Can someone out there either confirm this or let : : me know what I may be doing wrong? is it a matter of using a different : : analyzer setting or filter factory than the default setting for text. : : Sorting requires that there be a single Term/Token per doc ... most : Analyzers do not have this behavior, so you need to use copyField to : create a String version of the field that you use for sorting. : : the example schema in the trunk shows this using the name and nameSort : fields ... in the 1.1 release there is a comment about the manu_exact : field. : : I've added this as a FAQ. : : : : -Hoss : : : : -Hoss
Re: Wondering about results from PhraseQuer
: the scenario, understand this that user runs a search for title which has : pretty common terms such as how do I update {all of the words appears : 1000s of times in indexes } and they want to search prison the last term : appears not more than 1 or 2 times across the indexes. Now I have the : problem, if I try to run phrase query on this I get zero results and if I if the word rpison doesn't appear anywhere near the words how do i then a phrase search on how do i prison isn't going to find any documents. perhaps you should search on... +how do i +prison ..which will only return docs that match the phrase how do i and also contain the word prison. : 0.0 = fieldWeight(subject_t:how do i prison in 9268), product of: : 0.0 = tf(phraseFreq=0.0) : 18.508762 = idf(subject_t: how=2225 do=3359 i=4918 prison=4) : 0.5 = fieldNorm(field=subject_t, doc=9268) this would be my point before ... that phrase does not appear in the document (hence the tf is zero) -Hoss
solr.py - set boosts?
I've been using solr.py to post and search. It works well. Is it possible to specify doc boost and field boost with it? Jack Erik There is a solr.py in the Solr clients directory: Erik http://svn.apache.org/repos/asf/lucene/solr/trunk/client/python/solr.py Erik It's got some utility methods for generating field's. Mike It is not documented very well, but you can pass in a multi-map to the Mike solr.py client: Mike .add(field_one=['one', 'two', 'three'], field_two='value', ...)
Re: facet.sort does not work in python output
The Python output uses nested dictionaries for facet counts. I read it online that Python dictionaries do not preserve order. So when a string is eval()'d, the sorted order is lost in the generated Python object. Is it a good idea to use list to wrap around the dictionary? This is only needed for the fields, sorted by counts. -- Best regards, Jack Wednesday, May 2, 2007, 6:09:50 PM, you wrote: When facet.sort is used, the facet fields are sorted by the count in the reply string when using python output. However, after calling eval(), the sort order seems to be lost. Not sure if anyone has come up with a way to avoid this problem. Using the JSON output with a JSON parser for Python should work but I haven't tested it yet.
Re: facet.sort does not work in python output
On 5/3/07, Jack L [EMAIL PROTECTED] wrote: The Python output uses nested dictionaries for facet counts. I read it online that Python dictionaries do not preserve order. So when a string is eval()'d, the sorted order is lost in the generated Python object. Is it a good idea to use list to wrap around the dictionary? This is only needed for the fields, sorted by counts. This might be fixed in the future, but for now, either resort on the client-side (a one- or zero-liner), or specify json.nl=arrarr (which affects the whole python response structure... probably not recommended). There is some past discussion on the list if you search the archives. -Mike