Re: Collection Distirbution in windows

2007-05-03 Thread Maarten . De . Vilder
damn, there goes the platform independance ...

is there anybody with a lillte more experience when it comes to collection 
distribution on Windows ?

tnx in advance !





Bill Au [EMAIL PROTECTED] 
02/05/2007 15:09
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
Re: Collection Distirbution in windows






The collection distribution scripts relies on hard links and rsync.  It
seems that both maybe avaialble on Windows

hard links:
http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/fsutil_hardlink.mspx?mfr=true


rsync:
http://samba.anu.edu.au/rsync/download.html

I say maybe because I don't know if hard link on windows work the same way
as hard link on Linux/Unix.

You will also need something like cygwin to run the bash scripts.

Bill

On 5/2/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

 i know this is a stupid question, but are there any collection
 distribution scripts for windows available ?

 thanks !



Re: Index corruptions?

2007-05-03 Thread Bill Au

In additional to snapshot, you can also make backup copies of your Solr
index using the backup script.
Backup are created the same way as snapshots using hard links.  Each one is
a viable full index.

Bill

On 5/3/07, Charlie Jackson [EMAIL PROTECTED] wrote:


I have a couple of questions regarding index corruptions.



1) Has anyone using Solr in a production environment ever experienced an
index corruption? If so, how frequently do they occur?



2) It seems like the CollectionDistribution setup would be a good way to
put in place a recovery plan for (or at least have some viable backups
of) the index. However, I have a small concern that if the index gets
corrupted on the master server, the corruption would propagate down to
the slave servers as well. Is this concern unfounded? Also, each of the
snapshots taken by snapshooter are viable full indexes, correct? If so,
that means I'd have a backup of the index each and every time a commit
(or optimize for that matter) is done, which would be awesome.



One of our biggest requirements for the indexing process is to have a
good backup/recover strategy in place and I want to make sure Solr will
be able to provide that.



Thanks in advance!



Charlie




Re: Searchproblem composite words

2007-05-03 Thread Walter Underwood
A agree that multi-word synonyms are an excellent way to do this.

This may sound like a hack, but you'd end up doing this even if
you had dedicated linguistic compound decomposition software.
Those usually use a dictionary of common words and the dictionary
rarely has all the words that are important for your site.

I'll be doing this for my site to handle things like dreamgirls
and dream girls.

wunder

On 5/2/07 11:58 AM, Chris Hostetter [EMAIL PROTECTED] wrote:

 
 : For example I have the composite word wishlist in my document. I can
 : easily find the document by using the search string wishlist or wish*
 : but I don't get any result with list.
 
 what you are describing is basically a substring search problem ...
 sometimes this can be dealt with by using something like the
 WordDeliminterFilter -- but only if people are using WishList in their
 documents.
 
 Another approach would be to use and NGram based tokenizer (built in
 support for this will probably be added soon) but then searches for things
 like able will match words like cable ... which may not be what you
 want (yes it is a substring, but it is not what anyone would consider a
 composite word
 
 the best way to match what you want extremely acurately would be to use
 the SynonymFilter and enumerate every composite word you care about in the
 Synonym list ... tedious yes, but also very accurate.
 
 -Hoss




Re[2]: Snippet Generation at Punctuation Marks?

2007-05-03 Thread Jack L
Thanks. Looking forward to it!

 We are working on this and hope to have a solr patch soon. Doing  
 simple splitting on punctuation is a new fragmenter, which trunk solr
 does not support yet. But we're hoping to fix that asap.

 -brian



Re: Sorting in Solr

2007-05-03 Thread Chris Hostetter

: Just to be clear, I have multiple fields per document that Are coming
: back in the queried XML. Let's say it's name, id, date, description.  I
: want to sort dynamically on fields but for my test case on Description.
: Are you suggesting that there be one field defined per document, or you
: can only sort on one field per request?  I'm not sure I understand this
: explanation.

if you want to sort on a field called description then there must be at
most one indexed term per document for that field.  if you also ant to
sort on a field called date there must also be at most one indexed value
for that field per document.  for numeric or date type fields, ensuring
that there is only one index value per document is a simple value of
making sure the field is defined as multiValue=false in your schema,
but for textish fields it's not as simple ... you may send only one
field../field per doc for that field name but if you are using a
non trivial analyzer you'll wind up with more then one indexed term.

so you define name and description to be whatever you type you want with
whatever analyzer you want, and then you use copyField to create a second
version of each called nameSort and descriptionSort which use the StrField
filedtype ... now you can sort on either of those, or both at the same
time (ie: nameSort asc, descriptionSort desc)

: Chris Hostetter wrote:
:  : having issues with it.  Some fields work some do not and my results seem
:  : to suggest that it doesn't work when there are any non-alphaNumeric
:  : values in the fields.  Can someone out there either confirm this or let
:  : me know what I may be doing wrong?  is it a matter of using a different
:  : analyzer setting or filter factory than the default setting for text.
: 
:  Sorting requires that there be a single Term/Token per doc ... most
:  Analyzers do not have this behavior, so you need to use copyField to
:  create a String version of the field that you use for sorting.
: 
:  the example schema in the trunk shows this using the name and nameSort
:  fields ... in the 1.1 release there is a comment about the manu_exact
:  field.
: 
:  I've added this as a FAQ.
: 
: 
: 
:  -Hoss
: 
: 
: 
:



-Hoss



Re: Wondering about results from PhraseQuer

2007-05-03 Thread Chris Hostetter
: the scenario, understand this that user runs a search for title which has
: pretty common terms such as how do I update {all of the words appears
: 1000s of times in indexes } and they want to search prison the last term
: appears not more than 1 or 2 times across the indexes. Now I have the
: problem, if I try to run phrase query on this I get zero results and if I

if the word rpison doesn't appear anywhere near the words how do i
then a phrase search on how do i prison isn't going to find any
documents.  perhaps you should search on...

+how do i +prison

..which will only return docs that match the phrase how do i and also
contain the word prison.

: 0.0 = fieldWeight(subject_t:how do i prison in 9268), product of:
:   0.0 = tf(phraseFreq=0.0)
:   18.508762 = idf(subject_t: how=2225 do=3359 i=4918 prison=4)
:   0.5 = fieldNorm(field=subject_t, doc=9268)

this would be my point before ... that phrase does not appear in the
document (hence the tf is zero)



-Hoss



solr.py - set boosts?

2007-05-03 Thread Jack L
I've been using solr.py to post and search. It works well.
Is it possible to specify doc boost and field boost with it?

Jack

Erik There is a solr.py in the Solr clients directory:
Erik http://svn.apache.org/repos/asf/lucene/solr/trunk/client/python/solr.py
Erik It's got some utility methods for generating field's.

Mike It is not documented very well, but you can pass in a multi-map to the
Mike solr.py client:

Mike .add(field_one=['one', 'two', 'three'], field_two='value', ...)




Re: facet.sort does not work in python output

2007-05-03 Thread Jack L
The Python output uses nested dictionaries for facet counts.
I read it online that Python dictionaries do not preserve order.
So when a string is eval()'d, the sorted order is lost in the
generated Python object. Is it a good idea to use list to wrap
around the dictionary? This is only needed for the fields, sorted
by counts.

-- 
Best regards,
Jack

Wednesday, May 2, 2007, 6:09:50 PM, you wrote:


 When facet.sort is used, the facet fields are sorted by the count
 in the reply string when using python output. However, after calling
 eval(), the sort order seems to be lost. Not sure if anyone has come
 up with a way to avoid this problem.

 Using the JSON output with a JSON parser for Python should work but
 I haven't tested it yet.




Re: facet.sort does not work in python output

2007-05-03 Thread Mike Klaas

On 5/3/07, Jack L [EMAIL PROTECTED] wrote:

The Python output uses nested dictionaries for facet counts.
I read it online that Python dictionaries do not preserve order.
So when a string is eval()'d, the sorted order is lost in the
generated Python object. Is it a good idea to use list to wrap
around the dictionary? This is only needed for the fields, sorted
by counts.


This might be fixed in the future, but for now, either resort on the
client-side (a one- or zero-liner), or specify json.nl=arrarr (which
affects the whole python response structure... probably not
recommended).

There is some past discussion on the list if you search the archives.

-Mike