Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-12 Thread Erick Erickson
DISCLAIMER: This is from a Lucene-centric viewpoint. That said, this may be
useful

For your line number, page number etc perspective, it is possible to index
special guaranteed-to-not-match tokens then use the termdocs/termenum
data, along with SpanQueries to figure this out at search time. For
instance,
coincident with the last term in each line, index the token $.
Coincident
with the last token of every paragraph index the token #. If you get
the
offsets of the matching terms, you can quite quickly simply count the number
of line and paragraph tokens using TermDocs/TermEnums and correlate hits
to lines and paragraphs. The trick is to index your special tokens with an
increment of 0 (see SynonymAnalyzer in Lucene In Action for more on this).


Another possibility is to add a special field with each document with the
offsets
of each end-of-sentence and end-of-paragraph offsets (stored, not indexed).
Again, given the offsets,  you can read in this field and figure out what
line/
paragraph your hits are in.

How suitable either of these is depends on a lot of characteristics of your
particular problem space. I'm not sure either of them is suitable for very
high
volume applications.

Also, I'm approaching this from an in-the-guts-of-lucene perspective, so
don't
even *think* of asking me how to really make this work in SOLR G.

Best
Erick

On Nov 11, 2007 12:44 AM, David Neubert [EMAIL PROTECTED] wrote:

 Ryan (and others who need something to put them so sleep :) )

 Wow -- the light-bulb finally went off -- the Analzyer admin page is very
 cool -- I just was not at all thinking the SOLR/Lucene way.

 I need to rethink my whole approach now that I understand (from reviewing
 the schema.xml closer and playing with the Analyser) how compatible index
 and query policies can be applied automatically on a field by field basis by
 SOLR at both index and query time.

 I still may have a stumper here, but I need to give it some thought, and
 may return again with another question:

 The problem is that my text is book text (fairly large) that ooks very
 much like one would expect:
 book
 chapter
 parasen.../sensen/sen/para
 parasen.../sensen/sen/para
 parasen.../sensen.../sen/para
 /chapter
 /book

 The search results need to return exact sentences or paragraphs with their
 exact page:line numbers (which is available in the embedded markup in the
 text).

 There were previous responses by others, suggesting I look into payloads,
 but I did not fully understand that -- I may have to re-read those e-mails
 now that I am getting a clearer picture of SOLR/Lucene.

 However, the reason I resorted to indexing each paragraph as a single
 document, and then redundantly indexing each sentence as a single document,
 is because I was planning on pre-parsing the text myself (outside of SOLR)
 -- and feeding separate doc elements to the add because in that way I
 could produce the page:line reference in the pre-parsing (again outside of
 SOLR) and feed it in as explict field in the doc elements of the add
 requests.  Therefore at query time, I will have the exact page:line
 corresponding to the start of the paragraph or sentence.

 But I am beginning to suspect, I was planning to do a lot of work that
 SOLR can do for me.

 I will continue to study this and respond when I am a bit clearer, but the
 closer I could get to just submitting the books a chapter at a time -- and
 letting SOLR do the work, the better (cause I have all the books in well
 formed xml at chapter levels).  However, I don't  see yet how I could get
 par/sen granular search result hits, along with their exact page:line
 coordinates unless I approach it by explicitly indexing the pars and sens as
 single documents, not chapters hits, and also return the entire text of the
 sen or par, and highlight the keywords within (for the search result hit).
  Once a search result hit is selected, it would then act as expected and
 position into the chapter, at the selected reference, highlight again the
 key words, but this time in the context of an entire chapter (the whole
 document to the user's mind).

 Even with my new understanding you (and others) have given me, which I can
 use to certainly improve my approach -- it still seems to me that because
 multi-valued fields concatenate text -- even if you use the
 positionGapIncrment feature to prohibit unwanted phrase matches, how do you
 produce a well definied search result hit, bounded by the exact sen or par,
 unless you index them as single documents?

 Should I still read up on the payload discussion?

 Dave




 - Original Message 
 From: Ryan McKinley [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Saturday, November 10, 2007 5:00:43 PM
 Subject: Re: Redundant indexing * 4 only solution (for par/sen and case
 sensitivity)


 David Neubert wrote:
  Ryan,
 
  Thanks for your response.  I infer from your response that you can
  have a different analyzer for each field

 yes

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-12 Thread David Neubert
Erik,

Probably because of my newness to SOLR/Lucene, I see now what you/Yonik meant 
by case field, but I am not clear about your wording per-book setting 
attached at index time - would you mind ellaborating on that, so I am clear?

Dave

- Original Message 
From: Erik Hatcher [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Sunday, November 11, 2007 5:21:45 AM
Subject: Re: Redundant indexing * 4 only solution (for par/sen and case 
sensitivity)


Solr query syntax is documented here: http://wiki.apache.org/solr/ 
SolrQuerySyntax

What Yonik is referring to is creating your own case field with the  
per-book setting attached at index time.

Erik


On Nov 11, 2007, at 12:55 AM, David Neubert wrote:

 Yonik (or anyone else)

 Do you know where on-line documentation on the +case: syntax is  
 located?  I can't seem to find it.

 Dave

 - Original Message 
 From: Yonik Seeley [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Saturday, November 10, 2007 4:56:40 PM
 Subject: Re: Redundant indexing * 4 only solution (for par/sen and  
 case sensitivity)


 On Nov 10, 2007 4:24 PM, David Neubert [EMAIL PROTECTED] wrote:
 So if I am hitting multiple fields (in the same search request) that
  invoke different Analyzers -- am I at a dead end, and have to  
 result to
  consequetive multiple queries instead

 Solr handles that for you automatically.

 The app that I am replacing (and trying to enhance) has the ability
  to search multiple books at once
 with sen/par and case sensitivity settings individually selectable
  per book

 You could easily select case sensitivity or not *per query* across
 all
  books.
 You should step back and see what the requirements actually are (i.e.
 the reasons why one needs to be able to select case
 sensitive/insensitive on a book level... it doesn't make sense to me
 at first blush).

 It could be done on a per-book level in solr with a more complex
 query
 structure though...

 (+case:sensitive +(normal relevancy query on the case sensitive
 fields
 goes here)) OR (+case:insensitive +(normal relevancy query on the
 case
 insensitive fields goes here))

 -Yonik





 __
 Do You Yahoo!?
 Tired of spam?  Yahoo! Mail has the best spam protection around
 http://mail.yahoo.com






__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-12 Thread David Neubert
Erik - thanks, I am considering this approach, verses explicit redundant 
indexing -- and am also considering Lucene -- problem is, I am one week into 
both technologies (though have years in the search space) -- wish I could go to 
Hong Kong -- any discounts available anywhere :)

Dave

- Original Message 
From: Erick Erickson [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Monday, November 12, 2007 2:11:14 PM
Subject: Re: Redundant indexing * 4 only solution (for par/sen and case 
sensitivity)

DISCLAIMER: This is from a Lucene-centric viewpoint. That said, this
 may be
useful

For your line number, page number etc perspective, it is possible to
 index
special guaranteed-to-not-match tokens then use the termdocs/termenum
data, along with SpanQueries to figure this out at search time. For
instance,
coincident with the last term in each line, index the token $.
Coincident
with the last token of every paragraph index the token #. If you
 get
the
offsets of the matching terms, you can quite quickly simply count the
 number
of line and paragraph tokens using TermDocs/TermEnums and correlate
 hits
to lines and paragraphs. The trick is to index your special tokens with
 an
increment of 0 (see SynonymAnalyzer in Lucene In Action for more on
 this).


Another possibility is to add a special field with each document with
 the
offsets
of each end-of-sentence and end-of-paragraph offsets (stored, not
 indexed).
Again, given the offsets,  you can read in this field and figure out
 what
line/
paragraph your hits are in.

How suitable either of these is depends on a lot of characteristics of
 your
particular problem space. I'm not sure either of them is suitable for
 very
high
volume applications.

Also, I'm approaching this from an in-the-guts-of-lucene perspective,
 so
don't
even *think* of asking me how to really make this work in SOLR G.

Best
Erick

On Nov 11, 2007 12:44 AM, David Neubert [EMAIL PROTECTED] wrote:

 Ryan (and others who need something to put them so sleep :) )

 Wow -- the light-bulb finally went off -- the Analzyer admin page is
 very
 cool -- I just was not at all thinking the SOLR/Lucene way.

 I need to rethink my whole approach now that I understand (from
 reviewing
 the schema.xml closer and playing with the Analyser) how compatible
 index
 and query policies can be applied automatically on a field by field
 basis by
 SOLR at both index and query time.

 I still may have a stumper here, but I need to give it some thought,
 and
 may return again with another question:

 The problem is that my text is book text (fairly large) that ooks
 very
 much like one would expect:
 book
 chapter
 parasen.../sensen/sen/para
 parasen.../sensen/sen/para
 parasen.../sensen.../sen/para
 /chapter
 /book

 The search results need to return exact sentences or paragraphs with
 their
 exact page:line numbers (which is available in the embedded markup in
 the
 text).

 There were previous responses by others, suggesting I look into
 payloads,
 but I did not fully understand that -- I may have to re-read those
 e-mails
 now that I am getting a clearer picture of SOLR/Lucene.

 However, the reason I resorted to indexing each paragraph as a single
 document, and then redundantly indexing each sentence as a single
 document,
 is because I was planning on pre-parsing the text myself (outside of
 SOLR)
 -- and feeding separate doc elements to the add because in that
 way I
 could produce the page:line reference in the pre-parsing (again
 outside of
 SOLR) and feed it in as explict field in the doc elements of the
 add
 requests.  Therefore at query time, I will have the exact page:line
 corresponding to the start of the paragraph or sentence.

 But I am beginning to suspect, I was planning to do a lot of work
 that
 SOLR can do for me.

 I will continue to study this and respond when I am a bit clearer,
 but the
 closer I could get to just submitting the books a chapter at a time
 -- and
 letting SOLR do the work, the better (cause I have all the books in
 well
 formed xml at chapter levels).  However, I don't  see yet how I could
 get
 par/sen granular search result hits, along with their exact page:line
 coordinates unless I approach it by explicitly indexing the pars and
 sens as
 single documents, not chapters hits, and also return the entire text
 of the
 sen or par, and highlight the keywords within (for the search result
 hit).
  Once a search result hit is selected, it would then act as expected
 and
 position into the chapter, at the selected reference, highlight again
 the
 key words, but this time in the context of an entire chapter (the
 whole
 document to the user's mind).

 Even with my new understanding you (and others) have given me, which
 I can
 use to certainly improve my approach -- it still seems to me that
 because
 multi-valued fields concatenate text -- even if you use the
 positionGapIncrment feature to prohibit unwanted phrase matches, how
 do you
 produce

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-12 Thread Yonik Seeley
On Nov 12, 2007 2:20 PM, David Neubert [EMAIL PROTECTED] wrote:
 Erik - thanks, I am considering this approach, verses explicit redundant 
 indexing -- and am also considering Lucene -

There's not a well defined solution in either IMO.

 - problem is, I am one week into both technologies (though have years in the 
 search space) -- wish I could
 go to Hong Kong -- any discounts available anywhere :)

Unfortunately the OS Summit has been canceled.

-Yonik


Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-12 Thread Chris Hostetter

:  - problem is, I am one week into both technologies (though have years in 
the search space) -- wish I could
:  go to Hong Kong -- any discounts available anywhere :)
: 
: Unfortunately the OS Summit has been canceled.

Or rescheduled to 2008 ... depending on wether you are a half-empty / 
half-full kind of person.

And lets not forget atlanta ... starting today and all...

http://us.apachecon.com/us2007/



-Hoss



Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-11 Thread Erik Hatcher
Solr query syntax is documented here: http://wiki.apache.org/solr/ 
SolrQuerySyntax


What Yonik is referring to is creating your own case field with the  
per-book setting attached at index time.


Erik


On Nov 11, 2007, at 12:55 AM, David Neubert wrote:


Yonik (or anyone else)

Do you know where on-line documentation on the +case: syntax is  
located?  I can't seem to find it.


Dave

- Original Message 
From: Yonik Seeley [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Saturday, November 10, 2007 4:56:40 PM
Subject: Re: Redundant indexing * 4 only solution (for par/sen and  
case sensitivity)



On Nov 10, 2007 4:24 PM, David Neubert [EMAIL PROTECTED] wrote:

So if I am hitting multiple fields (in the same search request) that
 invoke different Analyzers -- am I at a dead end, and have to  
result to

 consequetive multiple queries instead

Solr handles that for you automatically.


The app that I am replacing (and trying to enhance) has the ability

 to search multiple books at once

with sen/par and case sensitivity settings individually selectable

 per book

You could easily select case sensitivity or not *per query* across all
 books.
You should step back and see what the requirements actually are (i.e.
the reasons why one needs to be able to select case
sensitive/insensitive on a book level... it doesn't make sense to me
at first blush).

It could be done on a per-book level in solr with a more complex query
structure though...

(+case:sensitive +(normal relevancy query on the case sensitive fields
goes here)) OR (+case:insensitive +(normal relevancy query on the case
insensitive fields goes here))

-Yonik





__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com




Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-10 Thread David Neubert
Hi all,

Using SOLR, I believe I have to index the same content 4 times (not desirable) 
into 2 indexes -- and I don't know how you can practically do multiple indexes 
in SOLR (if indeed there is no better solution than 4 indexing runs into two 
indexes?

My need is case-sensitive and case insensitive searches over well formed XML 
content (books), performing exact searches at the paragraph and sentence levels 
-- no errors over approximate boundaries -- the source content has exact 
par/sen tags.

I have already proven a pretty nice solution for par/sen indexing twice into 
the same index in SOLR.  I have added a tags field, and put correlative XML 
tags (comma delimited) into this field (one of which is either a para or sen 
flag) which flags the document (partial) as a paragraph or sentence.  Thus all 
paragraphs of the book are indexed as single document (with its sentences 
combined and concatenated) and then all sentences in the book are indexed again 
as single documents.  Both go into the same SOLR index. I just add an AND 
tags:para or tags:sen to my search and everything works fine.

The obvious downside to this approach is the 2X indexing, but it does execute 
quite nicely on a single Index using SOLR. This obviously doesn't scale nicely, 
but will do for quite a while probably.

I thought I could live with that

But then I moved on to case sensitive and case-insensitive searches, and my 
research so far is pointing to one index for each case.

So now I have:
(1) 4X in content indexing
(2) 2X in actual SOLR/Lucene indices
(3) I don't know how to practically due multiple indices using SOLR?

If there is a better way of attacking this problem, I would appreciate 
recommendations!!!

Also, I don't know how to do multiple indices in SOLR -- I have heard it might 
be available in 1.3.0.?  If this is my only recourse, please advise me where 
really good documentation is available on building 1.3.0.  I am not admin 
savvy, but I did succeed in getting SOLR up myself and navigation through it 
with the help of this forum.  But I have that building 1.3.0 (as opposed to 
downloading and installing it, like in 1.2.0) is a whole different experience 
and much more complex.

Thanks

Dave





__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-10 Thread David Neubert
Ryan,

Thanks for your response.  I infer from your response that you can have a 
different analyzer for each field -- I guess I should have figured that out 
--but because I had not thought of that, I concluded that  I needed multiple 
indices (sorry , I am still very new to Solr/Lucene).  

Does such an approach make querying difficult under the following condition: ?

The app that I am replacing (and trying to enhance) has the ability to search 
multiple books at once with sen/par and case sensitivity settings individually 
selectable per book (e.g. default search modes per book).  So with a single 
query request (just the query word(s)), you can search one book by par, with 
case, another by sen w/o case, etc. -- all settable as user defaults.  I need 
to try to figure out how to match that in Solr/Lucene -- I believe that the 
Analyzer approach you suggested requires the use of the same Analzyer at query 
time that was used during indexing.   So if I am hitting multiple fields (in 
the same search request) that invoke different Analyzers -- am I at a dead end, 
and have to result to consequetive multiple queries instead (and sort merge 
results afterwards?)  Or am I just over complicating this?

Dave

- Original Message 
From: Ryan McKinley [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Saturday, November 10, 2007 2:18:00 PM
Subject: Re: Redundant indexing * 4 only solution (for par/sen and case 
sensitivity)



 So now I have:
 (1) 4X in content indexing
 (2) 2X in actual SOLR/Lucene indices
 (3) I don't know how to practically due multiple indices using SOLR?
 
 If there is a better way of attacking this problem, I would
 appreciate recommendations!!!
 

I don't quite follow your current approach, but it sounds like you just
 
needs some copyFields to index the same content with multiple
 analyzers.

for example, say you have fields:

  field name=content type=string indexed=true stored=true/
  field name=content_sentence type=sentence indexed=true 
stored=false/
  field name=content_paragraph type=paragraph indexed=true 
stored=false/
  field name=content_text type=text indexed=true
 stored=false/

and copy fields:

   copyField source=content dest=content_sentence/
   copyField source=content dest=content_paragraph/
   copyField source=content dest=content_text/


The 4X indexing cost?  If you *need* to index the content 4 different 
ways, you don't have any way around that - do you?  But is it really a 
big deal?  How often does it need to index?  How big is the data?

I'm not quite following your need for multiple solr indicies, but in
 1.3 
it is possible.

ryan





__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-10 Thread Yonik Seeley
On Nov 10, 2007 4:24 PM, David Neubert [EMAIL PROTECTED] wrote:
 So if I am hitting multiple fields (in the same search request) that invoke 
 different Analyzers -- am I at a dead end, and have to result to consequetive 
 multiple queries instead

Solr handles that for you automatically.

 The app that I am replacing (and trying to enhance) has the ability to search 
 multiple books at once
 with sen/par and case sensitivity settings individually selectable per book

You could easily select case sensitivity or not *per query* across all books.
You should step back and see what the requirements actually are (i.e.
the reasons why one needs to be able to select case
sensitive/insensitive on a book level... it doesn't make sense to me
at first blush).

It could be done on a per-book level in solr with a more complex query
structure though...

(+case:sensitive +(normal relevancy query on the case sensitive fields
goes here)) OR (+case:insensitive +(normal relevancy query on the case
insensitive fields goes here))

-Yonik


Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-10 Thread Ryan McKinley

David Neubert wrote:

Ryan,

Thanks for your response.  I infer from your response that you can have a 
different analyzer for each field


yes!  each field can have its own indexing strategy.


I believe that the Analyzer approach you suggested requires the use 
of the same Analzyer at query time that was used during indexing.  


it does not require the *same* Analyzer - it just requires one that 
generates compatiable tokens.  That is, you may want the indexing to 
split the input into sentences, but the query time analyzer keeps the 
input as a single token.


check the example schema.xml file -- the 'text' field type applies 
synonyms at index time, but does at query time.


re searching acrross multiple fields, don't worry, lucene handles this 
well.  You may want to do that explicitly or with the dismax handler.


I'd suggest you play around with indexing some data.  check the 
analysis.jsp in the admin section.  It is a great tool to help figure 
out what analyzers do at index vs query time.


ryan



Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-10 Thread David Neubert
Ryan (and others who need something to put them so sleep :) )

Wow -- the light-bulb finally went off -- the Analzyer admin page is very cool 
-- I just was not at all thinking the SOLR/Lucene way.

I need to rethink my whole approach now that I understand (from reviewing the 
schema.xml closer and playing with the Analyser) how compatible index and query 
policies can be applied automatically on a field by field basis by SOLR at both 
index and query time.

I still may have a stumper here, but I need to give it some thought, and may 
return again with another question:

The problem is that my text is book text (fairly large) that ooks very much 
like one would expect:
book
chapter
parasen.../sensen/sen/para
parasen.../sensen/sen/para
parasen.../sensen.../sen/para
/chapter
/book

The search results need to return exact sentences or paragraphs with their 
exact page:line numbers (which is available in the embedded markup in the text).

There were previous responses by others, suggesting I look into payloads, but I 
did not fully understand that -- I may have to re-read those e-mails now that I 
am getting a clearer picture of SOLR/Lucene.

However, the reason I resorted to indexing each paragraph as a single document, 
and then redundantly indexing each sentence as a single document, is because I 
was planning on pre-parsing the text myself (outside of SOLR) -- and feeding 
separate doc elements to the add because in that way I could produce the 
page:line reference in the pre-parsing (again outside of SOLR) and feed it in 
as explict field in the doc elements of the add requests.  Therefore at 
query time, I will have the exact page:line corresponding to the start of the 
paragraph or sentence.

But I am beginning to suspect, I was planning to do a lot of work that SOLR can 
do for me.

I will continue to study this and respond when I am a bit clearer, but the 
closer I could get to just submitting the books a chapter at a time -- and 
letting SOLR do the work, the better (cause I have all the books in well formed 
xml at chapter levels).  However, I don't  see yet how I could get par/sen 
granular search result hits, along with their exact page:line coordinates 
unless I approach it by explicitly indexing the pars and sens as single 
documents, not chapters hits, and also return the entire text of the sen or 
par, and highlight the keywords within (for the search result hit).  Once a 
search result hit is selected, it would then act as expected and position into 
the chapter, at the selected reference, highlight again the key words, but this 
time in the context of an entire chapter (the whole document to the user's 
mind).

Even with my new understanding you (and others) have given me, which I can use 
to certainly improve my approach -- it still seems to me that because 
multi-valued fields concatenate text -- even if you use the positionGapIncrment 
feature to prohibit unwanted phrase matches, how do you produce a well definied 
search result hit, bounded by the exact sen or par, unless you index them as 
single documents?

Should I still read up on the payload discussion?

Dave




- Original Message 
From: Ryan McKinley [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Saturday, November 10, 2007 5:00:43 PM
Subject: Re: Redundant indexing * 4 only solution (for par/sen and case 
sensitivity)


David Neubert wrote:
 Ryan,
 
 Thanks for your response.  I infer from your response that you can
 have a different analyzer for each field

yes!  each field can have its own indexing strategy.


 I believe that the Analyzer approach you suggested requires the use 
 of the same Analzyer at query time that was used during indexing.  

it does not require the *same* Analyzer - it just requires one that 
generates compatiable tokens.  That is, you may want the indexing to 
split the input into sentences, but the query time analyzer keeps the 
input as a single token.

check the example schema.xml file -- the 'text' field type applies 
synonyms at index time, but does at query time.

re searching acrross multiple fields, don't worry, lucene handles this 
well.  You may want to do that explicitly or with the dismax handler.

I'd suggest you play around with indexing some data.  check the 
analysis.jsp in the admin section.  It is a great tool to help figure 
out what analyzers do at index vs query time.

ryan






__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-10 Thread David Neubert
Yonik (or anyone else)

Do you know where on-line documentation on the +case: syntax is located?  I 
can't seem to find it.

Dave

- Original Message 
From: Yonik Seeley [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Saturday, November 10, 2007 4:56:40 PM
Subject: Re: Redundant indexing * 4 only solution (for par/sen and case 
sensitivity)


On Nov 10, 2007 4:24 PM, David Neubert [EMAIL PROTECTED] wrote:
 So if I am hitting multiple fields (in the same search request) that
 invoke different Analyzers -- am I at a dead end, and have to result to
 consequetive multiple queries instead

Solr handles that for you automatically.

 The app that I am replacing (and trying to enhance) has the ability
 to search multiple books at once
 with sen/par and case sensitivity settings individually selectable
 per book

You could easily select case sensitivity or not *per query* across all
 books.
You should step back and see what the requirements actually are (i.e.
the reasons why one needs to be able to select case
sensitive/insensitive on a book level... it doesn't make sense to me
at first blush).

It could be done on a per-book level in solr with a more complex query
structure though...

(+case:sensitive +(normal relevancy query on the case sensitive fields
goes here)) OR (+case:insensitive +(normal relevancy query on the case
insensitive fields goes here))

-Yonik





__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com