Re: Delte all docs in a SOLR index?

2007-11-10 Thread David Neubert
Thanks -- I didn't know that deleting the Index (offline) was safe and complete 
-- thanks.

- Original Message 
From: "Norskog, Lance" <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Friday, November 9, 2007 6:42:21 PM
Subject: RE: Delte all docs in a SOLR index?


A safer way is to stop Solr and remove the index directory. There is
less chance of corruption, and it will faster. 

-Original Message-
From: David Neubert [mailto:[EMAIL PROTECTED] 
Sent: Friday, November 09, 2007 10:56 AM
To: solr-user@lucene.apache.org
Subject: Re: Delte all docs in a SOLR index?

Thanks!

- Original Message 
From: Chris Hostetter <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Friday, November 9, 2007 1:51:03 PM
Subject: Re: Delte all docs in a SOLR index?



: Sorry for another basic question -- but what is the best safe way to
: delete all docs in a SOLR index.

I thought this was a FAQ, but it's hidden in another question
(rebuilding if schema changes)  i'll pull it out into a top level
question...

*:*

: I am in my first few days using SOLR and Lucene, am iterating the
schema
: often, starting and stoping with test docs, etc.  I like to know a
very
: quick way to clean out the index and start over repeatedly -- can't
seem
: to find it on the wiki -- maybe its Friday :)

Huh .. that's actually the FAQ that does talk about deleting all docs
 :)

"How can I rebuild my index from scratch if I change my schema?"

http://wiki.apache.org/solr/FAQ#head-9aafb5d8dff5308e8ea4fcf4b71f19f029c
4bb99



-Hoss






__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 





__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: Delte all docs in a SOLR index?

2007-11-10 Thread David Neubert
I guess I better look into trunk -- not familiar with it yet.

- Original Message 
From: Mike Klaas <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Friday, November 9, 2007 6:49:40 PM
Subject: Re: Delte all docs in a SOLR index?


On 9-Nov-07, at 3:42 PM, Norskog, Lance wrote:

> A safer way is to stop Solr and remove the index directory. There is
> less chance of corruption, and it will faster.

In trunk, it should be quicker and safer than stopping/restarting.

Also, to clarify the 'corruption' issue, this should only be possible  
in the event of cold process termination (like power loss).

-Mike

> -Original Message-
> From: David Neubert [mailto:[EMAIL PROTECTED]
> Sent: Friday, November 09, 2007 10:56 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Delte all docs in a SOLR index?
>
> Thanks!
>
> - Original Message 
> From: Chris Hostetter <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Friday, November 9, 2007 1:51:03 PM
> Subject: Re: Delte all docs in a SOLR index?
>
>
>
> : Sorry for another basic question -- but what is the best safe way
 to
> : delete all docs in a SOLR index.
>
> I thought this was a FAQ, but it's hidden in another question
> (rebuilding if schema changes)  i'll pull it out into a top level
> question...
>
> *:*
>
> : I am in my first few days using SOLR and Lucene, am iterating the
> schema
> : often, starting and stoping with test docs, etc.  I like to know a
> very
> : quick way to clean out the index and start over repeatedly -- can't
> seem
> : to find it on the wiki -- maybe its Friday :)
>
> Huh .. that's actually the FAQ that does talk about deleting all docs
>  :)
>
> "How can I rebuild my index from scratch if I change my schema?"
>
> http://wiki.apache.org/solr/ 
> FAQ#head-9aafb5d8dff5308e8ea4fcf4b71f19f029c
> 4bb99
>
>
>
> -Hoss
>
>
>
>
>
>
> __
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com






__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: where to hook in to SOLR to read field-label from functionquery

2007-11-10 Thread Britske



hossman wrote:
> 
> 
> : Say I have a custom functionquery MinFloatFunction which takes as its
> : arguments an array of valuesources. 
> : 
> : MinFloatFunction(ValueSource[] sources)
> : 
> : In my case all these valuesources are the values of a collection of
> fields.
> 
> a ValueSource isn't required to be field specifc (it may already be the 
> mathematical combination of other multiple fields) so there is no generic 
> way to get the "field name" form a ValueSource ... but you could define 
> your MinFloatFunction only accept FieldCacheSource[] as input ... hmmm, 
> ecept that FieldCacheSource doesn't expose the field name.  so instead you 
> write...
> 
>   public class MyFieldCacheSource extends FieldCacheSource {
> public MyFieldCacheSource(String field) {
>   super(field);
> }
> public String getField() {
>   return field;
> }
>   }
>   public class MinFloatFunction ... {
> public MinFloatFunction(MyFieldCacheSource[] values);
>   }
> 
Thanks for this. I'm goign to look into this a little further. 


hossman wrote:
> 
> 
> : For this I designed a schema in which each 'row' in the index represents
> a
> : product (indepdent of variants) (which takes care of the 1 variant max)
> and
> : every variant is represented as 2 fields in this row:
> : 
> : variant_p_* <-- represents price (stored / indexed)
> : variant_source_*  <-- represents the other fields dependent on
> the
> : variant (stored / multivalued)
> 
> Note: if you have a lot of varients you may wind up with the same problem 
> as described here...
> 
> http://www.nabble.com/sorting-on-dynamic-fields---good%2C-bad%2C-neither--tf4694098.html
> 
> ...because of the underlying FieldCache usage in FieldCacheValueSource
> 
> 
> -Hoss
> 
> 
> 

Hmmm. thanks for pointing me to that one ( i guess ;-) I totally
underestimated the memory-requirements of the underlying Lucene Field-cache
implementation. 
Having the option to sort on about 10.000 variantfields with about 400.000
docs will consume about 16 GB max. Definitly not doable in my situation. A
LRU-implementation of the lucene field-cache would help big time in this
situation to at least not get OOM-errors.  Perhaps , you know of any
existing implementations? 

Thanks a lot, 
Geert-Jan
-- 
View this message in context: 
http://www.nabble.com/where-to-hook-in-to-SOLR-to-read-field-label-from-functionquery-tf4751109.html#a13682698
Sent from the Solr - User mailing list archive at Nabble.com.



Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-10 Thread David Neubert
Hi all,

Using SOLR, I believe I have to index the same content 4 times (not desirable) 
into 2 indexes -- and I don't know how you can practically do multiple indexes 
in SOLR (if indeed there is no better solution than 4 indexing runs into two 
indexes?

My need is case-sensitive and case insensitive searches over well formed XML 
content (books), performing exact searches at the paragraph and sentence levels 
-- no errors over approximate boundaries -- the source content has exact 
par/sen tags.

I have already proven a pretty nice solution for par/sen indexing twice into 
the same index in SOLR.  I have added a tags field, and put correlative XML 
tags (comma delimited) into this field (one of which is either a para or sen 
flag) which flags the document (partial) as a paragraph or sentence.  Thus all 
paragraphs of the book are indexed as single document (with its sentences 
combined and concatenated) and then all sentences in the book are indexed again 
as single documents.  Both go into the same SOLR index. I just add an AND 
"tags:para" or "tags:sen" to my search and everything works fine.

The obvious downside to this approach is the 2X indexing, but it does execute 
quite nicely on a single Index using SOLR. This obviously doesn't scale nicely, 
but will do for quite a while probably.

I thought I could live with that

But then I moved on to case sensitive and case-insensitive searches, and my 
research so far is pointing to one index for each case.

So now I have:
(1) 4X in content indexing
(2) 2X in actual SOLR/Lucene indices
(3) I don't know how to practically due multiple indices using SOLR?

If there is a better way of attacking this problem, I would appreciate 
recommendations!!!

Also, I don't know how to do multiple indices in SOLR -- I have heard it might 
be available in 1.3.0.?  If this is my only recourse, please advise me where 
really good documentation is available on building 1.3.0.  I am not admin 
savvy, but I did succeed in getting SOLR up myself and navigation through it 
with the help of this forum.  But I have that building 1.3.0 (as opposed to 
downloading and installing it, like in 1.2.0) is a whole different experience 
and much more complex.

Thanks

Dave





__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-10 Thread Ryan McKinley



So now I have:
(1) 4X in content indexing
(2) 2X in actual SOLR/Lucene indices
(3) I don't know how to practically due multiple indices using SOLR?

If there is a better way of attacking this problem, I would appreciate 
recommendations!!!



I don't quite follow your current approach, but it sounds like you just 
needs some copyFields to index the same content with multiple analyzers.


for example, say you have fields:

 
 stored="false"/>
 stored="false"/>

 

and copy fields:

  
  
  


The 4X indexing cost?  If you *need* to index the content 4 different 
ways, you don't have any way around that - do you?  But is it really a 
big deal?  How often does it need to index?  How big is the data?


I'm not quite following your need for multiple solr indicies, but in 1.3 
it is possible.


ryan


Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-10 Thread David Neubert
Ryan,

Thanks for your response.  I infer from your response that you can have a 
different analyzer for each field -- I guess I should have figured that out 
--but because I had not thought of that, I concluded that  I needed multiple 
indices (sorry , I am still very new to Solr/Lucene).  

Does such an approach make querying difficult under the following condition: ?

The app that I am replacing (and trying to enhance) has the ability to search 
multiple books at once with sen/par and case sensitivity settings individually 
selectable per book (e.g. default search modes per book).  So with a single 
query request (just the query word(s)), you can search one book by par, with 
case, another by sen w/o case, etc. -- all settable as user defaults.  I need 
to try to figure out how to match that in Solr/Lucene -- I believe that the 
Analyzer approach you suggested requires the use of the same Analzyer at query 
time that was used during indexing.   So if I am hitting multiple fields (in 
the same search request) that invoke different Analyzers -- am I at a dead end, 
and have to result to consequetive multiple queries instead (and sort merge 
results afterwards?)  Or am I just over complicating this?

Dave

- Original Message 
From: Ryan McKinley <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Saturday, November 10, 2007 2:18:00 PM
Subject: Re: Redundant indexing * 4 only solution (for par/sen and case 
sensitivity)



> So now I have:
> (1) 4X in content indexing
> (2) 2X in actual SOLR/Lucene indices
> (3) I don't know how to practically due multiple indices using SOLR?
> 
> If there is a better way of attacking this problem, I would
 appreciate recommendations!!!
> 

I don't quite follow your current approach, but it sounds like you just
 
needs some copyFields to index the same content with multiple
 analyzers.

for example, say you have fields:

  
  
  
  

and copy fields:

   
   
   


The 4X indexing cost?  If you *need* to index the content 4 different 
ways, you don't have any way around that - do you?  But is it really a 
big deal?  How often does it need to index?  How big is the data?

I'm not quite following your need for multiple solr indicies, but in
 1.3 
it is possible.

ryan





__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-10 Thread Yonik Seeley
On Nov 10, 2007 4:24 PM, David Neubert <[EMAIL PROTECTED]> wrote:
> So if I am hitting multiple fields (in the same search request) that invoke 
> different Analyzers -- am I at a dead end, and have to result to consequetive 
> multiple queries instead

Solr handles that for you automatically.

> The app that I am replacing (and trying to enhance) has the ability to search 
> multiple books at once
> with sen/par and case sensitivity settings individually selectable per book

You could easily select case sensitivity or not *per query* across all books.
You should step back and see what the requirements actually are (i.e.
the reasons why one needs to be able to select case
sensitive/insensitive on a book level... it doesn't make sense to me
at first blush).

It could be done on a per-book level in solr with a more complex query
structure though...

(+case:sensitive +(normal relevancy query on the case sensitive fields
goes here)) OR (+case:insensitive +(normal relevancy query on the case
insensitive fields goes here))

-Yonik


Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-10 Thread Ryan McKinley

David Neubert wrote:

Ryan,

Thanks for your response.  I infer from your response that you can have a 
different analyzer for each field


yes!  each field can have its own indexing strategy.


I believe that the Analyzer approach you suggested requires the use 
of the same Analzyer at query time that was used during indexing.  


it does not require the *same* Analyzer - it just requires one that 
generates compatiable tokens.  That is, you may want the indexing to 
split the input into sentences, but the query time analyzer keeps the 
input as a single token.


check the example schema.xml file -- the 'text' field type applies 
synonyms at index time, but does at query time.


re searching acrross multiple fields, don't worry, lucene handles this 
well.  You may want to do that explicitly or with the dismax handler.


I'd suggest you play around with indexing some data.  check the 
analysis.jsp in the admin section.  It is a great tool to help figure 
out what analyzers do at index vs query time.


ryan



Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-10 Thread David Neubert
Ryan (and others who need something to put them so sleep :) )

Wow -- the light-bulb finally went off -- the Analzyer admin page is very cool 
-- I just was not at all thinking the SOLR/Lucene way.

I need to rethink my whole approach now that I understand (from reviewing the 
schema.xml closer and playing with the Analyser) how compatible index and query 
policies can be applied automatically on a field by field basis by SOLR at both 
index and query time.

I still may have a stumper here, but I need to give it some thought, and may 
return again with another question:

The problem is that my text is book text (fairly large) that ooks very much 
like one would expect:


...
...
..

 elements to the  because in that way I could produce the 
page:line reference in the pre-parsing (again outside of SOLR) and feed it in 
as explict field in the  elements of the  requests.  Therefore at 
query time, I will have the exact page:line corresponding to the start of the 
paragraph or sentence.

But I am beginning to suspect, I was planning to do a lot of work that SOLR can 
do for me.

I will continue to study this and respond when I am a bit clearer, but the 
closer I could get to just submitting the books a chapter at a time -- and 
letting SOLR do the work, the better (cause I have all the books in well formed 
xml at chapter levels).  However, I don't  see yet how I could get par/sen 
granular search result hits, along with their exact page:line coordinates 
unless I approach it by explicitly indexing the pars and sens as single 
documents, not chapters hits, and also return the entire text of the sen or 
par, and highlight the keywords within (for the search result hit).  Once a 
search result hit is selected, it would then act as expected and position into 
the chapter, at the selected reference, highlight again the key words, but this 
time in the context of an entire chapter (the whole document to the user's 
mind).

Even with my new understanding you (and others) have given me, which I can use 
to certainly improve my approach -- it still seems to me that because 
multi-valued fields concatenate text -- even if you use the positionGapIncrment 
feature to prohibit unwanted phrase matches, how do you produce a well definied 
search result hit, bounded by the exact sen or par, unless you index them as 
single documents?

Should I still read up on the payload discussion?

Dave




- Original Message 
From: Ryan McKinley <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Saturday, November 10, 2007 5:00:43 PM
Subject: Re: Redundant indexing * 4 only solution (for par/sen and case 
sensitivity)


David Neubert wrote:
> Ryan,
> 
> Thanks for your response.  I infer from your response that you can
 have a different analyzer for each field

yes!  each field can have its own indexing strategy.


> I believe that the Analyzer approach you suggested requires the use 
> of the same Analzyer at query time that was used during indexing.  

it does not require the *same* Analyzer - it just requires one that 
generates compatiable tokens.  That is, you may want the indexing to 
split the input into sentences, but the query time analyzer keeps the 
input as a single token.

check the example schema.xml file -- the 'text' field type applies 
synonyms at index time, but does at query time.

re searching acrross multiple fields, don't worry, lucene handles this 
well.  You may want to do that explicitly or with the dismax handler.

I'd suggest you play around with indexing some data.  check the 
analysis.jsp in the admin section.  It is a great tool to help figure 
out what analyzers do at index vs query time.

ryan






__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-10 Thread David Neubert
Yonik (or anyone else)

Do you know where on-line documentation on the +case: syntax is located?  I 
can't seem to find it.

Dave

- Original Message 
From: Yonik Seeley <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Saturday, November 10, 2007 4:56:40 PM
Subject: Re: Redundant indexing * 4 only solution (for par/sen and case 
sensitivity)


On Nov 10, 2007 4:24 PM, David Neubert <[EMAIL PROTECTED]> wrote:
> So if I am hitting multiple fields (in the same search request) that
 invoke different Analyzers -- am I at a dead end, and have to result to
 consequetive multiple queries instead

Solr handles that for you automatically.

> The app that I am replacing (and trying to enhance) has the ability
 to search multiple books at once
> with sen/par and case sensitivity settings individually selectable
 per book

You could easily select case sensitivity or not *per query* across all
 books.
You should step back and see what the requirements actually are (i.e.
the reasons why one needs to be able to select case
sensitive/insensitive on a book level... it doesn't make sense to me
at first blush).

It could be done on a per-book level in solr with a more complex query
structure though...

(+case:sensitive +(normal relevancy query on the case sensitive fields
goes here)) OR (+case:insensitive +(normal relevancy query on the case
insensitive fields goes here))

-Yonik





__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com