Re: Delte all docs in a SOLR index?
Thanks -- I didn't know that deleting the Index (offline) was safe and complete -- thanks. - Original Message From: "Norskog, Lance" <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Friday, November 9, 2007 6:42:21 PM Subject: RE: Delte all docs in a SOLR index? A safer way is to stop Solr and remove the index directory. There is less chance of corruption, and it will faster. -Original Message- From: David Neubert [mailto:[EMAIL PROTECTED] Sent: Friday, November 09, 2007 10:56 AM To: solr-user@lucene.apache.org Subject: Re: Delte all docs in a SOLR index? Thanks! - Original Message From: Chris Hostetter <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Friday, November 9, 2007 1:51:03 PM Subject: Re: Delte all docs in a SOLR index? : Sorry for another basic question -- but what is the best safe way to : delete all docs in a SOLR index. I thought this was a FAQ, but it's hidden in another question (rebuilding if schema changes) i'll pull it out into a top level question... *:* : I am in my first few days using SOLR and Lucene, am iterating the schema : often, starting and stoping with test docs, etc. I like to know a very : quick way to clean out the index and start over repeatedly -- can't seem : to find it on the wiki -- maybe its Friday :) Huh .. that's actually the FAQ that does talk about deleting all docs :) "How can I rebuild my index from scratch if I change my schema?" http://wiki.apache.org/solr/FAQ#head-9aafb5d8dff5308e8ea4fcf4b71f19f029c 4bb99 -Hoss __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Re: Delte all docs in a SOLR index?
I guess I better look into trunk -- not familiar with it yet. - Original Message From: Mike Klaas <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Friday, November 9, 2007 6:49:40 PM Subject: Re: Delte all docs in a SOLR index? On 9-Nov-07, at 3:42 PM, Norskog, Lance wrote: > A safer way is to stop Solr and remove the index directory. There is > less chance of corruption, and it will faster. In trunk, it should be quicker and safer than stopping/restarting. Also, to clarify the 'corruption' issue, this should only be possible in the event of cold process termination (like power loss). -Mike > -Original Message- > From: David Neubert [mailto:[EMAIL PROTECTED] > Sent: Friday, November 09, 2007 10:56 AM > To: solr-user@lucene.apache.org > Subject: Re: Delte all docs in a SOLR index? > > Thanks! > > - Original Message > From: Chris Hostetter <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Friday, November 9, 2007 1:51:03 PM > Subject: Re: Delte all docs in a SOLR index? > > > > : Sorry for another basic question -- but what is the best safe way to > : delete all docs in a SOLR index. > > I thought this was a FAQ, but it's hidden in another question > (rebuilding if schema changes) i'll pull it out into a top level > question... > > *:* > > : I am in my first few days using SOLR and Lucene, am iterating the > schema > : often, starting and stoping with test docs, etc. I like to know a > very > : quick way to clean out the index and start over repeatedly -- can't > seem > : to find it on the wiki -- maybe its Friday :) > > Huh .. that's actually the FAQ that does talk about deleting all docs > :) > > "How can I rebuild my index from scratch if I change my schema?" > > http://wiki.apache.org/solr/ > FAQ#head-9aafb5d8dff5308e8ea4fcf4b71f19f029c > 4bb99 > > > > -Hoss > > > > > > > __ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Re: where to hook in to SOLR to read field-label from functionquery
hossman wrote: > > > : Say I have a custom functionquery MinFloatFunction which takes as its > : arguments an array of valuesources. > : > : MinFloatFunction(ValueSource[] sources) > : > : In my case all these valuesources are the values of a collection of > fields. > > a ValueSource isn't required to be field specifc (it may already be the > mathematical combination of other multiple fields) so there is no generic > way to get the "field name" form a ValueSource ... but you could define > your MinFloatFunction only accept FieldCacheSource[] as input ... hmmm, > ecept that FieldCacheSource doesn't expose the field name. so instead you > write... > > public class MyFieldCacheSource extends FieldCacheSource { > public MyFieldCacheSource(String field) { > super(field); > } > public String getField() { > return field; > } > } > public class MinFloatFunction ... { > public MinFloatFunction(MyFieldCacheSource[] values); > } > Thanks for this. I'm goign to look into this a little further. hossman wrote: > > > : For this I designed a schema in which each 'row' in the index represents > a > : product (indepdent of variants) (which takes care of the 1 variant max) > and > : every variant is represented as 2 fields in this row: > : > : variant_p_* <-- represents price (stored / indexed) > : variant_source_* <-- represents the other fields dependent on > the > : variant (stored / multivalued) > > Note: if you have a lot of varients you may wind up with the same problem > as described here... > > http://www.nabble.com/sorting-on-dynamic-fields---good%2C-bad%2C-neither--tf4694098.html > > ...because of the underlying FieldCache usage in FieldCacheValueSource > > > -Hoss > > > Hmmm. thanks for pointing me to that one ( i guess ;-) I totally underestimated the memory-requirements of the underlying Lucene Field-cache implementation. Having the option to sort on about 10.000 variantfields with about 400.000 docs will consume about 16 GB max. Definitly not doable in my situation. A LRU-implementation of the lucene field-cache would help big time in this situation to at least not get OOM-errors. Perhaps , you know of any existing implementations? Thanks a lot, Geert-Jan -- View this message in context: http://www.nabble.com/where-to-hook-in-to-SOLR-to-read-field-label-from-functionquery-tf4751109.html#a13682698 Sent from the Solr - User mailing list archive at Nabble.com.
Redundant indexing * 4 only solution (for par/sen and case sensitivity)
Hi all, Using SOLR, I believe I have to index the same content 4 times (not desirable) into 2 indexes -- and I don't know how you can practically do multiple indexes in SOLR (if indeed there is no better solution than 4 indexing runs into two indexes? My need is case-sensitive and case insensitive searches over well formed XML content (books), performing exact searches at the paragraph and sentence levels -- no errors over approximate boundaries -- the source content has exact par/sen tags. I have already proven a pretty nice solution for par/sen indexing twice into the same index in SOLR. I have added a tags field, and put correlative XML tags (comma delimited) into this field (one of which is either a para or sen flag) which flags the document (partial) as a paragraph or sentence. Thus all paragraphs of the book are indexed as single document (with its sentences combined and concatenated) and then all sentences in the book are indexed again as single documents. Both go into the same SOLR index. I just add an AND "tags:para" or "tags:sen" to my search and everything works fine. The obvious downside to this approach is the 2X indexing, but it does execute quite nicely on a single Index using SOLR. This obviously doesn't scale nicely, but will do for quite a while probably. I thought I could live with that But then I moved on to case sensitive and case-insensitive searches, and my research so far is pointing to one index for each case. So now I have: (1) 4X in content indexing (2) 2X in actual SOLR/Lucene indices (3) I don't know how to practically due multiple indices using SOLR? If there is a better way of attacking this problem, I would appreciate recommendations!!! Also, I don't know how to do multiple indices in SOLR -- I have heard it might be available in 1.3.0.? If this is my only recourse, please advise me where really good documentation is available on building 1.3.0. I am not admin savvy, but I did succeed in getting SOLR up myself and navigation through it with the help of this forum. But I have that building 1.3.0 (as opposed to downloading and installing it, like in 1.2.0) is a whole different experience and much more complex. Thanks Dave __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)
So now I have: (1) 4X in content indexing (2) 2X in actual SOLR/Lucene indices (3) I don't know how to practically due multiple indices using SOLR? If there is a better way of attacking this problem, I would appreciate recommendations!!! I don't quite follow your current approach, but it sounds like you just needs some copyFields to index the same content with multiple analyzers. for example, say you have fields: stored="false"/> stored="false"/> and copy fields: The 4X indexing cost? If you *need* to index the content 4 different ways, you don't have any way around that - do you? But is it really a big deal? How often does it need to index? How big is the data? I'm not quite following your need for multiple solr indicies, but in 1.3 it is possible. ryan
Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)
Ryan, Thanks for your response. I infer from your response that you can have a different analyzer for each field -- I guess I should have figured that out --but because I had not thought of that, I concluded that I needed multiple indices (sorry , I am still very new to Solr/Lucene). Does such an approach make querying difficult under the following condition: ? The app that I am replacing (and trying to enhance) has the ability to search multiple books at once with sen/par and case sensitivity settings individually selectable per book (e.g. default search modes per book). So with a single query request (just the query word(s)), you can search one book by par, with case, another by sen w/o case, etc. -- all settable as user defaults. I need to try to figure out how to match that in Solr/Lucene -- I believe that the Analyzer approach you suggested requires the use of the same Analzyer at query time that was used during indexing. So if I am hitting multiple fields (in the same search request) that invoke different Analyzers -- am I at a dead end, and have to result to consequetive multiple queries instead (and sort merge results afterwards?) Or am I just over complicating this? Dave - Original Message From: Ryan McKinley <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Saturday, November 10, 2007 2:18:00 PM Subject: Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity) > So now I have: > (1) 4X in content indexing > (2) 2X in actual SOLR/Lucene indices > (3) I don't know how to practically due multiple indices using SOLR? > > If there is a better way of attacking this problem, I would appreciate recommendations!!! > I don't quite follow your current approach, but it sounds like you just needs some copyFields to index the same content with multiple analyzers. for example, say you have fields: and copy fields: The 4X indexing cost? If you *need* to index the content 4 different ways, you don't have any way around that - do you? But is it really a big deal? How often does it need to index? How big is the data? I'm not quite following your need for multiple solr indicies, but in 1.3 it is possible. ryan __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)
On Nov 10, 2007 4:24 PM, David Neubert <[EMAIL PROTECTED]> wrote: > So if I am hitting multiple fields (in the same search request) that invoke > different Analyzers -- am I at a dead end, and have to result to consequetive > multiple queries instead Solr handles that for you automatically. > The app that I am replacing (and trying to enhance) has the ability to search > multiple books at once > with sen/par and case sensitivity settings individually selectable per book You could easily select case sensitivity or not *per query* across all books. You should step back and see what the requirements actually are (i.e. the reasons why one needs to be able to select case sensitive/insensitive on a book level... it doesn't make sense to me at first blush). It could be done on a per-book level in solr with a more complex query structure though... (+case:sensitive +(normal relevancy query on the case sensitive fields goes here)) OR (+case:insensitive +(normal relevancy query on the case insensitive fields goes here)) -Yonik
Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)
David Neubert wrote: Ryan, Thanks for your response. I infer from your response that you can have a different analyzer for each field yes! each field can have its own indexing strategy. I believe that the Analyzer approach you suggested requires the use of the same Analzyer at query time that was used during indexing. it does not require the *same* Analyzer - it just requires one that generates compatiable tokens. That is, you may want the indexing to split the input into sentences, but the query time analyzer keeps the input as a single token. check the example schema.xml file -- the 'text' field type applies synonyms at index time, but does at query time. re searching acrross multiple fields, don't worry, lucene handles this well. You may want to do that explicitly or with the dismax handler. I'd suggest you play around with indexing some data. check the analysis.jsp in the admin section. It is a great tool to help figure out what analyzers do at index vs query time. ryan
Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)
Ryan (and others who need something to put them so sleep :) ) Wow -- the light-bulb finally went off -- the Analzyer admin page is very cool -- I just was not at all thinking the SOLR/Lucene way. I need to rethink my whole approach now that I understand (from reviewing the schema.xml closer and playing with the Analyser) how compatible index and query policies can be applied automatically on a field by field basis by SOLR at both index and query time. I still may have a stumper here, but I need to give it some thought, and may return again with another question: The problem is that my text is book text (fairly large) that ooks very much like one would expect: ... ... .. elements to the because in that way I could produce the page:line reference in the pre-parsing (again outside of SOLR) and feed it in as explict field in the elements of the requests. Therefore at query time, I will have the exact page:line corresponding to the start of the paragraph or sentence. But I am beginning to suspect, I was planning to do a lot of work that SOLR can do for me. I will continue to study this and respond when I am a bit clearer, but the closer I could get to just submitting the books a chapter at a time -- and letting SOLR do the work, the better (cause I have all the books in well formed xml at chapter levels). However, I don't see yet how I could get par/sen granular search result hits, along with their exact page:line coordinates unless I approach it by explicitly indexing the pars and sens as single documents, not chapters hits, and also return the entire text of the sen or par, and highlight the keywords within (for the search result hit). Once a search result hit is selected, it would then act as expected and position into the chapter, at the selected reference, highlight again the key words, but this time in the context of an entire chapter (the whole document to the user's mind). Even with my new understanding you (and others) have given me, which I can use to certainly improve my approach -- it still seems to me that because multi-valued fields concatenate text -- even if you use the positionGapIncrment feature to prohibit unwanted phrase matches, how do you produce a well definied search result hit, bounded by the exact sen or par, unless you index them as single documents? Should I still read up on the payload discussion? Dave - Original Message From: Ryan McKinley <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Saturday, November 10, 2007 5:00:43 PM Subject: Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity) David Neubert wrote: > Ryan, > > Thanks for your response. I infer from your response that you can have a different analyzer for each field yes! each field can have its own indexing strategy. > I believe that the Analyzer approach you suggested requires the use > of the same Analzyer at query time that was used during indexing. it does not require the *same* Analyzer - it just requires one that generates compatiable tokens. That is, you may want the indexing to split the input into sentences, but the query time analyzer keeps the input as a single token. check the example schema.xml file -- the 'text' field type applies synonyms at index time, but does at query time. re searching acrross multiple fields, don't worry, lucene handles this well. You may want to do that explicitly or with the dismax handler. I'd suggest you play around with indexing some data. check the analysis.jsp in the admin section. It is a great tool to help figure out what analyzers do at index vs query time. ryan __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)
Yonik (or anyone else) Do you know where on-line documentation on the +case: syntax is located? I can't seem to find it. Dave - Original Message From: Yonik Seeley <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Saturday, November 10, 2007 4:56:40 PM Subject: Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity) On Nov 10, 2007 4:24 PM, David Neubert <[EMAIL PROTECTED]> wrote: > So if I am hitting multiple fields (in the same search request) that invoke different Analyzers -- am I at a dead end, and have to result to consequetive multiple queries instead Solr handles that for you automatically. > The app that I am replacing (and trying to enhance) has the ability to search multiple books at once > with sen/par and case sensitivity settings individually selectable per book You could easily select case sensitivity or not *per query* across all books. You should step back and see what the requirements actually are (i.e. the reasons why one needs to be able to select case sensitive/insensitive on a book level... it doesn't make sense to me at first blush). It could be done on a per-book level in solr with a more complex query structure though... (+case:sensitive +(normal relevancy query on the case sensitive fields goes here)) OR (+case:insensitive +(normal relevancy query on the case insensitive fields goes here)) -Yonik __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com