Re: solr index reusable with nutch?
Hi, Solr should be able to search any Lucene index, not just those created by Solr itself, as long as you configure it properly via schema.xml. Thus, you should be able to use Solr to search an index created by Nutch. Haven't tried it. It would be nice if you could contribute the configuration for doing this. Otis - Original Message From: Thorsten Scherler [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Wednesday, December 13, 2006 8:26:51 AM Subject: solr index reusable with nutch? Hi all, is it possible to directly use the solr index in nutch? My client is creating a portal search based on nutch. In this portal there is as well my project and ATM I prefer to go with solr instead of nutch since it its much better for my use case. Now the question is whether the portal search engine could use the solr index for my part of the portal. Can somebody point me to related documentation? TIA salu2 -- thorsten Together we stand, divided we fall! Hey you (Pink Floyd)
Re: automatic index time field?
: Is there a way to automatically set a field when a document is indexed? : Specifically, I'd like to have a date field updated to the current time when : a document is indexed. Your message reminded me that i never announced the new Date Match parsing code, which does let you say something like... field name=timestampNOW/field ...in your adddoc calls, but there is currently no way to have default values for fields in your schema ... it's on the wishlist, but no one is currently pursueing it as far as i know. : I have a bunch of stuff stored in SQL, my plan is to: : * note the current time ...the gist of your plan is sound, but to eliminate possible headaches from clock sync issues, instead of getting the current time from somewhere, i would query your index for the all docs (of the type you are interested in) sorted by date desc, and then note the date of the newst doc and later delete all docs with dates up to and including that one. : My options are: : 1) Send the index time along with the document. : 2) extend UpdateHandler (DirectUpdateHandler2) to do this automatically : : 1) is the easiest but requires that everyone sending data sends a valid : index_time field. : 2) more complicated, but then we know everything has a valid index_time : field. As i said, you could just put NOW in all of your docs, but if you are interested in pursuing option#2, the most general purpose and reusable approach miht be to add an optional default=value attribute to the field declarations in the schema.xml (relevant classes are SchemaField and IndexSchema) and then modify the DocumentBuilder.getDoc method to check for any default values of fields the Document doesn't already have values for and add them .. then your timestamp field becomes... field name=timestamp type=date indexed=true stored=true default=NOW / ..but you can also have other default fields... field name=forSale type=boolean indexed=true stored=true default=false / field name=type type=string indexed=true stored=true default=unknown / ...etc. -Hoss
Case sensitivity on hostnames and email addresses
I've run into some unexpected case sensitivity on searches, at least unexpected by me. If you index a text field containing this sentence: A sentence containing CamelCase words by [EMAIL PROTECTED] is found at StudlyCaps.org The document will be found by searching for camelcase but not for [EMAIL PROTECTED] or studlycaps.org. This happens with the Standard or the DisMax query handler. A bit of a problem for me, because I'm indexing a bunch of business magazines, and domain names are frequently capitalized, often in CamelCase. Is this maybe a bug? Or a WAD? -- Wade Leftwich Ithaca, NY
Re: solr index reusable with nutch?
On Wed, 2006-12-13 at 07:45 -0800, Otis Gospodnetic wrote: Hi, Solr should be able to search any Lucene index, ok, good to know. :) So can I guess that the same is true for nutch? Meaning the index solr is creating could be used by a nutch searcher. not just those created by Solr itself, as long as you configure it properly via schema.xml. http://wiki.apache.org/solr/SchemaXml?highlight=%28schema%29 Thus, you should be able to use Solr to search an index created by Nutch. In my use case I need the reverse. Nutch searches the index created by my solr application. The application is just one component in the portal and the portal will provide a global search engine which should use the index from solr. Haven't tried it. It would be nice if you could contribute the configuration for doing this. As I figure it out I will keep you informed. Thanks for the feedback. salu2 Otis - Original Message From: Thorsten Scherler [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Wednesday, December 13, 2006 8:26:51 AM Subject: solr index reusable with nutch? Hi all, is it possible to directly use the solr index in nutch? My client is creating a portal search based on nutch. In this portal there is as well my project and ATM I prefer to go with solr instead of nutch since it its much better for my use case. Now the question is whether the portal search engine could use the solr index for my part of the portal. Can somebody point me to related documentation? TIA salu2