Re: solr index reusable with nutch?

2006-12-13 Thread Otis Gospodnetic
Hi,

Solr should be able to search any Lucene index, not just those created by Solr 
itself, as long as you configure it properly via schema.xml.  Thus, you should 
be able to use Solr to search an index created by Nutch.  Haven't tried it.  It 
would be nice if you could contribute the configuration for doing this.

Otis

- Original Message 
From: Thorsten Scherler [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Wednesday, December 13, 2006 8:26:51 AM
Subject: solr index reusable with nutch?

Hi all,

is it possible to directly use the solr index in nutch?

My client is creating a portal search based on nutch. In this portal
there is as well my project and ATM I prefer to go with solr instead of
nutch since it its much better for my use case.

Now the question is whether the portal search engine could use the solr
index for my part of the portal.

Can somebody point me to related documentation?

TIA

salu2
-- 
thorsten

Together we stand, divided we fall! 
Hey you (Pink Floyd)






Re: automatic index time field?

2006-12-13 Thread Chris Hostetter

: Is there a way to automatically set a field when a document is indexed?
: Specifically, I'd like to have a date field updated to the current time when
: a document is indexed.

Your message reminded me that i never announced the new Date Match
parsing code, which does let you say something like...

  field name=timestampNOW/field

...in your adddoc calls, but there is currently no way to have
default values for fields in your schema ... it's on the wishlist, but
no one is currently pursueing it as far as i know.

: I have a bunch of stuff stored in SQL, my plan is to:
:  * note the current time

...the gist of your plan is sound, but to eliminate possible headaches
from clock sync issues, instead of getting the current time from
somewhere, i would query your index for the all docs (of the type
you are interested in) sorted by date desc, and then note the date of the
newst doc and later delete all docs with dates up to and including that
one.

: My options are:
: 1) Send the index time along with the document.
: 2) extend UpdateHandler (DirectUpdateHandler2) to do this automatically
:
: 1) is the easiest but requires that everyone sending data sends a valid
: index_time field.
: 2) more complicated, but then we know everything has a valid index_time
: field.

As i said, you could just put NOW in all of your docs, but if you are
interested in pursuing option#2, the most general purpose and reusable
approach miht be to add an optional default=value attribute to the
field declarations in the schema.xml (relevant classes are SchemaField
and IndexSchema) and then modify the DocumentBuilder.getDoc method to
check for any default values of fields the Document doesn't already have
values for and add them .. then your timestamp field becomes...

field name=timestamp type=date indexed=true stored=true default=NOW 
/

..but you can also have other default fields...

field name=forSale type=boolean indexed=true stored=true 
default=false /
field name=type type=string indexed=true stored=true default=unknown 
/

...etc.


-Hoss



Case sensitivity on hostnames and email addresses

2006-12-13 Thread Wade Leftwich
I've run into some unexpected case sensitivity on searches, at least
unexpected by me.

If you index a text field containing this sentence:

A sentence containing CamelCase words by [EMAIL PROTECTED] is found
at StudlyCaps.org

The document will be found by searching for camelcase but not for
[EMAIL PROTECTED] or studlycaps.org.

This happens with the Standard or the DisMax query handler.

A bit of a problem for me, because I'm indexing a bunch of business
magazines, and domain names are frequently capitalized, often in CamelCase.

Is this maybe a bug? Or a WAD?

-- Wade Leftwich
Ithaca, NY



Re: solr index reusable with nutch?

2006-12-13 Thread Thorsten Scherler
On Wed, 2006-12-13 at 07:45 -0800, Otis Gospodnetic wrote:
 Hi,
 
 Solr should be able to search any Lucene index,

ok, good to know. :) 

So can I guess that the same is true for nutch? Meaning the index solr
is creating could be used by a nutch searcher.

  not just those created by Solr itself, as long as you configure it properly 
 via schema.xml.  

http://wiki.apache.org/solr/SchemaXml?highlight=%28schema%29

 Thus, you should be able to use Solr to search an index created by Nutch. 

In my use case I need the reverse. Nutch searches the index created by
my solr application. The application is just one component in the portal
and the portal will provide a global search engine which should use
the index from solr.

  Haven't tried it.  It would be nice if you could contribute the 
 configuration for doing this.
 

As I figure it out I will keep you informed.

Thanks for the feedback.

salu2

 Otis
 
 - Original Message 
 From: Thorsten Scherler [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Wednesday, December 13, 2006 8:26:51 AM
 Subject: solr index reusable with nutch?
 
 Hi all,
 
 is it possible to directly use the solr index in nutch?
 
 My client is creating a portal search based on nutch. In this portal
 there is as well my project and ATM I prefer to go with solr instead of
 nutch since it its much better for my use case.
 
 Now the question is whether the portal search engine could use the solr
 index for my part of the portal.
 
 Can somebody point me to related documentation?
 
 TIA
 
 salu2