automatic index time field?

2006-12-12 Thread ryan mckinley

Is there a way to automatically set a field when a document is indexed?
Specifically, I'd like to have a date field updated to the current time when
a document is indexed.

I am trying to find the best way to re-index content on a live server.  I
can't wipe the index and start over as there will be active queries through
the whole process.

I have a bunch of stuff stored in SQL, my plan is to:
* note the current time
* Cycle through everything, 100 documents at a time
* when it is done, delete everything that was not updated since we started
this process.  Something like:
 index_time:[0 TO 2006-12-13T05:40:08,703]
* 
* 

My options are:
1) Send the index time along with the document.
2) extend UpdateHandler (DirectUpdateHandler2) to do this automatically

1) is the easiest but requires that everyone sending data sends a valid
"index_time" field.
2) more complicated, but then we know everything has a valid "index_time"
field.

Thanks for any pointers!
ryan


Re: How to query a parent child relationship returning result set of parents?

2006-12-12 Thread Eric Van Dewoestine

You can do pretty much anything you want in a custom request handler, but
i must admit that off the top of my head i can't think of any elegant way
to solve your problem.

Most people i know are happy with option #1 :)

-Hoss


I appreciate the input Hoss.  Unfortunately, I don't see option 1
working for us give the number of comments we expect our site to
generate.  If solr had some sort of append command to only index
appended content, then this may be a more viable solution. However,
I'm afraid of the performance impact that will result from re-index
the parent content and all child content every time a new child is
added.

It looks as though option 3 is the 'proper' solution and it's just a
matter of determining what and how to plug it in.  I've seen a couple
topics on the lucene mailing list which seem promising, so now I just
have to figure out how to fit that into the solr environment.

If you or anyone else have any more suggestions, tips, etc. I'd
appreciate the help as I'm a bit time constrained.

Thanks again for the response.

--
eric


Re: How to query a parent child relationship returning result set of parents?

2006-12-12 Thread Chris Hostetter
: Given this type of layout, how would I go about querying and returning
: a list of blogs which contain text in either the blog content or any
: of the comments' content?

a big issue is how timely do your comments have to show up in the index
... for some people an acceptible tradeoff is that new/edited "blogs" get
sent to the index immediately, but a cron runing at fixed regular
intervals indexes the comments ... in that approach your first idea is
usually the most straight forward...

: 1) aggregate comment content into the blog content index, allowing me
: to query directly on the blog.  However we are expecting the site to


A hybrid of your second and third suggestions is a much more involved
approach that might also work...

: 2) Use facets to get a list of parent items and issue an additional
: query (or hit the database) to pull in the parent content.  Again,
...
: 3) Plug into the solr code and implement a custom request handler,
: HitCollector, or ...?  I've spent some time digging into the solr code
: and I don't see any obvious place to plug this type of functionality

You can do pretty much anything you want in a custom request handler, but
i must admit that off the top of my head i can't think of any elegant way
to solve your problem.

Most people i know are happy with option #1 :)

-Hoss