Field Compression

2008-02-03 Thread Stu Hood
I just finished watching this talk about a column-store RDBMS, which has a long 
section on column compression. Specifically, it talks about the gains from 
compressing similar data together, and how lazily decompressing data only when 
it must be processed is great for memory/CPU cache usage.

http://youtube.com/watch?v=yrLd-3lnZ58

While interesting, its not relevant to Lucene's stored field storage. On the 
other hand, it did get me thinking about stored field compression and lazy 
field loading.

Can anyone give me some pointers about compressThreshold values that would be 
worth experimenting with? Our stored fields are often between 20 and 300 
characters, and we're willing to spend more time indexing if it will make 
searching less IO bound.

Thanks,

Stu Hood
Architecture Software Developer
Mailtrust, a Rackspace Company



Re: solr with hadoop

2008-01-07 Thread Stu Hood
As Mike suggested, we use Hadoop to organize our data en route to Solr. Hadoop 
allows us to load balance the indexing stage, and then we use the raw Lucene 
IndexWriter.addAllIndexes method to merge the data to be hosted on Solr 
instances.

Thanks,
Stu



-Original Message-
From: Mike Klaas [EMAIL PROTECTED]
Sent: Friday, January 4, 2008 3:04pm
To: solr-user@lucene.apache.org
Subject: Re: solr with hadoop

On 4-Jan-08, at 11:37 AM, Evgeniy Strokin wrote:

 I have huge index base (about 110 millions documents, 100 fields  
 each). But size of the index base is reasonable, it's about 70 Gb.  
 All I need is increase performance, since some queries, which match  
 big number of documents, are running slow.
 So I was thinking is any benefits to use hadoop for this? And if  
 so, what direction should I go? Is anybody did something for  
 integration Solr with Hadoop? Does it give any performance boost?

Hadoop might be useful for organizing your data enroute to Solr, but  
I don't see how it could be used to boost performance over a huge  
Solr index.  To accomplish that, you need to split it up over two  
machines (for which you might find hadoop useful).

-Mike




maxBooleanClauses

2007-12-20 Thread Stu Hood
Hello,

Is the 'maxBooleanClauses' setting just there for sanity checking, to protect 
me from my users?

Thanks,

Stu Hood
Webmail.us
You manage your business. We'll manage your email.®



RE: Query multiple fields

2007-11-18 Thread Stu Hood
 q=description:(test)!(type:10)!(type:14)

You can't use an '' symbol in your query (without escaping it). The boolean 
operator for 'and' in Lucene is 'AND': and it is case sensitive. Your query 
should probably look like:

 q=description:test AND -type:10 AND -type:14

See the Lucene query syntax here:

http://lucene.apache.org/java/docs/queryparsersyntax.html#Boolean%20operators

Thanks,
Stu


-Original Message-
From: Dave C. [EMAIL PROTECTED]
Sent: Sunday, November 18, 2007 1:50am
To: solr-user@lucene.apache.org
Subject: RE: Query multiple fields

Hi Nick,

Maybe you can help me with this related problem I am having.
My query is: q=description:(test)!(type:10)!(type:14).

However, my results are not as expected (55 results instead of the expected 23)

The response header shows: 
responseHeader:{
  status:0,
  QTime:1,
  params:{
wt:json,
!(type:10):,
!(type:14):,
indent:on,
q:description:(test),
fl:*}},

I am confused about why the !(type:10)!(type:14) is not in the 'q' 
parameter.

Any ideas?

Thanks,
David


 From: [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Subject: RE: Query multiple fields
 Date: Sun, 18 Nov 2007 03:18:12 +
 
 oh, awesome thanks
 
 -david
 
 
 
  Date: Sun, 18 Nov 2007 15:24:00 +1300
  From: [EMAIL PROTECTED]
  To: solr-user@lucene.apache.org
  Subject: Re: Query multiple fields
  
  Hi David
  You had it write in your example :)
  
  description:test AND type:10
  
  But it would probably be wise to wrap any text in parenthesis:
  
  description:(test foo bar baz) AND type:10
  
  You can find more info on the query syntax here:
  http://lucene.apache.org/java/docs/queryparsersyntax.html
  -Nick
  On 11/18/07, Dave C. [EMAIL PROTECTED] wrote:
   Hello,
  
   I've been trying to figure out how to query multiple fields at a time.
   For example, I want to do something like: description:test AND type:10.
   I've tried things like: ?q=description:testtype:10 etc, but I keep 
   getting syntax errors.
  
   Can anyone tell me how this can be done?
  
   Thanks,
   David
  
   P.S. Perhaps the solution to this could/should be added to the 
   FAQ/tutorial?
  
   _
   You keep typing, we keep giving. Download Messenger and join the i'm 
   Initiative now.
   http://im.live.com/messenger/im/home/?source=TAGLM
 
 _
 You keep typing, we keep giving. Download Messenger and join the i’m 
 Initiative now.
 http://im.live.com/messenger/im/home/?source=TAGLM

_
You keep typing, we keep giving. Download Messenger and join the i’m Initiative 
now.
http://im.live.com/messenger/im/home/?source=TAGLM



RE: Exception in SOLR when querying for fields of type string

2007-11-13 Thread Stu Hood
The first question is, what version of Solr are you using?

Thanks,
Stu


-Original Message-
From: Kasi Sankaralingam [EMAIL PROTECTED]
Sent: Tuesday, November 13, 2007 2:27pm
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Subject: Exception in SOLR when querying for fields of type string

Hi,

I am running into nullpointerexception on the SOLR side, when I do the 
following:

a)  Define a dynamic field in the schema of type string (say title_s)

b)  Do a query in the SOLR admin tool title_s: photo book

I get a null pointer exception when I run a search query on this.

If I enclose the search term within double quotes like photo book, it works 
fine.

Any ideas?

Thanks,

kasi




RE: Solr + Tomcat Undeploy Leaks

2007-10-18 Thread Stu Hood
Any ideas?

Has anyone had experienced this problem with other containers? I'm not tied to 
Tomcat if I can find another servlet host with a REST api for deploying apps.

Thanks,
Stu

-Original Message-
From: Stu Hood [EMAIL PROTECTED]
Sent: Wednesday, October 17, 2007 4:46pm
To: solr-user@lucene.apache.org
Subject: Solr + Tomcat Undeploy Leaks

Hello,

I'm using the Tomcat Manager app with 6.0.14 to start and stop Solr instances, 
and I believe I am running into a variant of the linked issue:

http://wiki.apache.org/jakarta-commons/Logging/UndeployMemoryLeak?action=print

According to `top`, the 'size' of the Tomcat process reaches the limit I have 
set for it with the Java -Xmx flag soon after starting and launching a few 
instances. The 'RSS' varies based on how full the caches are at any particular 
time, but I don't think it ever reaches the 'size'.

After a few days, I will get OOM errors in the logs when I try and start new 
instances (note: this is typically in the middle of the night, when usage is 
low), and all of the instances will stop responding until I (hard) restart 
Tomcat.



Has anyone run into this issue before? Is logging the culprit? If so, what 
options do I have (besides setting up a cron job to restart Tomcat nightly...)

Thanks,

Stu Hood
Webmail.us
You manage your business. We'll manage your email.®





Re: Solr + Tomcat Undeploy Leaks

2007-10-18 Thread Stu Hood
I'm running SVN r583865 (1.3-dev).

Mike: when you say 'process level management', do you mean starting them 
statically? Or starting them dynamically, but using a different container for 
each instance?

A little explanation is probably in order:
--
We're using Solr to provide log search capability to our customer service 
department. We keep 7 days of logs available by starting up a new instance 
every 6 hours or so and stopping the oldest instance. We merge in pre-indexed 
logs every 10 minutes.

We have a few nodes providing search, so we have one machine that schedules 
Solr instances on the nodes with Tomcat Manager, and all instances are searched 
simultaneously using the patch from SOLR-303.

An option I'm considering is keeping a static number of instances on each node, 
and doing a id[* TO *] delete when an instance needs to be reused, but I'd 
rather figure this bug out than refactor the scheduling/merging code for static 
instances. Java ought to be able to GC its way out of situations like this...

Thanks,
Stu


-Original Message-
From: Tom Hill [EMAIL PROTECTED]
Sent: Thursday, October 18, 2007 3:34pm
To: solr-user@lucene.apache.org
Subject: Re: Solr + Tomcat Undeploy Leaks

I certainly have seen memory problems when I just drop a new war file in
place. So now I usually stop tomcat and restart.

I used to see problems (pre-1.0) when I just redeployed repeatedly, without
even accessing the app, but I've got a little script running in the
background that has done that 50 times now, without running out of space.
Are you on a current version? I'm on 12

Tlom

On 10/18/07, Mike Klaas [EMAIL PROTECTED] wrote:

 I'm not sure that many people are dynamically taking down/starting up
 Solr webapps in servlet containers.  I certainly perfer process-level
 management of my (many) Solr instances.

 -Mike

 On 18-Oct-07, at 10:40 AM, Stu Hood wrote:

  Any ideas?
 
  Has anyone had experienced this problem with other containers? I'm
  not tied to Tomcat if I can find another servlet host with a REST
  api for deploying apps.
 
  Thanks,
  Stu
 
  -Original Message-
  From: Stu Hood [EMAIL PROTECTED]
  Sent: Wednesday, October 17, 2007 4:46pm
  To: solr-user@lucene.apache.org
  Subject: Solr + Tomcat Undeploy Leaks
 
  Hello,
 
  I'm using the Tomcat Manager app with 6.0.14 to start and stop Solr
  instances, and I believe I am running into a variant of the linked
  issue:
 
  http://wiki.apache.org/jakarta-commons/Logging/UndeployMemoryLeak?
  action=print
 
  According to `top`, the 'size' of the Tomcat process reaches the
  limit I have set for it with the Java -Xmx flag soon after starting
  and launching a few instances. The 'RSS' varies based on how full
  the caches are at any particular time, but I don't think it ever
  reaches the 'size'.
 
  After a few days, I will get OOM errors in the logs when I try and
  start new instances (note: this is typically in the middle of the
  night, when usage is low), and all of the instances will stop
  responding until I (hard) restart Tomcat.
 
  
 
  Has anyone run into this issue before? Is logging the culprit? If
  so, what options do I have (besides setting up a cron job to
  restart Tomcat nightly...)
 
  Thanks,
 
  Stu Hood
  Webmail.us
  You manage your business. We'll manage your email.(R)
 
 
 






Solr + Tomcat Undeploy Leaks

2007-10-17 Thread Stu Hood
Hello,

I'm using the Tomcat Manager app with 6.0.14 to start and stop Solr instances, 
and I believe I am running into a variant of the linked issue:

http://wiki.apache.org/jakarta-commons/Logging/UndeployMemoryLeak?action=print

According to `top`, the 'size' of the Tomcat process reaches the limit I have 
set for it with the Java -Xmx flag soon after starting and launching a few 
instances. The 'RSS' varies based on how full the caches are at any particular 
time, but I don't think it ever reaches the 'size'.

After a few days, I will get OOM errors in the logs when I try and start new 
instances (note: this is typically in the middle of the night, when usage is 
low), and all of the instances will stop responding until I (hard) restart 
Tomcat.



Has anyone run into this issue before? Is logging the culprit? If so, what 
options do I have (besides setting up a cron job to restart Tomcat nightly...)

Thanks,

Stu Hood
Webmail.us
You manage your business. We'll manage your email.®



Re: Facets and running out of Heap Space

2007-10-09 Thread Stu Hood
 Using the filter cache method on the things like media type and
 location; this will occupy ~2.3MB of memory _per unique value_

Mike, how did you calculate that value? I'm trying to tune my caches, and any 
equations that could be used to determine some balanced settings would be 
extremely helpful. I'm in a memory limited environment, so I can't afford to 
throw a ton of cache at the problem.

(I don't want to thread-jack, but I'm also wondering whether anyone has any 
notes on how to tune cache sizes for the filterCache, queryResultCache and 
documentCache).

Thanks,
Stu


-Original Message-
From: Mike Klaas [EMAIL PROTECTED]
Sent: Tuesday, October 9, 2007 9:30pm
To: solr-user@lucene.apache.org
Subject: Re: Facets and running out of Heap Space

On 9-Oct-07, at 12:36 PM, David Whalen wrote:

(snip)
 I'm sure we could stop storing many of these columns, especially
 if someone told me that would make a big difference.

I don't think that it would make a difference in memory consumption,  
but storage is certainly not necessary for faceting.  Extra stored  
fields can slow down search if they are large (in terms of bytes),  
but don't really occupy extra memory, unless they are polluting the  
doc cache.  Does 'text' need to be stored?

 what does the LukeReqeust Handler tell you about the # of
 distinct terms in each field that you facet on?

 Where would I find that?  I could probably estimate that myself
 on a per-column basis.  it ranges from 4 distinct values for
 media_type to 30-ish for location to 200-ish for country_code
 to almost 10,000 for site_id to almost 100,000 for journalist_id.

Using the filter cache method on the things like media type and  
location; this will occupy ~2.3MB of memory _per unique value_, so it  
should be a net win for those (although quite close in space  
requirements for a 30-ary field on your index size).

-Mike


Cache Memory Usage (was: Facets and running out of Heap Space)

2007-10-09 Thread Stu Hood
Sorry... where do the unique values come into the equation?



Also, you say that the queryResultCache memory usage is very low... how
could this be when it is storing the same information as the
filterCache, but with the addition of sorting?



Your answers are very helpful, thanks!

Stu Hood
Webmail.us
You manage your business. We'll manage your email.®

Re: How can i make a distribute search on So lr?

2007-09-19 Thread Stu Hood
Nutch implements federated search separately from their index generation.

My understanding is that MapReduce jobs generate the indexes (Nutch calls them 
segments) from raw data that has been downloaded, and then makes them available 
to be searched via remote procedure calls. Queries never pass through MapReduce 
in any shape or form, only the raw data and indexes.

If you take a look at the org.apache.nutch.searcher.DistributedSearch class, 
specifically the #Client.search method, you can see how they handle the actual 
federation of results.

Thanks,
Stu


-Original Message-
From: Norberto Meijome 
Sent: Wednesday, September 19, 2007 10:23am
To: solr-user@lucene.apache.org
Cc: [EMAIL PROTECTED]
Subject: Re: How can i make a distribute search on Solr?

On Wed, 19 Sep 2007 01:46:53 -0400
Ryan McKinley  wrote:

 Stu is referring to Federated Search - where each index has some of the 
 data and results are combined before they are returned.  This is not yet 
 supported out of the box

Maybe this is related. How does this compare to the map-reduce functionality in 
Nutch/Hadoop ? 
cheers,
B

_
{Beto|Norberto|Numard} Meijome

With sufficient thrust, pigs fly just fine. However, this is not necessarily a 
good idea. 
It is hard to be sure where they are going to land, and it could be dangerous 
sitting under them as they fly overhead.
   [RFC1925 - section 2, subsection 3]

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


RE: How can i make a distribute search on So lr?

2007-09-18 Thread Stu Hood
There are two federated/distributed search implementations that are still a few 
weeks away from maturity:
https://issues.apache.org/jira/browse/SOLR-255https://issues.apache.org/jira/browse/SOLR-303Any
 help in testing them would definitely be appreciated.
BUT, if you decide to roll your own, take a look at the following wiki page for 
details on the complexity of the task:
http://wiki.apache.org/solr/FederatedSearch
Good luck!


Thanks,
Stu


-Original Message-
From: ¹ý¼Ñ 
Sent: Wednesday, September 19, 2007 12:24am
To: solr-user@lucene.apache.org
Subject: How can i make a distribute search on Solr?

Hi everyone,

I successfully do the Collection Distribution on two Linux servers - one
master with one slave and sync the index data.

How can I make a search request to master server and receive the
response by all slave servers? OR it should be manually controlled?



Thanks  Best Regards.



Jarvis .


RE: Solr - rudimentary problems

2007-09-16 Thread Stu Hood
With regards to #3, it is recommended that for faceting, you use a separate 
copy of the field with stemming/tokenizing disabled. See : 
http://wiki.apache.org/solr/SolrFacetingOverview#head-fc68926c8421055de872acc694a6a966fab705d6

Thanks,
Stu


-Original Message-
From: Venkatraman S 
Sent: Monday, September 17, 2007 1:05am
To: solr-user@lucene.apache.org
Subject: Solr - rudimentary problems

We are using Lucene and are migrating to Solr 1.2 (we are using Embedded
Solr). During this process we are stumbling on certain problems :

1) IF the same document is added again, then it it getting added in the
index again(duplicated); inspite of the fact that the IDs are unique across
documents. This document should be updated in the Index.
 The corresponding entry for this field in schema.xml is :
 
stored=true multiValued=false  required=true/

2) Also, at the time of deleting a document, by providing its ID(exactly
similar to the deleteById proc in the Embedded Solr example) , we find that
the document is not getting deleted(and we also do not get any errors).

3) While using facets, we are getting the stemmed versions of the
corresponding words in the faceted fields - how do we get the 'original'
word?
As in, 'intenti' for 'intentional' etc

As i am new to Solr and did not find any documentation/on JIRA , i have
posted these. Any help would be highly appreciated.

-Venkat

--


RE: Re: multiple solr home directories

2007-08-31 Thread Stu Hood
You can use a combination of the Tomcat Manager app: 
http://tomcat.apache.org/tomcat-6.0-doc/manager-howto.html and this patch: 
https://issues.apache.org/jira/browse/SOLR-336 to create instances on the fly.

My three types of instances have separate home directories, but each running 
instance uses a different data directory.

Thanks,
Stu


-Original Message-
From: Ozgur Yilmazel 
Sent: Friday, August 31, 2007 4:48am
To: solr-user@lucene.apache.org
Subject: Re: multiple solr home directories

I have a related question on this topic. I have a web application
which I would like to create indexes for individual users on the fly,
is it possible to do JNDI configuration without restarting Tomcat?
Here is some more detail on what I am trying to do:
Our search application has a web based administration page in which
administrators can select set of documents and make them available for
search on different URLs or with different user privileges. I know we
could use the same index and filter results based on a indexname
field, but having separate indexes would make it easy for us to
migrate an index to a different machine easier.

Thank you for your help.

Ozgur




On 8/31/07, Chris Hostetter  wrote:

  Just to make sure.  you mean we can create a directory containing the shared
  jars, and each solr home/lib will symlink to the jar files in that
  directory. Right?

 correct.


 -Hoss



RE: Too many open files

2007-08-09 Thread Stu Hood


If you check out the documentation for mergeFactor, you'll find that adjusting 
it downward can lower the number of open files. Just remember that it is a 
speed tradeoff, and only lower it as much as you need to to stop getting the 
too many files errors.

See this section:
http://www.onjava.com/pub/a/onjava/2003/03/05/lucene.html#indexing_speed

Thanks,
Stu

-Original Message-
From: Ard Schrijvers [EMAIL PROTECTED]
Sent: Thu, August 9, 2007 10:52 am
To: solr-user@lucene.apache.org
Subject: RE: Too many open files

Hello,

useCompoundFile set to true, should avoid the problem. You could also try to 
set maximum open files higher, something like (I assume linux)

ulimit -n 8192

Ard


 
 You're a gentleman and a scholar.  I will donate the MMs to 
 myself :).
 Can you tell me from this snippet of my solrconfig.xml what I might
 tweak to make this more betterer?
 
 -KH
 
   
 default unless
 overridden. --
 false
 10
 1000
 2147483647
 1
 1000
 1
   
 


Re: Optimize index

2007-08-08 Thread Stu Hood


While we're on the subject of optimizing: Are there any benefits to optimizing 
an index before merging it into another index?

Thanks,
Stu


-Original Message-
From: Mike Klaas [EMAIL PROTECTED]
Sent: Wed, August 8, 2007 5:16 pm
To: solr-user@lucene.apache.org
Subject: Re: Optimize index

On 8-Aug-07, at 2:09 PM, Jae Joo wrote:

 How about standformat optimizion?
 Jae

Optimized indexes are always faster at query time that their non- 
optimized counterparts.  Sometimes significantly so.

-Mike


Logging in Solr Embedded

2007-08-01 Thread Stu Hood


Hello,

I've been using Solr in an embedded situation, and its been working quite well. 
But as I've started scaling up the application, the logging that Solr does to 
stderr is getting excessive.

I'd like to disable the INFO messages, and leave the WARNINGs. According to the 
Java API I should be able to use `SolrCore.log.setLevel(Level.WARNING)`, but 
that doesn't seem to stem the tide. Neither does 
`Config.log.setLevel(Level.WARNING)`.

Is there another log object that I've missed?

Thanks!

Stu Hood
Webmail.us
You manage your business. We'll manage your email.®

Re: Problems with embedded Solr

2007-07-26 Thread Stu Hood


I've filed this as https://issues.apache.org/jira/browse/SOLR-320

Sorry for the mixups!

Stu



-Original Message-
From: Ryan McKinley [EMAIL PROTECTED]
Sent: Thu, July 26, 2007 1:27 am
To: solr-user@lucene.apache.org
Subject: Re: Problems with embedded Solr

no attachments came through...

Off hand, SolrCore.close() should not exit the program, it just closes 
the searchers and cleans up after itself.

System.exit(0);

will terminate the program.


Stu Hood wrote:
 
 I'll try that again... (don't let my e-mail failures reflect badly on 
 Webmail.us =) 
 
 Hey everyone,
 
 I'm
 having some trouble getting embedded Solr working exactly as I'd like.
 I've attached some code that almost works how I want it to, except that
 even after calling SolrCore.close(), the program doesn't exit.
 
 Take
 a look in SolrRunner.java and see if you can spot any problems with the
 code. If not, edit the `run` script, and change the SOLR_INSTALL path
 to point to a copy of Solr. Then execute `run` from inside the
 solr-runner directory, and you should see that everything executes as
 expected, but doesn't exit when it is finished.
 
 Any ideas?
 
 Thanks!
 
 Stu Hood
 Webmail.us
 You manage your business. We'll manage your email.®



Re: Problems with embedded Solr

2007-07-26 Thread Stu Hood


The previous message that came through with no body holds the attachment.

I believe my issue is due to threads... after using JDB on the app I attached, 
I noticed one of the threads created by the SolrCore doesn't die after the core 
is closed. I need the thread to die on its own, because calling System.exit is 
not an option for my application.

Is there a step I'm missing?

Thanks,
Stu


-Original Message-
From: Ryan McKinley [EMAIL PROTECTED]
Sent: Thu, July 26, 2007 1:27 am
To: solr-user@lucene.apache.org
Subject: Re: Problems with embedded Solr

no attachments came through...

Off hand, SolrCore.close() should not exit the program, it just closes 
the searchers and cleans up after itself.

System.exit(0);

will terminate the program.


Stu Hood wrote:
 
 I'll try that again... (don't let my e-mail failures reflect badly on 
 Webmail.us =) 
 
 Hey everyone,
 
 I'm
 having some trouble getting embedded Solr working exactly as I'd like.
 I've attached some code that almost works how I want it to, except that
 even after calling SolrCore.close(), the program doesn't exit.
 
 Take
 a look in SolrRunner.java and see if you can spot any problems with the
 code. If not, edit the `run` script, and change the SOLR_INSTALL path
 to point to a copy of Solr. Then execute `run` from inside the
 solr-runner directory, and you should see that everything executes as
 expected, but doesn't exit when it is finished.
 
 Any ideas?
 
 Thanks!
 
 Stu Hood
 Webmail.us
 You manage your business. We'll manage your email.®



Problems with embedded Solr

2007-07-25 Thread Stu Hood


solr-runner.tgz
Description: application/compressed-tar


Problems with embedded Solr

2007-07-25 Thread Stu Hood


I'll try that again... (don't let my e-mail failures reflect badly on 
Webmail.us =) 

Hey everyone,

I'm
having some trouble getting embedded Solr working exactly as I'd like.
I've attached some code that almost works how I want it to, except that
even after calling SolrCore.close(), the program doesn't exit.

Take
a look in SolrRunner.java and see if you can spot any problems with the
code. If not, edit the `run` script, and change the SOLR_INSTALL path
to point to a copy of Solr. Then execute `run` from inside the
solr-runner directory, and you should see that everything executes as
expected, but doesn't exit when it is finished.

Any ideas?

Thanks!

Stu Hood
Webmail.us
You manage your business. We'll manage your email.®

Merging Solr Collections

2007-07-11 Thread Stu Hood


Hello,

I'm considering using embedded Solr in a distributed manner, such that an 
intermediate result will be N separate Solr indexes which would then be merged 
to a single final index.

I know that Lucene can merge indexes ( 
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/index/IndexWriter.html#addIndexes(org.apache.lucene.store.Directory[])
 ), but I also know that there is more to a Solr Collection than the index.

Does anyone have any suggestions for merging Solr Collections? Or should I 
forget this plan of attack...

Thanks,

Stu Hood
Webmail.us
You manage your business. We'll manage your email.®

Re: Merging Solr Collections

2007-07-11 Thread Stu Hood


Thanks Otis,

So Lucene's functions for merging indexes will correctly merge Solr 
Collections, without losing stored data? Thats good news!

Webmail.us is currently using vanilla Lucene for indexing our customer e-mail, 
which is working just fine. But our Log Search implementation isn't scalable 
enough to handle our growth, so I've been charged with finding a better 
solution for keeping track of the 60GB+ of log data we generate daily.

Solr is definitely being considered.

Thanks!
Stu


-Original Message-
From: Otis Gospodnetic [EMAIL PROTECTED]
Sent: Wed, July 11, 2007 11:44 am
To: solr-user@lucene.apache.org
Subject: Re: Merging Solr Collections

Hi Stu,

You can simply take your N Lucene indices managed by your N Solr instances and 
merge them into 1 Lucene index.  Then you can take your N Solr schema.xml's and 
solrconfig.xml's and merge them into a single schema.xml and a single 
solrconfig.xml that act as a union of the previous N configs.  Point your Solr 
to a newly merged index and use the newly merged configs, and I think you 
should be all set.

Is Webmail.us using Solr for email search? 


Otis
--
Lucene Consulting -- http://lucene-consulting.com/



- Original Message 
From: Stu Hood 
To: solr-user@lucene.apache.org
Sent: Wednesday, July 11, 2007 5:02:07 PM
Subject: Merging Solr Collections



Hello,

I'm considering using embedded Solr in a distributed manner, such that an 
intermediate result will be N separate Solr indexes which would then be merged 
to a single final index.

I know that Lucene can merge indexes ( 
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/index/IndexWriter.html#addIndexes(org.apache.lucene.store.Directory[])
 ), but I also know that there is more to a Solr Collection than the index.

Does anyone have any suggestions for merging Solr Collections? Or should I 
forget this plan of attack...

Thanks,

Stu Hood
Webmail.us
You manage your business. We'll manage your email.®




Date range problem

2007-06-25 Thread Stu Hood
Hello,Searching by date ranges doesn't seem to work in the example Solr install. A query like `timestamp:[20070101 TO 20080101]` returns: message Invalid Date String:'20070101'description The request sent by the client was syntactically incorrect (Invalid Date String:'20070101').That query should be valid according to http://lucene.apache.org/java/docs/queryparsersyntax.html#Range%20SearchesAny ideas?Stu HoodWebmail.us"You manage your business. We'll manage your email."®



Re: Date range problem

2007-06-25 Thread Stu Hood
Ok, the full time format works fine (and quickly too). Thanks for the quick answers!Stu-Original Message-From: Ryan McKinley <[EMAIL PROTECTED]>Sent: Mon, June 25, 2007 1:19 pmTo: solr-user@lucene.apache.orgSubject: Re: Date range problemthe solr date format is a bit more strict (ISO 8601)-MM-dd'T'HH:mm:ss.SSSthere is talk of a more lienent date parser, but nothing exists yet..The format you suggest would be ok if you index your dates as a string '20070101' and then use a range query.Stu Hood wrote: Hello,  Searching by date ranges doesn't seem to work in the example Solr  install. A query like `timestamp:[20070101 TO 20080101]` returns:  *message* _Invalid
  Date String:'20070101'_  *description* _The request sent by the client was syntactically  incorrect (Invalid Date String:'20070101')._  That query should be valid according to  http://lucene.apache.org/java/docs/queryparsersyntax.html#Range%20Searches  Any ideas?  Stu Hood Webmail.us "You manage your business. We'll manage your email."®