corrupt solr index on ec2

2008-10-30 Thread Bill Graham
Hi,

I've been running solr 1.3 on an ec2 instance for a couple of weeks and I've 
had some stability issues. It seems like I need to bounce the app once a day. 
That I could live with and ultimately maybe troubleshoot, but what's more 
disturbing is that three times in the last 2 weeks my index has been corrupted 
when FileNotFoundExceptions started to appear. 

I'm running in jetty and had my index on the local file system until I lost the 
index the first time. Then I moved it to my mounted ebs volume so I could 
restore from a snapshot if needed. I'm wondering if perhaps there are issues 
with the locking mechanize on either the local directory (which is really a 
virual instance), or the mounted xfs volume. Has anyone seem this, or have 
suggestions re the cause? I'm using the single lockType.

I'm running a single solr instance that gets frequent updates from multiple 
threads, and commits about every hour. 

A few things I see in the logs:

- From time to time I see write lock timeouts:
SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed 
out: SingleInstanceLock: write.lock

- I've seen OOM exceptions during warming. I've changed maxWarmingSearchers=1, 
which I suspect will do he trick

- The finally, this is what I fond in the logs today when the index got corrupt:

Oct 29, 2008 12:18:39 AM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
Oct 29, 2008 12:18:41 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.RuntimeException: java.io.FileNotFoundException: 
/var/local/solr/data/production/index/_2rv.fdt (No such file or directory)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:960)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:368)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:77)
at 
org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:226)
at 
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
Caused by: java.io.FileNotFoundException: 
/var/local/solr/data/production/index/_2rv.fdt (No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.init(RandomAccessFile.java:212)
at 
org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.init(FSDirectory.java:552)
at 
org.apache.lucene.store.FSDirectory$FSIndexInput.init(FSDirectory.java:582)
at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:488)
at org.apache.lucene.index.FieldsReader.init(FieldsReader.java:77)
at 
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:355)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:304)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:226)
at 
org.apache.lucene.index.MultiSegmentReader.init(MultiSegmentReader.java:56)
at 
org.apache.lucene.index.ReadOnlyMultiSegmentReader.init(ReadOnlyMultiSegmentReader.java:27)
at 

Re: Highlighting and fields

2008-10-30 Thread christophe

Hi Lars,

Thanks for it: it works great.

BR
Christophe

Lars Kotthoff wrote:

I'm doing the following query:
q=text:abc AND type:typeA
And I ask to return highlighting (query.setHighlight(true);). The search 
term for field type (typeA) is also highlighted in the text field.

Anyway to avoid this ?



Use setHighlightRequireFieldMatch(true) on the query object [1].

Lars


[1] 
http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/SolrQuery.html#setHighlightRequireFieldMatch(boolean)
  


How to get the min and max values from facets?

2008-10-30 Thread Vincent Pérès

Hello,

I'm using Solr 1.3. I would like to get only minimum and maximum values from
a facet.
In fact I'm using a range to get the results : [value TO value], and I don't
need to get the facets list in my XML results (which could be more than
hundred thousands)... so, I have to display the range (minimum and maximum
values) from a facet. Is there any way to do that?
I found the new statistics components, follow the link :
http://wiki.apache.org/solr/StatsComponent
But it's for solr 1.4.

Does anyone have any idea?

Thank you !
Vincent
-- 
View this message in context: 
http://www.nabble.com/How-to-get-the-min-and-max-values-from-facets--tp20243462p20243462.html
Sent from the Solr - User mailing list archive at Nabble.com.



Performanec Lucene / Solr

2008-10-30 Thread Kraus, Ralf | pixelhouse GmbH

Hello,

I am validating Sorl 1.3 now for about 3 weeks... My goal is to migrate
from Lucene to Solr because of the much better plugins and search functions.

Right now I am stress testing the performence and sending 2500 search 
request via JSON protocol and from my PHPUnit testcase.

All search reuqest are different so caching don´t do it for me.
Right now our old Lucene-JSPs are avout 4 times faster than my SOLR 
Sollution :-(


any chances I can tweak my solrconfig.xml ?

Greets -Ralf-


Re: Performanec Lucene / Solr

2008-10-30 Thread Shalin Shekhar Mangar
On Thu, Oct 30, 2008 at 4:12 PM, Kraus, Ralf | pixelhouse GmbH 
[EMAIL PROTECTED] wrote:

 I am validating Sorl 1.3 now for about 3 weeks... My goal is to migrate
 from Lucene to Solr because of the much better plugins and search
 functions.


Very nice!


 Right now I am stress testing the performence and sending 2500 search
 request via JSON protocol and from my PHPUnit testcase.
 All search reuqest are different so caching don´t do it for me.
 Right now our old Lucene-JSPs are avout 4 times faster than my SOLR
 Sollution :-(


Well, with Lucene it is an API call in the same JVM in the same web
application. With Solr, you are making HTTP calls across the network,
serializing requests and de-serializing responses. So the comparison is not
exactly apples to apples.

Look at what Solr offers -- replication, caching, plugins etc. Will you
really need to go over 2500 requests per second? Do you need to be concerned
with performance above and beyond that? Will it be easier to scale out to
more boxes?

-- 
Regards,
Shalin Shekhar Mangar.


Max Number of Facets

2008-10-30 Thread Jeryl Cook
is there a limit on the number of facets that i can create in
Solr?(dynamically generated facets.)

-- 
Jeryl Cook
/^\ Pharaoh /^\
http://pharaohofkush.blogspot.com/
Whether we bring our enemies to justice, or bring justice to our
enemies, justice will be done.
--George W. Bush, Address to a Joint Session of Congress and the
American People, September 20, 2001


Re: Performanec Lucene / Solr

2008-10-30 Thread Kraus, Ralf | pixelhouse GmbH

Mark Miller schrieb:

 Right now I am stress testing the performence and sending 2500 search

request via JSON protocol and from my PHPUnit testcase.
All search reuqest are different so caching don´t do it for me.
Right now our old Lucene-JSPs are avout 4 times faster than my SOLR
Sollution :-(




Well, with Lucene it is an API call in the same JVM in the same web
application. With Solr, you are making HTTP calls across the network,
serializing requests and de-serializing responses. So the comparison 
is not

exactly apples to apples.

Look at what Solr offers -- replication, caching, plugins etc. Will you
really need to go over 2500 requests per second? Do you need to be 
concerned
with performance above and beyond that? Will it be easier to scale 
out to

more boxes?

  

And have you tried solrj without http?


Right now I am using this php classes to send and receiver my requests :

- Apache_Solr_Service.php
- Responce.php

It has the advantage that I don´t need to write extra JSP oder JAVA code...

Greets -Ralf-


Re: Performanec Lucene / Solr

2008-10-30 Thread Mark Miller

Kraus, Ralf | pixelhouse GmbH wrote:

Mark Miller schrieb:

 Right now I am stress testing the performence and sending 2500 search

request via JSON protocol and from my PHPUnit testcase.
All search reuqest are different so caching don´t do it for me.
Right now our old Lucene-JSPs are avout 4 times faster than my SOLR
Sollution :-(




Well, with Lucene it is an API call in the same JVM in the same web
application. With Solr, you are making HTTP calls across the network,
serializing requests and de-serializing responses. So the comparison 
is not

exactly apples to apples.

Look at what Solr offers -- replication, caching, plugins etc. Will you
really need to go over 2500 requests per second? Do you need to be 
concerned
with performance above and beyond that? Will it be easier to scale 
out to

more boxes?

  

And have you tried solrj without http?


Right now I am using this php classes to send and receiver my requests :

- Apache_Solr_Service.php
- Responce.php

It has the advantage that I don´t need to write extra JSP oder JAVA 
code...


Greets -Ralf-

I think it will have the disadvantage of being a lot slower though...

How were you handling things with Lucene? You must have used Java then? 
If you even want to get close to that performance I think you need to 
use non http embedded solr.


Re: Performanec Lucene / Solr

2008-10-30 Thread Mark Miller

Shalin Shekhar Mangar wrote:

On Thu, Oct 30, 2008 at 4:12 PM, Kraus, Ralf | pixelhouse GmbH 
[EMAIL PROTECTED] wrote:

  

I am validating Sorl 1.3 now for about 3 weeks... My goal is to migrate
from Lucene to Solr because of the much better plugins and search
functions.




Very nice!


  

Right now I am stress testing the performence and sending 2500 search
request via JSON protocol and from my PHPUnit testcase.
All search reuqest are different so caching don´t do it for me.
Right now our old Lucene-JSPs are avout 4 times faster than my SOLR
Sollution :-(




Well, with Lucene it is an API call in the same JVM in the same web
application. With Solr, you are making HTTP calls across the network,
serializing requests and de-serializing responses. So the comparison is not
exactly apples to apples.

Look at what Solr offers -- replication, caching, plugins etc. Will you
really need to go over 2500 requests per second? Do you need to be concerned
with performance above and beyond that? Will it be easier to scale out to
more boxes?

  

And have you tried solrj without http?


Using Solrj

2008-10-30 Thread Raghunandan Rao
Hi,

I am trying to use Solrj for my web application. I am indexing a table
using the @Field annotation tag. Now I need to index or query multiple
tables. Like, get all the employees who are managers in Finance
department (interacting with 3 entities). How do I do that?

 

Does anyone have any idea?

 

Thanks



Re: Performanec Lucene / Solr

2008-10-30 Thread Shalin Shekhar Mangar
On Thu, Oct 30, 2008 at 5:22 PM, Kraus, Ralf | pixelhouse GmbH 
[EMAIL PROTECTED] wrote:

 Right now I am using this php classes to send and receiver my requests :

 - Apache_Solr_Service.php
 - Responce.php

 It has the advantage that I don´t need to write extra JSP oder JAVA code...


Unfortunately, the PHP client in Solr does not take advantage of the binary
response format. It is supported only by SolrJ (java client). It can reduce
a lot of the overhead associated with JSON/XML parsing.

-- 
Regards,
Shalin Shekhar Mangar.


Re: where's the bottleneck

2008-10-30 Thread Yonik Seeley
On Thu, Oct 30, 2008 at 1:02 AM, Barnett, Jeffrey
[EMAIL PROTECTED] wrote:
 I thought it was turned off already.  ( Lucene vs Solr ?) Where do I make 
 this change?

Comment out this part in your solrconfig.xml

autoCommit
  maxDocs2/maxDocs
  maxTime4/maxTime
/autoCommit

-Yonik

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley
 Sent: Wednesday, October 29, 2008 11:28 PM
 To: solr-user@lucene.apache.org
 Subject: Re: where's the bottleneck

 On Wed, Oct 29, 2008 at 9:48 PM, Barnett, Jeffrey
 [EMAIL PROTECTED] wrote:
 Reported import rates start a 70 docs per second, and decrease as more 
 records are added.

 It might just be segment merges (that takes more time as segments grow in 
 size).
 From the solrconfig.xml I see you have autocommit turned on... try
 with it off and see if it helps.

 -Yonik



Re: Performanec Lucene / Solr

2008-10-30 Thread Kraus, Ralf | pixelhouse GmbH

Mark Miller schrieb:

Kraus, Ralf | pixelhouse GmbH wrote:

Mark Miller schrieb:

 Right now I am stress testing the performence and sending 2500 search

request via JSON protocol and from my PHPUnit testcase.
All search reuqest are different so caching don´t do it for me.
Right now our old Lucene-JSPs are avout 4 times faster than my SOLR
Sollution :-(




Well, with Lucene it is an API call in the same JVM in the same web
application. With Solr, you are making HTTP calls across the network,
serializing requests and de-serializing responses. So the 
comparison is not

exactly apples to apples.

Look at what Solr offers -- replication, caching, plugins etc. Will 
you
really need to go over 2500 requests per second? Do you need to be 
concerned
with performance above and beyond that? Will it be easier to scale 
out to

more boxes?

  

And have you tried solrj without http?


Right now I am using this php classes to send and receiver my requests :

- Apache_Solr_Service.php
- Responce.php

It has the advantage that I don´t need to write extra JSP oder JAVA 
code...


Greets -Ralf-

I think it will have the disadvantage of being a lot slower though...

How were you handling things with Lucene? You must have used Java 
then? If you even want to get close to that performance I think you 
need to use non http embedded solr.


Okay okay :-) I am writing a new JSP Handler for my requests as we speak 
:-) I really hope performence will be better than with {wt=javabin} 


Greets -Ralf-


Re: Using Solrj

2008-10-30 Thread Noble Paul നോബിള്‍ नोब्ळ्
hi ,
There are two sides to this .
1. indexing (getting data into Solr) SolrJ or DataImportHandler can be
used for this
2.querying . getting data out of solr. Here you do not have the choice
of joining multiple tables. There only one index for Solr



On Thu, Oct 30, 2008 at 5:34 PM, Raghunandan Rao
[EMAIL PROTECTED] wrote:
 Hi,

 I am trying to use Solrj for my web application. I am indexing a table
 using the @Field annotation tag. Now I need to index or query multiple
 tables. Like, get all the employees who are managers in Finance
 department (interacting with 3 entities). How do I do that?



 Does anyone have any idea?



 Thanks





-- 
--Noble Paul


Re: Performanec Lucene / Solr

2008-10-30 Thread Yonik Seeley
On Thu, Oct 30, 2008 at 8:39 AM, Kraus, Ralf | pixelhouse GmbH
[EMAIL PROTECTED] wrote:
 Okay okay :-) I am writing a new JSP Handler for my requests as we speak :-)
 I really hope performence will be better than with {wt=javabin} 

What are your requirements for requests/sec, and how many are you
getting from Solr now?
Is the bottleneck Solr, the network, or PHP?  If Solr and PHP are on
the same box, at least do a top to see where the CPU is going.
If most of the CPU is going to Solr, post some of the URLs you are
using to search - perhaps they can be optimized.

-Yonik


RE: Using Solrj

2008-10-30 Thread Raghunandan Rao
Thanks Noble. 

So you mean to say that I need to create a view according to my query and then 
index on the view and fetch? 

-Original Message-
From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 30, 2008 6:16 PM
To: solr-user@lucene.apache.org
Subject: Re: Using Solrj

hi ,
There are two sides to this .
1. indexing (getting data into Solr) SolrJ or DataImportHandler can be
used for this
2.querying . getting data out of solr. Here you do not have the choice
of joining multiple tables. There only one index for Solr



On Thu, Oct 30, 2008 at 5:34 PM, Raghunandan Rao
[EMAIL PROTECTED] wrote:
 Hi,

 I am trying to use Solrj for my web application. I am indexing a table
 using the @Field annotation tag. Now I need to index or query multiple
 tables. Like, get all the employees who are managers in Finance
 department (interacting with 3 entities). How do I do that?



 Does anyone have any idea?



 Thanks





-- 
--Noble Paul


Re: Performanec Lucene / Solr

2008-10-30 Thread Mark Miller


All search reuqest are different so caching don´t do it for me. 
P.S. If caching is not helping you, turn it off. It costs to populate / 
maintain the cache, so if its not helping, its only hurting.





Re: Performanec Lucene / Solr

2008-10-30 Thread Grant Ingersoll
Have you gone through http://wiki.apache.org/solr/ 
SolrPerformanceFactors ?


Can you explain a little more about your testcase, maybe even share  
code?  I only know a little PHP, but maybe someone else who is better  
versed might spot something.


On Oct 30, 2008, at 8:39 AM, Kraus, Ralf | pixelhouse GmbH wrote:


Mark Miller schrieb:

Kraus, Ralf | pixelhouse GmbH wrote:

Mark Miller schrieb:
Right now I am stress testing the performence and sending 2500  
search

request via JSON protocol and from my PHPUnit testcase.
All search reuqest are different so caching don´t do it for me.
Right now our old Lucene-JSPs are avout 4 times faster than my  
SOLR

Sollution :-(




Well, with Lucene it is an API call in the same JVM in the same  
web
application. With Solr, you are making HTTP calls across the  
network,
serializing requests and de-serializing responses. So the  
comparison is not

exactly apples to apples.

Look at what Solr offers -- replication, caching, plugins etc.  
Will you
really need to go over 2500 requests per second? Do you need to  
be concerned
with performance above and beyond that? Will it be easier to  
scale out to

more boxes?



And have you tried solrj without http?

Right now I am using this php classes to send and receiver my  
requests :


- Apache_Solr_Service.php
- Responce.php

It has the advantage that I don´t need to write extra JSP oder  
JAVA code...


Greets -Ralf-

I think it will have the disadvantage of being a lot slower though...

How were you handling things with Lucene? You must have used Java  
then? If you even want to get close to that performance I think you  
need to use non http embedded solr.


Okay okay :-) I am writing a new JSP Handler for my requests as we  
speak :-) I really hope performence will be better than with  
{wt=javabin} 


Greets -Ralf-


--
Grant Ingersoll
Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
http://www.lucenebootcamp.com


Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











Re: Using Solrj

2008-10-30 Thread Noble Paul നോബിള്‍ नोब्ळ्
not really. you can explain your usecase and it will be more clear

On Thu, Oct 30, 2008 at 6:20 PM, Raghunandan Rao
[EMAIL PROTECTED] wrote:
 Thanks Noble.

 So you mean to say that I need to create a view according to my query and 
 then index on the view and fetch?

 -Original Message-
 From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED]
 Sent: Thursday, October 30, 2008 6:16 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Using Solrj

 hi ,
 There are two sides to this .
 1. indexing (getting data into Solr) SolrJ or DataImportHandler can be
 used for this
 2.querying . getting data out of solr. Here you do not have the choice
 of joining multiple tables. There only one index for Solr



 On Thu, Oct 30, 2008 at 5:34 PM, Raghunandan Rao
 [EMAIL PROTECTED] wrote:
 Hi,

 I am trying to use Solrj for my web application. I am indexing a table
 using the @Field annotation tag. Now I need to index or query multiple
 tables. Like, get all the employees who are managers in Finance
 department (interacting with 3 entities). How do I do that?



 Does anyone have any idea?



 Thanks





 --
 --Noble Paul




-- 
--Noble Paul


ApacheCon Reminder

2008-10-30 Thread Grant Ingersoll
For those attending ApacheCon in New Orleans next week, the Lucene  
Search and Machine Learning Birds of a Feather (BOF) will be held  
Wednesday night.  Please indicate your interest at: http://wiki.apache.org/apachecon/BirdsOfaFeatherUs08


Also, note there are a number of Lucene/Solr/Mahout talks on  
Wednesday, and, of course, it's not to late to sign up for Lucene or  
Solr training.  See the ApacheCon website for more info: http://us.apachecon.com/c/acus2008/


See you there,
Grant


Re: Max Number of Facets

2008-10-30 Thread Yonik Seeley
On Thu, Oct 30, 2008 at 7:28 AM, Jeryl Cook [EMAIL PROTECTED] wrote:
 is there a limit on the number of facets that i can create in
 Solr?(dynamically generated facets.)

Not really, It's practically limited by CPU and memory, which can vary
widely with what the facet fields look like (number of unique terms,
if it's multi-valued, etc).

-Yonik


Re: Performanec Lucene / Solr

2008-10-30 Thread Kraus, Ralf | pixelhouse GmbH

Grant Ingersoll schrieb:
Have you gone through 
http://wiki.apache.org/solr/SolrPerformanceFactors ?


Can you explain a little more about your testcase, maybe even share 
code?  I only know a little PHP, but maybe someone else who is better 
versed might spot something.

I just wrote my JSP script for using solrj instead
performence is much much better now !

Greets -Ralf-


Re: Using Solrj

2008-10-30 Thread Erick Erickson
Generally, you need to get your head out of the database world and into
the search world to be successful with Lucene. For instance, one
of the cardinal tenets of database design is to normalize your
data. It goes against every instinct to *denormalize* your data when
creating an Lucene index explicitly so you do NOT have to think
in terms of joins or sub-queries. Whenever I start thinking this
way, I try to back up and think again.

Both your posts indicate to me that you're thinking in database
terms. There are no views in Lucene, for instance. You refer
to tables. There are no tables in Lucene, there are only documents
with various numbers of fields. You could conceivable make your index
look like a database by creatively naming your document fields. But
that doesn't play to the strengths of Lucene *or* the database.

In fact, there is NO requirement that documents have the *same* fields.
Which is really difficult to get into when thinking like a DBA.

Lucene is designed to search text. Fast and well. It is NOT intended to
efficiently manipulate relationships *between* documents. There
are various hybrid solutions that people have used. That is, put the
data you really need to do text searching on in a Lucene index,
along with enough data to be able to get the *rest* of what you need
from your database. But it all depends upon the problem you're trying to
solve.

But as Noble says, all this is too general to be really useful, you need
to provide quite more detail about the problem you're trying to
solve to get useful recommendations.

Best
Erick

On Thu, Oct 30, 2008 at 8:50 AM, Raghunandan Rao 
[EMAIL PROTECTED] wrote:

 Thanks Noble.

 So you mean to say that I need to create a view according to my query and
 then index on the view and fetch?

 -Original Message-
 From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED]
 Sent: Thursday, October 30, 2008 6:16 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Using Solrj

 hi ,
 There are two sides to this .
 1. indexing (getting data into Solr) SolrJ or DataImportHandler can be
 used for this
 2.querying . getting data out of solr. Here you do not have the choice
 of joining multiple tables. There only one index for Solr



 On Thu, Oct 30, 2008 at 5:34 PM, Raghunandan Rao
 [EMAIL PROTECTED] wrote:
  Hi,
 
  I am trying to use Solrj for my web application. I am indexing a table
  using the @Field annotation tag. Now I need to index or query multiple
  tables. Like, get all the employees who are managers in Finance
  department (interacting with 3 entities). How do I do that?
 
 
 
  Does anyone have any idea?
 
 
 
  Thanks
 
 



 --
 --Noble Paul



Re: replication handler - compression

2008-10-30 Thread Erik Hatcher

+1 - the GzipServletFilter is the way to go.

Regarding request handlers reading HTTP headers, yeah,... this will  
improve, for sure.


Erik

On Oct 30, 2008, at 12:18 AM, Chris Hostetter wrote:



: You are partially right. Instead of the HTTP header , we use a  
request
: parameter. (RequestHandlers cannot read HTP headers). If the param  
is


hmmm, i'm with walter: we shouldn't invent new mechanisms for
clients to request compression over HTTP from servers.

replicatoin is both special enough and important enough that if we  
had to
add special support to make that information available to the  
handler on

the master we could.

but frankly i don't think that's neccessary: the logic to turn on
compression if the client requests it using Accept-Encoding: gzip is
generic enough that there is no reason for it to be in a handler.  we
could easily put it in the SolrDispatchFilter, or even in a new
ServletFilte (i'm guessing iv'e seen about 74 different  
implementations of

a GzipServletFilter in the wild that could be used as is.

then we'd have double wins: compression for replication, and  
compression

of all responses generated by Solr if hte client requests it.

-Hoss




RE: Performanec Lucene / Solr

2008-10-30 Thread Feak, Todd
I realize you said caching won't help because the searches are
different, but what about Document caching? Is every document returned
different? What's your hit rate on the Document cache? Can you throw
memory at the problem by increasing Document cache size?

I ask all this, as the Document cache was the biggest win for my
application when it came to increasing performance. Hit rates of 50%
resulted in 30% GC time. Hit rates  95% had GC rates below 2%.

-Todd

-Original Message-
From: Kraus, Ralf | pixelhouse GmbH [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 30, 2008 6:18 AM
To: solr-user@lucene.apache.org
Subject: Re: Performanec Lucene / Solr

Grant Ingersoll schrieb:
 Have you gone through 
 http://wiki.apache.org/solr/SolrPerformanceFactors ?

 Can you explain a little more about your testcase, maybe even share 
 code?  I only know a little PHP, but maybe someone else who is better 
 versed might spot something.
I just wrote my JSP script for using solrj instead
performence is much much better now !

Greets -Ralf-



Re: replication handler - compression

2008-10-30 Thread Otis Gospodnetic
Yeah.  I'm just not sure how much benefit in terms of data transfer this will 
save.  Has anyone tested this to see if this is even worth it?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Erik Hatcher [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Thursday, October 30, 2008 9:54:28 AM
 Subject: Re: replication handler - compression
 
 +1 - the GzipServletFilter is the way to go.
 
 Regarding request handlers reading HTTP headers, yeah,... this will improve, 
 for 
 sure.
 
 Erik
 
 On Oct 30, 2008, at 12:18 AM, Chris Hostetter wrote:
 
  
  : You are partially right. Instead of the HTTP header , we use a request
  : parameter. (RequestHandlers cannot read HTP headers). If the param is
  
  hmmm, i'm with walter: we shouldn't invent new mechanisms for
  clients to request compression over HTTP from servers.
  
  replicatoin is both special enough and important enough that if we had to
  add special support to make that information available to the handler on
  the master we could.
  
  but frankly i don't think that's neccessary: the logic to turn on
  compression if the client requests it using Accept-Encoding: gzip is
  generic enough that there is no reason for it to be in a handler.  we
  could easily put it in the SolrDispatchFilter, or even in a new
  ServletFilte (i'm guessing iv'e seen about 74 different implementations of
  a GzipServletFilter in the wild that could be used as is.
  
  then we'd have double wins: compression for replication, and compression
  of all responses generated by Solr if hte client requests it.
  
  -Hoss



Re: corrupt solr index on ec2

2008-10-30 Thread Yonik Seeley
On Thu, Oct 30, 2008 at 2:06 AM, Bill Graham [EMAIL PROTECTED] wrote:
 I've been running solr 1.3 on an ec2 instance for a couple of weeks and I've 
 had some stability issues. It seems like I need to bounce the app once a day. 
 That I could live with and ultimately maybe troubleshoot, but what's more 
 disturbing is that three times in the last 2 weeks my index has been 
 corrupted when FileNotFoundExceptions started to appear.

 I'm running in jetty and had my index on the local file system until I lost 
 the index the first time. Then I moved it to my mounted ebs volume so I could 
 restore from a snapshot if needed. I'm wondering if perhaps there are issues 
 with the locking mechanize on either the local directory (which is really a 
 virual instance), or the mounted xfs volume. Has anyone seem this, or have 
 suggestions re the cause? I'm using the single lockType.

 I'm running a single solr instance that gets frequent updates from multiple 
 threads, and commits about every hour.

 A few things I see in the logs:

 - From time to time I see write lock timeouts:
 SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed 
 out: SingleInstanceLock: write.lock

This is really strange.  It suggests that there is another in-process
writer that is holding the lock.  That should be impossible, unless
it's caused by a previous exception trying to open an IndexWriter and
the lock is simply stale.  What seems to be the first exception that
occurs?

Also, you might try changing the lock type from single to simple to
make it visible cross-process.  That would rule out trying to start
another solr instance on the same index directory opening two
writers on the same directory is one way to get missing files like you
appear to have.

 - I've seen OOM exceptions during warming. I've changed 
 maxWarmingSearchers=1, which I suspect will do he trick

OOM errors are really tricky - if they happen in the wrong place, it's
hard to recover gracefully from. Correctly cleaning up after an OOM
error in the IndexWriter recently had some little fixes in lucene
trunk - you might want to try the latest dev version of Lucene and see
if it helps.


-Yonik


 - The finally, this is what I fond in the logs today when the index got 
 corrupt:

 Oct 29, 2008 12:18:39 AM org.apache.solr.update.DirectUpdateHandler2 commit
 INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
 Oct 29, 2008 12:18:41 AM org.apache.solr.common.SolrException log
 SEVERE: java.lang.RuntimeException: java.io.FileNotFoundException: 
 /var/local/solr/data/production/index/_2rv.fdt (No such file or directory)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:960)
at 
 org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:368)
at 
 org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:77)
at 
 org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:226)
at 
 org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123)
at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
 org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
 org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
 Caused by: 

Re: replication handler - compression

2008-10-30 Thread Walter Underwood
About a factor of 2 on a small, optimized index. Gzipping took 20 seconds,
so it isn't free.

$ cd index-copy
$ du -sk
134336  .
$ gzip *
$ du -sk
62084   .

wunder

On 10/30/08 8:20 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote:

 Yeah.  I'm just not sure how much benefit in terms of data transfer this will
 save.  Has anyone tested this to see if this is even worth it?
 
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
 - Original Message 
 From: Erik Hatcher [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Thursday, October 30, 2008 9:54:28 AM
 Subject: Re: replication handler - compression
 
 +1 - the GzipServletFilter is the way to go.
 
 Regarding request handlers reading HTTP headers, yeah,... this will improve,
 for 
 sure.
 
 Erik
 
 On Oct 30, 2008, at 12:18 AM, Chris Hostetter wrote:
 
 
 : You are partially right. Instead of the HTTP header , we use a request
 : parameter. (RequestHandlers cannot read HTP headers). If the param is
 
 hmmm, i'm with walter: we shouldn't invent new mechanisms for
 clients to request compression over HTTP from servers.
 
 replicatoin is both special enough and important enough that if we had to
 add special support to make that information available to the handler on
 the master we could.
 
 but frankly i don't think that's neccessary: the logic to turn on
 compression if the client requests it using Accept-Encoding: gzip is
 generic enough that there is no reason for it to be in a handler.  we
 could easily put it in the SolrDispatchFilter, or even in a new
 ServletFilte (i'm guessing iv'e seen about 74 different implementations of
 a GzipServletFilter in the wild that could be used as is.
 
 then we'd have double wins: compression for replication, and compression
 of all responses generated by Solr if hte client requests it.
 
 -Hoss
 



Solr Searching on other fields which are not in query

2008-10-30 Thread Yerraguntla

Hi,

I have a data set with the following schema.

PersonName:Text
AnimalName:Text
PlantName:Text

 lot more attributes about each of them like nick name, animal nick
name, plant generic name etc which are multually exclusive
UniqueId:long

For each of the document data set, there will be only one value of the above
three.

In my solr query from client

I am using AnimalName:German Shepard.

The return result contains 
PersonName with 'Shepard' in it, even though I am querying on AnimalName
field.
Can anyone point me whats happening and how to prevent scanning other
columns/fields.

I appreciate your help.

Thanks
Ravi

-- 
View this message in context: 
http://www.nabble.com/Solr-Searching-on-other-fields-which-are-not-in-query-tp20249798p20249798.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: replication handler - compression

2008-10-30 Thread christophe
Gziping on disk requires quite some I/O. I guess that on the fly zipping 
should be faster.


C.

Walter Underwood wrote:

About a factor of 2 on a small, optimized index. Gzipping took 20 seconds,
so it isn't free.

$ cd index-copy
$ du -sk
134336  .
$ gzip *
$ du -sk
62084   .

wunder

On 10/30/08 8:20 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote:

  

Yeah.  I'm just not sure how much benefit in terms of data transfer this will
save.  Has anyone tested this to see if this is even worth it?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 


From: Erik Hatcher [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Thursday, October 30, 2008 9:54:28 AM
Subject: Re: replication handler - compression

+1 - the GzipServletFilter is the way to go.

Regarding request handlers reading HTTP headers, yeah,... this will improve,
for 
sure.


Erik

On Oct 30, 2008, at 12:18 AM, Chris Hostetter wrote:

  

: You are partially right. Instead of the HTTP header , we use a request
: parameter. (RequestHandlers cannot read HTP headers). If the param is

hmmm, i'm with walter: we shouldn't invent new mechanisms for
clients to request compression over HTTP from servers.

replicatoin is both special enough and important enough that if we had to
add special support to make that information available to the handler on
the master we could.

but frankly i don't think that's neccessary: the logic to turn on
compression if the client requests it using Accept-Encoding: gzip is
generic enough that there is no reason for it to be in a handler.  we
could easily put it in the SolrDispatchFilter, or even in a new
ServletFilte (i'm guessing iv'e seen about 74 different implementations of
a GzipServletFilter in the wild that could be used as is.

then we'd have double wins: compression for replication, and compression
of all responses generated by Solr if hte client requests it.

-Hoss



  


Re: replication handler - compression

2008-10-30 Thread Walter Underwood
CPU was at 100%, it was not IO bound. --wunder

On 10/30/08 8:58 AM, christophe [EMAIL PROTECTED] wrote:

 Gziping on disk requires quite some I/O. I guess that on the fly zipping
 should be faster.
 
 C.
 
 Walter Underwood wrote:
 About a factor of 2 on a small, optimized index. Gzipping took 20 seconds,
 so it isn't free.
 
 $ cd index-copy
 $ du -sk
 134336  .
 $ gzip *
 $ du -sk
 62084   .
 
 wunder
 
 On 10/30/08 8:20 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote:
 
   
 Yeah.  I'm just not sure how much benefit in terms of data transfer this
 will
 save.  Has anyone tested this to see if this is even worth it?
 
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
 - Original Message 
 
 From: Erik Hatcher [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Thursday, October 30, 2008 9:54:28 AM
 Subject: Re: replication handler - compression
 
 +1 - the GzipServletFilter is the way to go.
 
 Regarding request handlers reading HTTP headers, yeah,... this will
 improve,
 for 
 sure.
 
 Erik
 
 On Oct 30, 2008, at 12:18 AM, Chris Hostetter wrote:
 
   
 : You are partially right. Instead of the HTTP header , we use a request
 : parameter. (RequestHandlers cannot read HTP headers). If the param is
 
 hmmm, i'm with walter: we shouldn't invent new mechanisms for
 clients to request compression over HTTP from servers.
 
 replicatoin is both special enough and important enough that if we had to
 add special support to make that information available to the handler on
 the master we could.
 
 but frankly i don't think that's neccessary: the logic to turn on
 compression if the client requests it using Accept-Encoding: gzip is
 generic enough that there is no reason for it to be in a handler.  we
 could easily put it in the SolrDispatchFilter, or even in a new
 ServletFilte (i'm guessing iv'e seen about 74 different implementations of
 a GzipServletFilter in the wild that could be used as is.
 
 then we'd have double wins: compression for replication, and compression
 of all responses generated by Solr if hte client requests it.
 
 -Hoss
 
 
   



Re: corrupt solr index on ec2

2008-10-30 Thread Michael McCandless


One small correction below:

Yonik Seeley wrote:

- I've seen OOM exceptions during warming. I've changed  
maxWarmingSearchers=1, which I suspect will do he trick


OOM errors are really tricky - if they happen in the wrong place, it's
hard to recover gracefully from. Correctly cleaning up after an OOM
error in the IndexWriter recently had some little fixes in lucene
trunk - you might want to try the latest dev version of Lucene and see
if it helps.


This change (to not commit index changes after IndexWriter hits OOME)  
went in Feb 2008.  Solr 1.3 should already have it.


(I'm working now on adding javadocs to IW explaining this).

Mike


Re: Solr Searching on other fields which are not in query

2008-10-30 Thread Jorge Solari
Your query

AnimalName:German Shepard.

means

AnimalName:German defaultField:Shepard.

whichever the defaultField is

Try with
AnimalName:German Shepard

or

AnimalName:German AND AnimalName:Shepard.



On Thu, Oct 30, 2008 at 12:58 PM, Yerraguntla [EMAIL PROTECTED] wrote:


 Hi,

 I have a data set with the following schema.

 PersonName:Text
 AnimalName:Text
 PlantName:Text

  lot more attributes about each of them like nick name, animal nick
 name, plant generic name etc which are multually exclusive
 UniqueId:long

 For each of the document data set, there will be only one value of the
 above
 three.

 In my solr query from client

 I am using AnimalName:German Shepard.

 The return result contains
 PersonName with 'Shepard' in it, even though I am querying on AnimalName
 field.
 Can anyone point me whats happening and how to prevent scanning other
 columns/fields.

 I appreciate your help.

 Thanks
 Ravi

 --
 View this message in context:
 http://www.nabble.com/Solr-Searching-on-other-fields-which-are-not-in-query-tp20249798p20249798.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: corrupt solr index on ec2

2008-10-30 Thread Bill Graham
Thanks Yonik, I'll try changing the lock type to seeing how that works.

Looking closer at the logs I see the app was started at Oct 28, 2008 9:49:38, 
but not long afterwards it got it's first exception when warming the index:

INFO: [] webapp=/solr path=/update params={} status=0 QTime=3
Oct 28, 2008 9:49:47 PM org.apache.solr.common.SolrException log
SEVERE: Error during auto-warming of key:[EMAIL 
PROTECTED]:java.lang.OutOfMemoryError: Java heap space

2008-10-28 21:49:47.674::INFO:  Shutdown hook executing
Oct 28, 2008 9:49:47 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.InterruptedException
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:915)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217)
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:389)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:77)
at 
org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:226)
at 
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123)
...
Oct 28, 2008 9:49:47 PM org.apache.solr.update.processor.LogUpdateProcessor 
finish
INFO: {commit=} 0 81896
Oct 28, 2008 9:49:47 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/update params={} status=0 QTime=81896
Oct 28, 2008 9:49:47 PM org.apache.solr.core.SolrCore close
INFO: []  CLOSING SolrCore [EMAIL PROTECTED]
Oct 28, 2008 9:49:47 PM org.apache.solr.core.SolrCore closeSearcher
INFO: [] Closing main searcher on request.
Oct 28, 2008 9:49:47 PM org.apache.solr.update.DirectUpdateHandler2 close
INFO: closing 
DirectUpdateHandler2{commits=7,autocommits=0,optimizes=1,docsPending=10,adds=10,deletesById=0,deletesByQuery=0,errors=0,cu
mulative_adds=11511,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=1}
2008-10-28 21:49:48.744::INFO:  Shutdown hook complete
Oct 28, 2008 9:49:52 PM org.apache.solr.update.processor.LogUpdateProcessor 
finish
INFO: {add=[Inta:2113254]} 0 3
Oct 28, 2008 9:49:52 PM org.apache.solr.core.SolrCore execute

Then it seemed to run well for about an hour and I saw this:

Oct 28, 2008 10:38:51 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
Oct 28, 2008 10:38:51 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.RuntimeException: after flush: fdx size mismatch: 1156 docs 
vs 0 length in bytes of _2rv.fdx
at 
org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:94)
at 
org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumers.java:83)
at 
org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:47)
at 
org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:367)
at 
org.apache.lucene.index.IndexWriter.flushDocStores(IndexWriter.java:1774)
...
Oct 28, 2008 10:38:53 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed 
out: SingleInstanceLock: write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:85)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1140)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:938)
at 
org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:116)
at 
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:122)
at 
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:167)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:221)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59)
at 
org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:196)

These lock errors continued for about an hour, before what appears to be a 
successful commit/warm occurs. Then things appear normal for about another 1/2 
hour until the missing index file exceptions below appeared. 

There really should only be one solr process running with access to this index 
in my environment, so I'm puzzled re how two processes could mess up the index. 
The only thing that comes to mind is that I'm running monit to monitor my keep 
processes, including solr. It's set to bounce the port if it seems to be 
struggling. Maybe that bounce isn't happening cleanly and I somehow get 
overlapping process. Seem unlikely, but who knows. I'll look into that too.

Also, I'm in the process of 

Re: Distributed search, standard request handler and more like this

2008-10-30 Thread Chris Hostetter
: I'm doing some expirements with the morelikethis functionality using the
: standard request handler to see if it also works with distributed search (I
: saw that it will not yet work with the MoreLikeThis handler,
: https://issues.apache.org/jira/browse/SOLR-788). As far as I can see, this
: also does not work when using the standard request handler, i.e.:

i think perhaps you are confused about that issue .. SOLR-788 isn't About 
the MLT Handler -- it's about the MLT Component (as mentioned in the 
description of the issue) which provides the exact functioality you seem 
to be trying to test (as a component of SearchHandler) ...

: http://localhost:8080/solr/select?q=ID:*documentID*
: mlt=truemlt.fl=Textmlt.mindf=1mlt.mintf=1shards=shard1,shard2


the MLT Handler, also doesn't support distributed searching, but 
as far as i know there aren't any plans to add it (distributed searching 
is a feature of SearchHandler, as a seperate handler MoreLikeThisHandler 
doesnt' take advantage of that at all.)


-Hoss



Re: replication handler - compression

2008-10-30 Thread Chris Hostetter

: Yeah.  I'm just not sure how much benefit in terms of data transfer this 
: will save.  Has anyone tested this to see if this is even worth it?

one mans trash is another mans treasure ... if you're replicating 
snapshoots very frequently within a single datacenter speed is critical
and bandwidth is free -- if you're replicating once a day from one data 
center to another over a very expensive, very small, pipe spending some 
time+cpu to compress may be worth it.

either way: it should be almost trivial to implement if people wnat to 
supply a patch, and with a simple new requestDispatcher config option, 
easy to disable completeley on the server for people who might have 
clients sending Accept-Encodig: gzip willy nilly


-Hoss



Re: How to get the min and max values from facets?

2008-10-30 Thread Chris Hostetter

: hundred thousands)... so, I have to display the range (minimum and maximum
: values) from a facet. Is there any way to do that?
: I found the new statistics components, follow the link :
: http://wiki.apache.org/solr/StatsComponent
: But it's for solr 1.4.

there haven't been many changes on the trunk since 1.3 that StatsComponent 
would depend on, you can probably use it as is.

: Does anyone have any idea?

assuming the field you want the min/max for are stored, you can do 
multiple hits sorted on that field to get the highest and lowest value.



-Hoss



Re: Solr Searching on other fields which are not in query

2008-10-30 Thread Yerraguntla

Hmm,

I dont have any defaultField defined in the schema.xml.
Can you give the exact syntax how it looks like in schema.xml

I have defaultSearchFieldtext/defaultSearchField. 
Does it mean if sufficient requested count not available, it looks for the
search string in any of the text fields that are indexed?

Thanks
Ravi


Jorge Solari wrote:
 
 Your query
 
 AnimalName:German Shepard.
 
 means
 
 AnimalName:German defaultField:Shepard.
 
 whichever the defaultField is
 
 Try with
 AnimalName:German Shepard
 
 or
 
 AnimalName:German AND AnimalName:Shepard.
 
 
 
 On Thu, Oct 30, 2008 at 12:58 PM, Yerraguntla [EMAIL PROTECTED]
 wrote:
 

 Hi,

 I have a data set with the following schema.

 PersonName:Text
 AnimalName:Text
 PlantName:Text

  lot more attributes about each of them like nick name, animal nick
 name, plant generic name etc which are multually exclusive
 UniqueId:long

 For each of the document data set, there will be only one value of the
 above
 three.

 In my solr query from client

 I am using AnimalName:German Shepard.

 The return result contains
 PersonName with 'Shepard' in it, even though I am querying on AnimalName
 field.
 Can anyone point me whats happening and how to prevent scanning other
 columns/fields.

 I appreciate your help.

 Thanks
 Ravi

 --
 View this message in context:
 http://www.nabble.com/Solr-Searching-on-other-fields-which-are-not-in-query-tp20249798p20249798.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://www.nabble.com/Solr-Searching-on-other-fields-which-are-not-in-query-tp20249798p20252475.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr Searching on other fields which are not in query

2008-10-30 Thread Yerraguntla

Never mind, 
I understand now.

  I have defaultSearchFieldtext/defaultSearchField. 

I was searching on a string field with space in it and with no quotes.

This is causing to scan for text fields(since default search field is text)
in the schema.
Also in my schema there is an indexed field(AnimalNameText) which is not
populated which is a text field.

After I populate the text field, I get only the results I am expecting.

Thanks for the pointer Jorge !





Yerraguntla wrote:
 
 Hmm,
 
 I dont have any defaultField defined in the schema.xml.
 Can you give the exact syntax how it looks like in schema.xml
 
 I have defaultSearchFieldtext/defaultSearchField. 
 Does it mean if sufficient requested count not available, it looks for the
 search string in any of the text fields that are indexed?
 
 Thanks
 Ravi
 
 
 Jorge Solari wrote:
 
 Your query
 
 AnimalName:German Shepard.
 
 means
 
 AnimalName:German defaultField:Shepard.
 
 whichever the defaultField is
 
 Try with
 AnimalName:German Shepard
 
 or
 
 AnimalName:German AND AnimalName:Shepard.
 
 
 
 On Thu, Oct 30, 2008 at 12:58 PM, Yerraguntla [EMAIL PROTECTED]
 wrote:
 

 Hi,

 I have a data set with the following schema.

 PersonName:Text
 AnimalName:Text
 PlantName:Text

  lot more attributes about each of them like nick name, animal nick
 name, plant generic name etc which are multually exclusive
 UniqueId:long

 For each of the document data set, there will be only one value of the
 above
 three.

 In my solr query from client

 I am using AnimalName:German Shepard.

 The return result contains
 PersonName with 'Shepard' in it, even though I am querying on AnimalName
 field.
 Can anyone point me whats happening and how to prevent scanning other
 columns/fields.

 I appreciate your help.

 Thanks
 Ravi

 --
 View this message in context:
 http://www.nabble.com/Solr-Searching-on-other-fields-which-are-not-in-query-tp20249798p20249798.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Solr-Searching-on-other-fields-which-are-not-in-query-tp20249798p20252950.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr Searching on other fields which are not in query

2008-10-30 Thread Jorge Solari
I didn't mean with defaultField that it was the way to define the default
field it in the schema, only a generic way to say default field name.

The default field name, seems to be text in your case.

If the search query doesn't say on which field to search, the word will be
searched in that field.

in the query

AnimalName:German Shepard

you are saying:

search the word German in the field AnimalName and the word Shepard
in the default field

I think that that the search you want to do is

AnimalName:German AND AnimalName:Shepard

I don't know if there is a way to say to solr to search on all fields.

you may be copying the content of other fields to the text field.

See if there is something like

copyField source=* dest=text/

in the schema file.

Jorge


On Thu, Oct 30, 2008 at 3:13 PM, Yerraguntla [EMAIL PROTECTED] wrote:


 Hmm,

 I dont have any defaultField defined in the schema.xml.
 Can you give the exact syntax how it looks like in schema.xml

 I have defaultSearchFieldtext/defaultSearchField.
 Does it mean if sufficient requested count not available, it looks for the
 search string in any of the text fields that are indexed?

 Thanks
 Ravi


 Jorge Solari wrote:
 
  Your query
 
  AnimalName:German Shepard.
 
  means
 
  AnimalName:German defaultField:Shepard.
 
  whichever the defaultField is
 
  Try with
  AnimalName:German Shepard
 
  or
 
  AnimalName:German AND AnimalName:Shepard.
 
 
 
  On Thu, Oct 30, 2008 at 12:58 PM, Yerraguntla [EMAIL PROTECTED]
  wrote:
 
 
  Hi,
 
  I have a data set with the following schema.
 
  PersonName:Text
  AnimalName:Text
  PlantName:Text
 
   lot more attributes about each of them like nick name, animal nick
  name, plant generic name etc which are multually exclusive
  UniqueId:long
 
  For each of the document data set, there will be only one value of the
  above
  three.
 
  In my solr query from client
 
  I am using AnimalName:German Shepard.
 
  The return result contains
  PersonName with 'Shepard' in it, even though I am querying on AnimalName
  field.
  Can anyone point me whats happening and how to prevent scanning other
  columns/fields.
 
  I appreciate your help.
 
  Thanks
  Ravi
 
  --
  View this message in context:
 
 http://www.nabble.com/Solr-Searching-on-other-fields-which-are-not-in-query-tp20249798p20249798.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 

 --
 View this message in context:
 http://www.nabble.com/Solr-Searching-on-other-fields-which-are-not-in-query-tp20249798p20252475.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Changing mergeFactor in mid-stream?

2008-10-30 Thread Barnett, Jeffrey
The http://wiki.apache.org/lucene-java/ImproveIndexingSpeed page suggests that 
indexing will be sped up by using higher values of mergeFactor, while search 
speed improves with lower values.  I need to create an index using multiple 
batches of documents.  My question is, can I begin building with a high 
mergeFactor for the bulk of the load and then switch to a lower value for the 
final batch?  I build the indices offline, and only swap them to online when 
complete.  The online index is never updated.


Re: How to get the min and max values from facets?

2008-10-30 Thread Chris Hostetter

: myfacet, ASC, limit 1
: myfacet, DESC, limit 1
: So I can get the first value and the last one.
: 
: Do you think I will get more performance with this way than using stats?

I'm guessing that by all measurable metrics, the StatsComponent will blow 
that out of the water -- i was just putting it out there as a possible 
alternative if you didn't feel comfortable enough with java to compile the 
StatsComponent and use it with Solr 1.3.



-Hoss



Re: Max Number of Facets

2008-10-30 Thread Ryan McKinley
the only 'limit' is the effect on your query times...  you could have  
1000+ facets if you are ok with the response time.


Sorry to give the it depends answer, but it totally depends on your  
data and your needs.




On Oct 30, 2008, at 7:28 AM, Jeryl Cook wrote:


is there a limit on the number of facets that i can create in
Solr?(dynamically generated facets.)

--
Jeryl Cook
/^\ Pharaoh /^\
http://pharaohofkush.blogspot.com/
Whether we bring our enemies to justice, or bring justice to our
enemies, justice will be done.
--George W. Bush, Address to a Joint Session of Congress and the
American People, September 20, 2001




Re: Max Number of Facets

2008-10-30 Thread Stephen Weiss
I've actually seen cases on our site where it's possible to bring up  
over 30,000 facets for one query.  And they actually come up quickly -  
like, 3 seconds.  It takes longer for the browser to render them.


--
Steve

On Oct 30, 2008, at 6:04 PM, Ryan McKinley wrote:

the only 'limit' is the effect on your query times...  you could  
have 1000+ facets if you are ok with the response time.


Sorry to give the it depends answer, but it totally depends on  
your data and your needs.




On Oct 30, 2008, at 7:28 AM, Jeryl Cook wrote:


is there a limit on the number of facets that i can create in
Solr?(dynamically generated facets.)

--
Jeryl Cook
/^\ Pharaoh /^\
http://pharaohofkush.blogspot.com/
Whether we bring our enemies to justice, or bring justice to our
enemies, justice will be done.
--George W. Bush, Address to a Joint Session of Congress and the
American People, September 20, 2001






Re: Max Number of Facets

2008-10-30 Thread Jeryl Cook
I understand what you mean..I am building a system that will
dynammically generate facets which could possible be thousands , but
at most about 6 or 7 facets will be returned using a facet ranking
algorithm so I get what you mean if I request in my query that I
want 1000 faets back compared to just 6 or 7 i could take a
performance hit..

On 10/30/08, Ryan McKinley [EMAIL PROTECTED] wrote:
 the only 'limit' is the effect on your query times...  you could have
 1000+ facets if you are ok with the response time.

 Sorry to give the it depends answer, but it totally depends on your
 data and your needs.



 On Oct 30, 2008, at 7:28 AM, Jeryl Cook wrote:

 is there a limit on the number of facets that i can create in
 Solr?(dynamically generated facets.)

 --
 Jeryl Cook
 /^\ Pharaoh /^\
 http://pharaohofkush.blogspot.com/
 Whether we bring our enemies to justice, or bring justice to our
 enemies, justice will be done.
 --George W. Bush, Address to a Joint Session of Congress and the
 American People, September 20, 2001




-- 
Jeryl Cook
/^\ Pharaoh /^\
http://pharaohofkush.blogspot.com/
Whether we bring our enemies to justice, or bring justice to our
enemies, justice will be done.
--George W. Bush, Address to a Joint Session of Congress and the
American People, September 20, 2001


Re: Max Number of Facets

2008-10-30 Thread Jeryl Cook
wow ,30k in under 3 seconds

On 10/30/08, Stephen Weiss [EMAIL PROTECTED] wrote:
 I've actually seen cases on our site where it's possible to bring up
 over 30,000 facets for one query.  And they actually come up quickly -
 like, 3 seconds.  It takes longer for the browser to render them.

 --
 Steve

 On Oct 30, 2008, at 6:04 PM, Ryan McKinley wrote:

 the only 'limit' is the effect on your query times...  you could
 have 1000+ facets if you are ok with the response time.

 Sorry to give the it depends answer, but it totally depends on
 your data and your needs.



 On Oct 30, 2008, at 7:28 AM, Jeryl Cook wrote:

 is there a limit on the number of facets that i can create in
 Solr?(dynamically generated facets.)

 --
 Jeryl Cook
 /^\ Pharaoh /^\
 http://pharaohofkush.blogspot.com/
 Whether we bring our enemies to justice, or bring justice to our
 enemies, justice will be done.
 --George W. Bush, Address to a Joint Session of Congress and the
 American People, September 20, 2001





-- 
Jeryl Cook
/^\ Pharaoh /^\
http://pharaohofkush.blogspot.com/
Whether we bring our enemies to justice, or bring justice to our
enemies, justice will be done.
--George W. Bush, Address to a Joint Session of Congress and the
American People, September 20, 2001


Re: Solr Searching on other fields which are not in query

2008-10-30 Thread Norberto Meijome
On Thu, 30 Oct 2008 15:50:58 -0300
Jorge Solari [EMAIL PROTECTED] wrote:

 copyField source=* dest=text/
 
 in the schema file.

or use Dismax query handler.
b

_
{Beto|Norberto|Numard} Meijome

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


DIH and rss feeds

2008-10-30 Thread Lance Norskog
I have a DataImportHandler configured to index from an RSS feed. It is a
latest stuff feed. It reads the feed and indexes the 100 documents
harvested from the feed. So far, works great.
 
Now: a few hours later there are a different 100 lastest documents. How do
I add those to the index so I will have 200 documents?  'full-import' throws
away the first 100. 'delta-import' is not implemented. What is the special
trick here?  I'm using the Solr-1.3.0 release.
 
Thanks,
 
Lance Norskog


RE: Using Solrj

2008-10-30 Thread Raghunandan Rao
Thank you so much. 

Here goes my Use case:

I need to search the database for collection of input parameters which touches 
'n' number of tables. The data is very huge. The search query itself is so 
dynamic. I use lot of views for same search. How do I make use of Solr in this 
case? 

-Original Message-
From: Erick Erickson [mailto:[EMAIL PROTECTED] 
Sent: Thursday, October 30, 2008 7:01 PM
To: solr-user@lucene.apache.org
Subject: Re: Using Solrj

Generally, you need to get your head out of the database world and into
the search world to be successful with Lucene. For instance, one
of the cardinal tenets of database design is to normalize your
data. It goes against every instinct to *denormalize* your data when
creating an Lucene index explicitly so you do NOT have to think
in terms of joins or sub-queries. Whenever I start thinking this
way, I try to back up and think again.

Both your posts indicate to me that you're thinking in database
terms. There are no views in Lucene, for instance. You refer
to tables. There are no tables in Lucene, there are only documents
with various numbers of fields. You could conceivable make your index
look like a database by creatively naming your document fields. But
that doesn't play to the strengths of Lucene *or* the database.

In fact, there is NO requirement that documents have the *same* fields.
Which is really difficult to get into when thinking like a DBA.

Lucene is designed to search text. Fast and well. It is NOT intended to
efficiently manipulate relationships *between* documents. There
are various hybrid solutions that people have used. That is, put the
data you really need to do text searching on in a Lucene index,
along with enough data to be able to get the *rest* of what you need
from your database. But it all depends upon the problem you're trying to
solve.

But as Noble says, all this is too general to be really useful, you need
to provide quite more detail about the problem you're trying to
solve to get useful recommendations.

Best
Erick

On Thu, Oct 30, 2008 at 8:50 AM, Raghunandan Rao 
[EMAIL PROTECTED] wrote:

 Thanks Noble.

 So you mean to say that I need to create a view according to my query and
 then index on the view and fetch?

 -Original Message-
 From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED]
 Sent: Thursday, October 30, 2008 6:16 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Using Solrj

 hi ,
 There are two sides to this .
 1. indexing (getting data into Solr) SolrJ or DataImportHandler can be
 used for this
 2.querying . getting data out of solr. Here you do not have the choice
 of joining multiple tables. There only one index for Solr



 On Thu, Oct 30, 2008 at 5:34 PM, Raghunandan Rao
 [EMAIL PROTECTED] wrote:
  Hi,
 
  I am trying to use Solrj for my web application. I am indexing a table
  using the @Field annotation tag. Now I need to index or query multiple
  tables. Like, get all the employees who are managers in Finance
  department (interacting with 3 entities). How do I do that?
 
 
 
  Does anyone have any idea?
 
 
 
  Thanks
 
 



 --
 --Noble Paul



Re: DIH and rss feeds

2008-10-30 Thread Norberto Meijome
On Thu, 30 Oct 2008 20:46:16 -0700
Lance Norskog [EMAIL PROTECTED] wrote:

 Now: a few hours later there are a different 100 lastest documents. How do
 I add those to the index so I will have 200 documents?  'full-import' throws
 away the first 100. 'delta-import' is not implemented. What is the special
 trick here?  I'm using the Solr-1.3.0 release.
  

Lance,

1) DIH has a clean parameter that, when set to true ( default, i think), will 
delete all existing docs in the index.

2) ensure your new documents have different values in your field defined as key 
( schema.xml) .

let us know how it goes,
B

_
{Beto|Norberto|Numard} Meijome

Lack of planning on your part does not constitute an emergency on ours.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Changing mergeFactor in mid-stream?

2008-10-30 Thread Otis Gospodnetic
Yes, you can change the mergeFactor.  More important than the mergeFactor is 
this:

ramBufferSizeMB32/ramBufferSizeMB

Pump it up as much as your hardware/JVM allows.  And use appropriate -Xmx, of 
course.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Barnett, Jeffrey [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Thursday, October 30, 2008 3:49:30 PM
 Subject: Changing mergeFactor in mid-stream?
 
 The http://wiki.apache.org/lucene-java/ImproveIndexingSpeed page suggests 
 that 
 indexing will be sped up by using higher values of mergeFactor, while search 
 speed improves with lower values.  I need to create an index using multiple 
 batches of documents.  My question is, can I begin building with a high 
 mergeFactor for the bulk of the load and then switch to a lower value for the 
 final batch?  I build the indices offline, and only swap them to online when 
 complete.  The online index is never updated.



Re: replication handler - compression

2008-10-30 Thread Otis Gospodnetic
man gzip:

   -# --fast --best
  Regulate the speed of compression using the specified digit #, 
where -1 or --fast indicates the  fastest  compres-
  sion  method (less compression) and -9 or --best indicates the 
slowest compression method (best compression).  The
  default compression level is -6 (that is, biased towards high 
compression at expense of speed).

 
So it could be better than the factor of 2, but also take longer. :)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Walter Underwood [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Thursday, October 30, 2008 11:52:47 AM
 Subject: Re: replication handler - compression
 
 About a factor of 2 on a small, optimized index. Gzipping took 20 seconds,
 so it isn't free.
 
 $ cd index-copy
 $ du -sk
 134336  .
 $ gzip *
 $ du -sk
 62084   .
 
 wunder
 
 On 10/30/08 8:20 AM, Otis Gospodnetic wrote:
 
  Yeah.  I'm just not sure how much benefit in terms of data transfer this 
  will
  save.  Has anyone tested this to see if this is even worth it?
  
  
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
  
  
  
  - Original Message 
  From: Erik Hatcher 
  To: solr-user@lucene.apache.org
  Sent: Thursday, October 30, 2008 9:54:28 AM
  Subject: Re: replication handler - compression
  
  +1 - the GzipServletFilter is the way to go.
  
  Regarding request handlers reading HTTP headers, yeah,... this will 
  improve,
  for 
  sure.
  
  Erik
  
  On Oct 30, 2008, at 12:18 AM, Chris Hostetter wrote:
  
  
  : You are partially right. Instead of the HTTP header , we use a request
  : parameter. (RequestHandlers cannot read HTP headers). If the param is
  
  hmmm, i'm with walter: we shouldn't invent new mechanisms for
  clients to request compression over HTTP from servers.
  
  replicatoin is both special enough and important enough that if we had to
  add special support to make that information available to the handler on
  the master we could.
  
  but frankly i don't think that's neccessary: the logic to turn on
  compression if the client requests it using Accept-Encoding: gzip is
  generic enough that there is no reason for it to be in a handler.  we
  could easily put it in the SolrDispatchFilter, or even in a new
  ServletFilte (i'm guessing iv'e seen about 74 different implementations of
  a GzipServletFilter in the wild that could be used as is.
  
  then we'd have double wins: compression for replication, and compression
  of all responses generated by Solr if hte client requests it.
  
  -Hoss
  



Re: replication handler - compression

2008-10-30 Thread Walter Underwood
It could also be that the C version is a lot more efficient than
the Java version and it could take longer regardless. I could not
find a benchmark on that, but C is usually better for bit twiddling.

wunder

On 10/30/08 10:36 PM, Otis Gospodnetic [EMAIL PROTECTED] wrote:

 man gzip:
 
-# --fast --best
   Regulate the speed of compression using the specified digit #,
 where -1 or --fast indicates the  fastest  compres-
   sion  method (less compression) and -9 or --best indicates the
 slowest compression method (best compression).  The
   default compression level is -6 (that is, biased towards high
 compression at expense of speed).
 
  
 So it could be better than the factor of 2, but also take longer. :)
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
 - Original Message 
 From: Walter Underwood [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Thursday, October 30, 2008 11:52:47 AM
 Subject: Re: replication handler - compression
 
 About a factor of 2 on a small, optimized index. Gzipping took 20 seconds,
 so it isn't free.
 
 $ cd index-copy
 $ du -sk
 134336  .
 $ gzip *
 $ du -sk
 62084   .
 
 wunder
 
 On 10/30/08 8:20 AM, Otis Gospodnetic wrote:
 
 Yeah.  I'm just not sure how much benefit in terms of data transfer this
 will
 save.  Has anyone tested this to see if this is even worth it?
 
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
 - Original Message 
 From: Erik Hatcher
 To: solr-user@lucene.apache.org
 Sent: Thursday, October 30, 2008 9:54:28 AM
 Subject: Re: replication handler - compression
 
 +1 - the GzipServletFilter is the way to go.
 
 Regarding request handlers reading HTTP headers, yeah,... this will
 improve,
 for 
 sure.
 
 Erik
 
 On Oct 30, 2008, at 12:18 AM, Chris Hostetter wrote:
 
 
 : You are partially right. Instead of the HTTP header , we use a request
 : parameter. (RequestHandlers cannot read HTP headers). If the param is
 
 hmmm, i'm with walter: we shouldn't invent new mechanisms for
 clients to request compression over HTTP from servers.
 
 replicatoin is both special enough and important enough that if we had to
 add special support to make that information available to the handler on
 the master we could.
 
 but frankly i don't think that's neccessary: the logic to turn on
 compression if the client requests it using Accept-Encoding: gzip is
 generic enough that there is no reason for it to be in a handler.  we
 could easily put it in the SolrDispatchFilter, or even in a new
 ServletFilte (i'm guessing iv'e seen about 74 different implementations of
 a GzipServletFilter in the wild that could be used as is.
 
 then we'd have double wins: compression for replication, and compression
 of all responses generated by Solr if hte client requests it.
 
 -Hoss