Re: Query for Distributed search -

2008-11-24 Thread James liu
Up to your solr client.

On Mon, Nov 24, 2008 at 1:24 PM, souravm [EMAIL PROTECTED] wrote:

 Hi,

 Looking for some insight on distributed search.

 Say I have an index distributed in 3 boxes and the index contains time and
 text data (typical log file). Each box has index for different timeline -
 say Box 1 for all Jan to April, Box 2 for May to August and Box 3 for Sep to
 Dec.

 Now if I try to search for a text string, will the search would happen in
 parallel in all 3 boxes or sequentially?

 Regards,
 Sourav

  CAUTION - Disclaimer *
 This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
 solely
 for the use of the addressee(s). If you are not the intended recipient,
 please
 notify the sender by e-mail and delete the original message. Further, you
 are not
 to copy, disclose, or distribute this e-mail or its contents to any other
 person and
 any such actions are unlawful. This e-mail may contain viruses. Infosys has
 taken
 every reasonable precaution to minimize this risk, but is not liable for
 any damage
 you may sustain as a result of any virus in this e-mail. You should carry
 out your
 own virus checks before opening the e-mail or attachment. Infosys reserves
 the
 right to monitor and review the content of all messages sent to or from
 this e-mail
 address. Messages sent to or from this e-mail address may be stored on the
 Infosys e-mail system.
 ***INFOSYS End of Disclaimer INFOSYS***




-- 
regards
j.L


RE: Query for Distributed search -

2008-11-24 Thread souravm
Hi,

I understand your point on how do I do it myself in my Java code. 

However, I'm more interested to know how the default behaviour of 
DistributedSearch work when I issue a command like curl 
'http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=ipod+solr'
 as mentioned in the wiki.

Regards,
Sourav

-Original Message-
From: Aleksander M. Stensby [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 24, 2008 12:37 AM
To: solr-user@lucene.apache.org
Subject: Re: Query for Distributed search -

If you for instance use SolrJ and the HttpSolrServer, you could for  
instance add logic to your querying making your searches more efficient!  
That is partially the idea of sharding, right? :) So if the user wants to  
search for a log file in June, your application knows that June logs are  
stored on the second box, and hence will redirect the search to that box.  
Alternatively if he wants to search for logs spanning two boxes, you  
merely add the shards parameter to your query and just include the path to  
those to shards in question. I'm not really sure about how solr handles  
the merging of results etc and wether or not the requests are done in  
paralell or sequentially, but I do know that you could easily manage this  
on your own through java if you want to. (Simply setting up one  
HttpSolrServer in your code for each shard, and searching them in  
parallell in separate threads. = then reducing the results afterwards).

Have a look at http://wiki.apache.org/solr/DistributedSearch for more info.
You could also take a look at Hadoop. (http://hadoop.apache.org/)

regards,
  Aleks

On Mon, 24 Nov 2008 06:24:51 +0100, souravm [EMAIL PROTECTED] wrote:

 Hi,

 Looking for some insight on distributed search.

 Say I have an index distributed in 3 boxes and the index contains time  
 and text data (typical log file). Each box has index for different  
 timeline - say Box 1 for all Jan to April, Box 2 for May to August and  
 Box 3 for Sep to Dec.

 Now if I try to search for a text string, will the search would happen  
 in parallel in all 3 boxes or sequentially?

 Regards,
 Sourav

  CAUTION - Disclaimer *
 This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended  
 solely
 for the use of the addressee(s). If you are not the intended recipient,  
 please
 notify the sender by e-mail and delete the original message. Further,  
 you are not
 to copy, disclose, or distribute this e-mail or its contents to any  
 other person and
 any such actions are unlawful. This e-mail may contain viruses. Infosys  
 has taken
 every reasonable precaution to minimize this risk, but is not liable for  
 any damage
 you may sustain as a result of any virus in this e-mail. You should  
 carry out your
 own virus checks before opening the e-mail or attachment. Infosys  
 reserves the
 right to monitor and review the content of all messages sent to or from  
 this e-mail
 address. Messages sent to or from this e-mail address may be stored on  
 the
 Infosys e-mail system.
 ***INFOSYS End of Disclaimer INFOSYS***




-- 
Aleksander M. Stensby
Senior software developer
Integrasco A/S
www.integrasco.no


Re: Query for Distributed search -

2008-11-24 Thread Chris Hostetter

: Subject: Query for Distributed search -
: In-Reply-To: [EMAIL PROTECTED]



http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is hidden in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/Thread_hijacking



-Hoss



Query for Distributed search -

2008-11-23 Thread souravm
Hi,

Looking for some insight on distributed search.

Say I have an index distributed in 3 boxes and the index contains time and text 
data (typical log file). Each box has index for different timeline - say Box 1 
for all Jan to April, Box 2 for May to August and Box 3 for Sep to Dec.

Now if I try to search for a text string, will the search would happen in 
parallel in all 3 boxes or sequentially?

Regards,
Sourav

 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are 
not 
to copy, disclose, or distribute this e-mail or its contents to any other 
person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken 
every reasonable precaution to minimize this risk, but is not liable for any 
damage 
you may sustain as a result of any virus in this e-mail. You should carry out 
your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this 
e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***


Re: Query on distributed search ...

2008-11-04 Thread Shalin Shekhar Mangar
Yes, StatsComponent can be used in a distributed Solr environment.

StatComponent is in the 1.4 nightly builds (unreleased yet).

On Tue, Nov 4, 2008 at 2:40 AM, souravm [EMAIL PROTECTED] wrote:

 Hi,

 I'm new to Solr. Here is a query on distributed search.

 I have huge volume of log files which I would like to search. Apart from
 generic test search I would also like to get statistics - say each record
 has a field telling request processing time and I would like to get average
 of processing time for a given type of request. So for this I'm planning to
 use StatComponent.

 Since the log file volume is huge I plan to distribute it in multiple
 physical boxes and plan to use distributed search. However, I'm not sure
 whether StaComponent can be used in distributed search scenario.

 Any pointer on this query would be really helpful.

 Regards,
 Sourav

  CAUTION - Disclaimer *
 This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
 solely
 for the use of the addressee(s). If you are not the intended recipient,
 please
 notify the sender by e-mail and delete the original message. Further, you
 are not
 to copy, disclose, or distribute this e-mail or its contents to any other
 person and
 any such actions are unlawful. This e-mail may contain viruses. Infosys has
 taken
 every reasonable precaution to minimize this risk, but is not liable for
 any damage
 you may sustain as a result of any virus in this e-mail. You should carry
 out your
 own virus checks before opening the e-mail or attachment. Infosys reserves
 the
 right to monitor and review the content of all messages sent to or from
 this e-mail
 address. Messages sent to or from this e-mail address may be stored on the
 Infosys e-mail system.
 ***INFOSYS End of Disclaimer INFOSYS***




-- 
Regards,
Shalin Shekhar Mangar.


Query on distributed search ...

2008-11-03 Thread souravm
Hi,

I'm new to Solr. Here is a query on distributed search.

I have huge volume of log files which I would like to search. Apart from 
generic test search I would also like to get statistics - say each record has a 
field telling request processing time and I would like to get average of 
processing time for a given type of request. So for this I'm planning to use 
StatComponent.

Since the log file volume is huge I plan to distribute it in multiple physical 
boxes and plan to use distributed search. However, I'm not sure whether 
StaComponent can be used in distributed search scenario.

Any pointer on this query would be really helpful.

Regards,
Sourav

 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are 
not 
to copy, disclose, or distribute this e-mail or its contents to any other 
person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken 
every reasonable precaution to minimize this risk, but is not liable for any 
damage 
you may sustain as a result of any virus in this e-mail. You should carry out 
your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this 
e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***