Re: Multi index

2009-11-30 Thread Shalin Shekhar Mangar
On Sat, Nov 28, 2009 at 3:12 PM, Jörg Agatz wrote:

> Hallo Users...
>
> At the Moment i test MultiCorae Solr, but i cant search in more than one
> core direktly..
>
> Exist a way to use multiindex, 3-5 Indizes in one core ans search direkty
> in
> all? ore only in one?
>
>
>
You can search on all cores if schema.xml is same. See
http://wiki.apache.org/solr/DistributedSearch

If schema.xml is different, you can search on one core only. You can
denormalize and combine cores if you want to search on all of them.


-- 
Regards,
Shalin Shekhar Mangar.


Re: Multi Index

2009-11-29 Thread Shalin Shekhar Mangar
On Sun, Nov 29, 2009 at 11:26 AM, Bhuvi HN  wrote:

> Hi all
> I am in need of using single Solr instance for multi indexing. Please let
> me
> know if this is possble to do.
>

See http://wiki.apache.org/solr/MultipleIndexes

As a best practice, consider de-normalizing your data, if possible.

-- 
Regards,
Shalin Shekhar Mangar.


Multi Index

2009-11-28 Thread Bhuvi HN
Hi all
I am in need of using single Solr instance for multi indexing. Please let me
know if this is possble to do.
Regards
Bhuvi


Multi index

2009-11-28 Thread Jörg Agatz
Hallo Users...

At the Moment i test MultiCorae Solr, but i cant search in more than one
core direktly..

Exist a way to use multiindex, 3-5 Indizes in one core ans search direkty in
all? ore only in one?

it is realy important or my Projekt.

Thanks

King


Re: Multi-index Design

2009-05-06 Thread Michael Ludwig

Matt Weber schrieb:


http://wiki.apache.org/solr/MultipleIndexes


Thanks, Mark. Your explanation and the pointer to the Wiki have
clarified things for me.

Michael Ludwig


RE: Multi-index Design

2009-05-05 Thread Manepalli, Kalyan
That's how we do it in Orbitz. We use "type" field to separate content, review 
and promotional information in one single index. And then we use the 
last-components to plugin these data together.

Only thing that we haven't yet tested is the scalability of this model, since 
our data is small.

Thanks,
Kalyan Manepalli

-Original Message-
From: Chris Masters [mailto:roti...@yahoo.com]
Sent: Tuesday, May 05, 2009 10:00 AM
To: solr-user@lucene.apache.org
Subject: Multi-index Design


Hi All,

I'm [still!] evaluating Solr and setting up a PoC. The requirements are to 
index the following objects:

 - people - name, status, date added, address, profile, other people specific 
fields like group...
 - organisations - name, status, date added, address, profile, other 
organisational specific fields like size...
 - products - name, status, date added, profile, other product specific fields 
like product groups..

AND...I need to isolate indexes to a number of dynamic domains (customerA, 
customerB...) that will grow over time.

So, my initial thoughts are to do the following:

 - flatten the searchable objects as much as I can - use a type field to 
distinguish - into a single index
 - use multi-core approach to segregate domains of data

So, a couple questions on this:

 1) Is this approach/design sensible and do others use it?

 2) By flattening the data we will only index common fields; is it unreasonable 
to do a second database search and union the results when doing advanced 
searches on non indexed fields? Do others do this?

 3) I've read that I can dynamically add a new core - this fits well with the 
ability to dynamically add new domains; how scaliable is this approach? Would 
it be unreasonable to have 20-30 dynaimically created cores? I guess, 
redundancy aside and given our one core per domain approach, we could easily 
spill onto other physical servers without the need for replication?

Thanks again for your help!
rotis





Re: Multi-index Design

2009-05-05 Thread Matt Weber
1 - A field that is called "type" which is probably a string field  
that you index values such as "people", "organization", "product".


2 - Yes, for each document you are indexing, you will include it's  
type, ie. "person"


3, 4, 5 - You would have a core for each domain.  Each domain will  
then have it's own index that contains documents of all types.  See http://wiki.apache.org/solr/MultipleIndexes 
.


Thanks,

Matt Weber




On May 5, 2009, at 11:14 AM, Michael Ludwig wrote:


Chris Masters schrieb:


- flatten the searchable objects as much as I can - use a type field
  to distinguish - into a single index
- use multi-core approach to segregate domains of data


Some newbie questions:

(1) What is a "type field"? Is it to designate different types of
documents, e.g. product descriptions and forum postings?

(2) Would I include such a "type field" in the data I send to the  
update

facility and maybe configure Solr to take special action depending on
the value of the update field?

(3) Like, write the processing results to a domain dedicated to that
type of data that I could limit my search to, as per Otis' post?

(4) And is that what's called a "core" here?

(5) Or, failing (3), and lumping everything together in one search
domain (core?), would I use that "type field" to limit my search to
a particular type of data?

Michael Ludwig




Re: Multi-index Design

2009-05-05 Thread Michael Ludwig

Chris Masters schrieb:


 - flatten the searchable objects as much as I can - use a type field
   to distinguish - into a single index
 - use multi-core approach to segregate domains of data


Some newbie questions:

(1) What is a "type field"? Is it to designate different types of
documents, e.g. product descriptions and forum postings?

(2) Would I include such a "type field" in the data I send to the update
facility and maybe configure Solr to take special action depending on
the value of the update field?

(3) Like, write the processing results to a domain dedicated to that
type of data that I could limit my search to, as per Otis' post?

(4) And is that what's called a "core" here?

(5) Or, failing (3), and lumping everything together in one search
domain (core?), would I use that "type field" to limit my search to
a particular type of data?

Michael Ludwig


Re: Multi-index Design

2009-05-05 Thread Otis Gospodnetic

Chris,

1) I'd put different types of data in different cores/instances, unless you 
relly need to search them all together.  By using only common attributes 
you are kind of killing the richness of data and your ability to do something 
useful with it.

2) I'd triple-check the "do a second database search and union the results when 
doing advanced searches on non indexed field" part if you are dealing with 
non-trivial query rate.

3) Some people have thousands of Solr cores.  Not sure on how many machines, 
but it's all a function of data size, hardware specs, query complexity and rate.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Chris Masters 
> To: solr-user@lucene.apache.org
> Sent: Tuesday, May 5, 2009 10:59:40 AM
> Subject: Multi-index Design
> 
> 
> Hi All,
> 
> I'm [still!] evaluating Solr and setting up a PoC. The requirements are to 
> index 
> the following objects:
> 
>  - people - name, status, date added, address, profile, other people specific 
> fields like group...
>  - organisations - name, status, date added, address, profile, other 
> organisational specific fields like size...
>  - products - name, status, date added, profile, other product specific 
> fields 
> like product groups..
> 
> AND...I need to isolate indexes to a number of dynamic domains (customerA, 
> customerB...) that will grow over time.
> 
> So, my initial thoughts are to do the following:
> 
>  - flatten the searchable objects as much as I can - use a type field to 
> distinguish - into a single index
>  - use multi-core approach to segregate domains of data
> 
> So, a couple questions on this:
> 
>  1) Is this approach/design sensible and do others use it?
> 
>  2) By flattening the data we will only index common fields; is it 
> unreasonable 
> to do a second database search and union the results when doing advanced 
> searches on non indexed fields? Do others do this?
> 
>  3) I've read that I can dynamically add a new core - this fits well with the 
> ability to dynamically add new domains; how scaliable is this approach? Would 
> it 
> be unreasonable to have 20-30 dynaimically created cores? I guess, redundancy 
> aside and given our one core per domain approach, we could easily spill onto 
> other physical servers without the need for replication? 
> 
> Thanks again for your help!
> rotis



Re: Multi-index Design

2009-05-05 Thread Walter Underwood
More precisely, we use a single core, flat schema, with a type field.

wunder

On 5/5/09 8:48 AM, "Walter Underwood"  wrote:

> That is how we do it at Netflix. --wunder
> 
> On 5/5/09 7:59 AM, "Chris Masters"  wrote:
> 
>>  1) Is this approach/design sensible and do others use it?
> 



Re: Multi-index Design

2009-05-05 Thread Walter Underwood
That is how we do it at Netflix. --wunder

On 5/5/09 7:59 AM, "Chris Masters"  wrote:

>  1) Is this approach/design sensible and do others use it?



Multi-index Design

2009-05-05 Thread Chris Masters

Hi All,

I'm [still!] evaluating Solr and setting up a PoC. The requirements are to 
index the following objects:

 - people - name, status, date added, address, profile, other people specific 
fields like group...
 - organisations - name, status, date added, address, profile, other 
organisational specific fields like size...
 - products - name, status, date added, profile, other product specific fields 
like product groups..

AND...I need to isolate indexes to a number of dynamic domains (customerA, 
customerB...) that will grow over time.

So, my initial thoughts are to do the following:

 - flatten the searchable objects as much as I can - use a type field to 
distinguish - into a single index
 - use multi-core approach to segregate domains of data

So, a couple questions on this:

 1) Is this approach/design sensible and do others use it?

 2) By flattening the data we will only index common fields; is it unreasonable 
to do a second database search and union the results when doing advanced 
searches on non indexed fields? Do others do this?

 3) I've read that I can dynamically add a new core - this fits well with the 
ability to dynamically add new domains; how scaliable is this approach? Would 
it be unreasonable to have 20-30 dynaimically created cores? I guess, 
redundancy aside and given our one core per domain approach, we could easily 
spill onto other physical servers without the need for replication? 

Thanks again for your help!
rotis





Re: Multi-index searches

2007-12-17 Thread Ryan McKinley

Kirk Beers wrote:

Kirk Beers wrote:

Hi,

I am interested in using solr and I ran the tutorial but I was 
wondering if it supports multi-index searching ?


Kirk

Allow me to clear that up! I would like to have the documents of 2 
indices returned at once. Does solr support that ? Or am will it only 
return the documents of one index at a time?




one index at a time...

ryan


Re: Multi-index searches

2007-12-17 Thread Kirk Beers

Kirk Beers wrote:

Hi,

I am interested in using solr and I ran the tutorial but I was 
wondering if it supports multi-index searching ?


Kirk

Allow me to clear that up! I would like to have the documents of 2 
indices returned at once. Does solr support that ? Or am will it only 
return the documents of one index at a time?


Kirk


Multi-index searches

2007-12-17 Thread Kirk Beers

Hi,

I am interested in using solr and I ran the tutorial but I was wondering 
if it supports multi-index searching ?


Kirk


Re: Question: Pagination with multi index box

2007-05-15 Thread Mike Klaas


On 14-May-07, at 10:05 PM, James liu wrote:


2007/5/15, Mike Klaas <[EMAIL PROTECTED]>:


I'm not ignoring it: I'm implying that the above is the correct
descending score-sorted order.  You have to perform that sort  
manually.



i mean merged results(from 60 p) and sort it, not solr's sort.
every result from box have been  sorted by score.


Yep, me too.




so it will not sorted by score correctly.
>
> and if user click page 2 to see, how to show data?
>
> p1 start from 10 or query other partitions?

Assemble results 1 through 20, then display 11-20 to the user.



for example, i wanna query "solr"

p1 have 100 results which score is bigger than 80

p2 have 100 results which score is smaller than 20

so if i use rows=10, score not correct.

if i wanna promise 10 pages which sort by score correctly.

so i have to get 100(rows=100) results from every box.

and merge results, sort it, finallay get top 100 results.

but it will very slow.


i don't know other search how to solve it? maybe they not sort by  
score very

correctly.


Hmm, I feel as though we are going in circles.

If you want to cache the top 100 documents for a query, there is  
essentially no efficient means of accumulating these results in one  
request--as you note, to be sure of having the top 100 documents, 100  
documents from each partition must be requested.


Your options are essentially:

1) request a smaller number of documents, and accept some  
inaccuracies (frinstance, if you request 10 docs, then the first page  
is guaranteed to be correct, but page 10 probably won't be quite right)


2) request a smaller number of documents and attempt to assemble the  
top 100 docs.  if you can't, then request more documents from the  
partitions that were exhausted soonest.


Keep in mind also that the scores across independent solr partitions  
are comparable, but not exact, due to idf differences.  The relative  
exactitude of page 10 results might not be too important.


-Mike


Re: Question: Pagination with multi index box

2007-05-14 Thread James liu

maybe full-text search sort correct not very import.


2007/5/15, James liu <[EMAIL PROTECTED]>:




2007/5/15, Mike Klaas <[EMAIL PROTECTED]>:
>
> On 14-May-07, at 8:55 PM, James liu wrote:
>
> > thks for your detail answer.
> >
> > but u ignore "sorted by score"
> >
> > p1, p2,p1,p1,p3,p4,p1,p1
> >
> > maybe their max score is lower than from p19,p20.
> >
>
> I'm not ignoring it: I'm implying that the above is the correct
> descending score-sorted order.  You have to perform that sort manually.


i mean merged results(from 60 p) and sort it, not solr's sort.
every result from box have been  sorted by score.


> so it will not sorted by score correctly.
> >
> > and if user click page 2 to see, how to show data?
> >
> > p1 start from 10 or query other partitions?
>
> Assemble results 1 through 20, then display 11-20 to the user.


for example, i wanna query "solr"

p1 have 100 results which score is bigger than 80

p2 have 100 results which score is smaller than 20

so if i use rows=10, score not correct.

if i wanna promise 10 pages which sort by score correctly.

so i have to get 100(rows=100) results from every box.

and merge results, sort it, finallay get top 100 results.

but it will very slow.


i don't know other search how to solve it? maybe they not sort by score
very correctly.




-Mike
>
> >
> > 2007/5/15, Mike Klaas <[EMAIL PROTECTED] >:
> >>
> >> On 14-May-07, at 6:49 PM, James liu wrote:
> >>
> >> > 2007/5/15, Mike Klaas <[EMAIL PROTECTED]>:
> >> >>
> >> >> On 14-May-07, at 1:35 AM, James liu wrote:
> >> >>
> >> >> When you get up to 60 partitions, you should make it a multi stage
> >> >> process.  Assuming your partitions are disjoint and evenly
> >> >> distributed, estimate the number of documents that will appear
> >> in the
> >> >> final result from each.
> >> >
> >> >
> >> > yes, partitions distrbuted.
> >> >
> >> >
> >> > Double or triple that (and put a minimum
> >> >> threshold), try to assemble the number of documents you
> >> require, and
> >> >> if one partition "runs out" of docs before it is done, request
> >> a new
> >> >> round.
> >> >
> >> >
> >> > i dont' know what u mean "runs out"
> >>
> >> Say you request 5 docs from each of 60 partitions, and are interested
>
> >> in docs 1-10.  If, sorted by score, the docs come from:
> >>
> >> p1, p2, p1, p1, p3, p4, p1, p1
> >>
> >> Then p1 has "run out" at n=8, and there is no way to be sure if the
> >> remaining two needed docs come from p1 or somewhere else.  So you
> >> have to now request at least two additional documents from p1.
> >>
> >> > one user request will generate 60 partitions request.
> >> >
> >> > they work in parallel。
> >> >
> >> > so i don't know every partion's status before they done.
> >>
> >> Normally, you would wait for them to finish, and execute a subsequent
>
> >> request if more docs are needed.
> >>
> >> -Mike
> >
> >
> >
> >
> > --
> > regards
> > jl
>
>


--
regards
jl





--
regards
jl


Re: Question: Pagination with multi index box

2007-05-14 Thread James liu

2007/5/15, Mike Klaas <[EMAIL PROTECTED]>:


On 14-May-07, at 8:55 PM, James liu wrote:

> thks for your detail answer.
>
> but u ignore "sorted by score"
>
> p1, p2,p1,p1,p3,p4,p1,p1
>
> maybe their max score is lower than from p19,p20.
>

I'm not ignoring it: I'm implying that the above is the correct
descending score-sorted order.  You have to perform that sort manually.



i mean merged results(from 60 p) and sort it, not solr's sort.
every result from box have been  sorted by score.



so it will not sorted by score correctly.
>
> and if user click page 2 to see, how to show data?
>
> p1 start from 10 or query other partitions?

Assemble results 1 through 20, then display 11-20 to the user.



for example, i wanna query "solr"

p1 have 100 results which score is bigger than 80

p2 have 100 results which score is smaller than 20

so if i use rows=10, score not correct.

if i wanna promise 10 pages which sort by score correctly.

so i have to get 100(rows=100) results from every box.

and merge results, sort it, finallay get top 100 results.

but it will very slow.


i don't know other search how to solve it? maybe they not sort by score very
correctly.




-Mike


>
> 2007/5/15, Mike Klaas <[EMAIL PROTECTED]>:
>>
>> On 14-May-07, at 6:49 PM, James liu wrote:
>>
>> > 2007/5/15, Mike Klaas <[EMAIL PROTECTED]>:
>> >>
>> >> On 14-May-07, at 1:35 AM, James liu wrote:
>> >>
>> >> When you get up to 60 partitions, you should make it a multi stage
>> >> process.  Assuming your partitions are disjoint and evenly
>> >> distributed, estimate the number of documents that will appear
>> in the
>> >> final result from each.
>> >
>> >
>> > yes, partitions distrbuted.
>> >
>> >
>> > Double or triple that (and put a minimum
>> >> threshold), try to assemble the number of documents you
>> require, and
>> >> if one partition "runs out" of docs before it is done, request
>> a new
>> >> round.
>> >
>> >
>> > i dont' know what u mean "runs out"
>>
>> Say you request 5 docs from each of 60 partitions, and are interested
>> in docs 1-10.  If, sorted by score, the docs come from:
>>
>> p1, p2, p1, p1, p3, p4, p1, p1
>>
>> Then p1 has "run out" at n=8, and there is no way to be sure if the
>> remaining two needed docs come from p1 or somewhere else.  So you
>> have to now request at least two additional documents from p1.
>>
>> > one user request will generate 60 partitions request.
>> >
>> > they work in parallel。
>> >
>> > so i don't know every partion's status before they done.
>>
>> Normally, you would wait for them to finish, and execute a subsequent
>> request if more docs are needed.
>>
>> -Mike
>
>
>
>
> --
> regards
> jl





--
regards
jl


Re: Question: Pagination with multi index box

2007-05-14 Thread Mike Klaas

On 14-May-07, at 8:55 PM, James liu wrote:


thks for your detail answer.

but u ignore "sorted by score"

p1, p2,p1,p1,p3,p4,p1,p1

maybe their max score is lower than from p19,p20.



I'm not ignoring it: I'm implying that the above is the correct  
descending score-sorted order.  You have to perform that sort manually.



so it will not sorted by score correctly.

and if user click page 2 to see, how to show data?

p1 start from 10 or query other partitions?


Assemble results 1 through 20, then display 11-20 to the user.

-Mike



2007/5/15, Mike Klaas <[EMAIL PROTECTED]>:


On 14-May-07, at 6:49 PM, James liu wrote:

> 2007/5/15, Mike Klaas <[EMAIL PROTECTED]>:
>>
>> On 14-May-07, at 1:35 AM, James liu wrote:
>>
>> When you get up to 60 partitions, you should make it a multi stage
>> process.  Assuming your partitions are disjoint and evenly
>> distributed, estimate the number of documents that will appear  
in the

>> final result from each.
>
>
> yes, partitions distrbuted.
>
>
> Double or triple that (and put a minimum
>> threshold), try to assemble the number of documents you  
require, and
>> if one partition "runs out" of docs before it is done, request  
a new

>> round.
>
>
> i dont' know what u mean "runs out"

Say you request 5 docs from each of 60 partitions, and are interested
in docs 1-10.  If, sorted by score, the docs come from:

p1, p2, p1, p1, p3, p4, p1, p1

Then p1 has "run out" at n=8, and there is no way to be sure if the
remaining two needed docs come from p1 or somewhere else.  So you
have to now request at least two additional documents from p1.

> one user request will generate 60 partitions request.
>
> they work in parallel。
>
> so i don't know every partion's status before they done.

Normally, you would wait for them to finish, and execute a subsequent
request if more docs are needed.

-Mike





--
regards
jl




Re: Question: Pagination with multi index box

2007-05-14 Thread James liu

for example, i wanna query "lucene", it's numFound is 234300.

and results should sorted by score.

if u do, how to pagination and sort it's score?


2007/5/15, Mike Klaas <[EMAIL PROTECTED]>:



On 14-May-07, at 7:15 PM, James liu wrote:

> if i set rows=(page-1)*10,,,it will lose more result which fits query.
>
> how to set start when pagination.

I'm not sure I understand the question.

When combining results from partitions, you can't use startAt.




if not use startAt, how to define rows to keep user can find results?


You

must always assemble the docs from 0 to N for each partition (whether
through one request or multiple).



if  rows bigger it will slow, if smaller it will lose data and sort score
not correctly.

-Mike


>
>
> 2007/5/15, James liu <[EMAIL PROTECTED]>:
>>
>>
>>
>> 2007/5/15, Mike Klaas <[EMAIL PROTECTED]>:
>> >
>> > On 14-May-07, at 1:35 AM, James liu wrote:
>> >
>> > > if use multi index box, how to pagination with sort by score
>> > > correctly?
>> > >
>> > > for example, i wanna query "search" with 60 index box and sort by
>> > > score.
>> > >
>> > > i don't know the num found from every index box which have
>> different
>> > > content.
>> > >
>> > > if promise 10 page with sort score correctly, i think solr 's
>> start
>> > > is 0,
>> > > and rows is 100.(10 result per page)
>> > >
>> > > 60*100=6000, sort it and get top 100 to cache.
>> >
>> > > it is very slove although it promise 10 page with sort score
>> > > correctly.
>> >
>> > With few index partitions, you it is sufficient to ask for startAt
>> > +numNeeded docs from each partition and sort globally.  Normally if
>> > you wanted 10 for the first page, you would ask for 10 from each
>> > server and cache the remainder.  It is better to ask for more later
>> > if the user asks for page ten.
>> >
>> >
>> > When you get up to 60 partitions, you should make it a multi stage
>> > process.  Assuming your partitions are disjoint and evenly
>> > distributed, estimate the number of documents that will appear
>> in the
>> > final result from each.
>>
>>
>> yes, partitions distrbuted.
>>
>>
>>  Double or triple that (and put a minimum
>> > threshold), try to assemble the number of documents you require,
>> and
>> > if one partition "runs out" of docs before it is done, request a
>> new
>> > round.
>>
>>
>> i dont' know what u mean "runs out"
>>
>> one user request will generate 60 partitions request.
>>
>> they work in parallel。
>>
>> so i don't know every partion's status before they done.
>>
>>
>> To promise 10 page result sorted by score correctly, the only way
>> seems to
>> get 100 results(rows=100) from each partitioin. but it very slow.
>>
>> now i wanna find a way to get result sorted by score correctly and
>> search
>> fast.
>>
>>
>> -Mike
>> >
>>
>> Thks Mike. But it not i want.
>>
>>
>> --
>> regards
>> jl
>
>
>
>
> --
> regards
> jl





--
regards
jl


Re: Question: Pagination with multi index box

2007-05-14 Thread James liu

thks for your detail answer.

but u ignore "sorted by score"

p1, p2,p1,p1,p3,p4,p1,p1

maybe their max score is lower than from p19,p20.

so it will not sorted by score correctly.

and if user click page 2 to see, how to show data?

p1 start from 10 or query other partitions?


2007/5/15, Mike Klaas <[EMAIL PROTECTED]>:


On 14-May-07, at 6:49 PM, James liu wrote:

> 2007/5/15, Mike Klaas <[EMAIL PROTECTED]>:
>>
>> On 14-May-07, at 1:35 AM, James liu wrote:
>>
>> When you get up to 60 partitions, you should make it a multi stage
>> process.  Assuming your partitions are disjoint and evenly
>> distributed, estimate the number of documents that will appear in the
>> final result from each.
>
>
> yes, partitions distrbuted.
>
>
> Double or triple that (and put a minimum
>> threshold), try to assemble the number of documents you require, and
>> if one partition "runs out" of docs before it is done, request a new
>> round.
>
>
> i dont' know what u mean "runs out"

Say you request 5 docs from each of 60 partitions, and are interested
in docs 1-10.  If, sorted by score, the docs come from:

p1, p2, p1, p1, p3, p4, p1, p1

Then p1 has "run out" at n=8, and there is no way to be sure if the
remaining two needed docs come from p1 or somewhere else.  So you
have to now request at least two additional documents from p1.

> one user request will generate 60 partitions request.
>
> they work in parallel。
>
> so i don't know every partion's status before they done.

Normally, you would wait for them to finish, and execute a subsequent
request if more docs are needed.

-Mike





--
regards
jl


Re: Question: Pagination with multi index box

2007-05-14 Thread Mike Klaas


On 14-May-07, at 7:15 PM, James liu wrote:


if i set rows=(page-1)*10,,,it will lose more result which fits query.

how to set start when pagination.


I'm not sure I understand the question.

When combining results from partitions, you can't use startAt.  You  
must always assemble the docs from 0 to N for each partition (whether  
through one request or multiple).


-Mike




2007/5/15, James liu <[EMAIL PROTECTED]>:




2007/5/15, Mike Klaas <[EMAIL PROTECTED]>:
>
> On 14-May-07, at 1:35 AM, James liu wrote:
>
> > if use multi index box, how to pagination with sort by score
> > correctly?
> >
> > for example, i wanna query "search" with 60 index box and sort by
> > score.
> >
> > i don't know the num found from every index box which have  
different

> > content.
> >
> > if promise 10 page with sort score correctly, i think solr 's  
start

> > is 0,
> > and rows is 100.(10 result per page)
> >
> > 60*100=6000, sort it and get top 100 to cache.
>
> > it is very slove although it promise 10 page with sort score
> > correctly.
>
> With few index partitions, you it is sufficient to ask for startAt
> +numNeeded docs from each partition and sort globally.  Normally if
> you wanted 10 for the first page, you would ask for 10 from each
> server and cache the remainder.  It is better to ask for more later
> if the user asks for page ten.
>
>
> When you get up to 60 partitions, you should make it a multi stage
> process.  Assuming your partitions are disjoint and evenly
> distributed, estimate the number of documents that will appear  
in the

> final result from each.


yes, partitions distrbuted.


 Double or triple that (and put a minimum
> threshold), try to assemble the number of documents you require,  
and
> if one partition "runs out" of docs before it is done, request a  
new

> round.


i dont' know what u mean "runs out"

one user request will generate 60 partitions request.

they work in parallel。

so i don't know every partion's status before they done.


To promise 10 page result sorted by score correctly, the only way  
seems to

get 100 results(rows=100) from each partitioin. but it very slow.

now i wanna find a way to get result sorted by score correctly and  
search

fast.


-Mike
>

Thks Mike. But it not i want.


--
regards
jl





--
regards
jl




Re: Question: Pagination with multi index box

2007-05-14 Thread Mike Klaas

On 14-May-07, at 6:49 PM, James liu wrote:


2007/5/15, Mike Klaas <[EMAIL PROTECTED]>:


On 14-May-07, at 1:35 AM, James liu wrote:

When you get up to 60 partitions, you should make it a multi stage
process.  Assuming your partitions are disjoint and evenly
distributed, estimate the number of documents that will appear in the
final result from each.



yes, partitions distrbuted.


Double or triple that (and put a minimum

threshold), try to assemble the number of documents you require, and
if one partition "runs out" of docs before it is done, request a new
round.



i dont' know what u mean "runs out"


Say you request 5 docs from each of 60 partitions, and are interested  
in docs 1-10.  If, sorted by score, the docs come from:


p1, p2, p1, p1, p3, p4, p1, p1

Then p1 has "run out" at n=8, and there is no way to be sure if the  
remaining two needed docs come from p1 or somewhere else.  So you  
have to now request at least two additional documents from p1.



one user request will generate 60 partitions request.

they work in parallel。

so i don't know every partion's status before they done.


Normally, you would wait for them to finish, and execute a subsequent  
request if more docs are needed.


-Mike

Re: Question: Pagination with multi index box

2007-05-14 Thread James liu

if i set rows=(page-1)*10,,,it will lose more result which fits query.

how to set start when pagination.



2007/5/15, James liu <[EMAIL PROTECTED]>:




2007/5/15, Mike Klaas <[EMAIL PROTECTED]>:
>
> On 14-May-07, at 1:35 AM, James liu wrote:
>
> > if use multi index box, how to pagination with sort by score
> > correctly?
> >
> > for example, i wanna query "search" with 60 index box and sort by
> > score.
> >
> > i don't know the num found from every index box which have different
> > content.
> >
> > if promise 10 page with sort score correctly, i think solr 's start
> > is 0,
> > and rows is 100.(10 result per page)
> >
> > 60*100=6000, sort it and get top 100 to cache.
>
> > it is very slove although it promise 10 page with sort score
> > correctly.
>
> With few index partitions, you it is sufficient to ask for startAt
> +numNeeded docs from each partition and sort globally.  Normally if
> you wanted 10 for the first page, you would ask for 10 from each
> server and cache the remainder.  It is better to ask for more later
> if the user asks for page ten.
>
>
> When you get up to 60 partitions, you should make it a multi stage
> process.  Assuming your partitions are disjoint and evenly
> distributed, estimate the number of documents that will appear in the
> final result from each.


yes, partitions distrbuted.


 Double or triple that (and put a minimum
> threshold), try to assemble the number of documents you require, and
> if one partition "runs out" of docs before it is done, request a new
> round.


i dont' know what u mean "runs out"

one user request will generate 60 partitions request.

they work in parallel。

so i don't know every partion's status before they done.


To promise 10 page result sorted by score correctly, the only way seems to
get 100 results(rows=100) from each partitioin. but it very slow.

now i wanna find a way to get result sorted by score correctly and search
fast.


-Mike
>

Thks Mike. But it not i want.


--
regards
jl





--
regards
jl


Re: Question: Pagination with multi index box

2007-05-14 Thread James liu

2007/5/15, Mike Klaas <[EMAIL PROTECTED]>:


On 14-May-07, at 1:35 AM, James liu wrote:

> if use multi index box, how to pagination with sort by score
> correctly?
>
> for example, i wanna query "search" with 60 index box and sort by
> score.
>
> i don't know the num found from every index box which have different
> content.
>
> if promise 10 page with sort score correctly, i think solr 's start
> is 0,
> and rows is 100.(10 result per page)
>
> 60*100=6000, sort it and get top 100 to cache.

> it is very slove although it promise 10 page with sort score
> correctly.

With few index partitions, you it is sufficient to ask for startAt
+numNeeded docs from each partition and sort globally.  Normally if
you wanted 10 for the first page, you would ask for 10 from each
server and cache the remainder.  It is better to ask for more later
if the user asks for page ten.


When you get up to 60 partitions, you should make it a multi stage
process.  Assuming your partitions are disjoint and evenly
distributed, estimate the number of documents that will appear in the
final result from each.



yes, partitions distrbuted.


Double or triple that (and put a minimum

threshold), try to assemble the number of documents you require, and
if one partition "runs out" of docs before it is done, request a new
round.



i dont' know what u mean "runs out"

one user request will generate 60 partitions request.

they work in parallel。

so i don't know every partion's status before they done.


To promise 10 page result sorted by score correctly, the only way seems to
get 100 results(rows=100) from each partitioin. but it very slow.

now i wanna find a way to get result sorted by score correctly and search
fast.


-Mike




Thks Mike. But it not i want.


--
regards
jl


Re: Question: Pagination with multi index box

2007-05-14 Thread Mike Klaas

On 14-May-07, at 1:35 AM, James liu wrote:

if use multi index box, how to pagination with sort by score  
correctly?


for example, i wanna query "search" with 60 index box and sort by  
score.


i don't know the num found from every index box which have different
content.

if promise 10 page with sort score correctly, i think solr 's start  
is 0,

and rows is 100.(10 result per page)

60*100=6000, sort it and get top 100 to cache.


it is very slove although it promise 10 page with sort score  
correctly.


With few index partitions, you it is sufficient to ask for startAt 
+numNeeded docs from each partition and sort globally.  Normally if  
you wanted 10 for the first page, you would ask for 10 from each  
server and cache the remainder.  It is better to ask for more later  
if the user asks for page ten.



When you get up to 60 partitions, you should make it a multi stage  
process.  Assuming your partitions are disjoint and evenly  
distributed, estimate the number of documents that will appear in the  
final result from each.  Double or triple that (and put a minimum  
threshold), try to assemble the number of documents you require, and  
if one partition "runs out" of docs before it is done, request a new  
round.


-Mike


Question: Pagination with multi index box

2007-05-14 Thread James liu

if use multi index box, how to pagination with sort by score correctly?

for example, i wanna query "search" with 60 index box and sort by score.

i don't know the num found from every index box which have different
content.

if promise 10 page with sort score correctly, i think solr 's start is 0,
and rows is 100.(10 result per page)

60*100=6000, sort it and get top 100 to cache.

it is very slove although it promise 10 page with sort score correctly.


any idea to fix it?

fast and correct.



--
regards
jl


Re: Question to php to do with multi index

2007-04-27 Thread James liu

i think curl_multi is slow.

thks, i will try.

2007/4/27, Michael Kimsal <[EMAIL PROTECTED]>:


The curl_multi is probably the most effective way, using straight PHP.
Another option would be to spawn several jobs, assuming unix/linux, and
wait
for them to get done.  It doesn't give you very good error handling (well,
none at all actually!) but would let you run multiple indexing jobs at
once.

Visit http://us.php.net/shell_exec and look at the 'class exec'
contributed
note about halfway down the page.  It'll give you an idea of how to easily
spawn multiple jobs.

If you're using PHP5, the proc_open function may be another way to go.
proc_open was available in 4, but there were a number of extra parameters
and controls made available in 5.
http://us.php.net/manual/en/function.proc-open.php

An adventurous soul could combine the two concepts in to one class to
manage
pipes communication between multiple child processes effectively.

On 4/26/07, James liu <[EMAIL PROTECTED]> wrote:
>
> php not support multi thread,,,and how can u solve with multi index in
> parallel?
>
> now i use curl_multi
>
> maybe more effect way i don't know,,,so if u know, tell me. thks.
>
>
> --
> regards
> jl
>



--
Michael Kimsal
http://webdevradio.com





--
regards
jl


Re: Question to php to do with multi index

2007-04-27 Thread Michael Kimsal

The curl_multi is probably the most effective way, using straight PHP.
Another option would be to spawn several jobs, assuming unix/linux, and wait
for them to get done.  It doesn't give you very good error handling (well,
none at all actually!) but would let you run multiple indexing jobs at once.

Visit http://us.php.net/shell_exec and look at the 'class exec' contributed
note about halfway down the page.  It'll give you an idea of how to easily
spawn multiple jobs.

If you're using PHP5, the proc_open function may be another way to go.
proc_open was available in 4, but there were a number of extra parameters
and controls made available in 5.
http://us.php.net/manual/en/function.proc-open.php

An adventurous soul could combine the two concepts in to one class to manage
pipes communication between multiple child processes effectively.

On 4/26/07, James liu <[EMAIL PROTECTED]> wrote:


php not support multi thread,,,and how can u solve with multi index in
parallel?

now i use curl_multi

maybe more effect way i don't know,,,so if u know, tell me. thks.


--
regards
jl





--
Michael Kimsal
http://webdevradio.com


Question to php to do with multi index

2007-04-26 Thread James liu

php not support multi thread,,,and how can u solve with multi index in
parallel?

now i use curl_multi

maybe more effect way i don't know,,,so if u know, tell me. thks.


--
regards
jl


Re: Does solr support Multi index and return by score and datetime

2007-04-05 Thread James liu

2007/4/5, Otis Gospodnetic <[EMAIL PROTECTED]>:


How to cache results?
Put them in a cache like memcached, for example, keyed off of query (can't
exceed 250 bytes in the case of memcached, so you'll want to pack that
query, perhaps use its MD5 as the cache key)



Yes,i use memcached and key is md5 query. thk ur advice.
I decrease count of documents because of ram is only 1g.

I think master use tomcat which use 20 solr instance. and slaveA and SlaveB
have 10 solr instance.
Web Server use lighttpd+php+memcached.

It is my design. but not test. Maybe u can show me ur experience.

Otis

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message 
From: James liu <[EMAIL PROTECTED] >
To: solr-user@lucene.apache.org
Sent: Thursday, April 5, 2007 1:57:07 AM
Subject: Re: Does solr support Multi index and return by score and
datetime

2007/4/5, Otis Gospodnetic <[EMAIL PROTECTED]>:
>
> James,
>
> It looks like people already answered your questions.
> Split your big index.
> Put it on multiple servers.
> Put Solr on each of those servers.
> Write an application that searches multiple Solr instances in parallel.
> Get N results from each, combine them, order by score.


How to cache its result? I hesitate that It will cache many data.


As far as I know, this is the best you can do with what is available from
> Solr today.
> For anything else, you'll have to roll up your sleeves and dig into the
> code.
>
> Good luck!
>
> Otis
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share
>
> - Original Message 
> From: James liu <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Thursday, April 5, 2007 1:18:30 AM
> Subject: Re: Does solr support Multi index and return by score and
> datetime
>
> Anyone have problem like this and how to solve it?
>
>
>
>
> 2007/4/5, James liu <[EMAIL PROTECTED]>:
> >
> >
> >
> > 2007/4/5, Mike Klaas < [EMAIL PROTECTED]>:
> > >
> > > On 4/4/07, James liu <[EMAIL PROTECTED]> wrote:
> > >
> > > > > > I think it is part of full-text search.
> > > >
> > > > I think query slavers and combin result by score should be the
part
> of
> > > solr.
> > > >
> > > > I find it http://dev.lucene-ws.net/wiki/MultiIndexOperations
> > > > but i wanna use solr and i like it.
> > > >
> > > > Now i wanna find a good method to solve it by using solr and less
> > > > coding.(More code will cost more time to write and test.)
> > >
> > > I agree that it would be an excellent addition to Solr, but it is a
> > > major undertaking, and so I wouldn't wait around for it if it is
> > > important to you.  Solr devs have code to write and test too :).
> > >
> > > > > >  If you document
> > > > > > > distribution is uniform random, then the norms converge to
> > > > > > > approximately equal values anyway.
> > > > > >
> > > > > > I don't know it.
> > > >
> > > > I don't know why u say "document distribution". Does it mean if i
> > > write code
> > > > independently, i will consider it?
> > >
> > > One of the complexities of queries multiple remote Solr/lucene
> > > instances is that the scores are not directly comparable as the term

> > > idf scores will be different.  However, in practical situations,
this
> > > can be glossed over.
> > >
> > > This is the basic algorithm for single-pass querying multiple solr
> > > slaves.  Say you want results N to N + M (e.g 10 to 20).
> > >
> > > 1. query each solr instance independently for N+M documents for the
> > > given query.  This should be done asynchronously (or you could spawn
a
> > > thread per server).
> > > 2. wait for all responses (or for a certain timeout)
> > > 3. put all returned documents into an array, and reverse sort by
score
> > > 4. select documents [N, N+M) from this array.
> > >
> > > This is a relatively simple task.  It gets more complicated once
> > > multiple passes, idf compensation, deduplication, etc. are added.
> > >
> > > -Mike
> > >
> >
> > Thks Mike.
> >
> > I find it more complicate than i think.
> >
> > Is it the only way to solve my problem:
> >
> > I have a project, it have 100g data, now i have 3-4 server for solr.
> >
> >
> >
> >
> >
> >
> > --
> > regards
> > jl
>
>
>
>
> --
> regards
> jl
>
>
>
>


--
regards
jl







--
regards
jl


Re: Does solr support Multi index and return by score and datetime

2007-04-05 Thread Otis Gospodnetic
How to cache results?
Put them in a cache like memcached, for example, keyed off of query (can't 
exceed 250 bytes in the case of memcached, so you'll want to pack that query, 
perhaps use its MD5 as the cache key)

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message 
From: James liu <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Thursday, April 5, 2007 1:57:07 AM
Subject: Re: Does solr support Multi index and return by score and datetime

2007/4/5, Otis Gospodnetic <[EMAIL PROTECTED]>:
>
> James,
>
> It looks like people already answered your questions.
> Split your big index.
> Put it on multiple servers.
> Put Solr on each of those servers.
> Write an application that searches multiple Solr instances in parallel.
> Get N results from each, combine them, order by score.


How to cache its result? I hesitate that It will cache many data.


As far as I know, this is the best you can do with what is available from
> Solr today.
> For anything else, you'll have to roll up your sleeves and dig into the
> code.
>
> Good luck!
>
> Otis
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share
>
> - Original Message 
> From: James liu <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Thursday, April 5, 2007 1:18:30 AM
> Subject: Re: Does solr support Multi index and return by score and
> datetime
>
> Anyone have problem like this and how to solve it?
>
>
>
>
> 2007/4/5, James liu <[EMAIL PROTECTED]>:
> >
> >
> >
> > 2007/4/5, Mike Klaas <[EMAIL PROTECTED]>:
> > >
> > > On 4/4/07, James liu <[EMAIL PROTECTED]> wrote:
> > >
> > > > > > I think it is part of full-text search.
> > > >
> > > > I think query slavers and combin result by score should be the part
> of
> > > solr.
> > > >
> > > > I find it http://dev.lucene-ws.net/wiki/MultiIndexOperations
> > > > but i wanna use solr and i like it.
> > > >
> > > > Now i wanna find a good method to solve it by using solr and less
> > > > coding.(More code will cost more time to write and test.)
> > >
> > > I agree that it would be an excellent addition to Solr, but it is a
> > > major undertaking, and so I wouldn't wait around for it if it is
> > > important to you.  Solr devs have code to write and test too :).
> > >
> > > > > >  If you document
> > > > > > > distribution is uniform random, then the norms converge to
> > > > > > > approximately equal values anyway.
> > > > > >
> > > > > > I don't know it.
> > > >
> > > > I don't know why u say "document distribution". Does it mean if i
> > > write code
> > > > independently, i will consider it?
> > >
> > > One of the complexities of queries multiple remote Solr/lucene
> > > instances is that the scores are not directly comparable as the term
> > > idf scores will be different.  However, in practical situations, this
> > > can be glossed over.
> > >
> > > This is the basic algorithm for single-pass querying multiple solr
> > > slaves.  Say you want results N to N + M (e.g 10 to 20).
> > >
> > > 1. query each solr instance independently for N+M documents for the
> > > given query.  This should be done asynchronously (or you could spawn a
> > > thread per server).
> > > 2. wait for all responses (or for a certain timeout)
> > > 3. put all returned documents into an array, and reverse sort by score
> > > 4. select documents [N, N+M) from this array.
> > >
> > > This is a relatively simple task.  It gets more complicated once
> > > multiple passes, idf compensation, deduplication, etc. are added.
> > >
> > > -Mike
> > >
> >
> > Thks Mike.
> >
> > I find it more complicate than i think.
> >
> > Is it the only way to solve my problem:
> >
> > I have a project, it have 100g data, now i have 3-4 server for solr.
> >
> >
> >
> >
> >
> >
> > --
> > regards
> > jl
>
>
>
>
> --
> regards
> jl
>
>
>
>


-- 
regards
jl





Re: Does solr support Multi index and return by score and datetime

2007-04-04 Thread James liu

2007/4/5, Otis Gospodnetic <[EMAIL PROTECTED]>:


James,

It looks like people already answered your questions.
Split your big index.
Put it on multiple servers.
Put Solr on each of those servers.
Write an application that searches multiple Solr instances in parallel.
Get N results from each, combine them, order by score.



How to cache its result? I hesitate that It will cache many data.


As far as I know, this is the best you can do with what is available from

Solr today.
For anything else, you'll have to roll up your sleeves and dig into the
code.

Good luck!

Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message 
From: James liu <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Thursday, April 5, 2007 1:18:30 AM
Subject: Re: Does solr support Multi index and return by score and
datetime

Anyone have problem like this and how to solve it?




2007/4/5, James liu <[EMAIL PROTECTED]>:
>
>
>
> 2007/4/5, Mike Klaas <[EMAIL PROTECTED]>:
> >
> > On 4/4/07, James liu <[EMAIL PROTECTED]> wrote:
> >
> > > > > I think it is part of full-text search.
> > >
> > > I think query slavers and combin result by score should be the part
of
> > solr.
> > >
> > > I find it http://dev.lucene-ws.net/wiki/MultiIndexOperations
> > > but i wanna use solr and i like it.
> > >
> > > Now i wanna find a good method to solve it by using solr and less
> > > coding.(More code will cost more time to write and test.)
> >
> > I agree that it would be an excellent addition to Solr, but it is a
> > major undertaking, and so I wouldn't wait around for it if it is
> > important to you.  Solr devs have code to write and test too :).
> >
> > > > >  If you document
> > > > > > distribution is uniform random, then the norms converge to
> > > > > > approximately equal values anyway.
> > > > >
> > > > > I don't know it.
> > >
> > > I don't know why u say "document distribution". Does it mean if i
> > write code
> > > independently, i will consider it?
> >
> > One of the complexities of queries multiple remote Solr/lucene
> > instances is that the scores are not directly comparable as the term
> > idf scores will be different.  However, in practical situations, this
> > can be glossed over.
> >
> > This is the basic algorithm for single-pass querying multiple solr
> > slaves.  Say you want results N to N + M (e.g 10 to 20).
> >
> > 1. query each solr instance independently for N+M documents for the
> > given query.  This should be done asynchronously (or you could spawn a
> > thread per server).
> > 2. wait for all responses (or for a certain timeout)
> > 3. put all returned documents into an array, and reverse sort by score
> > 4. select documents [N, N+M) from this array.
> >
> > This is a relatively simple task.  It gets more complicated once
> > multiple passes, idf compensation, deduplication, etc. are added.
> >
> > -Mike
> >
>
> Thks Mike.
>
> I find it more complicate than i think.
>
> Is it the only way to solve my problem:
>
> I have a project, it have 100g data, now i have 3-4 server for solr.
>
>
>
>
>
>
> --
> regards
> jl




--
regards
jl







--
regards
jl


Re: Does solr support Multi index and return by score and datetime

2007-04-04 Thread James liu

Anyone have problem like this and how to solve it?




2007/4/5, James liu <[EMAIL PROTECTED]>:




2007/4/5, Mike Klaas <[EMAIL PROTECTED]>:
>
> On 4/4/07, James liu <[EMAIL PROTECTED]> wrote:
>
> > > > I think it is part of full-text search.
> >
> > I think query slavers and combin result by score should be the part of
> solr.
> >
> > I find it http://dev.lucene-ws.net/wiki/MultiIndexOperations
> > but i wanna use solr and i like it.
> >
> > Now i wanna find a good method to solve it by using solr and less
> > coding.(More code will cost more time to write and test.)
>
> I agree that it would be an excellent addition to Solr, but it is a
> major undertaking, and so I wouldn't wait around for it if it is
> important to you.  Solr devs have code to write and test too :).
>
> > > >  If you document
> > > > > distribution is uniform random, then the norms converge to
> > > > > approximately equal values anyway.
> > > >
> > > > I don't know it.
> >
> > I don't know why u say "document distribution". Does it mean if i
> write code
> > independently, i will consider it?
>
> One of the complexities of queries multiple remote Solr/lucene
> instances is that the scores are not directly comparable as the term
> idf scores will be different.  However, in practical situations, this
> can be glossed over.
>
> This is the basic algorithm for single-pass querying multiple solr
> slaves.  Say you want results N to N + M (e.g 10 to 20).
>
> 1. query each solr instance independently for N+M documents for the
> given query.  This should be done asynchronously (or you could spawn a
> thread per server).
> 2. wait for all responses (or for a certain timeout)
> 3. put all returned documents into an array, and reverse sort by score
> 4. select documents [N, N+M) from this array.
>
> This is a relatively simple task.  It gets more complicated once
> multiple passes, idf compensation, deduplication, etc. are added.
>
> -Mike
>

Thks Mike.

I find it more complicate than i think.

Is it the only way to solve my problem:

I have a project, it have 100g data, now i have 3-4 server for solr.






--
regards
jl





--
regards
jl


Re: Does solr support Multi index and return by score and datetime

2007-04-04 Thread Otis Gospodnetic
James,

It looks like people already answered your questions.
Split your big index.
Put it on multiple servers.
Put Solr on each of those servers.
Write an application that searches multiple Solr instances in parallel.
Get N results from each, combine them, order by score.

As far as I know, this is the best you can do with what is available from Solr 
today.
For anything else, you'll have to roll up your sleeves and dig into the code.

Good luck!

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message 
From: James liu <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Thursday, April 5, 2007 1:18:30 AM
Subject: Re: Does solr support Multi index and return by score and datetime

Anyone have problem like this and how to solve it?




2007/4/5, James liu <[EMAIL PROTECTED]>:
>
>
>
> 2007/4/5, Mike Klaas <[EMAIL PROTECTED]>:
> >
> > On 4/4/07, James liu <[EMAIL PROTECTED]> wrote:
> >
> > > > > I think it is part of full-text search.
> > >
> > > I think query slavers and combin result by score should be the part of
> > solr.
> > >
> > > I find it http://dev.lucene-ws.net/wiki/MultiIndexOperations
> > > but i wanna use solr and i like it.
> > >
> > > Now i wanna find a good method to solve it by using solr and less
> > > coding.(More code will cost more time to write and test.)
> >
> > I agree that it would be an excellent addition to Solr, but it is a
> > major undertaking, and so I wouldn't wait around for it if it is
> > important to you.  Solr devs have code to write and test too :).
> >
> > > > >  If you document
> > > > > > distribution is uniform random, then the norms converge to
> > > > > > approximately equal values anyway.
> > > > >
> > > > > I don't know it.
> > >
> > > I don't know why u say "document distribution". Does it mean if i
> > write code
> > > independently, i will consider it?
> >
> > One of the complexities of queries multiple remote Solr/lucene
> > instances is that the scores are not directly comparable as the term
> > idf scores will be different.  However, in practical situations, this
> > can be glossed over.
> >
> > This is the basic algorithm for single-pass querying multiple solr
> > slaves.  Say you want results N to N + M (e.g 10 to 20).
> >
> > 1. query each solr instance independently for N+M documents for the
> > given query.  This should be done asynchronously (or you could spawn a
> > thread per server).
> > 2. wait for all responses (or for a certain timeout)
> > 3. put all returned documents into an array, and reverse sort by score
> > 4. select documents [N, N+M) from this array.
> >
> > This is a relatively simple task.  It gets more complicated once
> > multiple passes, idf compensation, deduplication, etc. are added.
> >
> > -Mike
> >
>
> Thks Mike.
>
> I find it more complicate than i think.
>
> Is it the only way to solve my problem:
>
> I have a project, it have 100g data, now i have 3-4 server for solr.
>
>
>
>
>
>
> --
> regards
> jl




-- 
regards
jl





Re: Does solr support Multi index and return by score and datetime

2007-04-04 Thread James liu

2007/4/5, Mike Klaas <[EMAIL PROTECTED]>:


On 4/4/07, James liu <[EMAIL PROTECTED]> wrote:

> > > I think it is part of full-text search.
>
> I think query slavers and combin result by score should be the part of
solr.
>
> I find it http://dev.lucene-ws.net/wiki/MultiIndexOperations
> but i wanna use solr and i like it.
>
> Now i wanna find a good method to solve it by using solr and less
> coding.(More code will cost more time to write and test.)

I agree that it would be an excellent addition to Solr, but it is a
major undertaking, and so I wouldn't wait around for it if it is
important to you.  Solr devs have code to write and test too :).

> > >  If you document
> > > > distribution is uniform random, then the norms converge to
> > > > approximately equal values anyway.
> > >
> > > I don't know it.
>
> I don't know why u say "document distribution". Does it mean if i write
code
> independently, i will consider it?

One of the complexities of queries multiple remote Solr/lucene
instances is that the scores are not directly comparable as the term
idf scores will be different.  However, in practical situations, this
can be glossed over.

This is the basic algorithm for single-pass querying multiple solr
slaves.  Say you want results N to N + M (e.g 10 to 20).

1. query each solr instance independently for N+M documents for the
given query.  This should be done asynchronously (or you could spawn a
thread per server).
2. wait for all responses (or for a certain timeout)
3. put all returned documents into an array, and reverse sort by score
4. select documents [N, N+M) from this array.

This is a relatively simple task.  It gets more complicated once
multiple passes, idf compensation, deduplication, etc. are added.

-Mike



Thks Mike.

I find it more complicate than i think.

Is it the only way to solve my problem:

I have a project, it have 100g data, now i have 3-4 server for solr.






--
regards
jl


Re: Does solr support Multi index and return by score and datetime

2007-04-04 Thread Mike Klaas

On 4/4/07, James liu <[EMAIL PROTECTED]> wrote:


> > I think it is part of full-text search.

I think query slavers and combin result by score should be the part of solr.

I find it http://dev.lucene-ws.net/wiki/MultiIndexOperations
but i wanna use solr and i like it.

Now i wanna find a good method to solve it by using solr and less
coding.(More code will cost more time to write and test.)


I agree that it would be an excellent addition to Solr, but it is a
major undertaking, and so I wouldn't wait around for it if it is
important to you.  Solr devs have code to write and test too :).


> >  If you document
> > > distribution is uniform random, then the norms converge to
> > > approximately equal values anyway.
> >
> > I don't know it.

I don't know why u say "document distribution". Does it mean if i write code
independently, i will consider it?


One of the complexities of queries multiple remote Solr/lucene
instances is that the scores are not directly comparable as the term
idf scores will be different.  However, in practical situations, this
can be glossed over.

This is the basic algorithm for single-pass querying multiple solr
slaves.  Say you want results N to N + M (e.g 10 to 20).

1. query each solr instance independently for N+M documents for the
given query.  This should be done asynchronously (or you could spawn a
thread per server).
2. wait for all responses (or for a certain timeout)
3. put all returned documents into an array, and reverse sort by score
4. select documents [N, N+M) from this array.

This is a relatively simple task.  It gets more complicated once
multiple passes, idf compensation, deduplication, etc. are added.

-Mike


Re: Does solr support Multi index and return by score and datetime

2007-04-04 Thread James liu

2007/4/5, Mike Klaas <[EMAIL PROTECTED]>:


On 4/4/07, James liu <[EMAIL PROTECTED]> wrote:
> 2007/4/5, Mike Klaas <[EMAIL PROTECTED]>:
> >
> > On 4/4/07, James liu <[EMAIL PROTECTED]> wrote:
> > > That means now i can' solve it with solr?
> >
> > Not out-of-the-box, no.  But you can certainly query your slaves
> > independently can combine based on score.
>
> I think it is part of full-text search.



I think query slavers and combin result by score should be the part of solr.

I find it http://dev.lucene-ws.net/wiki/MultiIndexOperations
but i wanna use solr and i like it.

Now i wanna find a good method to solve it by using solr and less
coding.(More code will cost more time to write and test.)



>  If you document
> > distribution is uniform random, then the norms converge to
> > approximately equal values anyway.
>
> I don't know it.



I don't know why u say "document distribution". Does it mean if i write code
independently, i will consider it?


I'm afraid I didn't understand either of these comments.


-Mike





--
regards
jl


Re: Does solr support Multi index and return by score and datetime

2007-04-04 Thread Mike Klaas

On 4/4/07, James liu <[EMAIL PROTECTED]> wrote:

2007/4/5, Mike Klaas <[EMAIL PROTECTED]>:
>
> On 4/4/07, James liu <[EMAIL PROTECTED]> wrote:
> > That means now i can' solve it with solr?
>
> Not out-of-the-box, no.  But you can certainly query your slaves
> independently can combine based on score.

I think it is part of full-text search.

 If you document
> distribution is uniform random, then the norms converge to
> approximately equal values anyway.

I don't know it.


I'm afraid I didn't understand either of these comments.

-Mike


Re: Does solr support Multi index and return by score and datetime

2007-04-04 Thread James liu

2007/4/5, Mike Klaas <[EMAIL PROTECTED]>:


On 4/4/07, James liu <[EMAIL PROTECTED]> wrote:
> That means now i can' solve it with solr?

Not out-of-the-box, no.  But you can certainly query your slaves
independently can combine based on score.



I think it is part of full-text search.

If you document

distribution is uniform random, then the norms converge to
approximately equal values anyway.



I don't know it.


-Mike






--
regards
jl


Re: Does solr support Multi index and return by score and datetime

2007-04-04 Thread Mike Klaas

On 4/4/07, James liu <[EMAIL PROTECTED]> wrote:

That means now i can' solve it with solr?


Not out-of-the-box, no.  But you can certainly query your slaves
independently can combine based on score.  If you document
distribution is uniform random, then the norms converge to
approximately equal values anyway.

-Mike


Re: Does solr support Multi index and return by score and datetime

2007-04-04 Thread James liu

That means now i can' solve it with solr?


2007/4/4, Yonik Seeley <[EMAIL PROTECTED]>:


On 4/4/07, James liu <[EMAIL PROTECTED]> wrote:
> i find it http://wiki.apache.org/solr/FederatedSearch

That was design brainstorming.  Nothing there has been implemented,
and it's not currently  at the top of my personal todo list.

-Yonik





--
regards
jl


Re: Does solr support Multi index and return by score and datetime

2007-04-04 Thread Yonik Seeley

On 4/4/07, James liu <[EMAIL PROTECTED]> wrote:

i find it http://wiki.apache.org/solr/FederatedSearch


That was design brainstorming.  Nothing there has been implemented,
and it's not currently  at the top of my personal todo list.

-Yonik


Re: Does solr support Multi index and return by score and datetime

2007-04-04 Thread James liu

i find it http://wiki.apache.org/solr/FederatedSearch

we can use it and how?




2007/4/4, James liu <[EMAIL PROTECTED]>:


i have a project, it have 100g data, now i have 3-4 server for solr.

so i wanna use multi solr to decrease index's time.

but how to search by using solr, if solr not support multi index.



--
regards
jl





--
regards
jl


Does solr support Multi index and return by score and datetime

2007-04-04 Thread James liu

i have a project, it have 100g data, now i have 3-4 server for solr.

so i wanna use multi solr to decrease index's time.

but how to search by using solr, if solr not support multi index.



--
regards
jl