Re: best way for sum of fields

2011-11-07 Thread stockii
sry. 

i need the sum of values of the found documents. e.g. the total amount of
one day. each doc in index has ist own amount.

i try out something with StatsComponent but with  48 Million docs in Index
its to slow. 

-
--- System 

One Server, 12 GB RAM, 2 Solr Instances, 8 Cores, 
1 Core with 45 Million Documents other Cores  200.000

- Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
- Solr2 for Update-Request  - delta every Minute - 4GB Xmx
--
View this message in context: 
http://lucene.472066.n3.nabble.com/best-way-for-sum-of-fields-tp3477517p3486406.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: best way for sum of fields

2011-11-07 Thread pravesh
I Guess,

This has nothing to do with search part. You can post process the search
results(I mean iterate through your results and sum it)

Regds
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/best-way-for-sum-of-fields-tp3477517p3486536.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: best way for sum of fields

2011-11-07 Thread stockii
yes, this way i am using on another part in my application. i hoped, that
exists another way to avoid the way over php

-
--- System 

One Server, 12 GB RAM, 2 Solr Instances, 8 Cores, 
1 Core with 45 Million Documents other Cores  200.000

- Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
- Solr2 for Update-Request  - delta every Minute - 4GB Xmx
--
View this message in context: 
http://lucene.472066.n3.nabble.com/best-way-for-sum-of-fields-tp3477517p3486593.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: best way for sum of fields

2011-11-07 Thread Tanguy Moal

Hi,

If you only need to sum over displayed results, go with the 
post-processing of hits solution, that's fast and easy.
If you sum over the whole data set (i.e your sum is not query 
dependant), have it computed at indexing time, depending on your 
indexing workflow.


Otherwise, (sum over the whole result set, query dependant but 
independantly of displayed results) you should give a try to sharding...
You generally want that when your index size is too large to be searched 
quickly (see http://wiki.apache.org/solr/DistributedSearch) (here the 
sum operation is part of a search query)


Basically what you need is:
- On the master host : n master instances (each being a shard)
- On slave host : n slave instances (each being a replica of its master 
side counterpart)


Only the slave instances will need a comfortable amount of RAM in order 
to serve queries rapidly. Slave instances can be deployed over several 
hosts if the total amount of RAM required is high.


Your main effort here might be in finding the 'n' value.
You have 45M documents in a single shard and that may be the cause of 
your issue, especially for queries returning a high number of results.

You may need to split it into more shards to achieve your goal.

This should enable you to reduce the time to perform the sum operation 
at search time (but adds complixity at data indexing time : you need to 
define a way to send documents to shard #1, #2, ..., or #n).
If you keep having more and more documents over time, may be you'll want 
to have a fixed maximum shard size (say 5M docs, if performing the sum 
on 5M docs is fast enough) and simply add shards as required, when more 
documents are to be indexed/searched. This addresses the importing issue 
because you'll simply need to change the target shard every 5M documents.

The last shard is always the smallest.

Such sharding can involve a little overhead at search time : make sure 
you don't allow for retrieval of far documents (start=k, where k is high 
-- see 
http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations).
- When using stats component, have start and rows parameters set to 0 
if you don't need the documents themselves.



After that, if you face high search load issues, you could still 
duplicate the slave host to match your load requirements, and 
load-balance your search traffic over slaves as required.


Hope this helps,

Tanguy

Le 07/11/2011 09:49, stockii a écrit :

sry.

i need the sum of values of the found documents. e.g. the total amount of
one day. each doc in index has ist own amount.

i try out something with StatsComponent but with  48 Million docs in Index
its to slow.

-
--- System 

One Server, 12 GB RAM, 2 Solr Instances, 8 Cores,
1 Core with 45 Million Documents other Cores  200.000

- Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
- Solr2 for Update-Request  - delta every Minute - 4GB Xmx
--
View this message in context: 
http://lucene.472066.n3.nabble.com/best-way-for-sum-of-fields-tp3477517p3486406.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: best way for sum of fields

2011-11-07 Thread stockii
hi thanks for the big reply ;)

i had the idea with the several and small 5M shards too. 
and i think thats the next step i have to go, because our biggest index
grows each day with avg. 50K documents.
but make it sense to keep searcher AND updater cores on one big server? i
dont want to use replication, because with our own high avalibility solution
is this not possible. 

my system is split into searcher and updater cores, each with his own index.
some search requests are over all this 8 cores with distributed search.



-
--- System 

One Server, 12 GB RAM, 2 Solr Instances, 8 Cores, 
1 Core with 45 Million Documents other Cores  200.000

- Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
- Solr2 for Update-Request  - delta every Minute - 4GB Xmx
--
View this message in context: 
http://lucene.472066.n3.nabble.com/best-way-for-sum-of-fields-tp3477517p3486652.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: best way for sum of fields

2011-11-07 Thread Tanguy Moal

Hi again,
Since you have a custom high availability solution over your solr 
instances, I can't help much I guess... :-)


I usually rely on master/slave replication to separate index build and 
index search processes.


The fact is that resources consumption at build time and search time are 
not necessarily the same, and therefor hardware dimensioning can be 
adapted as required.
I like to have the service related processes isolated and easy to deploy 
wherever needed. Just in case things go wrong, hardware failures occur.
Build services on the other hand don't have the same availability 
constraints, and can be off for a while, it's no issue (unless near 
realtime indexing comes into party, that's an other thing)


In a slave configuration, the index doesn't need to commit. It simply 
replicates its data from its associated master whenever the master 
changes and performs a reopen of the searcher. Change events can be 
triggered at commit, startup and / or optimize. (see 
http://wiki.apache.org/solr/SolrReplication , although you seemed to be 
not interested by this feature :) )


Having search and build on the same host is no bad point.
It simply depends on available resources and build vs service load 
requirements.
For example with a big core such as the one you have, segments merging 
can occur from time to time, which is an operation that is IO bound 
(i.e. time is dependant of disk performances). Under high IO load, a 
server can become less responsive and therefor having the service 
separated from the build could became handy at that time.


As you see, I can't tell you what makes sense and what doesn't.
It's all about what you're doing, at which frequency, etc. :-)

Regards,

Tanguy

Le 07/11/2011 12:12, stockii a écrit :

hi thanks for the big reply ;)

i had the idea with the several and small 5M shards too.
and i think thats the next step i have to go, because our biggest index
grows each day with avg. 50K documents.
but make it sense to keep searcher AND updater cores on one big server? i
dont want to use replication, because with our own high avalibility solution
is this not possible.

my system is split into searcher and updater cores, each with his own index.
some search requests are over all this 8 cores with distributed search.



-
--- System 

One Server, 12 GB RAM, 2 Solr Instances, 8 Cores,
1 Core with 45 Million Documents other Cores  200.000

- Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
- Solr2 for Update-Request  - delta every Minute - 4GB Xmx
--
View this message in context: 
http://lucene.472066.n3.nabble.com/best-way-for-sum-of-fields-tp3477517p3486652.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: best way for sum of fields

2011-11-04 Thread Erick Erickson
Please define sum of fields. The total number of unique terms for
all the fields?
The sum of some values of some fields for each document?
The count of the number of fields in the index?
Other???

Best
Erick

On Thu, Nov 3, 2011 at 11:43 AM, stockii stock.jo...@googlemail.com wrote:
 i am searching for the best way to get the sum of fields.

 I know the StatsComponent, but this component is not fast enough for 40-60
 thousands documents.

 exists some other components or methods form solr ?

 -
 --- System 
 

 One Server, 12 GB RAM, 2 Solr Instances, 8 Cores,
 1 Core with 45 Million Documents other Cores  200.000

 - Solr1 for Search-Requests - commit every Minute  - 5GB Xmx
 - Solr2 for Update-Request  - delta every Minute - 4GB Xmx
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/best-way-for-sum-of-fields-tp3477517p3477517.html
 Sent from the Solr - User mailing list archive at Nabble.com.