Re: best way for sum of fields
sry. i need the sum of values of the found documents. e.g. the total amount of one day. each doc in index has ist own amount. i try out something with StatsComponent but with 48 Million docs in Index its to slow. - --- System One Server, 12 GB RAM, 2 Solr Instances, 8 Cores, 1 Core with 45 Million Documents other Cores 200.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/best-way-for-sum-of-fields-tp3477517p3486406.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: best way for sum of fields
I Guess, This has nothing to do with search part. You can post process the search results(I mean iterate through your results and sum it) Regds Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/best-way-for-sum-of-fields-tp3477517p3486536.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: best way for sum of fields
yes, this way i am using on another part in my application. i hoped, that exists another way to avoid the way over php - --- System One Server, 12 GB RAM, 2 Solr Instances, 8 Cores, 1 Core with 45 Million Documents other Cores 200.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/best-way-for-sum-of-fields-tp3477517p3486593.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: best way for sum of fields
Hi, If you only need to sum over displayed results, go with the post-processing of hits solution, that's fast and easy. If you sum over the whole data set (i.e your sum is not query dependant), have it computed at indexing time, depending on your indexing workflow. Otherwise, (sum over the whole result set, query dependant but independantly of displayed results) you should give a try to sharding... You generally want that when your index size is too large to be searched quickly (see http://wiki.apache.org/solr/DistributedSearch) (here the sum operation is part of a search query) Basically what you need is: - On the master host : n master instances (each being a shard) - On slave host : n slave instances (each being a replica of its master side counterpart) Only the slave instances will need a comfortable amount of RAM in order to serve queries rapidly. Slave instances can be deployed over several hosts if the total amount of RAM required is high. Your main effort here might be in finding the 'n' value. You have 45M documents in a single shard and that may be the cause of your issue, especially for queries returning a high number of results. You may need to split it into more shards to achieve your goal. This should enable you to reduce the time to perform the sum operation at search time (but adds complixity at data indexing time : you need to define a way to send documents to shard #1, #2, ..., or #n). If you keep having more and more documents over time, may be you'll want to have a fixed maximum shard size (say 5M docs, if performing the sum on 5M docs is fast enough) and simply add shards as required, when more documents are to be indexed/searched. This addresses the importing issue because you'll simply need to change the target shard every 5M documents. The last shard is always the smallest. Such sharding can involve a little overhead at search time : make sure you don't allow for retrieval of far documents (start=k, where k is high -- see http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations). - When using stats component, have start and rows parameters set to 0 if you don't need the documents themselves. After that, if you face high search load issues, you could still duplicate the slave host to match your load requirements, and load-balance your search traffic over slaves as required. Hope this helps, Tanguy Le 07/11/2011 09:49, stockii a écrit : sry. i need the sum of values of the found documents. e.g. the total amount of one day. each doc in index has ist own amount. i try out something with StatsComponent but with 48 Million docs in Index its to slow. - --- System One Server, 12 GB RAM, 2 Solr Instances, 8 Cores, 1 Core with 45 Million Documents other Cores 200.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/best-way-for-sum-of-fields-tp3477517p3486406.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: best way for sum of fields
hi thanks for the big reply ;) i had the idea with the several and small 5M shards too. and i think thats the next step i have to go, because our biggest index grows each day with avg. 50K documents. but make it sense to keep searcher AND updater cores on one big server? i dont want to use replication, because with our own high avalibility solution is this not possible. my system is split into searcher and updater cores, each with his own index. some search requests are over all this 8 cores with distributed search. - --- System One Server, 12 GB RAM, 2 Solr Instances, 8 Cores, 1 Core with 45 Million Documents other Cores 200.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/best-way-for-sum-of-fields-tp3477517p3486652.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: best way for sum of fields
Hi again, Since you have a custom high availability solution over your solr instances, I can't help much I guess... :-) I usually rely on master/slave replication to separate index build and index search processes. The fact is that resources consumption at build time and search time are not necessarily the same, and therefor hardware dimensioning can be adapted as required. I like to have the service related processes isolated and easy to deploy wherever needed. Just in case things go wrong, hardware failures occur. Build services on the other hand don't have the same availability constraints, and can be off for a while, it's no issue (unless near realtime indexing comes into party, that's an other thing) In a slave configuration, the index doesn't need to commit. It simply replicates its data from its associated master whenever the master changes and performs a reopen of the searcher. Change events can be triggered at commit, startup and / or optimize. (see http://wiki.apache.org/solr/SolrReplication , although you seemed to be not interested by this feature :) ) Having search and build on the same host is no bad point. It simply depends on available resources and build vs service load requirements. For example with a big core such as the one you have, segments merging can occur from time to time, which is an operation that is IO bound (i.e. time is dependant of disk performances). Under high IO load, a server can become less responsive and therefor having the service separated from the build could became handy at that time. As you see, I can't tell you what makes sense and what doesn't. It's all about what you're doing, at which frequency, etc. :-) Regards, Tanguy Le 07/11/2011 12:12, stockii a écrit : hi thanks for the big reply ;) i had the idea with the several and small 5M shards too. and i think thats the next step i have to go, because our biggest index grows each day with avg. 50K documents. but make it sense to keep searcher AND updater cores on one big server? i dont want to use replication, because with our own high avalibility solution is this not possible. my system is split into searcher and updater cores, each with his own index. some search requests are over all this 8 cores with distributed search. - --- System One Server, 12 GB RAM, 2 Solr Instances, 8 Cores, 1 Core with 45 Million Documents other Cores 200.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/best-way-for-sum-of-fields-tp3477517p3486652.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: best way for sum of fields
Please define sum of fields. The total number of unique terms for all the fields? The sum of some values of some fields for each document? The count of the number of fields in the index? Other??? Best Erick On Thu, Nov 3, 2011 at 11:43 AM, stockii stock.jo...@googlemail.com wrote: i am searching for the best way to get the sum of fields. I know the StatsComponent, but this component is not fast enough for 40-60 thousands documents. exists some other components or methods form solr ? - --- System One Server, 12 GB RAM, 2 Solr Instances, 8 Cores, 1 Core with 45 Million Documents other Cores 200.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/best-way-for-sum-of-fields-tp3477517p3477517.html Sent from the Solr - User mailing list archive at Nabble.com.