Re: Best practice for Delta every 2 Minutes.
Don't use a RAMDirectory. The operating system is better at managing memory (disk buffers) than Java. Just use the disk-based index and it will be just as fast. On Fri, Dec 17, 2010 at 5:10 AM, Erick Erickson wrote: > In this context, delta refers to the changes over some interval. > > Best > Erick > > On Fri, Dec 17, 2010 at 2:03 AM, Dennis Gearon wrote: > >> BTW, what is a Delta (in this context, not an equipment line or a rocket, >> please :-) >> Dennis Gearon >> >> Signature Warning >> >> It is always a good idea to learn from your own mistakes. It is usually a >> better idea to learn from others’ mistakes, so you do not have to make them >> yourself. from ' >> http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' >> >> EARTH has a Right To Life, >> otherwise we all die. >> >> >> --- On Thu, 12/16/10, Li Li wrote: >> >> > From: Li Li >> > Subject: Re: Best practice for Delta every 2 Minutes. >> > To: solr-user@lucene.apache.org >> > Date: Thursday, December 16, 2010, 10:54 PM >> > I think it will not because default >> > configuration can only have 2 >> > newSearcher threads but the delay will be more and more >> > long. The >> > newer newSearcher will wait these 2 ealier one to finish. >> > >> > 2010/12/1 Jonathan Rochkind : >> > > If your index warmings take longer than two minutes, >> > but you're doing a >> > > commit every two minutes -- you're going to run into >> > trouble with >> > > overlapping index preperations, eventually leading to >> > an OOM. Could this be >> > > it? >> > > >> > > On 11/30/2010 11:36 AM, Erick Erickson wrote: >> > >> >> > >> I don't know, you'll have to debug it to see if >> > it's the thing that takes >> > >> so >> > >> long. Solr >> > >> should be able to handle 1,200 updates in a very >> > short time unless there's >> > >> something >> > >> else going on, like you're committing after every >> > update or something. >> > >> >> > >> This may help you track down performance with DIH >> > >> >> > >> http://wiki.apache.org/solr/DataImportHandler#interactive >> > >> >> > >> <http://wiki.apache.org/solr/DataImportHandler#interactive>Best >> > >> Erick >> > >> >> > >> On Tue, Nov 30, 2010 at 9:01 AM, stockii >> > wrote: >> > >> >> > >>> how do you think is the deltaQuery better ? >> > XD >> > >>> -- >> > >>> View this message in context: >> > >>> >> > >>> >> http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992774.html >> > >>> Sent from the Solr - User mailing list archive >> > at Nabble.com. >> > >>> >> > > >> > >> > -- Lance Norskog goks...@gmail.com
Re: Best practice for Delta every 2 Minutes.
In this context, delta refers to the changes over some interval. Best Erick On Fri, Dec 17, 2010 at 2:03 AM, Dennis Gearon wrote: > BTW, what is a Delta (in this context, not an equipment line or a rocket, > please :-) > Dennis Gearon > > Signature Warning > > It is always a good idea to learn from your own mistakes. It is usually a > better idea to learn from others’ mistakes, so you do not have to make them > yourself. from ' > http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > EARTH has a Right To Life, > otherwise we all die. > > > --- On Thu, 12/16/10, Li Li wrote: > > > From: Li Li > > Subject: Re: Best practice for Delta every 2 Minutes. > > To: solr-user@lucene.apache.org > > Date: Thursday, December 16, 2010, 10:54 PM > > I think it will not because default > > configuration can only have 2 > > newSearcher threads but the delay will be more and more > > long. The > > newer newSearcher will wait these 2 ealier one to finish. > > > > 2010/12/1 Jonathan Rochkind : > > > If your index warmings take longer than two minutes, > > but you're doing a > > > commit every two minutes -- you're going to run into > > trouble with > > > overlapping index preperations, eventually leading to > > an OOM. Could this be > > > it? > > > > > > On 11/30/2010 11:36 AM, Erick Erickson wrote: > > >> > > >> I don't know, you'll have to debug it to see if > > it's the thing that takes > > >> so > > >> long. Solr > > >> should be able to handle 1,200 updates in a very > > short time unless there's > > >> something > > >> else going on, like you're committing after every > > update or something. > > >> > > >> This may help you track down performance with DIH > > >> > > >> http://wiki.apache.org/solr/DataImportHandler#interactive > > >> > > >> <http://wiki.apache.org/solr/DataImportHandler#interactive>Best > > >> Erick > > >> > > >> On Tue, Nov 30, 2010 at 9:01 AM, stockii > > wrote: > > >> > > >>> how do you think is the deltaQuery better ? > > XD > > >>> -- > > >>> View this message in context: > > >>> > > >>> > http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992774.html > > >>> Sent from the Solr - User mailing list archive > > at Nabble.com. > > >>> > > > > > >
Re: Best practice for Delta every 2 Minutes.
BTW, what is a Delta (in this context, not an equipment line or a rocket, please :-) Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Thu, 12/16/10, Li Li wrote: > From: Li Li > Subject: Re: Best practice for Delta every 2 Minutes. > To: solr-user@lucene.apache.org > Date: Thursday, December 16, 2010, 10:54 PM > I think it will not because default > configuration can only have 2 > newSearcher threads but the delay will be more and more > long. The > newer newSearcher will wait these 2 ealier one to finish. > > 2010/12/1 Jonathan Rochkind : > > If your index warmings take longer than two minutes, > but you're doing a > > commit every two minutes -- you're going to run into > trouble with > > overlapping index preperations, eventually leading to > an OOM. Could this be > > it? > > > > On 11/30/2010 11:36 AM, Erick Erickson wrote: > >> > >> I don't know, you'll have to debug it to see if > it's the thing that takes > >> so > >> long. Solr > >> should be able to handle 1,200 updates in a very > short time unless there's > >> something > >> else going on, like you're committing after every > update or something. > >> > >> This may help you track down performance with DIH > >> > >> http://wiki.apache.org/solr/DataImportHandler#interactive > >> > >> <http://wiki.apache.org/solr/DataImportHandler#interactive>Best > >> Erick > >> > >> On Tue, Nov 30, 2010 at 9:01 AM, stockii > wrote: > >> > >>> how do you think is the deltaQuery better ? > XD > >>> -- > >>> View this message in context: > >>> > >>> http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992774.html > >>> Sent from the Solr - User mailing list archive > at Nabble.com. > >>> > > >
Re: Best practice for Delta every 2 Minutes.
we now meet the same situation and want to implement like this: we add new documents to a RAMDirectory and search two indice-- the index in disk and the RAM index. regularly(e.g. every hour we flush the RAMDirecotry into disk and make a new segment) to prevent error. before add to RAMDirecotry,we write the document into log file. and after flushing, we delete corresponding lines in the log file if the program corrput. we will redo the log and add them into RAMDirectory. Any one has done similar work? 2010/12/1 Li Li : > you may implement your own MergePolicy to keep on large index and > merge all other small ones > or simply set merge factor to 2 and the largest index not be merged by > set maxMergeDocs less than the docs in the largest one. > So there is one large index and a small one. when adding a little > docs, they will be merged into the small one. and you can, e.g. weekly > optimize the index and merge all indice into one index. > > 2010/11/30 stockii : >> >> Hello. >> >> index is about 28 Million documents large. When i starts an delta-import is >> look at modified. but delta import takes to long. over an hour need solr for >> delta. >> >> thats my query. all sessions from the last hour should updated and all >> changed. i think its normal that solr need long time for the querys. how can >> i optimize this ? >> >> deltaQuery="SELECT id FROM sessions >> WHERE created BETWEEN DATE_ADD( NOW(), INTERVAL - 10 HOUR ) AND NOW() >> OR modified BETWEEN '${dataimporter.last_index_time}' AND DATE_ADD( NOW(), >> INTERVAL - 1 HOUR ) " >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992714.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >
Re: Best practice for Delta every 2 Minutes.
I think it will not because default configuration can only have 2 newSearcher threads but the delay will be more and more long. The newer newSearcher will wait these 2 ealier one to finish. 2010/12/1 Jonathan Rochkind : > If your index warmings take longer than two minutes, but you're doing a > commit every two minutes -- you're going to run into trouble with > overlapping index preperations, eventually leading to an OOM. Could this be > it? > > On 11/30/2010 11:36 AM, Erick Erickson wrote: >> >> I don't know, you'll have to debug it to see if it's the thing that takes >> so >> long. Solr >> should be able to handle 1,200 updates in a very short time unless there's >> something >> else going on, like you're committing after every update or something. >> >> This may help you track down performance with DIH >> >> http://wiki.apache.org/solr/DataImportHandler#interactive >> >> <http://wiki.apache.org/solr/DataImportHandler#interactive>Best >> Erick >> >> On Tue, Nov 30, 2010 at 9:01 AM, stockii wrote: >> >>> how do you think is the deltaQuery better ? XD >>> -- >>> View this message in context: >>> >>> http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992774.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >
Re: Best practice for Delta every 2 Minutes.
In fact, having a master/slave where the master is the indexing/updating machine and the slave(s) are searchers is one of the recommended configurations. The replication is used in many, many sites so it's pretty solid. It's generally not recommended, though, to run separate instances on the *same* server. No matter how many cores/instances/etc, you're still running on the same physical hardware so I/O contention, memory issues, etc are still bounded by your hardware Best Erick On Thu, Dec 2, 2010 at 5:12 AM, stockii wrote: > > at the time no OOM occurs. but we are not in correct live system ... > > i thougt maybe i get this problem ... > > we are running seven cores and each want be update very fast. only one core > have a huge index with 28M docs. maybe it makes sense for the future to use > solr with replication !? or can i runs two instances, one for search and > one > for updating ? or is there the danger of corrupt indizes ? > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p2005108.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Best practice for Delta every 2 Minutes.
at the time no OOM occurs. but we are not in correct live system ... i thougt maybe i get this problem ... we are running seven cores and each want be update very fast. only one core have a huge index with 28M docs. maybe it makes sense for the future to use solr with replication !? or can i runs two instances, one for search and one for updating ? or is there the danger of corrupt indizes ? -- View this message in context: http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p2005108.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best practice for Delta every 2 Minutes.
If your index warmings take longer than two minutes, but you're doing a commit every two minutes -- you're going to run into trouble with overlapping index preperations, eventually leading to an OOM. Could this be it? On 11/30/2010 11:36 AM, Erick Erickson wrote: I don't know, you'll have to debug it to see if it's the thing that takes so long. Solr should be able to handle 1,200 updates in a very short time unless there's something else going on, like you're committing after every update or something. This may help you track down performance with DIH http://wiki.apache.org/solr/DataImportHandler#interactive <http://wiki.apache.org/solr/DataImportHandler#interactive>Best Erick On Tue, Nov 30, 2010 at 9:01 AM, stockii wrote: how do you think is the deltaQuery better ? XD -- View this message in context: http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992774.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best practice for Delta every 2 Minutes.
http://10.1.0.10:8983/solr/payment/dataimport?commad=delta-import&debug=on dont work. no debug is started =( thanks. i will try mergefactor=2 -- View this message in context: http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1997595.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best practice for Delta every 2 Minutes.
you may implement your own MergePolicy to keep on large index and merge all other small ones or simply set merge factor to 2 and the largest index not be merged by set maxMergeDocs less than the docs in the largest one. So there is one large index and a small one. when adding a little docs, they will be merged into the small one. and you can, e.g. weekly optimize the index and merge all indice into one index. 2010/11/30 stockii : > > Hello. > > index is about 28 Million documents large. When i starts an delta-import is > look at modified. but delta import takes to long. over an hour need solr for > delta. > > thats my query. all sessions from the last hour should updated and all > changed. i think its normal that solr need long time for the querys. how can > i optimize this ? > > deltaQuery="SELECT id FROM sessions > WHERE created BETWEEN DATE_ADD( NOW(), INTERVAL - 10 HOUR ) AND NOW() > OR modified BETWEEN '${dataimporter.last_index_time}' AND DATE_ADD( NOW(), > INTERVAL - 1 HOUR ) " > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992714.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Best practice for Delta every 2 Minutes.
I don't know, you'll have to debug it to see if it's the thing that takes so long. Solr should be able to handle 1,200 updates in a very short time unless there's something else going on, like you're committing after every update or something. This may help you track down performance with DIH http://wiki.apache.org/solr/DataImportHandler#interactive <http://wiki.apache.org/solr/DataImportHandler#interactive>Best Erick On Tue, Nov 30, 2010 at 9:01 AM, stockii wrote: > > how do you think is the deltaQuery better ? XD > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992774.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Best practice for Delta every 2 Minutes.
i copied the wrong query, because 10 hours ;) i didnt test the query with 28 million records . but wiht a few million and it works fine. ... before i used DIH, i used php and import direclty documents into solr. but i want use dih because the better performance, i think so ... grml ... -- View this message in context: http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992908.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best practice for Delta every 2 Minutes.
how do you think is the deltaQuery better ? XD -- View this message in context: http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992774.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best practice for Delta every 2 Minutes.
everyday ~30.000 Documents and every hour ~1200 multiple thread with DIH ? how it works ? -- View this message in context: http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992767.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best practice for Delta every 2 Minutes.
Please provide more data. Specifically: > how many documents are updated? > Have you tried running this query without Solr? In other words have you investigated whether the speed issue is simply your SQL executing slowly? > Why are you selecting the last 10 hours' data when all you want is the last hour? You could always partition the problem to multiple threads if you really have that many documents to update, but I'd look at the efficiency of your SQL query first. Best Erick On Tue, Nov 30, 2010 at 8:50 AM, stockii wrote: > > Hello. > > index is about 28 Million documents large. When i starts an delta-import is > look at modified. but delta import takes to long. over an hour need solr > for > delta. > > thats my query. all sessions from the last hour should updated and all > changed. i think its normal that solr need long time for the querys. how > can > i optimize this ? > > deltaQuery="SELECT id FROM sessions > WHERE created BETWEEN DATE_ADD( NOW(), INTERVAL - 10 HOUR ) AND NOW() > OR modified BETWEEN '${dataimporter.last_index_time}' AND DATE_ADD( NOW(), > INTERVAL - 1 HOUR ) " > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992714.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Best practice for Delta every 2 Minutes.
Hello. index is about 28 Million documents large. When i starts an delta-import is look at modified. but delta import takes to long. over an hour need solr for delta. thats my query. all sessions from the last hour should updated and all changed. i think its normal that solr need long time for the querys. how can i optimize this ? deltaQuery="SELECT id FROM sessions WHERE created BETWEEN DATE_ADD( NOW(), INTERVAL - 10 HOUR ) AND NOW() OR modified BETWEEN '${dataimporter.last_index_time}' AND DATE_ADD( NOW(), INTERVAL - 1 HOUR ) " -- View this message in context: http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992714.html Sent from the Solr - User mailing list archive at Nabble.com.