Re: Best practice for Delta every 2 Minutes.

2010-12-17 Thread Lance Norskog
Don't use a RAMDirectory. The operating system is better at managing
memory (disk buffers) than Java. Just use the disk-based index and it
will be just as fast.

On Fri, Dec 17, 2010 at 5:10 AM, Erick Erickson  wrote:
> In this context, delta refers to the changes over some interval.
>
> Best
> Erick
>
> On Fri, Dec 17, 2010 at 2:03 AM, Dennis Gearon wrote:
>
>> BTW, what is a Delta  (in this context, not an equipment line or a rocket,
>> please :-)
>> Dennis Gearon
>>
>> Signature Warning
>> 
>> It is always a good idea to learn from your own mistakes. It is usually a
>> better idea to learn from others’ mistakes, so you do not have to make them
>> yourself. from '
>> http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>>
>> EARTH has a Right To Life,
>>  otherwise we all die.
>>
>>
>> --- On Thu, 12/16/10, Li Li  wrote:
>>
>> > From: Li Li 
>> > Subject: Re: Best practice for Delta every 2 Minutes.
>> > To: solr-user@lucene.apache.org
>> > Date: Thursday, December 16, 2010, 10:54 PM
>> > I think it will not because default
>> > configuration can only have 2
>> > newSearcher threads but the delay will be more and more
>> > long. The
>> > newer newSearcher will wait these 2 ealier one to finish.
>> >
>> > 2010/12/1 Jonathan Rochkind :
>> > > If your index warmings take longer than two minutes,
>> > but you're doing a
>> > > commit every two minutes -- you're going to run into
>> > trouble with
>> > > overlapping index preperations, eventually leading to
>> > an OOM.  Could this be
>> > > it?
>> > >
>> > > On 11/30/2010 11:36 AM, Erick Erickson wrote:
>> > >>
>> > >> I don't know, you'll have to debug it to see if
>> > it's the thing that takes
>> > >> so
>> > >> long. Solr
>> > >> should be able to handle 1,200 updates in a very
>> > short time unless there's
>> > >> something
>> > >> else going on, like you're committing after every
>> > update or something.
>> > >>
>> > >> This may help you track down performance with DIH
>> > >>
>> > >> http://wiki.apache.org/solr/DataImportHandler#interactive
>> > >>
>> > >> <http://wiki.apache.org/solr/DataImportHandler#interactive>Best
>> > >> Erick
>> > >>
>> > >> On Tue, Nov 30, 2010 at 9:01 AM, stockii
>> >  wrote:
>> > >>
>> > >>> how do you think is the deltaQuery better ?
>> > XD
>> > >>> --
>> > >>> View this message in context:
>> > >>>
>> > >>>
>> http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992774.html
>> > >>> Sent from the Solr - User mailing list archive
>> > at Nabble.com.
>> > >>>
>> > >
>> >
>>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Best practice for Delta every 2 Minutes.

2010-12-17 Thread Erick Erickson
In this context, delta refers to the changes over some interval.

Best
Erick

On Fri, Dec 17, 2010 at 2:03 AM, Dennis Gearon wrote:

> BTW, what is a Delta  (in this context, not an equipment line or a rocket,
> please :-)
> Dennis Gearon
>
> Signature Warning
> 
> It is always a good idea to learn from your own mistakes. It is usually a
> better idea to learn from others’ mistakes, so you do not have to make them
> yourself. from '
> http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
> EARTH has a Right To Life,
>  otherwise we all die.
>
>
> --- On Thu, 12/16/10, Li Li  wrote:
>
> > From: Li Li 
> > Subject: Re: Best practice for Delta every 2 Minutes.
> > To: solr-user@lucene.apache.org
> > Date: Thursday, December 16, 2010, 10:54 PM
> > I think it will not because default
> > configuration can only have 2
> > newSearcher threads but the delay will be more and more
> > long. The
> > newer newSearcher will wait these 2 ealier one to finish.
> >
> > 2010/12/1 Jonathan Rochkind :
> > > If your index warmings take longer than two minutes,
> > but you're doing a
> > > commit every two minutes -- you're going to run into
> > trouble with
> > > overlapping index preperations, eventually leading to
> > an OOM.  Could this be
> > > it?
> > >
> > > On 11/30/2010 11:36 AM, Erick Erickson wrote:
> > >>
> > >> I don't know, you'll have to debug it to see if
> > it's the thing that takes
> > >> so
> > >> long. Solr
> > >> should be able to handle 1,200 updates in a very
> > short time unless there's
> > >> something
> > >> else going on, like you're committing after every
> > update or something.
> > >>
> > >> This may help you track down performance with DIH
> > >>
> > >> http://wiki.apache.org/solr/DataImportHandler#interactive
> > >>
> > >> <http://wiki.apache.org/solr/DataImportHandler#interactive>Best
> > >> Erick
> > >>
> > >> On Tue, Nov 30, 2010 at 9:01 AM, stockii
> >  wrote:
> > >>
> > >>> how do you think is the deltaQuery better ?
> > XD
> > >>> --
> > >>> View this message in context:
> > >>>
> > >>>
> http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992774.html
> > >>> Sent from the Solr - User mailing list archive
> > at Nabble.com.
> > >>>
> > >
> >
>


Re: Best practice for Delta every 2 Minutes.

2010-12-16 Thread Dennis Gearon
BTW, what is a Delta  (in this context, not an equipment line or a rocket, 
please :-)
Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Thu, 12/16/10, Li Li  wrote:

> From: Li Li 
> Subject: Re: Best practice for Delta every 2 Minutes.
> To: solr-user@lucene.apache.org
> Date: Thursday, December 16, 2010, 10:54 PM
> I think it will not because default
> configuration can only have 2
> newSearcher threads but the delay will be more and more
> long. The
> newer newSearcher will wait these 2 ealier one to finish.
> 
> 2010/12/1 Jonathan Rochkind :
> > If your index warmings take longer than two minutes,
> but you're doing a
> > commit every two minutes -- you're going to run into
> trouble with
> > overlapping index preperations, eventually leading to
> an OOM.  Could this be
> > it?
> >
> > On 11/30/2010 11:36 AM, Erick Erickson wrote:
> >>
> >> I don't know, you'll have to debug it to see if
> it's the thing that takes
> >> so
> >> long. Solr
> >> should be able to handle 1,200 updates in a very
> short time unless there's
> >> something
> >> else going on, like you're committing after every
> update or something.
> >>
> >> This may help you track down performance with DIH
> >>
> >> http://wiki.apache.org/solr/DataImportHandler#interactive
> >>
> >> <http://wiki.apache.org/solr/DataImportHandler#interactive>Best
> >> Erick
> >>
> >> On Tue, Nov 30, 2010 at 9:01 AM, stockii
>  wrote:
> >>
> >>> how do you think is the deltaQuery better ?
> XD
> >>> --
> >>> View this message in context:
> >>>
> >>> http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992774.html
> >>> Sent from the Solr - User mailing list archive
> at Nabble.com.
> >>>
> >
>


Re: Best practice for Delta every 2 Minutes.

2010-12-16 Thread Li Li
we now meet the same situation and want to implement like this:
we add new documents to a RAMDirectory and search two indice-- the
index in disk and the RAM index.
regularly(e.g. every hour we flush the RAMDirecotry into disk and make
a new segment)
to prevent error. before add to RAMDirecotry,we write the document
into log file.
and after flushing, we delete corresponding lines in the log file
if the program corrput. we will redo the log and add them into RAMDirectory.
Any one has done similar work?

2010/12/1 Li Li :
> you may implement your own MergePolicy to keep on large index and
> merge all other small ones
> or simply set merge factor to 2 and the largest index not be merged by
> set maxMergeDocs less than the docs in the largest one.
> So there is one large index and a small one. when adding a little
> docs, they will be merged into the small one. and you can, e.g. weekly
> optimize the index and merge all indice into one index.
>
> 2010/11/30 stockii :
>>
>> Hello.
>>
>> index is about 28 Million documents large. When i starts an delta-import is
>> look at modified. but delta import takes to long. over an hour need solr for
>> delta.
>>
>> thats my query. all sessions from the last hour should updated and all
>> changed. i think its normal that solr need long time for the querys. how can
>> i optimize this ?
>>
>> deltaQuery="SELECT id FROM sessions
>> WHERE created BETWEEN DATE_ADD( NOW(), INTERVAL - 10 HOUR ) AND NOW()
>> OR modified BETWEEN '${dataimporter.last_index_time}' AND DATE_ADD( NOW(),
>> INTERVAL - 1 HOUR  ) "
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992714.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>


Re: Best practice for Delta every 2 Minutes.

2010-12-16 Thread Li Li
I think it will not because default configuration can only have 2
newSearcher threads but the delay will be more and more long. The
newer newSearcher will wait these 2 ealier one to finish.

2010/12/1 Jonathan Rochkind :
> If your index warmings take longer than two minutes, but you're doing a
> commit every two minutes -- you're going to run into trouble with
> overlapping index preperations, eventually leading to an OOM.  Could this be
> it?
>
> On 11/30/2010 11:36 AM, Erick Erickson wrote:
>>
>> I don't know, you'll have to debug it to see if it's the thing that takes
>> so
>> long. Solr
>> should be able to handle 1,200 updates in a very short time unless there's
>> something
>> else going on, like you're committing after every update or something.
>>
>> This may help you track down performance with DIH
>>
>> http://wiki.apache.org/solr/DataImportHandler#interactive
>>
>> <http://wiki.apache.org/solr/DataImportHandler#interactive>Best
>> Erick
>>
>> On Tue, Nov 30, 2010 at 9:01 AM, stockii  wrote:
>>
>>> how do you think is the deltaQuery better ? XD
>>> --
>>> View this message in context:
>>>
>>> http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992774.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>


Re: Best practice for Delta every 2 Minutes.

2010-12-02 Thread Erick Erickson
In fact, having a master/slave where the master is the
indexing/updating machine and the slave(s) are searchers
is one of the recommended configurations. The replication
is used in many, many sites so it's pretty solid.

It's generally not recommended, though, to run separate
instances on the *same* server. No matter how many
cores/instances/etc, you're still running on the same
physical hardware so I/O contention, memory issues, etc
are still bounded by your hardware

Best
Erick

On Thu, Dec 2, 2010 at 5:12 AM, stockii  wrote:

>
> at the time no OOM occurs. but we are not in correct live system ...
>
> i thougt maybe i get this problem ...
>
> we are running seven cores and each want be update very fast. only one core
> have a huge index with 28M docs. maybe it makes sense for the future to use
> solr with replication !? or can i runs two instances, one for search and
> one
> for updating ? or is there the danger of corrupt indizes ?
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p2005108.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Best practice for Delta every 2 Minutes.

2010-12-02 Thread stockii

at the time no OOM occurs. but we are not in correct live system ... 

i thougt maybe i get this problem ... 

we are running seven cores and each want be update very fast. only one core
have a huge index with 28M docs. maybe it makes sense for the future to use
solr with replication !? or can i runs two instances, one for search and one
for updating ? or is there the danger of corrupt indizes ? 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p2005108.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Best practice for Delta every 2 Minutes.

2010-12-01 Thread Jonathan Rochkind
If your index warmings take longer than two minutes, but you're doing a 
commit every two minutes -- you're going to run into trouble with 
overlapping index preperations, eventually leading to an OOM.  Could 
this be it?


On 11/30/2010 11:36 AM, Erick Erickson wrote:

I don't know, you'll have to debug it to see if it's the thing that takes so
long. Solr
should be able to handle 1,200 updates in a very short time unless there's
something
else going on, like you're committing after every update or something.

This may help you track down performance with DIH

http://wiki.apache.org/solr/DataImportHandler#interactive

<http://wiki.apache.org/solr/DataImportHandler#interactive>Best
Erick

On Tue, Nov 30, 2010 at 9:01 AM, stockii  wrote:


how do you think is the deltaQuery better ? XD
--
View this message in context:
http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992774.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Best practice for Delta every 2 Minutes.

2010-12-01 Thread stockii

http://10.1.0.10:8983/solr/payment/dataimport?commad=delta-import&debug=on
dont work. no debug is started =(

thanks. i will try mergefactor=2
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1997595.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Best practice for Delta every 2 Minutes.

2010-11-30 Thread Li Li
you may implement your own MergePolicy to keep on large index and
merge all other small ones
or simply set merge factor to 2 and the largest index not be merged by
set maxMergeDocs less than the docs in the largest one.
So there is one large index and a small one. when adding a little
docs, they will be merged into the small one. and you can, e.g. weekly
optimize the index and merge all indice into one index.

2010/11/30 stockii :
>
> Hello.
>
> index is about 28 Million documents large. When i starts an delta-import is
> look at modified. but delta import takes to long. over an hour need solr for
> delta.
>
> thats my query. all sessions from the last hour should updated and all
> changed. i think its normal that solr need long time for the querys. how can
> i optimize this ?
>
> deltaQuery="SELECT id FROM sessions
> WHERE created BETWEEN DATE_ADD( NOW(), INTERVAL - 10 HOUR ) AND NOW()
> OR modified BETWEEN '${dataimporter.last_index_time}' AND DATE_ADD( NOW(),
> INTERVAL - 1 HOUR  ) "
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992714.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Best practice for Delta every 2 Minutes.

2010-11-30 Thread Erick Erickson
I don't know, you'll have to debug it to see if it's the thing that takes so
long. Solr
should be able to handle 1,200 updates in a very short time unless there's
something
else going on, like you're committing after every update or something.

This may help you track down performance with DIH

http://wiki.apache.org/solr/DataImportHandler#interactive

<http://wiki.apache.org/solr/DataImportHandler#interactive>Best
Erick

On Tue, Nov 30, 2010 at 9:01 AM, stockii  wrote:

>
> how do you think is the deltaQuery better ? XD
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992774.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Best practice for Delta every 2 Minutes.

2010-11-30 Thread stockii

i copied the wrong query, because 10 hours ;)

i didnt test the query with 28 million records . but wiht a few million and
it works fine. ...

before i used DIH, i used php and import direclty documents into solr. but i
want use dih because the better performance, i think so ... grml ...


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992908.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Best practice for Delta every 2 Minutes.

2010-11-30 Thread stockii

how do you think is the deltaQuery better ? XD
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992774.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Best practice for Delta every 2 Minutes.

2010-11-30 Thread stockii

everyday ~30.000 Documents and every hour ~1200

multiple thread with DIH ? how it works ? 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992767.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Best practice for Delta every 2 Minutes.

2010-11-30 Thread Erick Erickson
Please provide more data. Specifically:
> how many documents are updated?
> Have you tried running this query without Solr? In other words
   have you investigated whether the speed issue is simply your
   SQL executing slowly?
> Why are you selecting the last 10 hours' data when all you want
is the last hour?

You could always partition the problem to multiple threads if you
really have that many documents to update, but I'd look at the
efficiency of your SQL query first.

Best
Erick

On Tue, Nov 30, 2010 at 8:50 AM, stockii  wrote:

>
> Hello.
>
> index is about 28 Million documents large. When i starts an delta-import is
> look at modified. but delta import takes to long. over an hour need solr
> for
> delta.
>
> thats my query. all sessions from the last hour should updated and all
> changed. i think its normal that solr need long time for the querys. how
> can
> i optimize this ?
>
> deltaQuery="SELECT id FROM sessions
> WHERE created BETWEEN DATE_ADD( NOW(), INTERVAL - 10 HOUR ) AND NOW()
> OR modified BETWEEN '${dataimporter.last_index_time}' AND DATE_ADD( NOW(),
> INTERVAL - 1 HOUR  ) "
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992714.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Best practice for Delta every 2 Minutes.

2010-11-30 Thread stockii

Hello.

index is about 28 Million documents large. When i starts an delta-import is
look at modified. but delta import takes to long. over an hour need solr for
delta.

thats my query. all sessions from the last hour should updated and all
changed. i think its normal that solr need long time for the querys. how can
i optimize this ? 

deltaQuery="SELECT id FROM sessions 
WHERE created BETWEEN DATE_ADD( NOW(), INTERVAL - 10 HOUR ) AND NOW() 
OR modified BETWEEN '${dataimporter.last_index_time}' AND DATE_ADD( NOW(),
INTERVAL - 1 HOUR  ) "  
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-practice-for-Delta-every-2-Minutes-tp1992714p1992714.html
Sent from the Solr - User mailing list archive at Nabble.com.