[jira] Commented: (SOLR-126) Auto-commit documents after time interval

2007-02-08 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471550
 ] 

Mike Klaas commented on SOLR-126:
-

extra patch committed in r505114

 Auto-commit documents after time interval
 -

 Key: SOLR-126
 URL: https://issues.apache.org/jira/browse/SOLR-126
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Priority: Minor
 Fix For: 1.2

 Attachments: AutoCommit.patch, 
 AutocommitingUpdateRequestHandler.patch, SOLR-126-ClosePending.patch


 If an index is getting updated from multiple sources and needs to add 
 documents reasonably quickly, there should be a good solr side mechanism to 
 help prevent the client from spawning multiple overlapping commit/ commands.
 My specific use case is sending each document to solr every time hibernate 
 saves an object (see SOLR-20).  This happens from multiple machines 
 simultaneously.  I'd like solr to make sure the documents are committed 
 within a second.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-126) Auto-commit documents after time interval

2007-02-01 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469559
 ] 

Mike Klaas commented on SOLR-126:
-

Committed in r502328.  Thanks!

Ryan, the last comment of mine was about the units of time that maxTime was 
specified in--the old version had the time specified in seconds and the 
parameter name was maxSec.  I committed it the way it stands in the patch; if 
anyone has a strong opinion, this can be changed before being closed.

 Auto-commit documents after time interval
 -

 Key: SOLR-126
 URL: https://issues.apache.org/jira/browse/SOLR-126
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Priority: Minor
 Attachments: AutoCommit.patch, AutocommitingUpdateRequestHandler.patch


 If an index is getting updated from multiple sources and needs to add 
 documents reasonably quickly, there should be a good solr side mechanism to 
 help prevent the client from spawning multiple overlapping commit/ commands.
 My specific use case is sending each document to solr every time hibernate 
 saves an object (see SOLR-20).  This happens from multiple machines 
 simultaneously.  I'd like solr to make sure the documents are committed 
 within a second.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-126) Auto-commit documents after time interval

2007-02-01 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469663
 ] 

Ryan McKinley commented on SOLR-126:



It looks like the test solrconfi.xml got commited with some mangled xml 
comments:

 !-- autocommit pending docs if certain criteria are met 
autoCommit 
  maxDocs1/maxDocs
  maxTime360/maxTime !-- one hour in milliseconds --  
/autoCommit
--

so it fails the tests

it needs to be:

 !-- autocommit pending docs if certain criteria are met --
autoCommit 
  maxDocs1/maxDocs
  maxTime360/maxTime !-- one hour in milliseconds --  
/autoCommit

thanks!
ryan



 Auto-commit documents after time interval
 -

 Key: SOLR-126
 URL: https://issues.apache.org/jira/browse/SOLR-126
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Priority: Minor
 Attachments: AutoCommit.patch, AutocommitingUpdateRequestHandler.patch


 If an index is getting updated from multiple sources and needs to add 
 documents reasonably quickly, there should be a good solr side mechanism to 
 help prevent the client from spawning multiple overlapping commit/ commands.
 My specific use case is sending each document to solr every time hibernate 
 saves an object (see SOLR-20).  This happens from multiple machines 
 simultaneously.  I'd like solr to make sure the documents are committed 
 within a second.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (SOLR-126) Auto-commit documents after time interval

2007-02-01 Thread Mike Klaas

Thanks! That's what I get for not running the test suite One Last Time
before committing.

-Mike

On 2/1/07, Ryan McKinley (JIRA) [EMAIL PROTECTED] wrote:


[ 
https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469663
 ]

Ryan McKinley commented on SOLR-126:



It looks like the test solrconfi.xml got commited with some mangled xml 
comments:

 !-- autocommit pending docs if certain criteria are met
autoCommit
  maxDocs1/maxDocs
  maxTime360/maxTime !-- one hour in milliseconds --
/autoCommit
--


[jira] Commented: (SOLR-126) Auto-commit documents after time interval

2007-01-31 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469187
 ] 

Mike Klaas commented on SOLR-126:
-

Ryan: looking good!  A few comments:

 - You notify the tracker that the document is added before actually adding the 
document.  This is okay--commit() cannot run until addDoc() is complete--but it 
does mean that the autocommit maxTime is measured from the start of the 
document being added until after it has been processed.  I'm not sure it 
matters in practice.

- similarly, didCommit() is invoked before the searcher is warmed.  Autocommits 
will never occur simulatneously (as you note; due to synchronization of run()), 
but they could be invoked continually if warming takes a long time.

 - If 250ms is a small enough time to not care about, does it make sense to 
force the user to specify the time in milliseconds?

These are all relatively minor things--if no one else has any thoughts this can 
probably be committed soon.  

 Auto-commit documents after time interval
 -

 Key: SOLR-126
 URL: https://issues.apache.org/jira/browse/SOLR-126
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Priority: Minor
 Attachments: AutoCommit.patch, AutocommitingUpdateRequestHandler.patch


 If an index is getting updated from multiple sources and needs to add 
 documents reasonably quickly, there should be a good solr side mechanism to 
 help prevent the client from spawning multiple overlapping commit/ commands.
 My specific use case is sending each document to solr every time hibernate 
 saves an object (see SOLR-20).  This happens from multiple machines 
 simultaneously.  I'd like solr to make sure the documents are committed 
 within a second.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-126) Auto-commit documents after time interval

2007-01-31 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469204
 ] 

Ryan McKinley commented on SOLR-126:


 
  - You notify the tracker that the document is added before actually adding 
 the document.  This is okay--commit() cannot run until addDoc() is 
 complete--but it does mean that the autocommit maxTime is measured from the 
 start of the document being added until after it has been processed.  I'm not 
 sure it matters in practice.
 

I'm looking at it from the client perspective.  The timer should start as soon 
as close to the request time as possible.


 - similarly, didCommit() is invoked before the searcher is warmed.  
 Autocommits will never occur simulatneously (as you note; due to 
 synchronization of run()), but they could be invoked continually if warming 
 takes a long time.
 

I just left at were it was in the existing code.  I think it makes sense  
because the searcher has the proper data at that point - a second commit wont 
change the results.

Also, it will not start a new autocommit until the first has warmed the 
searcher anyway:

  CommitUpdateCommand command = new CommitUpdateCommand( false );
  command.waitFlush = true;
  command.waitSearcher = true; 


  - If 250ms is a small enough time to not care about, does it make sense to 
 force the user to specify the time in milliseconds?
 

This is trying to avoid is the case where 100 documents are added at the same 
time with maxDocs=10.  We don't want to commit 10 times, so it waits 1/4 sec. 
(could be shorter or longer in my opinion)

If anyone is worried about the timing, they should use maxTime, not maxDocs



 Auto-commit documents after time interval
 -

 Key: SOLR-126
 URL: https://issues.apache.org/jira/browse/SOLR-126
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Priority: Minor
 Attachments: AutoCommit.patch, AutocommitingUpdateRequestHandler.patch


 If an index is getting updated from multiple sources and needs to add 
 documents reasonably quickly, there should be a good solr side mechanism to 
 help prevent the client from spawning multiple overlapping commit/ commands.
 My specific use case is sending each document to solr every time hibernate 
 saves an object (see SOLR-20).  This happens from multiple machines 
 simultaneously.  I'd like solr to make sure the documents are committed 
 within a second.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-126) Auto-commit documents after time interval

2007-01-30 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468561
 ] 

Ryan McKinley commented on SOLR-126:


I just posted AutoCommit.patch 

This patch modifies DirectUpdateHandler2.CommentTracker to automatically commit 
a after a certain period.  As written, It should never start multiple commits 
at the same time.

I think this is ready to go.  


 Auto-commit documents after time interval
 -

 Key: SOLR-126
 URL: https://issues.apache.org/jira/browse/SOLR-126
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Priority: Minor
 Attachments: AutoCommit.patch, AutocommitingUpdateRequestHandler.patch


 If an index is getting updated from multiple sources and needs to add 
 documents reasonably quickly, there should be a good solr side mechanism to 
 help prevent the client from spawning multiple overlapping commit/ commands.
 My specific use case is sending each document to solr every time hibernate 
 saves an object (see SOLR-20).  This happens from multiple machines 
 simultaneously.  I'd like solr to make sure the documents are committed 
 within a second.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-126) Auto-commit documents after time interval

2007-01-29 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468482
 ] 

Ryan McKinley commented on SOLR-126:


I'm not sure the best design for this problem, as a quick way to get something 
working NOW without modifying UpdateHandler, I made a custom UpdateHandler 
(extended from XmlUpdateRequestHandler) that starts a timer and executes 
commit/ after a configured time.

- - - - - - 

The auto-commit logic should probably be in the UpdateHandler along with the 
exiting CommitTracker.  The existing CommitTracker lets you specify a number of 
documents it should keep before indexing.  

For a time limit, I *think* we need to run a thread - I know that is bad for 
webapps, but I'm not sure there is any other option.  In this example, I used:

final ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);

then:

scheduler.schedule( this, interval, TimeUnit.MILLISECONDS );

Is this a reasonable approach?  Is there any other threading/timing mechanism 
to consider?
Should this be applied directly to the UpdateHandler and configured along with 

 autoCommit 
   maxDocs1/maxDocs
   maxTime1/maxTime !-- milliseconds --
 /autoCommit

or should it be a custom request handler (as implemented in this attachment)?




 Auto-commit documents after time interval
 -

 Key: SOLR-126
 URL: https://issues.apache.org/jira/browse/SOLR-126
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Priority: Minor
 Attachments: AutocommitingUpdateRequestHandler.patch


 If an index is getting updated from multiple sources and needs to add 
 documents reasonably quickly, there should be a good solr side mechanism to 
 help prevent the client from spawning multiple overlapping commit/ commands.
 My specific use case is sending each document to solr every time hibernate 
 saves an object (see SOLR-20).  This happens from multiple machines 
 simultaneously.  I'd like solr to make sure the documents are committed 
 within a second.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-126) Auto-commit documents after time interval

2007-01-29 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468485
 ] 

Mike Klaas commented on SOLR-126:
-

A few high-level comments:
 - commitTer/commitTing.  It seems pedantic to gripe about spelling, but making 
it right really helps in finding things at a later date.
 - autoCommitAfter doesn't particularly document the semantics, which are 
autocommit interval upper bound.  The reason that this is important is that 
we'll want to implement autocommit interval lower bound at some point (and/or 
autocommit after idle time).
 - I think this would fit pretty easily into the existing committracker (just 
gut the time-based things that are already there).   Unless there is any reason 
why this should be limited to a custom handler?

 Auto-commit documents after time interval
 -

 Key: SOLR-126
 URL: https://issues.apache.org/jira/browse/SOLR-126
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Priority: Minor
 Attachments: AutocommitingUpdateRequestHandler.patch


 If an index is getting updated from multiple sources and needs to add 
 documents reasonably quickly, there should be a good solr side mechanism to 
 help prevent the client from spawning multiple overlapping commit/ commands.
 My specific use case is sending each document to solr every time hibernate 
 saves an object (see SOLR-20).  This happens from multiple machines 
 simultaneously.  I'd like solr to make sure the documents are committed 
 within a second.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-126) Auto-commit documents after time interval

2007-01-29 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468493
 ] 

Ryan McKinley commented on SOLR-126:


 
 A few high-level comments:
  - commitTer/commitTing.  It seems pedantic to gripe about spelling,...

yes, thanks.  I'm notoriously bad anywhere spellcheck can't help (and often 
where it does!)  Please fix or point out anything, it does make it much easier 
to find in the future.

  - autoCommitAfter doesn't particularly document the semantics,...

perhaps:

 autoCommit
   maxDocs1/maxDocs
   maxTime1/maxTime 
  !--
   In the future we may add:
   minTime1/minTime 
   idleTime1/idleTime 
  --
 /autoCommit


  Unless there is any reason why this should be limited to a custom handler?
 

Yes, it should probably go in the UpdateHandler





 Auto-commit documents after time interval
 -

 Key: SOLR-126
 URL: https://issues.apache.org/jira/browse/SOLR-126
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Priority: Minor
 Attachments: AutocommitingUpdateRequestHandler.patch


 If an index is getting updated from multiple sources and needs to add 
 documents reasonably quickly, there should be a good solr side mechanism to 
 help prevent the client from spawning multiple overlapping commit/ commands.
 My specific use case is sending each document to solr every time hibernate 
 saves an object (see SOLR-20).  This happens from multiple machines 
 simultaneously.  I'd like solr to make sure the documents are committed 
 within a second.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-126) Auto-commit documents after time interval

2007-01-29 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468497
 ] 

Yonik Seeley commented on SOLR-126:
---

Thanks for tackling this Ryan, this is much needed functionality (planned since 
inception but never coded!)
This should definitely go into the existing committracker/updatehandler.

Timer/TimerTask for the timers?  Haven't used them myself... are they 
appropriate?

Some future issues (i.e. a different JIRA issue)... one setting may make 
perfect sense when doing incremental updates, to enforce a minimum freshness 
and take the burden of commit off of the clients.  When building an index from 
scratch, you want to do it as quickly as possible, and not do commits all the 
time (or even expose a partial index to searches).
Perhaps something like autocommit=false in the update params?
It might even be nice to be able to change the mergeFactor for a complete index 
build vs incremental updates.




 Auto-commit documents after time interval
 -

 Key: SOLR-126
 URL: https://issues.apache.org/jira/browse/SOLR-126
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Priority: Minor
 Attachments: AutocommitingUpdateRequestHandler.patch


 If an index is getting updated from multiple sources and needs to add 
 documents reasonably quickly, there should be a good solr side mechanism to 
 help prevent the client from spawning multiple overlapping commit/ commands.
 My specific use case is sending each document to solr every time hibernate 
 saves an object (see SOLR-20).  This happens from multiple machines 
 simultaneously.  I'd like solr to make sure the documents are committed 
 within a second.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-126) Auto-commit documents after time interval

2007-01-29 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468499
 ] 

Ryan McKinley commented on SOLR-126:


 
 Timer/TimerTask for the timers?  Haven't used them myself... are they 
 appropriate?
 

I'm no expert on this, but it looks like Java5 added:

http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ScheduledExecutorService.html

it does everything Timer/TimerTask did - I'm not sure (and can't seem to find) 
if the difference is just syntax or something more profound.







 Auto-commit documents after time interval
 -

 Key: SOLR-126
 URL: https://issues.apache.org/jira/browse/SOLR-126
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Priority: Minor
 Attachments: AutocommitingUpdateRequestHandler.patch


 If an index is getting updated from multiple sources and needs to add 
 documents reasonably quickly, there should be a good solr side mechanism to 
 help prevent the client from spawning multiple overlapping commit/ commands.
 My specific use case is sending each document to solr every time hibernate 
 saves an object (see SOLR-20).  This happens from multiple machines 
 simultaneously.  I'd like solr to make sure the documents are committed 
 within a second.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-126) Auto-commit documents after time interval

2007-01-29 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468505
 ] 

Yonik Seeley commented on SOLR-126:
---

Another future idea along the lines of autocommit=true/false, is supporting 
clients with different timeliness needs.
For example, a client could send an update request, telling solr that a commit 
should be done within 5 minutes.
maxwait=300 or something like that.



 Auto-commit documents after time interval
 -

 Key: SOLR-126
 URL: https://issues.apache.org/jira/browse/SOLR-126
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Priority: Minor
 Attachments: AutocommitingUpdateRequestHandler.patch


 If an index is getting updated from multiple sources and needs to add 
 documents reasonably quickly, there should be a good solr side mechanism to 
 help prevent the client from spawning multiple overlapping commit/ commands.
 My specific use case is sending each document to solr every time hibernate 
 saves an object (see SOLR-20).  This happens from multiple machines 
 simultaneously.  I'd like solr to make sure the documents are committed 
 within a second.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.