[jira] Commented: (SOLR-126) Auto-commit documents after time interval
[ https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471550 ] Mike Klaas commented on SOLR-126: - extra patch committed in r505114 Auto-commit documents after time interval - Key: SOLR-126 URL: https://issues.apache.org/jira/browse/SOLR-126 Project: Solr Issue Type: Improvement Components: update Reporter: Ryan McKinley Priority: Minor Fix For: 1.2 Attachments: AutoCommit.patch, AutocommitingUpdateRequestHandler.patch, SOLR-126-ClosePending.patch If an index is getting updated from multiple sources and needs to add documents reasonably quickly, there should be a good solr side mechanism to help prevent the client from spawning multiple overlapping commit/ commands. My specific use case is sending each document to solr every time hibernate saves an object (see SOLR-20). This happens from multiple machines simultaneously. I'd like solr to make sure the documents are committed within a second. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-126) Auto-commit documents after time interval
[ https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469559 ] Mike Klaas commented on SOLR-126: - Committed in r502328. Thanks! Ryan, the last comment of mine was about the units of time that maxTime was specified in--the old version had the time specified in seconds and the parameter name was maxSec. I committed it the way it stands in the patch; if anyone has a strong opinion, this can be changed before being closed. Auto-commit documents after time interval - Key: SOLR-126 URL: https://issues.apache.org/jira/browse/SOLR-126 Project: Solr Issue Type: Improvement Components: update Reporter: Ryan McKinley Priority: Minor Attachments: AutoCommit.patch, AutocommitingUpdateRequestHandler.patch If an index is getting updated from multiple sources and needs to add documents reasonably quickly, there should be a good solr side mechanism to help prevent the client from spawning multiple overlapping commit/ commands. My specific use case is sending each document to solr every time hibernate saves an object (see SOLR-20). This happens from multiple machines simultaneously. I'd like solr to make sure the documents are committed within a second. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-126) Auto-commit documents after time interval
[ https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469663 ] Ryan McKinley commented on SOLR-126: It looks like the test solrconfi.xml got commited with some mangled xml comments: !-- autocommit pending docs if certain criteria are met autoCommit maxDocs1/maxDocs maxTime360/maxTime !-- one hour in milliseconds -- /autoCommit -- so it fails the tests it needs to be: !-- autocommit pending docs if certain criteria are met -- autoCommit maxDocs1/maxDocs maxTime360/maxTime !-- one hour in milliseconds -- /autoCommit thanks! ryan Auto-commit documents after time interval - Key: SOLR-126 URL: https://issues.apache.org/jira/browse/SOLR-126 Project: Solr Issue Type: Improvement Components: update Reporter: Ryan McKinley Priority: Minor Attachments: AutoCommit.patch, AutocommitingUpdateRequestHandler.patch If an index is getting updated from multiple sources and needs to add documents reasonably quickly, there should be a good solr side mechanism to help prevent the client from spawning multiple overlapping commit/ commands. My specific use case is sending each document to solr every time hibernate saves an object (see SOLR-20). This happens from multiple machines simultaneously. I'd like solr to make sure the documents are committed within a second. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (SOLR-126) Auto-commit documents after time interval
Thanks! That's what I get for not running the test suite One Last Time before committing. -Mike On 2/1/07, Ryan McKinley (JIRA) [EMAIL PROTECTED] wrote: [ https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469663 ] Ryan McKinley commented on SOLR-126: It looks like the test solrconfi.xml got commited with some mangled xml comments: !-- autocommit pending docs if certain criteria are met autoCommit maxDocs1/maxDocs maxTime360/maxTime !-- one hour in milliseconds -- /autoCommit --
[jira] Commented: (SOLR-126) Auto-commit documents after time interval
[ https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469187 ] Mike Klaas commented on SOLR-126: - Ryan: looking good! A few comments: - You notify the tracker that the document is added before actually adding the document. This is okay--commit() cannot run until addDoc() is complete--but it does mean that the autocommit maxTime is measured from the start of the document being added until after it has been processed. I'm not sure it matters in practice. - similarly, didCommit() is invoked before the searcher is warmed. Autocommits will never occur simulatneously (as you note; due to synchronization of run()), but they could be invoked continually if warming takes a long time. - If 250ms is a small enough time to not care about, does it make sense to force the user to specify the time in milliseconds? These are all relatively minor things--if no one else has any thoughts this can probably be committed soon. Auto-commit documents after time interval - Key: SOLR-126 URL: https://issues.apache.org/jira/browse/SOLR-126 Project: Solr Issue Type: Improvement Components: update Reporter: Ryan McKinley Priority: Minor Attachments: AutoCommit.patch, AutocommitingUpdateRequestHandler.patch If an index is getting updated from multiple sources and needs to add documents reasonably quickly, there should be a good solr side mechanism to help prevent the client from spawning multiple overlapping commit/ commands. My specific use case is sending each document to solr every time hibernate saves an object (see SOLR-20). This happens from multiple machines simultaneously. I'd like solr to make sure the documents are committed within a second. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-126) Auto-commit documents after time interval
[ https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469204 ] Ryan McKinley commented on SOLR-126: - You notify the tracker that the document is added before actually adding the document. This is okay--commit() cannot run until addDoc() is complete--but it does mean that the autocommit maxTime is measured from the start of the document being added until after it has been processed. I'm not sure it matters in practice. I'm looking at it from the client perspective. The timer should start as soon as close to the request time as possible. - similarly, didCommit() is invoked before the searcher is warmed. Autocommits will never occur simulatneously (as you note; due to synchronization of run()), but they could be invoked continually if warming takes a long time. I just left at were it was in the existing code. I think it makes sense because the searcher has the proper data at that point - a second commit wont change the results. Also, it will not start a new autocommit until the first has warmed the searcher anyway: CommitUpdateCommand command = new CommitUpdateCommand( false ); command.waitFlush = true; command.waitSearcher = true; - If 250ms is a small enough time to not care about, does it make sense to force the user to specify the time in milliseconds? This is trying to avoid is the case where 100 documents are added at the same time with maxDocs=10. We don't want to commit 10 times, so it waits 1/4 sec. (could be shorter or longer in my opinion) If anyone is worried about the timing, they should use maxTime, not maxDocs Auto-commit documents after time interval - Key: SOLR-126 URL: https://issues.apache.org/jira/browse/SOLR-126 Project: Solr Issue Type: Improvement Components: update Reporter: Ryan McKinley Priority: Minor Attachments: AutoCommit.patch, AutocommitingUpdateRequestHandler.patch If an index is getting updated from multiple sources and needs to add documents reasonably quickly, there should be a good solr side mechanism to help prevent the client from spawning multiple overlapping commit/ commands. My specific use case is sending each document to solr every time hibernate saves an object (see SOLR-20). This happens from multiple machines simultaneously. I'd like solr to make sure the documents are committed within a second. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-126) Auto-commit documents after time interval
[ https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468561 ] Ryan McKinley commented on SOLR-126: I just posted AutoCommit.patch This patch modifies DirectUpdateHandler2.CommentTracker to automatically commit a after a certain period. As written, It should never start multiple commits at the same time. I think this is ready to go. Auto-commit documents after time interval - Key: SOLR-126 URL: https://issues.apache.org/jira/browse/SOLR-126 Project: Solr Issue Type: Improvement Components: update Reporter: Ryan McKinley Priority: Minor Attachments: AutoCommit.patch, AutocommitingUpdateRequestHandler.patch If an index is getting updated from multiple sources and needs to add documents reasonably quickly, there should be a good solr side mechanism to help prevent the client from spawning multiple overlapping commit/ commands. My specific use case is sending each document to solr every time hibernate saves an object (see SOLR-20). This happens from multiple machines simultaneously. I'd like solr to make sure the documents are committed within a second. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-126) Auto-commit documents after time interval
[ https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468482 ] Ryan McKinley commented on SOLR-126: I'm not sure the best design for this problem, as a quick way to get something working NOW without modifying UpdateHandler, I made a custom UpdateHandler (extended from XmlUpdateRequestHandler) that starts a timer and executes commit/ after a configured time. - - - - - - The auto-commit logic should probably be in the UpdateHandler along with the exiting CommitTracker. The existing CommitTracker lets you specify a number of documents it should keep before indexing. For a time limit, I *think* we need to run a thread - I know that is bad for webapps, but I'm not sure there is any other option. In this example, I used: final ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1); then: scheduler.schedule( this, interval, TimeUnit.MILLISECONDS ); Is this a reasonable approach? Is there any other threading/timing mechanism to consider? Should this be applied directly to the UpdateHandler and configured along with autoCommit maxDocs1/maxDocs maxTime1/maxTime !-- milliseconds -- /autoCommit or should it be a custom request handler (as implemented in this attachment)? Auto-commit documents after time interval - Key: SOLR-126 URL: https://issues.apache.org/jira/browse/SOLR-126 Project: Solr Issue Type: Improvement Components: update Reporter: Ryan McKinley Priority: Minor Attachments: AutocommitingUpdateRequestHandler.patch If an index is getting updated from multiple sources and needs to add documents reasonably quickly, there should be a good solr side mechanism to help prevent the client from spawning multiple overlapping commit/ commands. My specific use case is sending each document to solr every time hibernate saves an object (see SOLR-20). This happens from multiple machines simultaneously. I'd like solr to make sure the documents are committed within a second. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-126) Auto-commit documents after time interval
[ https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468485 ] Mike Klaas commented on SOLR-126: - A few high-level comments: - commitTer/commitTing. It seems pedantic to gripe about spelling, but making it right really helps in finding things at a later date. - autoCommitAfter doesn't particularly document the semantics, which are autocommit interval upper bound. The reason that this is important is that we'll want to implement autocommit interval lower bound at some point (and/or autocommit after idle time). - I think this would fit pretty easily into the existing committracker (just gut the time-based things that are already there). Unless there is any reason why this should be limited to a custom handler? Auto-commit documents after time interval - Key: SOLR-126 URL: https://issues.apache.org/jira/browse/SOLR-126 Project: Solr Issue Type: Improvement Components: update Reporter: Ryan McKinley Priority: Minor Attachments: AutocommitingUpdateRequestHandler.patch If an index is getting updated from multiple sources and needs to add documents reasonably quickly, there should be a good solr side mechanism to help prevent the client from spawning multiple overlapping commit/ commands. My specific use case is sending each document to solr every time hibernate saves an object (see SOLR-20). This happens from multiple machines simultaneously. I'd like solr to make sure the documents are committed within a second. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-126) Auto-commit documents after time interval
[ https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468493 ] Ryan McKinley commented on SOLR-126: A few high-level comments: - commitTer/commitTing. It seems pedantic to gripe about spelling,... yes, thanks. I'm notoriously bad anywhere spellcheck can't help (and often where it does!) Please fix or point out anything, it does make it much easier to find in the future. - autoCommitAfter doesn't particularly document the semantics,... perhaps: autoCommit maxDocs1/maxDocs maxTime1/maxTime !-- In the future we may add: minTime1/minTime idleTime1/idleTime -- /autoCommit Unless there is any reason why this should be limited to a custom handler? Yes, it should probably go in the UpdateHandler Auto-commit documents after time interval - Key: SOLR-126 URL: https://issues.apache.org/jira/browse/SOLR-126 Project: Solr Issue Type: Improvement Components: update Reporter: Ryan McKinley Priority: Minor Attachments: AutocommitingUpdateRequestHandler.patch If an index is getting updated from multiple sources and needs to add documents reasonably quickly, there should be a good solr side mechanism to help prevent the client from spawning multiple overlapping commit/ commands. My specific use case is sending each document to solr every time hibernate saves an object (see SOLR-20). This happens from multiple machines simultaneously. I'd like solr to make sure the documents are committed within a second. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-126) Auto-commit documents after time interval
[ https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468497 ] Yonik Seeley commented on SOLR-126: --- Thanks for tackling this Ryan, this is much needed functionality (planned since inception but never coded!) This should definitely go into the existing committracker/updatehandler. Timer/TimerTask for the timers? Haven't used them myself... are they appropriate? Some future issues (i.e. a different JIRA issue)... one setting may make perfect sense when doing incremental updates, to enforce a minimum freshness and take the burden of commit off of the clients. When building an index from scratch, you want to do it as quickly as possible, and not do commits all the time (or even expose a partial index to searches). Perhaps something like autocommit=false in the update params? It might even be nice to be able to change the mergeFactor for a complete index build vs incremental updates. Auto-commit documents after time interval - Key: SOLR-126 URL: https://issues.apache.org/jira/browse/SOLR-126 Project: Solr Issue Type: Improvement Components: update Reporter: Ryan McKinley Priority: Minor Attachments: AutocommitingUpdateRequestHandler.patch If an index is getting updated from multiple sources and needs to add documents reasonably quickly, there should be a good solr side mechanism to help prevent the client from spawning multiple overlapping commit/ commands. My specific use case is sending each document to solr every time hibernate saves an object (see SOLR-20). This happens from multiple machines simultaneously. I'd like solr to make sure the documents are committed within a second. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-126) Auto-commit documents after time interval
[ https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468499 ] Ryan McKinley commented on SOLR-126: Timer/TimerTask for the timers? Haven't used them myself... are they appropriate? I'm no expert on this, but it looks like Java5 added: http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ScheduledExecutorService.html it does everything Timer/TimerTask did - I'm not sure (and can't seem to find) if the difference is just syntax or something more profound. Auto-commit documents after time interval - Key: SOLR-126 URL: https://issues.apache.org/jira/browse/SOLR-126 Project: Solr Issue Type: Improvement Components: update Reporter: Ryan McKinley Priority: Minor Attachments: AutocommitingUpdateRequestHandler.patch If an index is getting updated from multiple sources and needs to add documents reasonably quickly, there should be a good solr side mechanism to help prevent the client from spawning multiple overlapping commit/ commands. My specific use case is sending each document to solr every time hibernate saves an object (see SOLR-20). This happens from multiple machines simultaneously. I'd like solr to make sure the documents are committed within a second. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-126) Auto-commit documents after time interval
[ https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468505 ] Yonik Seeley commented on SOLR-126: --- Another future idea along the lines of autocommit=true/false, is supporting clients with different timeliness needs. For example, a client could send an update request, telling solr that a commit should be done within 5 minutes. maxwait=300 or something like that. Auto-commit documents after time interval - Key: SOLR-126 URL: https://issues.apache.org/jira/browse/SOLR-126 Project: Solr Issue Type: Improvement Components: update Reporter: Ryan McKinley Priority: Minor Attachments: AutocommitingUpdateRequestHandler.patch If an index is getting updated from multiple sources and needs to add documents reasonably quickly, there should be a good solr side mechanism to help prevent the client from spawning multiple overlapping commit/ commands. My specific use case is sending each document to solr every time hibernate saves an object (see SOLR-20). This happens from multiple machines simultaneously. I'd like solr to make sure the documents are committed within a second. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.