subject:"\[jira\] \[Updated\] \(LUCENE\-2573\) Tiered flushing of DWPTs by RAM with low\/high water marks"

[jira] [Updated] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2011-03-29 Thread Simon Willnauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2573:


Attachment: LUCENE-2573.patch

next iterations.
* added JavaDoc to all classes and interfaces.
* fixed the possible hot loop in DWPTThreadPool
* changed stalling logic to block if more flushing DWPT are around than active 
ones.
* check IWC setting on init and listen to live changes for those who have not 
been disabled.
* made the hard per thread RAM limit configurable and added 
DEFAULT_RAM_PER_THREAD_HARD_LIMIT set to 1945MB

* renamed  DefaultFP --> FlushByRAMOrCounts 
* added setFlushDeletes to DWFlushControl that is checked each time we add a 
delete and if set we flush the global deletes.

this seems somewhat close though. its time to benchmark it again.






> Tiered flushing of DWPTs by RAM with low/high water marks
> -
>
> Key: LUCENE-2573
> URL: https://issues.apache.org/jira/browse/LUCENE-2573
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
> LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
> LUCENE-2573.patch, LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed 
> RAM across all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
> tiered approach:  
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
> used, flush at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values 
> explicitly using total values (e.g. low water mark at 120MB, high water mark 
> at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
> config method and use something like 90% and 110% for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2011-03-23 Thread Simon Willnauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2573:


Attachment: LUCENE-2573.patch

here is my current state on this issue. I did't add all JDocs needed (by far) 
and I will wait until we settled on the API for FlushPolicy.

* I removed the complex TieredFlushPolicy entirely and added one 
DefaultFlushPolicy that flushes at IWC.getRAMBufferSizeMB() / sets biggest DWPT 
pending.
* DW will stall threads if we reach 2 x maxNetRam which is retrieved from 
FlushPolicy so folks can lower that depending on their env.

* DWFlushControl checks if a single DWPT grows too large and sets it forcefully 
pending once its ram consumption is > 1.9 GB. That should be enough buffer to 
not reach the 2048MB limit. We should consider making this configurable.

* FlushPolicy has now three methods onInsert, onUpdate and onDelete while 
DefaultFlushPolicy only implements onInsert and onDelete, the Abstract base 
class just calls those on an update.

* I removed FlushControl from IW
* added documentation on IWC for FlushPolicy and removed the jdocs for the RAM 
limit. I think we should add some lines about how RAM is now used and that 
users should balance the RAM with the number of threads they are using. Will do 
that later on though.

* For testing I added a ThrottledIndexOutput that makes flushing slow so I can 
test if we are stalled and / or blocked. This is passed to 
MockDirectoryWrapper. Its currently under util but it rather should go under 
store, no?

* byte consumption is now committed before FlushPolicy is called since we don't 
have the multitier flush which required that to reliably proceed across tier 
boundaries (not required but it was easier to test really). So FP doesn't need 
to take care of the delta

* FlushPolicy now also flushes on maxBufferedDeleteTerms while the buffered 
delete terms is not yet connected to the DW#getNumBufferedDeleteTerms() which 
causes some failures though. I added //nocommit & @Ignore to those tests.

* this patch also contains a @Ignore on TestPersistentSnapshotDeletionPolicy 
which I couldn't figure out why it is failing but it could be due to an old 
version of LUCENE-2881 on this branch. I will see if it still fails once we 
merged.

* Healthiness now doesn't stall if we are not flushing on RAM consumption to 
ensure we don't lock in threads. 


over all this seems much closer now. I will start writing jdocs. Flush on 
buffered delete terms might need some tests and I should also write a more 
reliable test for Healthiness... current it relies on that the 
ThrottledIndexOutput is slowing down indexing enough to block which might not 
be true all the time. It didn't fail yet. 



> Tiered flushing of DWPTs by RAM with low/high water marks
> -
>
> Key: LUCENE-2573
> URL: https://issues.apache.org/jira/browse/LUCENE-2573
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
> LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
> LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed 
> RAM across all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
> tiered approach:  
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
> used, flush at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values 
> explicitly using total values (e.g. low water mark at 120MB, high water mark 
> at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
> config method and use something like 90% and 110% for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2011-03-18 Thread Simon Willnauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2573:


Attachment: LUCENE-2573.patch

next iteration containing a large number of refactorings.

* I moved all responsibilities related to flushing including synchronization 
into the DocsWriterSession and renamed it to DocumentsWriterFlushControl.

* DWFC now only tracks active and flush bytes since the relict from my initial 
patch where pending memory was tracked is not needed anymore. 

* DWFC took over all synchronization so there is not synchronized 
(flushControl) {...} in DocumentsWriter anymore. Seem way cleaner too though.

* Healthiness now blocks once we reach 2x maxMemory and SingleTierFlushPolicy 
uses 0.9 maxRam as low watermark and 2x low watermark as its HW to flush all 
threads. The multi tier one is still unchanged and flushes in linear steps from 
0.9 to 1.10 x maxRam. We should actually test if this does better worse than 
the single tier FP.

* FlushPolicy now has only a visit method and uses the IW.message to write to 
info stream.

* ThreadState now holds a boolean flag that indicates if a flush is pending 
which is synced and written by DWFC. States[] is gone in DWFC.

* FlushSpecification is gone and DWFC returns DWPT upon checkoutForFlush. Yet, 
I still track the mem for the flushing DWPT seperatly since the 
DWPT#bytesUsed() changes during flush and I don't want to rely on that this 
doesn't change. As a nice side-effect I can check if a checked out DWPT is 
passed to doAfterFlush and assert on that.

next steps here are benchmarking and getting good defaults for the flush 
policies. I think we are close though.

> Tiered flushing of DWPTs by RAM with low/high water marks
> -
>
> Key: LUCENE-2573
> URL: https://issues.apache.org/jira/browse/LUCENE-2573
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
> LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed 
> RAM across all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
> tiered approach:  
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
> used, flush at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values 
> explicitly using total values (e.g. low water mark at 120MB, high water mark 
> at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
> config method and use something like 90% and 110% for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2011-03-14 Thread Simon Willnauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2573:


Attachment: LUCENE-2573.patch

next iteration on this patch. I changed some naming issues and separated a 
ByRAMFlushPolicy as an abstract base class. This patch contains the original 
MultiTierFlushPolicy and a SingleTierFlushPolicy that only has a low and a high 
watermark. This policy tries to flush the biggest DWPT once LW is crossed and 
flushed all DWPT once HW is crossed. 

This patch also adds a "flush if stalled" control that hijacks indexing threads 
if the DW is stalled and there are still pending flushes. If so the incoming 
thread tries to check out a pending DWPT and flushes it before it adds the 
actual document.

I didn't benchmark the more complex MultiTierFP vs. SingleTierFP yet. I hope I 
get to this soon.

> Tiered flushing of DWPTs by RAM with low/high water marks
> -
>
> Key: LUCENE-2573
> URL: https://issues.apache.org/jira/browse/LUCENE-2573
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
> LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed 
> RAM across all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
> tiered approach:  
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
> used, flush at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values 
> explicitly using total values (e.g. low water mark at 120MB, high water mark 
> at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
> config method and use something like 90% and 110% for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2011-03-08 Thread Simon Willnauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2573:


Attachment: LUCENE-2573.patch

here is an updated patch that writes more stuff to the infostream including if 
we are unhealthy and block threads.
I also fixed some issues with TestIndexWriterExceptions.

Another idea we should follow IMO is to see if we can biggypack the indexing 
threads on commit / flushAll instead of waiting to be able to lock each DWPT 
and flush it sequentially. This should be fairly easy since we can simply mark 
them as flushPending and let incoming indexing thread do the flush in parallel. 
Depending on how we index and how big the DWPTs are this could give us another 
sizable gain. For instance if  you index and frequently commit, lets say every 
10k docs (so many folks do stuff like that) but keep on indexing we should see 
concurrency helping us a lot since commit is not blocking all incoming indexing 
threads. I think we should spinoff another issues once this is ready

> Tiered flushing of DWPTs by RAM with low/high water marks
> -
>
> Key: LUCENE-2573
> URL: https://issues.apache.org/jira/browse/LUCENE-2573
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
> LUCENE-2573.patch, LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed 
> RAM across all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
> tiered approach:  
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
> used, flush at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values 
> explicitly using total values (e.g. low water mark at 120MB, high water mark 
> at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
> config method and use something like 90% and 110% for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2011-03-07 Thread Simon Willnauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2573:


Attachment: LUCENE-2573.patch

Here is my first cut / current status on this issue. First of all I have a 
couple of failures related to deletes but they seem not to be related 
(directly) to this patch since I can reproduce them even without the patch. 
all of the failures are related to deletes in some way so I suspect that there 
is another issue for that, no?

This patch implements a tiered flush strategy combined with a concurrent flush 
approach. 

* All decisions are based on a FlushPolicy which operates on a 
DocumentsWriterSession (does the ram tracking and housekeeping), once the flush 
policy encounters a transition to the next tier it marks the "largest" ram 
consuming thread 
as flushPending if we transition from a lower level and all threads if we 
transition from the upper watermark (level). DocumentsWriterSession shifts the 
memory of a pending thread to a new memory "level" (pendingBytes) and marks the 
thread as pending. 

* Once FlushPolicy#findFlushes(..) returns the caller tries to check if itself 
needs to flush and if so it "checks-out" its DWPT and replaces it with a 
complete new instance. Releases the lock on the ThreadState and continues to 
flush the "checked-out" DWPT. After this is done or if the current DWPT doesn't 
need flushing the indexing thread checks if there are any other pending flushes 
and tries to (non-blocking) obtain their lock. It only tries to get the lock 
and only tries once since if the lock is taken another thread is already 
holding it and will see the flushPending once finished adding the document.


This approach tries to utilize as much conccurrency as possible while flushing 
the DWPT and releaseing its ThreadState with an entirely new DWPT. Yet, this 
might also have problems especially if IO is slow and we filling up indexing 
RAM too fast. To prevent us from bloating up the memory too much I introduced a 
notation of "healtiness" which operates on the net-bytes used in the 
DocumentsWriterSession (flushBytes + pendingBytes + activeBytes) -- (flushBytes 
- mem consumption of currently flushing DWPT, pendingBytes - mem consumption of 
marked as pending ThreadStates / DWPT, activeBytes mem consuption of the 
indexing DWPT). If net-bytes reach a certain threshold (2*maxRam currently) I 
stop incoming threads until the session becomes healty again.

I run luceneutil with trunk vs. LUCENE-2573 indexing 300k wikipedia docs with 
1GB MaxRamBuffer and 4 Threads. Searches on both indexes yield identical 
results (Phew!) 
Indexing time in ms look promising
||trunk||patch|| diff ||
|134129 ms|102932 ms|{color:green}23.25%{color}| 

This patch is still kind of rough and needs iterations so reviews and questions 
are very much welcome.




> Tiered flushing of DWPTs by RAM with low/high water marks
> -
>
> Key: LUCENE-2573
> URL: https://issues.apache.org/jira/browse/LUCENE-2573
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Simon Willnauer
>Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, 
> LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed 
> RAM across all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
> tiered approach:  
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
> used, flush at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values 
> explicitly using total values (e.g. low water mark at 120MB, high water mark 
> at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
> config method and use something like 90% and 110% for the water marks?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2010-09-23 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2573:


Component/s: Index

> Tiered flushing of DWPTs by RAM with low/high water marks
> -
>
> Key: LUCENE-2573
> URL: https://issues.apache.org/jira/browse/LUCENE-2573
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed 
> RAM across all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
> tiered approach:  
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
> used, flush at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values 
> explicitly using total values (e.g. low water mark at 120MB, high water mark 
> at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
> config method and use something like 90% and 110% for the water marks?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2010-09-07 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-2573:
-

Attachment: LUCENE-2573.patch

There was a small bug in the choice of the max DWPT, in that all DWPTs, 
including ones that were scheduled to flush were being compared against the 
current DWPT (ie the one being examined for possible flushing).

> Tiered flushing of DWPTs by RAM with low/high water marks
> -
>
> Key: LUCENE-2573
> URL: https://issues.apache.org/jira/browse/LUCENE-2573
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed 
> RAM across all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
> tiered approach:  
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
> used, flush at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values 
> explicitly using total values (e.g. low water mark at 120MB, high water mark 
> at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
> config method and use something like 90% and 110% for the water marks?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2010-09-07 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-2573:
-

Attachment: LUCENE-2573.patch

* perDocAllocator is removed from DocumentsWriterRAMAllocator

* getByteBlock and getIntBlock always increments the numBytesUsed

The test that simply prints out debugging messages looks better.  I need to 
figure out unit tests.

> Tiered flushing of DWPTs by RAM with low/high water marks
> -
>
> Key: LUCENE-2573
> URL: https://issues.apache.org/jira/browse/LUCENE-2573
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2573.patch, LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed 
> RAM across all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
> tiered approach:  
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
> used, flush at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values 
> explicitly using total values (e.g. low water mark at 120MB, high water mark 
> at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
> config method and use something like 90% and 110% for the water marks?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2010-09-06 Thread Jason Rutherglen (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-2573:
-

Attachment: LUCENE-2573.patch

Here's a first cut...

The TestDWPTFlushByRAM doesn't do much at this point.  It adds documents in 2 
threads, and prints the RAM usage to stdout.  It more or less shows the tiered 
flushing working.  

I don't think we're tracking all of the RAM usage yet, maybe just the terms?  I 
need to review.  

> Tiered flushing of DWPTs by RAM with low/high water marks
> -
>
> Key: LUCENE-2573
> URL: https://issues.apache.org/jira/browse/LUCENE-2573
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael Busch
>Assignee: Michael Busch
>Priority: Minor
> Fix For: Realtime Branch
>
> Attachments: LUCENE-2573.patch
>
>
> Now that we have DocumentsWriterPerThreads we need to track total consumed 
> RAM across all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
> tiered approach:  
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
> used, flush at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values 
> explicitly using total values (e.g. low water mark at 120MB, high water mark 
> at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
> config method and use something like 90% and 110% for the water marks?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

[jira] [Updated] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

10 matches

Site Navigation

Mail list logo

Footer information