[jira] [Commented] (ACCUMULO-3294) Need mechanism to update iterator settings
[ https://issues.apache.org/jira/browse/ACCUMULO-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009804#comment-15009804 ] Jonathan Park commented on ACCUMULO-3294: - No substantial reason. My original reason was that since the method was called "update", I added a check so that it couldn't be used like a "set". It adds a little bit protection from surprises for clients since if I as a client (and perhaps its just me) am trying to "update" an iterator setting, I'm assuming that the iterator is there. If it suddenly disappears while I'm trying to update, then perhaps something happened that means I shouldn't update. It's not a spectacular solution to provide this protection since it's performing a read/modify/write with no locks so the iterator could still disappear. I don't feel too strongly about the check since there are use-cases for it not being there so if removing the check and making this behave more like a "set" is better aligned with other efforts, I can remove it. What would the new interface look like if we want this to update all of a tables iterators at once? A Collection? > Need mechanism to update iterator settings > -- > > Key: ACCUMULO-3294 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3294 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: John Vines >Assignee: Jonathan Park > Fix For: 1.8.0 > > Attachments: bug3294.patch, bug3294.patch.1 > > > Currently our API supports attachIterator, removeIterator, and > getIteratorSettings. There is no mechanism to directly change an iterators > settings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3294) Need mechanism to update iterator settings
[ https://issues.apache.org/jira/browse/ACCUMULO-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15007002#comment-15007002 ] Jonathan Park commented on ACCUMULO-3294: - [~elserj] thanks for the feedback! I'll apply it to the patch if the patch is still desirable. [~ctubbsii] I agree that this can enable users shooting themselves in the foot due to the lack of atomicity guarantees in Accumulo regarding the propagation of config. I still think such an API call is valuable and would become safer and easier to consume if/when Accumulo has the ability to support atomic updates to config (although if Accumulo adds transactional support to updates, then I would admit a dedicated update call would not add much value). The example use-case we have is 1 non-trivial iterator which is logically the composition of a set of iterators. This iterator will at run-time read its configuration and determine what it needs to do. The issue we're running into is that we can't afford to have a compaction start when we're in the middle of removing/adding. We could get pretty far by re-designing our system to disable major compactions so we can manually launch the compactions and control the iterator settings used (we would probably need some other design around minor compactions to handle those safely as well). If there's a particular direction that Accumulo is headed to support these changing configurations for compactions, I'd love to help out where possible to help make Accumulo easier to consume w.r.t configurations. I do admit this patch is less than ideal and doesn't address some other issues present. [~kturner] Thanks for the example and explanation of the benefits. I agree that it's safer for scans to follow such a pattern. > Need mechanism to update iterator settings > -- > > Key: ACCUMULO-3294 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3294 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: John Vines >Assignee: Jonathan Park > Fix For: 1.8.0 > > Attachments: bug3294.patch, bug3294.patch.1 > > > Currently our API supports attachIterator, removeIterator, and > getIteratorSettings. There is no mechanism to directly change an iterators > settings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ACCUMULO-3294) Need mechanism to update iterator settings
[ https://issues.apache.org/jira/browse/ACCUMULO-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Park updated ACCUMULO-3294: Status: Patch Available (was: Open) Changed updates to remove unset properties. > Need mechanism to update iterator settings > -- > > Key: ACCUMULO-3294 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3294 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: John Vines >Assignee: Jonathan Park > Attachments: bug3294.patch, bug3294.patch.1 > > > Currently our API supports attachIterator, removeIterator, and > getIteratorSettings. There is no mechanism to directly change an iterators > settings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ACCUMULO-3294) Need mechanism to update iterator settings
[ https://issues.apache.org/jira/browse/ACCUMULO-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Park updated ACCUMULO-3294: Attachment: bug3294.patch.1 > Need mechanism to update iterator settings > -- > > Key: ACCUMULO-3294 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3294 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: John Vines >Assignee: Jonathan Park > Attachments: bug3294.patch, bug3294.patch.1 > > > Currently our API supports attachIterator, removeIterator, and > getIteratorSettings. There is no mechanism to directly change an iterators > settings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ACCUMULO-3294) Need mechanism to update iterator settings
[ https://issues.apache.org/jira/browse/ACCUMULO-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Park updated ACCUMULO-3294: Status: Open (was: Patch Available) > Need mechanism to update iterator settings > -- > > Key: ACCUMULO-3294 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3294 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: John Vines >Assignee: Jonathan Park > Attachments: bug3294.patch > > > Currently our API supports attachIterator, removeIterator, and > getIteratorSettings. There is no mechanism to directly change an iterators > settings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ACCUMULO-3294) Need mechanism to update iterator settings
[ https://issues.apache.org/jira/browse/ACCUMULO-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Park updated ACCUMULO-3294: Status: Patch Available (was: Open) > Need mechanism to update iterator settings > -- > > Key: ACCUMULO-3294 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3294 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: John Vines >Assignee: Jonathan Park > Attachments: bug3294.patch > > > Currently our API supports attachIterator, removeIterator, and > getIteratorSettings. There is no mechanism to directly change an iterators > settings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ACCUMULO-3294) Need mechanism to update iterator settings
[ https://issues.apache.org/jira/browse/ACCUMULO-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Park updated ACCUMULO-3294: Attachment: bug3294.patch > Need mechanism to update iterator settings > -- > > Key: ACCUMULO-3294 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3294 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: John Vines >Assignee: Jonathan Park > Attachments: bug3294.patch > > > Currently our API supports attachIterator, removeIterator, and > getIteratorSettings. There is no mechanism to directly change an iterators > settings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (ACCUMULO-3294) Need mechanism to update iterator settings
[ https://issues.apache.org/jira/browse/ACCUMULO-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Park reassigned ACCUMULO-3294: --- Assignee: Jonathan Park > Need mechanism to update iterator settings > -- > > Key: ACCUMULO-3294 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3294 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: John Vines >Assignee: Jonathan Park > > Currently our API supports attachIterator, removeIterator, and > getIteratorSettings. There is no mechanism to directly change an iterators > settings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3530) alterTable/NamespaceProperty should use Fate locks
[ https://issues.apache.org/jira/browse/ACCUMULO-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292152#comment-14292152 ] Jonathan Park commented on ACCUMULO-3530: - I believe ACCUMULO-1568 can satisfy our use-case as well. I agree that I'm not sure Fate locks are the way to go. Some way of obtaining a snapshot read of table properties is sufficient. > alterTable/NamespaceProperty should use Fate locks > -- > > Key: ACCUMULO-3530 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3530 > Project: Accumulo > Issue Type: Bug >Reporter: John Vines > > Fate operations, such as clone table, have logic in place to ensure > consistency as the operation occurs. However, operaitons like > alterTableProperty can still interfere because there is no locking done. We > should add identical locking to these methods in MasterClientServiceHandler > to help ensure consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3530) alterTable/NamespaceProperty should use Fate locks
[ https://issues.apache.org/jira/browse/ACCUMULO-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292138#comment-14292138 ] Jonathan Park commented on ACCUMULO-3530: - [~ctubbsii] Is there a particular case you're concerned with? I'm not too familiar with the fate locks but the general problem we observed was that while a clone was in progress, a table had an iterator configuration removed. The clone operation ended up failing because the ZK node correlated w/ the iterator config disappeared. I believe the ask here is that while an iterative read + copy of the table properties is in progress, changes should be disallowed so that a consistent read of the set of table properties is copied. > alterTable/NamespaceProperty should use Fate locks > -- > > Key: ACCUMULO-3530 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3530 > Project: Accumulo > Issue Type: Bug >Reporter: John Vines > > Fate operations, such as clone table, have logic in place to ensure > consistency as the operation occurs. However, operaitons like > alterTableProperty can still interfere because there is no locking done. We > should add identical locking to these methods in MasterClientServiceHandler > to help ensure consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (ACCUMULO-3330) Tserver "Running low on memory" appears more frequently than necessary when min heap != max heap
[ https://issues.apache.org/jira/browse/ACCUMULO-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Park resolved ACCUMULO-3330. - Resolution: Duplicate > Tserver "Running low on memory" appears more frequently than necessary when > min heap != max heap > > > Key: ACCUMULO-3330 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3330 > Project: Accumulo > Issue Type: Bug >Reporter: Jonathan Park >Priority: Minor > > I'm not sure if this is JVM specific behavior, but I suspect the way we > compute when to log "Running low on memory" could be improved. > Currently we use {{Runtime.getRuntime()}} and rely on the formula > {{freeMemory() < maxMemory() * 0.05}} to determine whether or not to log the > warning. With Oracle's HotSpot VM, {{freeMemory()}} appears to return the > amount of free memory relative to the current JVM heap size (as returned by > {{totalMemory()}}. If {{totalMemory()}} != {{maxMemory()}} then this warning > will start appearing before I think it was intended to which is misleading. > Easiest workaround is to configure the JVM heap to have the min size = max > size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3330) Tserver "Running low on memory" appears more frequently than necessary when min heap != max heap
[ https://issues.apache.org/jira/browse/ACCUMULO-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14210106#comment-14210106 ] Jonathan Park commented on ACCUMULO-3330: - Linked to the wrong issue. > Tserver "Running low on memory" appears more frequently than necessary when > min heap != max heap > > > Key: ACCUMULO-3330 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3330 > Project: Accumulo > Issue Type: Bug >Reporter: Jonathan Park >Priority: Minor > > I'm not sure if this is JVM specific behavior, but I suspect the way we > compute when to log "Running low on memory" could be improved. > Currently we use {{Runtime.getRuntime()}} and rely on the formula > {{freeMemory() < maxMemory() * 0.05}} to determine whether or not to log the > warning. With Oracle's HotSpot VM, {{freeMemory()}} appears to return the > amount of free memory relative to the current JVM heap size (as returned by > {{totalMemory()}}. If {{totalMemory()}} != {{maxMemory()}} then this warning > will start appearing before I think it was intended to which is misleading. > Easiest workaround is to configure the JVM heap to have the min size = max > size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-3330) Tserver "Running low on memory" appears more frequently than necessary when min heap != max heap
[ https://issues.apache.org/jira/browse/ACCUMULO-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14210105#comment-14210105 ] Jonathan Park commented on ACCUMULO-3330: - Didn't notice ACCUMULO-3320 earlier. This one is a duplicate of that one. > Tserver "Running low on memory" appears more frequently than necessary when > min heap != max heap > > > Key: ACCUMULO-3330 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3330 > Project: Accumulo > Issue Type: Bug >Reporter: Jonathan Park >Priority: Minor > > I'm not sure if this is JVM specific behavior, but I suspect the way we > compute when to log "Running low on memory" could be improved. > Currently we use {{Runtime.getRuntime()}} and rely on the formula > {{freeMemory() < maxMemory() * 0.05}} to determine whether or not to log the > warning. With Oracle's HotSpot VM, {{freeMemory()}} appears to return the > amount of free memory relative to the current JVM heap size (as returned by > {{totalMemory()}}. If {{totalMemory()}} != {{maxMemory()}} then this warning > will start appearing before I think it was intended to which is misleading. > Easiest workaround is to configure the JVM heap to have the min size = max > size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ACCUMULO-3330) Tserver "Running low on memory" appears more frequently than necessary when min heap != max heap
[ https://issues.apache.org/jira/browse/ACCUMULO-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Park updated ACCUMULO-3330: Summary: Tserver "Running low on memory" appears more frequently than necessary when min heap != max heap (was: Tserver "Running low on memory" might be miscomputed) > Tserver "Running low on memory" appears more frequently than necessary when > min heap != max heap > > > Key: ACCUMULO-3330 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3330 > Project: Accumulo > Issue Type: Bug >Reporter: Jonathan Park >Priority: Minor > > I'm not sure if this is JVM specific behavior, but I suspect the way we > compute when to log "Running low on memory" could be improved. > Currently we use {{Runtime.getRuntime()}} and rely on the formula > {{freeMemory() < maxMemory() * 0.05}} to determine whether or not to log the > warning. With Oracle's HotSpot VM, {{freeMemory()}} appears to return the > amount of free memory relative to the current JVM heap size (as returned by > {{totalMemory()}}. If {{totalMemory()}} != {{maxMemory()}} then this warning > will start appearing before I think it was intended to which is misleading. > Easiest workaround is to configure the JVM heap to have the min size = max > size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ACCUMULO-3330) Tserver "Running low on memory" might be miscomputed
Jonathan Park created ACCUMULO-3330: --- Summary: Tserver "Running low on memory" might be miscomputed Key: ACCUMULO-3330 URL: https://issues.apache.org/jira/browse/ACCUMULO-3330 Project: Accumulo Issue Type: Bug Reporter: Jonathan Park Priority: Minor I'm not sure if this is JVM specific behavior, but I suspect the way we compute when to log "Running low on memory" could be improved. Currently we use {{Runtime.getRuntime()}} and rely on the formula {{freeMemory() < maxMemory() * 0.05}} to determine whether or not to log the warning. With Oracle's HotSpot VM, {{freeMemory()}} appears to return the amount of free memory relative to the current JVM heap size (as returned by {{totalMemory()}}. If {{totalMemory()}} != {{maxMemory()}} then this warning will start appearing before I think it was intended to which is misleading. Easiest workaround is to configure the JVM heap to have the min size = max size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ACCUMULO-2889) Batch metadata table updates for new walogs
[ https://issues.apache.org/jira/browse/ACCUMULO-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Park updated ACCUMULO-2889: Attachment: ACCUMULO-2889.2.patch I'll gather a new set of #s when I get access to a cluster of machines. > Batch metadata table updates for new walogs > --- > > Key: ACCUMULO-2889 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2889 > Project: Accumulo > Issue Type: Improvement >Affects Versions: 1.5.1, 1.6.0 >Reporter: Jonathan Park >Assignee: Jonathan Park > Attachments: ACCUMULO-2889.0.patch.txt, ACCUMULO-2889.1.patch, > ACCUMULO-2889.2.patch, accumulo-2889-withpatch.png, > accumulo-2889_withoutpatch.png, batch_perf_test.sh, run_all.sh, > start-ingest.sh > > > Currently, when we update the Metadata table with new loggers, we will update > the metadata for each tablet serially. We could optimize this to instead use > a batchwriter to send all metadata updates for all tablets in a batch. > A few special cases include: > - What if the !METADATA tablet was included in the batch? > - What about the root tablet? > Benefit: > In one of our clusters, we're experiencing particularly slow HDFS operations > leading to large oscillations in ingest performance. We haven't isolated the > cause in HDFS but when we profile the tservers, we noticed that they were > waiting for metadata table operations to complete. This would target the > waiting. > Potential downsides: > Given the existing locking scheme, it looks like we may have to lock a tablet > for slightly longer (we'll lock for the duration of the batch). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ACCUMULO-2889) Batch metadata table updates for new walogs
[ https://issues.apache.org/jira/browse/ACCUMULO-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Park updated ACCUMULO-2889: Attachment: accumulo-2889_withoutpatch.png accumulo-2889-withpatch.png ACCUMULO-2889.1.patch start-ingest.sh batch_perf_test.sh run_all.sh Results from performance tests: Test design: - Run continuous ingest with 4 ingesters each ingesting 25million entries and then measure time until completion - We varied # of minor compactors and tablets per server (in retrospect, # of minor compactors didn't really matter in these tests, it may have been better to vary # of clients). - Each trial was run 3x and the average was taken. Tests were run on a single node (24 logical cores, 64 GB RAM, 8 drives) ||minc||tablets/server||w/o patch(ms)||w/ patch(ms)||ratio|| |4|32|269790.33|257537.33|0.95458325| |12|32|271124.33|255952|0.94403922| |12|320|355962.67|323737|0.90946896| |24|32|268709|261362.67|0.97266065| |24|320|355182.33|324308.67|0.91307659| I'll try to run this on a multi-node cluster if I can get around to it. > Batch metadata table updates for new walogs > --- > > Key: ACCUMULO-2889 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2889 > Project: Accumulo > Issue Type: Improvement >Affects Versions: 1.5.1, 1.6.0 >Reporter: Jonathan Park >Assignee: Jonathan Park > Attachments: ACCUMULO-2889.0.patch.txt, ACCUMULO-2889.1.patch, > accumulo-2889-withpatch.png, accumulo-2889_withoutpatch.png, > batch_perf_test.sh, run_all.sh, start-ingest.sh > > > Currently, when we update the Metadata table with new loggers, we will update > the metadata for each tablet serially. We could optimize this to instead use > a batchwriter to send all metadata updates for all tablets in a batch. > A few special cases include: > - What if the !METADATA tablet was included in the batch? > - What about the root tablet? > Benefit: > In one of our clusters, we're experiencing particularly slow HDFS operations > leading to large oscillations in ingest performance. We haven't isolated the > cause in HDFS but when we profile the tservers, we noticed that they were > waiting for metadata table operations to complete. This would target the > waiting. > Potential downsides: > Given the existing locking scheme, it looks like we may have to lock a tablet > for slightly longer (we'll lock for the duration of the batch). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (ACCUMULO-2827) HeapIterator optimization
[ https://issues.apache.org/jira/browse/ACCUMULO-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Park updated ACCUMULO-2827: Attachment: ACCUMULO-2827-compaction-performance-test.patch.2 I decided to re-run the tests but this time running each case 10 times in order to get rid of some of the noise. I threw away the first few test results + averaging rates across the 10 runs. These #s look more promising and seems to better correlate with [~kturner]'s earlier results. ||files||rows||cols||rate w/o patch||rate w/ patch PQ||PQ speedup|| |10|100|1|430636.7|457304.2|1.061925749| |10|10|10|550790.1|759692.6|1.379277877| |10|1|100|584660.3|851496.9|1.456395962| |20|50|1|397171|426878.5|1.074797757| |20|5|10|509081.4|735482.6|1.44472495| |1|1000|1|513712.2|539288|1.049786242| [~dlmarion] I'm not entirely sure what was going on in my earlier tests as we shouldn't have seen that large of a hit. The single column case is a close approximation of the worst case for this optimization since it should cause high % of interleaving across iterators which goes against our assumption in the optimization. I've posted an updated patch (from keith's original) that includes the test harness used for this test. > HeapIterator optimization > - > > Key: ACCUMULO-2827 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2827 > Project: Accumulo > Issue Type: Improvement >Affects Versions: 1.5.1, 1.6.0 >Reporter: Jonathan Park >Assignee: Jonathan Park >Priority: Minor > Fix For: 1.5.2, 1.6.1, 1.7.0 > > Attachments: ACCUMULO-2827-compaction-performance-test.patch, > ACCUMULO-2827-compaction-performance-test.patch.2, ACCUMULO-2827.0.patch.txt, > ACCUMULO-2827.1.patch.txt, ACCUMULO-2827.2.patch.txt, > ACCUMULO-2827.3.patch.txt, BenchmarkMultiIterator.java, > accumulo-2827.raw_data, new_heapiter.png, old_heapiter.png, together.png > > > We've been running a few performance tests of our iterator stack and noticed > a decent amount of time spent in the HeapIterator specifically related to > add/removal into the heap. > This may not be a general enough optimization but we thought we'd see what > people thought. Our assumption is that it's more probable that the current > "top iterator" will supply the next value in the iteration than not. The > current implementation takes the other assumption by always removing + > inserting the minimum iterator back into the heap. With the implementation of > a binary heap that we're using, this can get costly if our assumption is > wrong because we pay the log penalty of percolating up the iterator in the > heap upon insertion and again when percolating down upon removal. > We believe our assumption is a fair one to hold given that as major > compactions create a log distribution of file sizes, it's likely that we may > see a long chain of consecutive entries coming from 1 iterator. > Understandably, taking this assumption comes at an additional cost in the > case that we're wrong. Therefore, we've run a few benchmarking tests to see > how much of a cost we pay as well as what kind of benefit we see. I've > attached a potential patch (which includes a test harness) + image that > captures the results of our tests. The x-axis represents # of repeated keys > before switching to another iterator. The y-axis represents iteration time. > The sets of blue + red lines varies in # of iterators present in the heap. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ACCUMULO-2827) HeapIterator optimization
[ https://issues.apache.org/jira/browse/ACCUMULO-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041441#comment-14041441 ] Jonathan Park commented on ACCUMULO-2827: - Re-ran the tests. 3 runs each (with first few test results thrown away) + averaged rates across the 3 runs: ||files||rows||cols||rate w/o patch||rate w/ patch PQ||PQ speedup|| |10|100|1|431762.|417590.|0.9671763864| |10|10|10|575699.6667|675052.|1.172577252| |10|1|100|623398.6667|721883.|1.157980233| |20|50|1|421120.|391037|0.9285635697| |20|5|10|546200.6667|643364.6667|1.177890665| |1|1000|1|521046.|483500.6667|0.9279417889| > HeapIterator optimization > - > > Key: ACCUMULO-2827 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2827 > Project: Accumulo > Issue Type: Improvement >Affects Versions: 1.5.1, 1.6.0 >Reporter: Jonathan Park >Assignee: Jonathan Park >Priority: Minor > Fix For: 1.5.2, 1.6.1, 1.7.0 > > Attachments: ACCUMULO-2827-compaction-performance-test.patch, > ACCUMULO-2827.0.patch.txt, ACCUMULO-2827.1.patch.txt, > ACCUMULO-2827.2.patch.txt, ACCUMULO-2827.3.patch.txt, > BenchmarkMultiIterator.java, accumulo-2827.raw_data, new_heapiter.png, > old_heapiter.png, together.png > > > We've been running a few performance tests of our iterator stack and noticed > a decent amount of time spent in the HeapIterator specifically related to > add/removal into the heap. > This may not be a general enough optimization but we thought we'd see what > people thought. Our assumption is that it's more probable that the current > "top iterator" will supply the next value in the iteration than not. The > current implementation takes the other assumption by always removing + > inserting the minimum iterator back into the heap. With the implementation of > a binary heap that we're using, this can get costly if our assumption is > wrong because we pay the log penalty of percolating up the iterator in the > heap upon insertion and again when percolating down upon removal. > We believe our assumption is a fair one to hold given that as major > compactions create a log distribution of file sizes, it's likely that we may > see a long chain of consecutive entries coming from 1 iterator. > Understandably, taking this assumption comes at an additional cost in the > case that we're wrong. Therefore, we've run a few benchmarking tests to see > how much of a cost we pay as well as what kind of benefit we see. I've > attached a potential patch (which includes a test harness) + image that > captures the results of our tests. The x-axis represents # of repeated keys > before switching to another iterator. The y-axis represents iteration time. > The sets of blue + red lines varies in # of iterators present in the heap. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (ACCUMULO-2827) HeapIterator optimization
[ https://issues.apache.org/jira/browse/ACCUMULO-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Park updated ACCUMULO-2827: Attachment: BenchmarkMultiIterator.java ACCUMULO-2827.3.patch.txt ACCUMULO-2827.3.patch.txt: Only contains HeapIterator changes that uses java.util.PriorityQueue instead of org.apache.commons.collections.buffer.PriorityBuffer. > HeapIterator optimization > - > > Key: ACCUMULO-2827 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2827 > Project: Accumulo > Issue Type: Improvement >Affects Versions: 1.5.1, 1.6.0 >Reporter: Jonathan Park >Assignee: Jonathan Park >Priority: Minor > Fix For: 1.5.2, 1.6.1, 1.7.0 > > Attachments: ACCUMULO-2827-compaction-performance-test.patch, > ACCUMULO-2827.0.patch.txt, ACCUMULO-2827.1.patch.txt, > ACCUMULO-2827.2.patch.txt, ACCUMULO-2827.3.patch.txt, > BenchmarkMultiIterator.java, accumulo-2827.raw_data, new_heapiter.png, > old_heapiter.png, together.png > > > We've been running a few performance tests of our iterator stack and noticed > a decent amount of time spent in the HeapIterator specifically related to > add/removal into the heap. > This may not be a general enough optimization but we thought we'd see what > people thought. Our assumption is that it's more probable that the current > "top iterator" will supply the next value in the iteration than not. The > current implementation takes the other assumption by always removing + > inserting the minimum iterator back into the heap. With the implementation of > a binary heap that we're using, this can get costly if our assumption is > wrong because we pay the log penalty of percolating up the iterator in the > heap upon insertion and again when percolating down upon removal. > We believe our assumption is a fair one to hold given that as major > compactions create a log distribution of file sizes, it's likely that we may > see a long chain of consecutive entries coming from 1 iterator. > Understandably, taking this assumption comes at an additional cost in the > case that we're wrong. Therefore, we've run a few benchmarking tests to see > how much of a cost we pay as well as what kind of benefit we see. I've > attached a potential patch (which includes a test harness) + image that > captures the results of our tests. The x-axis represents # of repeated keys > before switching to another iterator. The y-axis represents iteration time. > The sets of blue + red lines varies in # of iterators present in the heap. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ACCUMULO-2827) HeapIterator optimization
[ https://issues.apache.org/jira/browse/ACCUMULO-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041141#comment-14041141 ] Jonathan Park commented on ACCUMULO-2827: - Results of performance tests (I also re-ran [~kturner]'s cases for comparison): pq => priorityqueue pb => prioritybuffer ||files||rows per file||cols per row||rate w/o patch||rate w/ patch pq||pq speedup||rate w/ patch pb||pb speedup|| |10|1,000,000|1| 449,355 |418,128| .93 |433,477|.965| |10|100,000|10| 598,155 |678,698| 1.135 | 667,465|1.116| |10|10,000|100| 641,190 |739,809| 1.154 |729,501|1.138| |20|500,000|1| 405,915 |400,614| .987 | 405,571|.999| |20|50,000|10| 551,997 |659,932| 1.196 | 643,250|1.165| |1|10,000,000|1| 506,719 |483,178| .954 | 517,362|1.021| Not entirely sure why PriorityQueue is performing worse than PriorityBuffer in the worst cases (high interleaving). Might just be noise in the tests? > HeapIterator optimization > - > > Key: ACCUMULO-2827 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2827 > Project: Accumulo > Issue Type: Improvement >Affects Versions: 1.5.1, 1.6.0 >Reporter: Jonathan Park >Assignee: Jonathan Park >Priority: Minor > Fix For: 1.5.2, 1.6.1, 1.7.0 > > Attachments: ACCUMULO-2827-compaction-performance-test.patch, > ACCUMULO-2827.0.patch.txt, ACCUMULO-2827.1.patch.txt, > ACCUMULO-2827.2.patch.txt, accumulo-2827.raw_data, new_heapiter.png, > old_heapiter.png, together.png > > > We've been running a few performance tests of our iterator stack and noticed > a decent amount of time spent in the HeapIterator specifically related to > add/removal into the heap. > This may not be a general enough optimization but we thought we'd see what > people thought. Our assumption is that it's more probable that the current > "top iterator" will supply the next value in the iteration than not. The > current implementation takes the other assumption by always removing + > inserting the minimum iterator back into the heap. With the implementation of > a binary heap that we're using, this can get costly if our assumption is > wrong because we pay the log penalty of percolating up the iterator in the > heap upon insertion and again when percolating down upon removal. > We believe our assumption is a fair one to hold given that as major > compactions create a log distribution of file sizes, it's likely that we may > see a long chain of consecutive entries coming from 1 iterator. > Understandably, taking this assumption comes at an additional cost in the > case that we're wrong. Therefore, we've run a few benchmarking tests to see > how much of a cost we pay as well as what kind of benefit we see. I've > attached a potential patch (which includes a test harness) + image that > captures the results of our tests. The x-axis represents # of repeated keys > before switching to another iterator. The y-axis represents iteration time. > The sets of blue + red lines varies in # of iterators present in the heap. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (ACCUMULO-2827) HeapIterator optimization
[ https://issues.apache.org/jira/browse/ACCUMULO-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Park updated ACCUMULO-2827: Attachment: ACCUMULO-2827.2.patch.txt Removed trailing whitespace on all lines. I'm currently running [~kturner]'s tests to make sure changing from a PriorityBuffer to PriorityQueue doesn't affect performance too much. > HeapIterator optimization > - > > Key: ACCUMULO-2827 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2827 > Project: Accumulo > Issue Type: Improvement >Affects Versions: 1.5.1, 1.6.0 >Reporter: Jonathan Park >Assignee: Jonathan Park >Priority: Minor > Fix For: 1.5.2, 1.6.1, 1.7.0 > > Attachments: ACCUMULO-2827-compaction-performance-test.patch, > ACCUMULO-2827.0.patch.txt, ACCUMULO-2827.1.patch.txt, > ACCUMULO-2827.2.patch.txt, accumulo-2827.raw_data, new_heapiter.png, > old_heapiter.png, together.png > > > We've been running a few performance tests of our iterator stack and noticed > a decent amount of time spent in the HeapIterator specifically related to > add/removal into the heap. > This may not be a general enough optimization but we thought we'd see what > people thought. Our assumption is that it's more probable that the current > "top iterator" will supply the next value in the iteration than not. The > current implementation takes the other assumption by always removing + > inserting the minimum iterator back into the heap. With the implementation of > a binary heap that we're using, this can get costly if our assumption is > wrong because we pay the log penalty of percolating up the iterator in the > heap upon insertion and again when percolating down upon removal. > We believe our assumption is a fair one to hold given that as major > compactions create a log distribution of file sizes, it's likely that we may > see a long chain of consecutive entries coming from 1 iterator. > Understandably, taking this assumption comes at an additional cost in the > case that we're wrong. Therefore, we've run a few benchmarking tests to see > how much of a cost we pay as well as what kind of benefit we see. I've > attached a potential patch (which includes a test harness) + image that > captures the results of our tests. The x-axis represents # of repeated keys > before switching to another iterator. The y-axis represents iteration time. > The sets of blue + red lines varies in # of iterators present in the heap. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (ACCUMULO-2827) HeapIterator optimization
[ https://issues.apache.org/jira/browse/ACCUMULO-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Park updated ACCUMULO-2827: Attachment: ACCUMULO-2827.1.patch.txt Updating patch to use PriorityQueue instead of PriorityBuffer. > HeapIterator optimization > - > > Key: ACCUMULO-2827 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2827 > Project: Accumulo > Issue Type: Improvement >Affects Versions: 1.5.1, 1.6.0 >Reporter: Jonathan Park >Assignee: Jonathan Park >Priority: Minor > Fix For: 1.5.2, 1.6.1, 1.7.0 > > Attachments: ACCUMULO-2827-compaction-performance-test.patch, > ACCUMULO-2827.0.patch.txt, ACCUMULO-2827.1.patch.txt, accumulo-2827.raw_data, > new_heapiter.png, old_heapiter.png, together.png > > > We've been running a few performance tests of our iterator stack and noticed > a decent amount of time spent in the HeapIterator specifically related to > add/removal into the heap. > This may not be a general enough optimization but we thought we'd see what > people thought. Our assumption is that it's more probable that the current > "top iterator" will supply the next value in the iteration than not. The > current implementation takes the other assumption by always removing + > inserting the minimum iterator back into the heap. With the implementation of > a binary heap that we're using, this can get costly if our assumption is > wrong because we pay the log penalty of percolating up the iterator in the > heap upon insertion and again when percolating down upon removal. > We believe our assumption is a fair one to hold given that as major > compactions create a log distribution of file sizes, it's likely that we may > see a long chain of consecutive entries coming from 1 iterator. > Understandably, taking this assumption comes at an additional cost in the > case that we're wrong. Therefore, we've run a few benchmarking tests to see > how much of a cost we pay as well as what kind of benefit we see. I've > attached a potential patch (which includes a test harness) + image that > captures the results of our tests. The x-axis represents # of repeated keys > before switching to another iterator. The y-axis represents iteration time. > The sets of blue + red lines varies in # of iterators present in the heap. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ACCUMULO-2827) HeapIterator optimization
[ https://issues.apache.org/jira/browse/ACCUMULO-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036680#comment-14036680 ] Jonathan Park commented on ACCUMULO-2827: - Sorry I've been delayed in my responses. I agree with [~kturner] on the lack of performance gain due to the uniform random data. Thanks for volunteering to run the test with multiple columns per row Keith! :) Looking forward to seeing the results! > HeapIterator optimization > - > > Key: ACCUMULO-2827 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2827 > Project: Accumulo > Issue Type: Improvement >Affects Versions: 1.5.1, 1.6.0 >Reporter: Jonathan Park >Assignee: Jonathan Park >Priority: Minor > Fix For: 1.5.2, 1.6.1, 1.7.0 > > Attachments: ACCUMULO-2827.0.patch.txt, accumulo-2827.raw_data, > new_heapiter.png, old_heapiter.png, together.png > > > We've been running a few performance tests of our iterator stack and noticed > a decent amount of time spent in the HeapIterator specifically related to > add/removal into the heap. > This may not be a general enough optimization but we thought we'd see what > people thought. Our assumption is that it's more probable that the current > "top iterator" will supply the next value in the iteration than not. The > current implementation takes the other assumption by always removing + > inserting the minimum iterator back into the heap. With the implementation of > a binary heap that we're using, this can get costly if our assumption is > wrong because we pay the log penalty of percolating up the iterator in the > heap upon insertion and again when percolating down upon removal. > We believe our assumption is a fair one to hold given that as major > compactions create a log distribution of file sizes, it's likely that we may > see a long chain of consecutive entries coming from 1 iterator. > Understandably, taking this assumption comes at an additional cost in the > case that we're wrong. Therefore, we've run a few benchmarking tests to see > how much of a cost we pay as well as what kind of benefit we see. I've > attached a potential patch (which includes a test harness) + image that > captures the results of our tests. The x-axis represents # of repeated keys > before switching to another iterator. The y-axis represents iteration time. > The sets of blue + red lines varies in # of iterators present in the heap. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (ACCUMULO-2889) Batch metadata table updates for new walogs
[ https://issues.apache.org/jira/browse/ACCUMULO-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Park updated ACCUMULO-2889: Attachment: ACCUMULO-2889.0.patch.txt > Batch metadata table updates for new walogs > --- > > Key: ACCUMULO-2889 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2889 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.5.1, 1.6.0 >Reporter: Jonathan Park >Assignee: Jonathan Park > Attachments: ACCUMULO-2889.0.patch.txt > > > Currently, when we update the Metadata table with new loggers, we will update > the metadata for each tablet serially. We could optimize this to instead use > a batchwriter to send all metadata updates for all tablets in a batch. > A few special cases include: > - What if the !METADATA tablet was included in the batch? > - What about the root tablet? > Benefit: > In one of our clusters, we're experiencing particularly slow HDFS operations > leading to large oscillations in ingest performance. We haven't isolated the > cause in HDFS but when we profile the tservers, we noticed that they were > waiting for metadata table operations to complete. This would target the > waiting. > Potential downsides: > Given the existing locking scheme, it looks like we may have to lock a tablet > for slightly longer (we'll lock for the duration of the batch). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (ACCUMULO-2889) Batch metadata table updates for new walogs
[ https://issues.apache.org/jira/browse/ACCUMULO-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Park updated ACCUMULO-2889: Affects Version/s: 1.6.0 Status: Patch Available (was: In Progress) First pass at batching metadata updates for new WALs. I'll attach a screenshot of its affects as well. > Batch metadata table updates for new walogs > --- > > Key: ACCUMULO-2889 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2889 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.6.0, 1.5.1 >Reporter: Jonathan Park >Assignee: Jonathan Park > Attachments: ACCUMULO-2889.0.patch.txt > > > Currently, when we update the Metadata table with new loggers, we will update > the metadata for each tablet serially. We could optimize this to instead use > a batchwriter to send all metadata updates for all tablets in a batch. > A few special cases include: > - What if the !METADATA tablet was included in the batch? > - What about the root tablet? > Benefit: > In one of our clusters, we're experiencing particularly slow HDFS operations > leading to large oscillations in ingest performance. We haven't isolated the > cause in HDFS but when we profile the tservers, we noticed that they were > waiting for metadata table operations to complete. This would target the > waiting. > Potential downsides: > Given the existing locking scheme, it looks like we may have to lock a tablet > for slightly longer (we'll lock for the duration of the batch). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ACCUMULO-2827) HeapIterator optimization
[ https://issues.apache.org/jira/browse/ACCUMULO-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14029547#comment-14029547 ] Jonathan Park commented on ACCUMULO-2827: - This was only a single compaction using each of the old and new iterators. Any more details on hooking up a profiler and seeing how long the HeapIterator takes? By profiler, do you mean something like jvisualvm? Do you want us to try and profile a major compaction as its running in a tserver vs a test harness? How would you like the profiler hooked up? Typically we've found that attaching profilers to the iterators greatly affect the performance. It should still show the benefit from this change though. Would those results be valid? > HeapIterator optimization > - > > Key: ACCUMULO-2827 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2827 > Project: Accumulo > Issue Type: Improvement >Affects Versions: 1.5.1, 1.6.0 >Reporter: Jonathan Park >Assignee: Jonathan Park >Priority: Minor > Fix For: 1.5.2, 1.6.1, 1.7.0 > > Attachments: ACCUMULO-2827.0.patch.txt, accumulo-2827.raw_data, > new_heapiter.png, old_heapiter.png, together.png > > > We've been running a few performance tests of our iterator stack and noticed > a decent amount of time spent in the HeapIterator specifically related to > add/removal into the heap. > This may not be a general enough optimization but we thought we'd see what > people thought. Our assumption is that it's more probable that the current > "top iterator" will supply the next value in the iteration than not. The > current implementation takes the other assumption by always removing + > inserting the minimum iterator back into the heap. With the implementation of > a binary heap that we're using, this can get costly if our assumption is > wrong because we pay the log penalty of percolating up the iterator in the > heap upon insertion and again when percolating down upon removal. > We believe our assumption is a fair one to hold given that as major > compactions create a log distribution of file sizes, it's likely that we may > see a long chain of consecutive entries coming from 1 iterator. > Understandably, taking this assumption comes at an additional cost in the > case that we're wrong. Therefore, we've run a few benchmarking tests to see > how much of a cost we pay as well as what kind of benefit we see. I've > attached a potential patch (which includes a test harness) + image that > captures the results of our tests. The x-axis represents # of repeated keys > before switching to another iterator. The y-axis represents iteration time. > The sets of blue + red lines varies in # of iterators present in the heap. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ACCUMULO-2827) HeapIterator optimization
[ https://issues.apache.org/jira/browse/ACCUMULO-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028114#comment-14028114 ] Jonathan Park commented on ACCUMULO-2827: - Results of accumulo continuous ingest (against 1.5.1 on hadoop-2.2.0). Tests were run against a 12 physical-core, 64GB RAM, 8 drive single-node machine: Test: - Ingest roughly 1 billion entries (set NUM=1,000,000,000 (without commas)) - Pre-split into 8 tablets - table.split.threshold=100G (Avoid splits so we can have more entries per tablet) - table.compaction.major.ratio=4 - table.file.max=10 - tserver.compaction.major.concurrent.max=9 (enough to have all compactions running concurrently) - tserver.compaction.major.thread.files.open.max=20 (all files open at once during majc) - tserver.memory.maps.max=4G We only used 1 ingester instance (so a single batchwriter thread). Results: After ingest completed, we triggered a full majc and timed how long it took to complete. {noformat} time accumulo shell -u root -p -e 'compact -t ci -w' {noformat} 1.5.1 old heap iterator {no format} real21m48.785s user0m6.014s sys 0m0.475s {no format} 1.5.1 new heap iterator {no format} real20m45.002s user0m5.693s sys 0m0.456s {no format} > HeapIterator optimization > - > > Key: ACCUMULO-2827 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2827 > Project: Accumulo > Issue Type: Improvement >Affects Versions: 1.5.1, 1.6.0 >Reporter: Jonathan Park >Assignee: Jonathan Park >Priority: Minor > Fix For: 1.5.2, 1.6.1, 1.7.0 > > Attachments: ACCUMULO-2827.0.patch.txt, accumulo-2827.raw_data, > new_heapiter.png, old_heapiter.png, together.png > > > We've been running a few performance tests of our iterator stack and noticed > a decent amount of time spent in the HeapIterator specifically related to > add/removal into the heap. > This may not be a general enough optimization but we thought we'd see what > people thought. Our assumption is that it's more probable that the current > "top iterator" will supply the next value in the iteration than not. The > current implementation takes the other assumption by always removing + > inserting the minimum iterator back into the heap. With the implementation of > a binary heap that we're using, this can get costly if our assumption is > wrong because we pay the log penalty of percolating up the iterator in the > heap upon insertion and again when percolating down upon removal. > We believe our assumption is a fair one to hold given that as major > compactions create a log distribution of file sizes, it's likely that we may > see a long chain of consecutive entries coming from 1 iterator. > Understandably, taking this assumption comes at an additional cost in the > case that we're wrong. Therefore, we've run a few benchmarking tests to see > how much of a cost we pay as well as what kind of benefit we see. I've > attached a potential patch (which includes a test harness) + image that > captures the results of our tests. The x-axis represents # of repeated keys > before switching to another iterator. The y-axis represents iteration time. > The sets of blue + red lines varies in # of iterators present in the heap. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (ACCUMULO-2889) Batch metadata table updates for new walogs
[ https://issues.apache.org/jira/browse/ACCUMULO-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Park reassigned ACCUMULO-2889: --- Assignee: Jonathan Park > Batch metadata table updates for new walogs > --- > > Key: ACCUMULO-2889 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2889 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.5.1 >Reporter: Jonathan Park >Assignee: Jonathan Park > > Currently, when we update the Metadata table with new loggers, we will update > the metadata for each tablet serially. We could optimize this to instead use > a batchwriter to send all metadata updates for all tablets in a batch. > A few special cases include: > - What if the !METADATA tablet was included in the batch? > - What about the root tablet? > Benefit: > In one of our clusters, we're experiencing particularly slow HDFS operations > leading to large oscillations in ingest performance. We haven't isolated the > cause in HDFS but when we profile the tservers, we noticed that they were > waiting for metadata table operations to complete. This would target the > waiting. > Potential downsides: > Given the existing locking scheme, it looks like we may have to lock a tablet > for slightly longer (we'll lock for the duration of the batch). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (ACCUMULO-2827) HeapIterator optimization
[ https://issues.apache.org/jira/browse/ACCUMULO-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Park updated ACCUMULO-2827: Attachment: accumulo-2827.raw_data Attaching raw data for the together.png image. Values are in ms so there may be some amount of noise involved. Continuous ingest tests are still running. Sorry for the delay. > HeapIterator optimization > - > > Key: ACCUMULO-2827 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2827 > Project: Accumulo > Issue Type: Improvement >Affects Versions: 1.5.1, 1.6.0 >Reporter: Jonathan Park >Assignee: Jonathan Park >Priority: Minor > Fix For: 1.5.2, 1.6.1, 1.7.0 > > Attachments: ACCUMULO-2827.0.patch.txt, accumulo-2827.raw_data, > new_heapiter.png, old_heapiter.png, together.png > > > We've been running a few performance tests of our iterator stack and noticed > a decent amount of time spent in the HeapIterator specifically related to > add/removal into the heap. > This may not be a general enough optimization but we thought we'd see what > people thought. Our assumption is that it's more probable that the current > "top iterator" will supply the next value in the iteration than not. The > current implementation takes the other assumption by always removing + > inserting the minimum iterator back into the heap. With the implementation of > a binary heap that we're using, this can get costly if our assumption is > wrong because we pay the log penalty of percolating up the iterator in the > heap upon insertion and again when percolating down upon removal. > We believe our assumption is a fair one to hold given that as major > compactions create a log distribution of file sizes, it's likely that we may > see a long chain of consecutive entries coming from 1 iterator. > Understandably, taking this assumption comes at an additional cost in the > case that we're wrong. Therefore, we've run a few benchmarking tests to see > how much of a cost we pay as well as what kind of benefit we see. I've > attached a potential patch (which includes a test harness) + image that > captures the results of our tests. The x-axis represents # of repeated keys > before switching to another iterator. The y-axis represents iteration time. > The sets of blue + red lines varies in # of iterators present in the heap. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ACCUMULO-2801) define tablet syncs walog for each tablet in a batch
[ https://issues.apache.org/jira/browse/ACCUMULO-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027036#comment-14027036 ] Jonathan Park commented on ACCUMULO-2801: - Oops, sorry for spam... Unsure if I referenced the right Keith Turner in the 1st comment so wanted to link kturner. > define tablet syncs walog for each tablet in a batch > > > Key: ACCUMULO-2801 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2801 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Keith Turner > > When the batch writer sends a batch of mutations for N tablets that were not > currently using a walog, then define tablet will be called for each tablet. > Define tablet will sync the walog. In hadoop 2 hsync is used, which is much > slower than hadoop1 sync calls. If hsync takes 50ms and there are 100 > tablets, then this operation would take 5 secs. The calls to define tablet > do not occur frequently, just when walogs switch or tablets are loaded so the > cost will be amortized. Ideally there could be one walog sync call for all > of the tablets in a batch of mutations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ACCUMULO-2801) define tablet syncs walog for each tablet in a batch
[ https://issues.apache.org/jira/browse/ACCUMULO-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027032#comment-14027032 ] Jonathan Park commented on ACCUMULO-2801: - [~kturner] > define tablet syncs walog for each tablet in a batch > > > Key: ACCUMULO-2801 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2801 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Keith Turner > > When the batch writer sends a batch of mutations for N tablets that were not > currently using a walog, then define tablet will be called for each tablet. > Define tablet will sync the walog. In hadoop 2 hsync is used, which is much > slower than hadoop1 sync calls. If hsync takes 50ms and there are 100 > tablets, then this operation would take 5 secs. The calls to define tablet > do not occur frequently, just when walogs switch or tablets are loaded so the > cost will be amortized. Ideally there could be one walog sync call for all > of the tablets in a batch of mutations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ACCUMULO-2889) Batch metadata table updates for new walogs
[ https://issues.apache.org/jira/browse/ACCUMULO-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027009#comment-14027009 ] Jonathan Park commented on ACCUMULO-2889: - Our current proposal: TabletServer client threads: order(commitSessions) // this is to avoid deadlock across multiple client threads batch.start foreach tablet: tablet.logLock.lock if (tablet.mustRegisterNewLoggers) then defineTablet(tablet) // write WAL entry for tablet tablet.addLoggerToMetadataBatch(batch) // hold onto the lock else tablet.logLock.release batch.flush release(allCurrentlyHeldLocks); > Batch metadata table updates for new walogs > --- > > Key: ACCUMULO-2889 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2889 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.5.1 >Reporter: Jonathan Park > > Currently, when we update the Metadata table with new loggers, we will update > the metadata for each tablet serially. We could optimize this to instead use > a batchwriter to send all metadata updates for all tablets in a batch. > A few special cases include: > - What if the !METADATA tablet was included in the batch? > - What about the root tablet? > Benefit: > In one of our clusters, we're experiencing particularly slow HDFS operations > leading to large oscillations in ingest performance. We haven't isolated the > cause in HDFS but when we profile the tservers, we noticed that they were > waiting for metadata table operations to complete. This would target the > waiting. > Potential downsides: > Given the existing locking scheme, it looks like we may have to lock a tablet > for slightly longer (we'll lock for the duration of the batch). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (ACCUMULO-2889) Batch metadata table updates for new walogs
Jonathan Park created ACCUMULO-2889: --- Summary: Batch metadata table updates for new walogs Key: ACCUMULO-2889 URL: https://issues.apache.org/jira/browse/ACCUMULO-2889 Project: Accumulo Issue Type: Bug Affects Versions: 1.5.1 Reporter: Jonathan Park Currently, when we update the Metadata table with new loggers, we will update the metadata for each tablet serially. We could optimize this to instead use a batchwriter to send all metadata updates for all tablets in a batch. A few special cases include: - What if the !METADATA tablet was included in the batch? - What about the root tablet? Benefit: In one of our clusters, we're experiencing particularly slow HDFS operations leading to large oscillations in ingest performance. We haven't isolated the cause in HDFS but when we profile the tservers, we noticed that they were waiting for metadata table operations to complete. This would target the waiting. Potential downsides: Given the existing locking scheme, it looks like we may have to lock a tablet for slightly longer (we'll lock for the duration of the batch). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ACCUMULO-2801) define tablet syncs walog for each tablet in a batch
[ https://issues.apache.org/jira/browse/ACCUMULO-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026951#comment-14026951 ] Jonathan Park commented on ACCUMULO-2801: - [~keith_turner] what are your thoughts on not calling sync for define tablet and instead relying on the sync for a data write to ensure that it exists? It will make it possible for there to be a metadata table entry for the WAL without there being an associated DEFINE_TABLET in the WAL which I think recovery will currently ignore (looking at 1.5.1). It might change our recovery semantics (I'm not fully familiar with what our current guarantees are) in the case of log rollovers/defines. > define tablet syncs walog for each tablet in a batch > > > Key: ACCUMULO-2801 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2801 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.5.0, 1.5.1, 1.6.0 >Reporter: Keith Turner > > When the batch writer sends a batch of mutations for N tablets that were not > currently using a walog, then define tablet will be called for each tablet. > Define tablet will sync the walog. In hadoop 2 hsync is used, which is much > slower than hadoop1 sync calls. If hsync takes 50ms and there are 100 > tablets, then this operation would take 5 secs. The calls to define tablet > do not occur frequently, just when walogs switch or tablets are loaded so the > cost will be amortized. Ideally there could be one walog sync call for all > of the tablets in a batch of mutations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ACCUMULO-2827) HeapIterator optimization
[ https://issues.apache.org/jira/browse/ACCUMULO-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018832#comment-14018832 ] Jonathan Park commented on ACCUMULO-2827: - My apologies, I haven't gotten to this yet. I'll try to find some time this weekend. > HeapIterator optimization > - > > Key: ACCUMULO-2827 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2827 > Project: Accumulo > Issue Type: Improvement >Affects Versions: 1.5.1, 1.6.0 >Reporter: Jonathan Park >Assignee: Jonathan Park >Priority: Minor > Fix For: 1.5.2, 1.6.1, 1.7.0 > > Attachments: ACCUMULO-2827.0.patch.txt, new_heapiter.png, > old_heapiter.png, together.png > > > We've been running a few performance tests of our iterator stack and noticed > a decent amount of time spent in the HeapIterator specifically related to > add/removal into the heap. > This may not be a general enough optimization but we thought we'd see what > people thought. Our assumption is that it's more probable that the current > "top iterator" will supply the next value in the iteration than not. The > current implementation takes the other assumption by always removing + > inserting the minimum iterator back into the heap. With the implementation of > a binary heap that we're using, this can get costly if our assumption is > wrong because we pay the log penalty of percolating up the iterator in the > heap upon insertion and again when percolating down upon removal. > We believe our assumption is a fair one to hold given that as major > compactions create a log distribution of file sizes, it's likely that we may > see a long chain of consecutive entries coming from 1 iterator. > Understandably, taking this assumption comes at an additional cost in the > case that we're wrong. Therefore, we've run a few benchmarking tests to see > how much of a cost we pay as well as what kind of benefit we see. I've > attached a potential patch (which includes a test harness) + image that > captures the results of our tests. The x-axis represents # of repeated keys > before switching to another iterator. The y-axis represents iteration time. > The sets of blue + red lines varies in # of iterators present in the heap. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (ACCUMULO-2827) HeapIterator optimization
[ https://issues.apache.org/jira/browse/ACCUMULO-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Park updated ACCUMULO-2827: Assignee: Jonathan Park Status: Patch Available (was: Open) > HeapIterator optimization > - > > Key: ACCUMULO-2827 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2827 > Project: Accumulo > Issue Type: Improvement >Affects Versions: 1.5.1 >Reporter: Jonathan Park >Assignee: Jonathan Park >Priority: Minor > Attachments: ACCUMULO-2827.0.patch.txt, new_heapiter.png, > old_heapiter.png, together.png > > > We've been running a few performance tests of our iterator stack and noticed > a decent amount of time spent in the HeapIterator specifically related to > add/removal into the heap. > This may not be a general enough optimization but we thought we'd see what > people thought. Our assumption is that it's more probable that the current > "top iterator" will supply the next value in the iteration than not. The > current implementation takes the other assumption by always removing + > inserting the minimum iterator back into the heap. With the implementation of > a binary heap that we're using, this can get costly if our assumption is > wrong because we pay the log penalty of percolating up the iterator in the > heap upon insertion and again when percolating down upon removal. > We believe our assumption is a fair one to hold given that as major > compactions create a log distribution of file sizes, it's likely that we may > see a long chain of consecutive entries coming from 1 iterator. > Understandably, taking this assumption comes at an additional cost in the > case that we're wrong. Therefore, we've run a few benchmarking tests to see > how much of a cost we pay as well as what kind of benefit we see. I've > attached a potential patch (which includes a test harness) + image that > captures the results of our tests. The x-axis represents # of repeated keys > before switching to another iterator. The y-axis represents iteration time. > The sets of blue + red lines varies in # of iterators present in the heap. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (ACCUMULO-2827) HeapIterator optimization
[ https://issues.apache.org/jira/browse/ACCUMULO-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Park updated ACCUMULO-2827: Attachment: together.png old_heapiter.png new_heapiter.png ACCUMULO-2827.0.patch.txt > HeapIterator optimization > - > > Key: ACCUMULO-2827 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2827 > Project: Accumulo > Issue Type: Improvement >Affects Versions: 1.5.1 >Reporter: Jonathan Park >Priority: Minor > Attachments: ACCUMULO-2827.0.patch.txt, new_heapiter.png, > old_heapiter.png, together.png > > > We've been running a few performance tests of our iterator stack and noticed > a decent amount of time spent in the HeapIterator specifically related to > add/removal into the heap. > This may not be a general enough optimization but we thought we'd see what > people thought. Our assumption is that it's more probable that the current > "top iterator" will supply the next value in the iteration than not. The > current implementation takes the other assumption by always removing + > inserting the minimum iterator back into the heap. With the implementation of > a binary heap that we're using, this can get costly if our assumption is > wrong because we pay the log penalty of percolating up the iterator in the > heap upon insertion and again when percolating down upon removal. > We believe our assumption is a fair one to hold given that as major > compactions create a log distribution of file sizes, it's likely that we may > see a long chain of consecutive entries coming from 1 iterator. > Understandably, taking this assumption comes at an additional cost in the > case that we're wrong. Therefore, we've run a few benchmarking tests to see > how much of a cost we pay as well as what kind of benefit we see. I've > attached a potential patch (which includes a test harness) + image that > captures the results of our tests. The x-axis represents # of repeated keys > before switching to another iterator. The y-axis represents iteration time. > The sets of blue + red lines varies in # of iterators present in the heap. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (ACCUMULO-2827) HeapIterator optimization
Jonathan Park created ACCUMULO-2827: --- Summary: HeapIterator optimization Key: ACCUMULO-2827 URL: https://issues.apache.org/jira/browse/ACCUMULO-2827 Project: Accumulo Issue Type: Improvement Affects Versions: 1.5.1 Reporter: Jonathan Park Priority: Minor We've been running a few performance tests of our iterator stack and noticed a decent amount of time spent in the HeapIterator specifically related to add/removal into the heap. This may not be a general enough optimization but we thought we'd see what people thought. Our assumption is that it's more probable that the current "top iterator" will supply the next value in the iteration than not. The current implementation takes the other assumption by always removing + inserting the minimum iterator back into the heap. With the implementation of a binary heap that we're using, this can get costly if our assumption is wrong because we pay the log(n) penalty of percolating up the iterator in the heap upon insertion and again when percolating down upon removal. We believe our assumption is a fair one to hold given that as major compactions create a log distribution of file sizes, it's likely that we may see a long chain of consecutive entries coming from 1 iterator. Understandably, taking this assumption comes at an additional cost in the case that we're wrong. Therefore, we've run a few benchmarking tests to see how much of a cost we pay as well as what kind of benefit we see. I've attached a potential patch (which includes a test harness) + image that captures the results of our tests. The x-axis represents # of repeated keys before switching to another iterator. The y-axis represents iteration time. The sets of blue + red lines varies in # of iterators present in the heap. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (ACCUMULO-2827) HeapIterator optimization
[ https://issues.apache.org/jira/browse/ACCUMULO-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Park updated ACCUMULO-2827: Description: We've been running a few performance tests of our iterator stack and noticed a decent amount of time spent in the HeapIterator specifically related to add/removal into the heap. This may not be a general enough optimization but we thought we'd see what people thought. Our assumption is that it's more probable that the current "top iterator" will supply the next value in the iteration than not. The current implementation takes the other assumption by always removing + inserting the minimum iterator back into the heap. With the implementation of a binary heap that we're using, this can get costly if our assumption is wrong because we pay the log penalty of percolating up the iterator in the heap upon insertion and again when percolating down upon removal. We believe our assumption is a fair one to hold given that as major compactions create a log distribution of file sizes, it's likely that we may see a long chain of consecutive entries coming from 1 iterator. Understandably, taking this assumption comes at an additional cost in the case that we're wrong. Therefore, we've run a few benchmarking tests to see how much of a cost we pay as well as what kind of benefit we see. I've attached a potential patch (which includes a test harness) + image that captures the results of our tests. The x-axis represents # of repeated keys before switching to another iterator. The y-axis represents iteration time. The sets of blue + red lines varies in # of iterators present in the heap. was: We've been running a few performance tests of our iterator stack and noticed a decent amount of time spent in the HeapIterator specifically related to add/removal into the heap. This may not be a general enough optimization but we thought we'd see what people thought. Our assumption is that it's more probable that the current "top iterator" will supply the next value in the iteration than not. The current implementation takes the other assumption by always removing + inserting the minimum iterator back into the heap. With the implementation of a binary heap that we're using, this can get costly if our assumption is wrong because we pay the log(n) penalty of percolating up the iterator in the heap upon insertion and again when percolating down upon removal. We believe our assumption is a fair one to hold given that as major compactions create a log distribution of file sizes, it's likely that we may see a long chain of consecutive entries coming from 1 iterator. Understandably, taking this assumption comes at an additional cost in the case that we're wrong. Therefore, we've run a few benchmarking tests to see how much of a cost we pay as well as what kind of benefit we see. I've attached a potential patch (which includes a test harness) + image that captures the results of our tests. The x-axis represents # of repeated keys before switching to another iterator. The y-axis represents iteration time. The sets of blue + red lines varies in # of iterators present in the heap. > HeapIterator optimization > - > > Key: ACCUMULO-2827 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2827 > Project: Accumulo > Issue Type: Improvement >Affects Versions: 1.5.1 >Reporter: Jonathan Park >Priority: Minor > > We've been running a few performance tests of our iterator stack and noticed > a decent amount of time spent in the HeapIterator specifically related to > add/removal into the heap. > This may not be a general enough optimization but we thought we'd see what > people thought. Our assumption is that it's more probable that the current > "top iterator" will supply the next value in the iteration than not. The > current implementation takes the other assumption by always removing + > inserting the minimum iterator back into the heap. With the implementation of > a binary heap that we're using, this can get costly if our assumption is > wrong because we pay the log penalty of percolating up the iterator in the > heap upon insertion and again when percolating down upon removal. > We believe our assumption is a fair one to hold given that as major > compactions create a log distribution of file sizes, it's likely that we may > see a long chain of consecutive entries coming from 1 iterator. > Understandably, taking this assumption comes at an additional cost in the > case that we're wrong. Therefore, we've run a few benchmarking tests to see > how much of a cost we pay as well as what kind of benefit we see. I've > attached a potential patch (which includes a test harness) + image that > captures the results of our tests. The x-axis represents # of repeated keys > bef
[jira] [Commented] (ACCUMULO-2668) slow WAL writes
[ https://issues.apache.org/jira/browse/ACCUMULO-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969705#comment-13969705 ] Jonathan Park commented on ACCUMULO-2668: - Hey Sean. It would be great to be listed as a contributor. Name: Jonathan Park Company name: sqrrl Timezone: ET Thanks for all the help everyone! > slow WAL writes > --- > > Key: ACCUMULO-2668 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2668 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.6.0 >Reporter: Jonathan Park >Assignee: Jonathan Park >Priority: Blocker > Labels: 16_qa_bug > Fix For: 1.6.1 > > Attachments: ACCUMULO-2668.0.patch.txt, noflush.diff > > > During continuous ingest, we saw over 70% of our ingest time taken up by > writes to the WAL. When we ran the DfsLogger in isolation (created one > outside of the Tserver), we saw about ~25MB/s throughput as opposed to nearly > 100MB/s from just writing directly to an hdfs outputstream (computed by > taking the estimated size of the mutations sent to the DfsLogger class > divided by the time it took for it to flush + sync the data to HDFS). > After investigating, we found one possible culprit was the > NoFlushOutputStream. It is a subclass of java.io.FilterOutputStream but does > not override the write(byte[], int, int) method signature. The javadoc > indicates that subclasses of the FilterOutputStream should provide a more > efficient implementation. > I've attached a small diff that illustrates and addresses the issue but this > may not be how we ultimately want to fix it. > As a side note, I may be misreading the implementation of DfsLogger, but it > looks like we always make use of the NoFlushOutputStream, even if encryption > isn't enabled. There appears to be a faulty check in the DfsLogger.open() > implementation that I don't believe can be satisfied (line 384). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (ACCUMULO-2671) BlockedOutputStream can hit a StackOverflowError
Jonathan Park created ACCUMULO-2671: --- Summary: BlockedOutputStream can hit a StackOverflowError Key: ACCUMULO-2671 URL: https://issues.apache.org/jira/browse/ACCUMULO-2671 Project: Accumulo Issue Type: Bug Affects Versions: 1.6.0 Reporter: Jonathan Park This issue mostly came up after a resolution to ACCUMULO-2668 that allows a byte[] to be passed directly to the underlying stream from the NoFlushOutputStream. The problem appears to be due to the BlockedOutputStream.write(byte[], int, int) implementation that recursively writes out blocks/buffers out. When the stream is passed a large mutation (128MB was sufficient to trigger the error for me), this will cause a StackOverflowError. This is appears to be specifically with encryption at rest turned on. A simple fix would be to unroll the recursion. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (ACCUMULO-2668) slow WAL writes
[ https://issues.apache.org/jira/browse/ACCUMULO-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Park updated ACCUMULO-2668: Attachment: ACCUMULO-2668.0.patch.txt Reuploading file from format-patch command microbenchmark: - Ran continuous ingest on my laptop (2013 mbp: 2.6 GHz quad core i7, 16 GB RAM) using default 3GB accumulo config using native maps. Used a single continuous ingester instance against a table with 4 tablets. results: with fix: 120K entries/s, 12.66 MB/s without fix: 83K entries/s, 9.05 MB/s #s were obtained at some point in time during the ingest > slow WAL writes > --- > > Key: ACCUMULO-2668 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2668 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.6.0 >Reporter: Jonathan Park >Assignee: Jonathan Park >Priority: Blocker > Labels: 16_qa_bug > Fix For: 1.6.1 > > Attachments: ACCUMULO-2668.0.patch.txt, noflush.diff > > > During continuous ingest, we saw over 70% of our ingest time taken up by > writes to the WAL. When we ran the DfsLogger in isolation (created one > outside of the Tserver), we saw about ~25MB/s throughput as opposed to nearly > 100MB/s from just writing directly to an hdfs outputstream (computed by > taking the estimated size of the mutations sent to the DfsLogger class > divided by the time it took for it to flush + sync the data to HDFS). > After investigating, we found one possible culprit was the > NoFlushOutputStream. It is a subclass of java.io.FilterOutputStream but does > not override the write(byte[], int, int) method signature. The javadoc > indicates that subclasses of the FilterOutputStream should provide a more > efficient implementation. > I've attached a small diff that illustrates and addresses the issue but this > may not be how we ultimately want to fix it. > As a side note, I may be misreading the implementation of DfsLogger, but it > looks like we always make use of the NoFlushOutputStream, even if encryption > isn't enabled. There appears to be a faulty check in the DfsLogger.open() > implementation that I don't believe can be satisfied (line 384). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (ACCUMULO-2669) NoFlushOutputStream always in use in DfsLogger
Jonathan Park created ACCUMULO-2669: --- Summary: NoFlushOutputStream always in use in DfsLogger Key: ACCUMULO-2669 URL: https://issues.apache.org/jira/browse/ACCUMULO-2669 Project: Accumulo Issue Type: Bug Affects Versions: 1.6.0 Reporter: Jonathan Park Priority: Minor I may be misreading the implementation of DfsLogger, but it looks like we always make use of the NoFlushOutputStream, even if encryption isn't enabled. There appears to be a faulty check in the DfsLogger.open() implementation that I don't believe can be satisfied (line 384). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (ACCUMULO-2668) slow WAL writes
[ https://issues.apache.org/jira/browse/ACCUMULO-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Park updated ACCUMULO-2668: Description: During continuous ingest, we saw over 70% of our ingest time taken up by writes to the WAL. When we ran the DfsLogger in isolation (created one outside of the Tserver), we saw about ~25MB/s throughput as opposed to nearly 100MB/s from just writing directly to an hdfs outputstream (computed by taking the estimated size of the mutations sent to the DfsLogger class divided by the time it took for it to flush + sync the data to HDFS). After investigating, we found one possible culprit was the NoFlushOutputStream. It is a subclass of java.io.FilterOutputStream but does not override the write(byte[], int, int) method signature. The javadoc indicates that subclasses of the FilterOutputStream should provide a more efficient implementation. I've attached a small diff that illustrates and addresses the issue but this may not be how we ultimately want to fix it. As a side note, I may be misreading the implementation of DfsLogger, but it looks like we always make use of the NoFlushOutputStream, even if encryption isn't enabled. There appears to be a faulty check in the DfsLogger.open() implementation that I don't believe can be satisfied (line 384). was: During continuous ingest, we saw over 70% of our ingest time taken up by writes to the WAL. When we ran the DfsLogger in isolation (created one outside of the Tserver), we saw about ~25MB/s throughput (computed by taking the estimated size of the mutations sent to the DfsLogger class divided by the time it took for it to flush + sync the data to HDFS). After investigating, we found one possible culprit was the NoFlushOutputStream. It is a subclass of java.io.FilterOutputStream but does not override the write(byte[], int, int) method signature. The javadoc indicates that subclasses of the FilterOutputStream should provide a more efficient implementation. I've attached a small diff that illustrates and addresses the issue but this may not be how we ultimately want to fix it. As a side note, I may be misreading the implementation of DfsLogger, but it looks like we always make use of the NoFlushOutputStream, even if encryption isn't enabled. There appears to be a faulty check in the DfsLogger.open() implementation that I don't believe can be satisfied (line 384). > slow WAL writes > --- > > Key: ACCUMULO-2668 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2668 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.6.0 >Reporter: Jonathan Park > Attachments: noflush.diff > > > During continuous ingest, we saw over 70% of our ingest time taken up by > writes to the WAL. When we ran the DfsLogger in isolation (created one > outside of the Tserver), we saw about ~25MB/s throughput as opposed to nearly > 100MB/s from just writing directly to an hdfs outputstream (computed by > taking the estimated size of the mutations sent to the DfsLogger class > divided by the time it took for it to flush + sync the data to HDFS). > After investigating, we found one possible culprit was the > NoFlushOutputStream. It is a subclass of java.io.FilterOutputStream but does > not override the write(byte[], int, int) method signature. The javadoc > indicates that subclasses of the FilterOutputStream should provide a more > efficient implementation. > I've attached a small diff that illustrates and addresses the issue but this > may not be how we ultimately want to fix it. > As a side note, I may be misreading the implementation of DfsLogger, but it > looks like we always make use of the NoFlushOutputStream, even if encryption > isn't enabled. There appears to be a faulty check in the DfsLogger.open() > implementation that I don't believe can be satisfied (line 384). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (ACCUMULO-2668) slow WAL writes
[ https://issues.apache.org/jira/browse/ACCUMULO-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Park updated ACCUMULO-2668: Status: Patch Available (was: Open) Attaching a possible fix. > slow WAL writes > --- > > Key: ACCUMULO-2668 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2668 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.6.0 >Reporter: Jonathan Park > Attachments: noflush.diff > > > During continuous ingest, we saw over 70% of our ingest time taken up by > writes to the WAL. When we ran the DfsLogger in isolation (created one > outside of the Tserver), we saw about ~25MB/s throughput (computed by taking > the estimated size of the mutations sent to the DfsLogger class divided by > the time it took for it to flush + sync the data to HDFS). > After investigating, we found one possible culprit was the > NoFlushOutputStream. It is a subclass of java.io.FilterOutputStream but does > not override the write(byte[], int, int) method signature. The javadoc > indicates that subclasses of the FilterOutputStream should provide a more > efficient implementation. > I've attached a small diff that illustrates and addresses the issue but this > may not be how we ultimately want to fix it. > As a side note, I may be misreading the implementation of DfsLogger, but it > looks like we always make use of the NoFlushOutputStream, even if encryption > isn't enabled. There appears to be a faulty check in the DfsLogger.open() > implementation that I don't believe can be satisfied (line 384). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (ACCUMULO-2668) slow WAL writes
Jonathan Park created ACCUMULO-2668: --- Summary: slow WAL writes Key: ACCUMULO-2668 URL: https://issues.apache.org/jira/browse/ACCUMULO-2668 Project: Accumulo Issue Type: Bug Affects Versions: 1.6.0 Reporter: Jonathan Park Attachments: noflush.diff During continuous ingest, we saw over 70% of our ingest time taken up by writes to the WAL. When we ran the DfsLogger in isolation (created one outside of the Tserver), we saw about ~25MB/s throughput (computed by taking the estimated size of the mutations sent to the DfsLogger class divided by the time it took for it to flush + sync the data to HDFS). After investigating, we found one possible culprit was the NoFlushOutputStream. It is a subclass of java.io.FilterOutputStream but does not override the write(byte[], int, int) method signature. The javadoc indicates that subclasses of the FilterOutputStream should provide a more efficient implementation. I've attached a small diff that illustrates and addresses the issue but this may not be how we ultimately want to fix it. As a side note, I may be misreading the implementation of DfsLogger, but it looks like we always make use of the NoFlushOutputStream, even if encryption isn't enabled. There appears to be a faulty check in the DfsLogger.open() implementation that I don't believe can be satisfied (line 384). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (ACCUMULO-2668) slow WAL writes
[ https://issues.apache.org/jira/browse/ACCUMULO-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Park updated ACCUMULO-2668: Attachment: noflush.diff > slow WAL writes > --- > > Key: ACCUMULO-2668 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2668 > Project: Accumulo > Issue Type: Bug >Affects Versions: 1.6.0 >Reporter: Jonathan Park > Attachments: noflush.diff > > > During continuous ingest, we saw over 70% of our ingest time taken up by > writes to the WAL. When we ran the DfsLogger in isolation (created one > outside of the Tserver), we saw about ~25MB/s throughput (computed by taking > the estimated size of the mutations sent to the DfsLogger class divided by > the time it took for it to flush + sync the data to HDFS). > After investigating, we found one possible culprit was the > NoFlushOutputStream. It is a subclass of java.io.FilterOutputStream but does > not override the write(byte[], int, int) method signature. The javadoc > indicates that subclasses of the FilterOutputStream should provide a more > efficient implementation. > I've attached a small diff that illustrates and addresses the issue but this > may not be how we ultimately want to fix it. > As a side note, I may be misreading the implementation of DfsLogger, but it > looks like we always make use of the NoFlushOutputStream, even if encryption > isn't enabled. There appears to be a faulty check in the DfsLogger.open() > implementation that I don't believe can be satisfied (line 384). -- This message was sent by Atlassian JIRA (v6.2#6252)