[jira] [Commented] (ACCUMULO-3294) Need mechanism to update iterator settings

2015-11-17 Thread Jonathan Park (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009804#comment-15009804
 ] 

Jonathan Park commented on ACCUMULO-3294:
-

No substantial reason. My original reason was that since the method was called 
"update", I added a check so that it couldn't be used like a "set". It adds a 
little bit protection from surprises for clients since if I as a client (and 
perhaps its just me) am trying to "update" an iterator setting, I'm assuming 
that the iterator is there. If it suddenly disappears while I'm trying to 
update, then perhaps something happened that means I shouldn't update. It's not 
a spectacular solution to provide this protection since it's performing a 
read/modify/write with no locks so the iterator could still disappear. 

I don't feel too strongly about the check since there are use-cases for it not 
being there so if removing the check and making this behave more like a "set" 
is better aligned with other efforts, I can remove it.

What would the new interface look like if we want this to update all of a 
tables iterators at once? A Collection?

> Need mechanism to update iterator settings
> --
>
> Key: ACCUMULO-3294
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3294
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: John Vines
>Assignee: Jonathan Park
> Fix For: 1.8.0
>
> Attachments: bug3294.patch, bug3294.patch.1
>
>
> Currently our API supports attachIterator, removeIterator, and 
> getIteratorSettings. There is no mechanism to directly change an iterators 
> settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3294) Need mechanism to update iterator settings

2015-11-16 Thread Jonathan Park (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15007002#comment-15007002
 ] 

Jonathan Park commented on ACCUMULO-3294:
-

[~elserj] thanks for the feedback! I'll apply it to the patch if the patch is 
still desirable.

[~ctubbsii] I agree that this can enable users shooting themselves in the foot 
due to the lack of atomicity guarantees in Accumulo regarding the propagation 
of config. I still think such an API call is valuable and would become safer 
and easier to consume if/when Accumulo has the ability to support atomic 
updates to config (although if Accumulo adds transactional support to updates, 
then I would admit a dedicated update call would not add much value). 

The example use-case we have is 1 non-trivial iterator which is logically the 
composition of a set of iterators. This iterator will at run-time read its 
configuration and determine what it needs to do. The issue we're running into 
is that we can't afford to have a compaction start when we're in the middle of 
removing/adding. We could get pretty far by re-designing our system to disable 
major compactions so we can manually launch the compactions and control the 
iterator settings used (we would probably need some other design around minor 
compactions to handle those safely as well). 

If there's a particular direction that Accumulo is headed to support these 
changing configurations for compactions, I'd love to help out where possible to 
help make Accumulo easier to consume w.r.t configurations. I do admit this 
patch is less than ideal and doesn't address some other issues present. 

[~kturner] Thanks for the example and explanation of the benefits. I agree that 
it's safer for scans to follow such a pattern. 

> Need mechanism to update iterator settings
> --
>
> Key: ACCUMULO-3294
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3294
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: John Vines
>Assignee: Jonathan Park
> Fix For: 1.8.0
>
> Attachments: bug3294.patch, bug3294.patch.1
>
>
> Currently our API supports attachIterator, removeIterator, and 
> getIteratorSettings. There is no mechanism to directly change an iterators 
> settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-3294) Need mechanism to update iterator settings

2015-11-13 Thread Jonathan Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Park updated ACCUMULO-3294:

Status: Patch Available  (was: Open)

Changed updates to remove unset properties.

> Need mechanism to update iterator settings
> --
>
> Key: ACCUMULO-3294
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3294
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: John Vines
>Assignee: Jonathan Park
> Attachments: bug3294.patch, bug3294.patch.1
>
>
> Currently our API supports attachIterator, removeIterator, and 
> getIteratorSettings. There is no mechanism to directly change an iterators 
> settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-3294) Need mechanism to update iterator settings

2015-11-13 Thread Jonathan Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Park updated ACCUMULO-3294:

Attachment: bug3294.patch.1

> Need mechanism to update iterator settings
> --
>
> Key: ACCUMULO-3294
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3294
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: John Vines
>Assignee: Jonathan Park
> Attachments: bug3294.patch, bug3294.patch.1
>
>
> Currently our API supports attachIterator, removeIterator, and 
> getIteratorSettings. There is no mechanism to directly change an iterators 
> settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-3294) Need mechanism to update iterator settings

2015-11-13 Thread Jonathan Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Park updated ACCUMULO-3294:

Status: Open  (was: Patch Available)

> Need mechanism to update iterator settings
> --
>
> Key: ACCUMULO-3294
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3294
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: John Vines
>Assignee: Jonathan Park
> Attachments: bug3294.patch
>
>
> Currently our API supports attachIterator, removeIterator, and 
> getIteratorSettings. There is no mechanism to directly change an iterators 
> settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-3294) Need mechanism to update iterator settings

2015-11-13 Thread Jonathan Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Park updated ACCUMULO-3294:

Status: Patch Available  (was: Open)

> Need mechanism to update iterator settings
> --
>
> Key: ACCUMULO-3294
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3294
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: John Vines
>Assignee: Jonathan Park
> Attachments: bug3294.patch
>
>
> Currently our API supports attachIterator, removeIterator, and 
> getIteratorSettings. There is no mechanism to directly change an iterators 
> settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-3294) Need mechanism to update iterator settings

2015-11-13 Thread Jonathan Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Park updated ACCUMULO-3294:

Attachment: bug3294.patch

> Need mechanism to update iterator settings
> --
>
> Key: ACCUMULO-3294
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3294
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: John Vines
>Assignee: Jonathan Park
> Attachments: bug3294.patch
>
>
> Currently our API supports attachIterator, removeIterator, and 
> getIteratorSettings. There is no mechanism to directly change an iterators 
> settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (ACCUMULO-3294) Need mechanism to update iterator settings

2015-11-13 Thread Jonathan Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Park reassigned ACCUMULO-3294:
---

Assignee: Jonathan Park

> Need mechanism to update iterator settings
> --
>
> Key: ACCUMULO-3294
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3294
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: John Vines
>Assignee: Jonathan Park
>
> Currently our API supports attachIterator, removeIterator, and 
> getIteratorSettings. There is no mechanism to directly change an iterators 
> settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3530) alterTable/NamespaceProperty should use Fate locks

2015-01-26 Thread Jonathan Park (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292152#comment-14292152
 ] 

Jonathan Park commented on ACCUMULO-3530:
-

I believe ACCUMULO-1568 can satisfy our use-case as well. I agree that I'm not 
sure Fate locks are the way to go. Some way of obtaining a snapshot read of 
table properties is sufficient. 

> alterTable/NamespaceProperty should use Fate locks
> --
>
> Key: ACCUMULO-3530
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3530
> Project: Accumulo
>  Issue Type: Bug
>Reporter: John Vines
>
> Fate operations, such as clone table, have logic in place to ensure 
> consistency as the operation occurs. However, operaitons like 
> alterTableProperty can still interfere because there is no locking done. We 
> should add identical locking to these methods in MasterClientServiceHandler 
> to help ensure consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3530) alterTable/NamespaceProperty should use Fate locks

2015-01-26 Thread Jonathan Park (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292138#comment-14292138
 ] 

Jonathan Park commented on ACCUMULO-3530:
-

[~ctubbsii] Is there a particular case you're concerned with? 

I'm not too familiar with the fate locks but the general problem we observed 
was that while a clone was in progress, a table had an iterator configuration 
removed. The clone operation ended up failing because the ZK node correlated w/ 
the iterator config disappeared. I believe the ask here is that while an 
iterative read + copy of the table properties is in progress, changes should be 
disallowed so that a consistent read of the set of table properties is copied. 

> alterTable/NamespaceProperty should use Fate locks
> --
>
> Key: ACCUMULO-3530
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3530
> Project: Accumulo
>  Issue Type: Bug
>Reporter: John Vines
>
> Fate operations, such as clone table, have logic in place to ensure 
> consistency as the operation occurs. However, operaitons like 
> alterTableProperty can still interfere because there is no locking done. We 
> should add identical locking to these methods in MasterClientServiceHandler 
> to help ensure consistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (ACCUMULO-3330) Tserver "Running low on memory" appears more frequently than necessary when min heap != max heap

2014-11-13 Thread Jonathan Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Park resolved ACCUMULO-3330.
-
Resolution: Duplicate

> Tserver "Running low on memory" appears more frequently than necessary when 
> min heap != max heap
> 
>
> Key: ACCUMULO-3330
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3330
> Project: Accumulo
>  Issue Type: Bug
>Reporter: Jonathan Park
>Priority: Minor
>
> I'm not sure if this is JVM specific behavior, but I suspect the way we 
> compute when to log "Running low on memory" could be improved. 
> Currently we use {{Runtime.getRuntime()}} and rely on the formula 
> {{freeMemory() < maxMemory() * 0.05}} to determine whether or not to log the 
> warning. With Oracle's HotSpot VM, {{freeMemory()}} appears to return the 
> amount of free memory relative to the current JVM heap size (as returned by 
> {{totalMemory()}}. If {{totalMemory()}} != {{maxMemory()}} then this warning 
> will start appearing before I think it was intended to which is misleading.
> Easiest workaround is to configure the JVM heap to have the min size = max 
> size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3330) Tserver "Running low on memory" appears more frequently than necessary when min heap != max heap

2014-11-13 Thread Jonathan Park (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14210106#comment-14210106
 ] 

Jonathan Park commented on ACCUMULO-3330:
-

Linked to the wrong issue.

> Tserver "Running low on memory" appears more frequently than necessary when 
> min heap != max heap
> 
>
> Key: ACCUMULO-3330
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3330
> Project: Accumulo
>  Issue Type: Bug
>Reporter: Jonathan Park
>Priority: Minor
>
> I'm not sure if this is JVM specific behavior, but I suspect the way we 
> compute when to log "Running low on memory" could be improved. 
> Currently we use {{Runtime.getRuntime()}} and rely on the formula 
> {{freeMemory() < maxMemory() * 0.05}} to determine whether or not to log the 
> warning. With Oracle's HotSpot VM, {{freeMemory()}} appears to return the 
> amount of free memory relative to the current JVM heap size (as returned by 
> {{totalMemory()}}. If {{totalMemory()}} != {{maxMemory()}} then this warning 
> will start appearing before I think it was intended to which is misleading.
> Easiest workaround is to configure the JVM heap to have the min size = max 
> size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3330) Tserver "Running low on memory" appears more frequently than necessary when min heap != max heap

2014-11-13 Thread Jonathan Park (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14210105#comment-14210105
 ] 

Jonathan Park commented on ACCUMULO-3330:
-

Didn't notice ACCUMULO-3320 earlier. This one is a duplicate of that one.

> Tserver "Running low on memory" appears more frequently than necessary when 
> min heap != max heap
> 
>
> Key: ACCUMULO-3330
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3330
> Project: Accumulo
>  Issue Type: Bug
>Reporter: Jonathan Park
>Priority: Minor
>
> I'm not sure if this is JVM specific behavior, but I suspect the way we 
> compute when to log "Running low on memory" could be improved. 
> Currently we use {{Runtime.getRuntime()}} and rely on the formula 
> {{freeMemory() < maxMemory() * 0.05}} to determine whether or not to log the 
> warning. With Oracle's HotSpot VM, {{freeMemory()}} appears to return the 
> amount of free memory relative to the current JVM heap size (as returned by 
> {{totalMemory()}}. If {{totalMemory()}} != {{maxMemory()}} then this warning 
> will start appearing before I think it was intended to which is misleading.
> Easiest workaround is to configure the JVM heap to have the min size = max 
> size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-3330) Tserver "Running low on memory" appears more frequently than necessary when min heap != max heap

2014-11-13 Thread Jonathan Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-3330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Park updated ACCUMULO-3330:

Summary: Tserver "Running low on memory" appears more frequently than 
necessary when min heap != max heap  (was: Tserver "Running low on memory" 
might be miscomputed)

> Tserver "Running low on memory" appears more frequently than necessary when 
> min heap != max heap
> 
>
> Key: ACCUMULO-3330
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3330
> Project: Accumulo
>  Issue Type: Bug
>Reporter: Jonathan Park
>Priority: Minor
>
> I'm not sure if this is JVM specific behavior, but I suspect the way we 
> compute when to log "Running low on memory" could be improved. 
> Currently we use {{Runtime.getRuntime()}} and rely on the formula 
> {{freeMemory() < maxMemory() * 0.05}} to determine whether or not to log the 
> warning. With Oracle's HotSpot VM, {{freeMemory()}} appears to return the 
> amount of free memory relative to the current JVM heap size (as returned by 
> {{totalMemory()}}. If {{totalMemory()}} != {{maxMemory()}} then this warning 
> will start appearing before I think it was intended to which is misleading.
> Easiest workaround is to configure the JVM heap to have the min size = max 
> size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ACCUMULO-3330) Tserver "Running low on memory" might be miscomputed

2014-11-13 Thread Jonathan Park (JIRA)
Jonathan Park created ACCUMULO-3330:
---

 Summary: Tserver "Running low on memory" might be miscomputed
 Key: ACCUMULO-3330
 URL: https://issues.apache.org/jira/browse/ACCUMULO-3330
 Project: Accumulo
  Issue Type: Bug
Reporter: Jonathan Park
Priority: Minor


I'm not sure if this is JVM specific behavior, but I suspect the way we compute 
when to log "Running low on memory" could be improved. 

Currently we use {{Runtime.getRuntime()}} and rely on the formula 
{{freeMemory() < maxMemory() * 0.05}} to determine whether or not to log the 
warning. With Oracle's HotSpot VM, {{freeMemory()}} appears to return the 
amount of free memory relative to the current JVM heap size (as returned by 
{{totalMemory()}}. If {{totalMemory()}} != {{maxMemory()}} then this warning 
will start appearing before I think it was intended to which is misleading.

Easiest workaround is to configure the JVM heap to have the min size = max size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-2889) Batch metadata table updates for new walogs

2014-09-01 Thread Jonathan Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Park updated ACCUMULO-2889:

Attachment: ACCUMULO-2889.2.patch

I'll gather a new set of #s when I get access to a cluster of machines. 

> Batch metadata table updates for new walogs
> ---
>
> Key: ACCUMULO-2889
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2889
> Project: Accumulo
>  Issue Type: Improvement
>Affects Versions: 1.5.1, 1.6.0
>Reporter: Jonathan Park
>Assignee: Jonathan Park
> Attachments: ACCUMULO-2889.0.patch.txt, ACCUMULO-2889.1.patch, 
> ACCUMULO-2889.2.patch, accumulo-2889-withpatch.png, 
> accumulo-2889_withoutpatch.png, batch_perf_test.sh, run_all.sh, 
> start-ingest.sh
>
>
> Currently, when we update the Metadata table with new loggers, we will update 
> the metadata for each tablet serially. We could optimize this to instead use 
> a batchwriter to send all metadata updates for all tablets in a batch.
> A few special cases include:
> - What if the !METADATA tablet was included in the batch?
> - What about the root tablet?
> Benefit:
> In one of our clusters, we're experiencing particularly slow HDFS operations 
> leading to large oscillations in ingest performance. We haven't isolated the 
> cause in HDFS but when we profile the tservers, we noticed that they were 
> waiting for metadata table operations to complete. This would target the 
> waiting.
> Potential downsides:
> Given the existing locking scheme, it looks like we may have to lock a tablet 
> for slightly longer (we'll lock for the duration of the batch).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-2889) Batch metadata table updates for new walogs

2014-06-28 Thread Jonathan Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Park updated ACCUMULO-2889:


Attachment: accumulo-2889_withoutpatch.png
accumulo-2889-withpatch.png
ACCUMULO-2889.1.patch
start-ingest.sh
batch_perf_test.sh
run_all.sh

Results from performance tests:

Test design:
- Run continuous ingest with 4 ingesters each ingesting 25million entries and 
then measure time until completion
- We varied # of minor compactors and tablets per server (in retrospect, # of 
minor compactors didn't really matter in these tests, it may have been better 
to vary # of clients).
- Each trial was run 3x and the average was taken.

Tests were run on a single node (24 logical cores, 64 GB RAM, 8 drives)

||minc||tablets/server||w/o patch(ms)||w/ patch(ms)||ratio||
|4|32|269790.33|257537.33|0.95458325|
|12|32|271124.33|255952|0.94403922|
|12|320|355962.67|323737|0.90946896|
|24|32|268709|261362.67|0.97266065|
|24|320|355182.33|324308.67|0.91307659|

I'll try to run this on a multi-node cluster if I can get around to it.

> Batch metadata table updates for new walogs
> ---
>
> Key: ACCUMULO-2889
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2889
> Project: Accumulo
>  Issue Type: Improvement
>Affects Versions: 1.5.1, 1.6.0
>Reporter: Jonathan Park
>Assignee: Jonathan Park
> Attachments: ACCUMULO-2889.0.patch.txt, ACCUMULO-2889.1.patch, 
> accumulo-2889-withpatch.png, accumulo-2889_withoutpatch.png, 
> batch_perf_test.sh, run_all.sh, start-ingest.sh
>
>
> Currently, when we update the Metadata table with new loggers, we will update 
> the metadata for each tablet serially. We could optimize this to instead use 
> a batchwriter to send all metadata updates for all tablets in a batch.
> A few special cases include:
> - What if the !METADATA tablet was included in the batch?
> - What about the root tablet?
> Benefit:
> In one of our clusters, we're experiencing particularly slow HDFS operations 
> leading to large oscillations in ingest performance. We haven't isolated the 
> cause in HDFS but when we profile the tservers, we noticed that they were 
> waiting for metadata table operations to complete. This would target the 
> waiting.
> Potential downsides:
> Given the existing locking scheme, it looks like we may have to lock a tablet 
> for slightly longer (we'll lock for the duration of the batch).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (ACCUMULO-2827) HeapIterator optimization

2014-06-24 Thread Jonathan Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Park updated ACCUMULO-2827:


Attachment: ACCUMULO-2827-compaction-performance-test.patch.2

I decided to re-run the tests but this time running each case 10 times in order 
to get rid of some of the noise. I threw away the first few test results + 
averaging rates across the 10 runs. These #s look more promising and seems to 
better correlate with [~kturner]'s earlier results.

||files||rows||cols||rate w/o patch||rate w/ patch PQ||PQ speedup||
|10|100|1|430636.7|457304.2|1.061925749|
|10|10|10|550790.1|759692.6|1.379277877|
|10|1|100|584660.3|851496.9|1.456395962|
|20|50|1|397171|426878.5|1.074797757|
|20|5|10|509081.4|735482.6|1.44472495|
|1|1000|1|513712.2|539288|1.049786242|

[~dlmarion] I'm not entirely sure what was going on in my earlier tests as we 
shouldn't have seen that large of a hit. The single column case is a close 
approximation of the worst case for this optimization since it should cause 
high % of interleaving across iterators which goes against our assumption in 
the optimization. 

I've posted an updated patch (from keith's original) that includes the test 
harness used for this test.

> HeapIterator optimization
> -
>
> Key: ACCUMULO-2827
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2827
> Project: Accumulo
>  Issue Type: Improvement
>Affects Versions: 1.5.1, 1.6.0
>Reporter: Jonathan Park
>Assignee: Jonathan Park
>Priority: Minor
> Fix For: 1.5.2, 1.6.1, 1.7.0
>
> Attachments: ACCUMULO-2827-compaction-performance-test.patch, 
> ACCUMULO-2827-compaction-performance-test.patch.2, ACCUMULO-2827.0.patch.txt, 
> ACCUMULO-2827.1.patch.txt, ACCUMULO-2827.2.patch.txt, 
> ACCUMULO-2827.3.patch.txt, BenchmarkMultiIterator.java, 
> accumulo-2827.raw_data, new_heapiter.png, old_heapiter.png, together.png
>
>
> We've been running a few performance tests of our iterator stack and noticed 
> a decent amount of time spent in the HeapIterator specifically related to 
> add/removal into the heap.
> This may not be a general enough optimization but we thought we'd see what 
> people thought. Our assumption is that it's more probable that the current 
> "top iterator" will supply the next value in the iteration than not. The 
> current implementation takes the other assumption by always removing + 
> inserting the minimum iterator back into the heap. With the implementation of 
> a binary heap that we're using, this can get costly if our assumption is 
> wrong because we pay the log penalty of percolating up the iterator in the 
> heap upon insertion and again when percolating down upon removal.
> We believe our assumption is a fair one to hold given that as major 
> compactions create a log distribution of file sizes, it's likely that we may 
> see a long chain of consecutive entries coming from 1 iterator. 
> Understandably, taking this assumption comes at an additional cost in the 
> case that we're wrong. Therefore, we've run a few benchmarking tests to see 
> how much of a cost we pay as well as what kind of benefit we see. I've 
> attached a potential patch (which includes a test harness) + image that 
> captures the results of our tests. The x-axis represents # of repeated keys 
> before switching to another iterator. The y-axis represents iteration time. 
> The sets of blue + red lines varies in # of iterators present in the heap.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ACCUMULO-2827) HeapIterator optimization

2014-06-23 Thread Jonathan Park (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041441#comment-14041441
 ] 

Jonathan Park commented on ACCUMULO-2827:
-

Re-ran the tests. 3 runs each (with first few test results thrown away) + 
averaged rates across the 3 runs:

||files||rows||cols||rate w/o patch||rate w/ patch PQ||PQ speedup||
|10|100|1|431762.|417590.|0.9671763864|
|10|10|10|575699.6667|675052.|1.172577252|
|10|1|100|623398.6667|721883.|1.157980233|
|20|50|1|421120.|391037|0.9285635697|
|20|5|10|546200.6667|643364.6667|1.177890665|
|1|1000|1|521046.|483500.6667|0.9279417889|

> HeapIterator optimization
> -
>
> Key: ACCUMULO-2827
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2827
> Project: Accumulo
>  Issue Type: Improvement
>Affects Versions: 1.5.1, 1.6.0
>Reporter: Jonathan Park
>Assignee: Jonathan Park
>Priority: Minor
> Fix For: 1.5.2, 1.6.1, 1.7.0
>
> Attachments: ACCUMULO-2827-compaction-performance-test.patch, 
> ACCUMULO-2827.0.patch.txt, ACCUMULO-2827.1.patch.txt, 
> ACCUMULO-2827.2.patch.txt, ACCUMULO-2827.3.patch.txt, 
> BenchmarkMultiIterator.java, accumulo-2827.raw_data, new_heapiter.png, 
> old_heapiter.png, together.png
>
>
> We've been running a few performance tests of our iterator stack and noticed 
> a decent amount of time spent in the HeapIterator specifically related to 
> add/removal into the heap.
> This may not be a general enough optimization but we thought we'd see what 
> people thought. Our assumption is that it's more probable that the current 
> "top iterator" will supply the next value in the iteration than not. The 
> current implementation takes the other assumption by always removing + 
> inserting the minimum iterator back into the heap. With the implementation of 
> a binary heap that we're using, this can get costly if our assumption is 
> wrong because we pay the log penalty of percolating up the iterator in the 
> heap upon insertion and again when percolating down upon removal.
> We believe our assumption is a fair one to hold given that as major 
> compactions create a log distribution of file sizes, it's likely that we may 
> see a long chain of consecutive entries coming from 1 iterator. 
> Understandably, taking this assumption comes at an additional cost in the 
> case that we're wrong. Therefore, we've run a few benchmarking tests to see 
> how much of a cost we pay as well as what kind of benefit we see. I've 
> attached a potential patch (which includes a test harness) + image that 
> captures the results of our tests. The x-axis represents # of repeated keys 
> before switching to another iterator. The y-axis represents iteration time. 
> The sets of blue + red lines varies in # of iterators present in the heap.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (ACCUMULO-2827) HeapIterator optimization

2014-06-23 Thread Jonathan Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Park updated ACCUMULO-2827:


Attachment: BenchmarkMultiIterator.java
ACCUMULO-2827.3.patch.txt

ACCUMULO-2827.3.patch.txt:

Only contains HeapIterator changes that uses java.util.PriorityQueue instead of 
org.apache.commons.collections.buffer.PriorityBuffer.

> HeapIterator optimization
> -
>
> Key: ACCUMULO-2827
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2827
> Project: Accumulo
>  Issue Type: Improvement
>Affects Versions: 1.5.1, 1.6.0
>Reporter: Jonathan Park
>Assignee: Jonathan Park
>Priority: Minor
> Fix For: 1.5.2, 1.6.1, 1.7.0
>
> Attachments: ACCUMULO-2827-compaction-performance-test.patch, 
> ACCUMULO-2827.0.patch.txt, ACCUMULO-2827.1.patch.txt, 
> ACCUMULO-2827.2.patch.txt, ACCUMULO-2827.3.patch.txt, 
> BenchmarkMultiIterator.java, accumulo-2827.raw_data, new_heapiter.png, 
> old_heapiter.png, together.png
>
>
> We've been running a few performance tests of our iterator stack and noticed 
> a decent amount of time spent in the HeapIterator specifically related to 
> add/removal into the heap.
> This may not be a general enough optimization but we thought we'd see what 
> people thought. Our assumption is that it's more probable that the current 
> "top iterator" will supply the next value in the iteration than not. The 
> current implementation takes the other assumption by always removing + 
> inserting the minimum iterator back into the heap. With the implementation of 
> a binary heap that we're using, this can get costly if our assumption is 
> wrong because we pay the log penalty of percolating up the iterator in the 
> heap upon insertion and again when percolating down upon removal.
> We believe our assumption is a fair one to hold given that as major 
> compactions create a log distribution of file sizes, it's likely that we may 
> see a long chain of consecutive entries coming from 1 iterator. 
> Understandably, taking this assumption comes at an additional cost in the 
> case that we're wrong. Therefore, we've run a few benchmarking tests to see 
> how much of a cost we pay as well as what kind of benefit we see. I've 
> attached a potential patch (which includes a test harness) + image that 
> captures the results of our tests. The x-axis represents # of repeated keys 
> before switching to another iterator. The y-axis represents iteration time. 
> The sets of blue + red lines varies in # of iterators present in the heap.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ACCUMULO-2827) HeapIterator optimization

2014-06-23 Thread Jonathan Park (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14041141#comment-14041141
 ] 

Jonathan Park commented on ACCUMULO-2827:
-

Results of performance tests (I also re-ran [~kturner]'s cases for comparison):
pq => priorityqueue
pb => prioritybuffer

||files||rows per file||cols per row||rate w/o patch||rate w/ patch pq||pq 
speedup||rate w/ patch pb||pb speedup||
|10|1,000,000|1| 449,355 |418,128| .93 |433,477|.965|
|10|100,000|10| 598,155 |678,698| 1.135 | 667,465|1.116|
|10|10,000|100| 641,190 |739,809| 1.154 |729,501|1.138|
|20|500,000|1| 405,915 |400,614| .987 | 405,571|.999|
|20|50,000|10| 551,997 |659,932| 1.196 | 643,250|1.165|
|1|10,000,000|1| 506,719 |483,178| .954 | 517,362|1.021|

Not entirely sure why PriorityQueue is performing worse than PriorityBuffer in 
the worst cases (high interleaving). Might just be noise in the tests?

> HeapIterator optimization
> -
>
> Key: ACCUMULO-2827
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2827
> Project: Accumulo
>  Issue Type: Improvement
>Affects Versions: 1.5.1, 1.6.0
>Reporter: Jonathan Park
>Assignee: Jonathan Park
>Priority: Minor
> Fix For: 1.5.2, 1.6.1, 1.7.0
>
> Attachments: ACCUMULO-2827-compaction-performance-test.patch, 
> ACCUMULO-2827.0.patch.txt, ACCUMULO-2827.1.patch.txt, 
> ACCUMULO-2827.2.patch.txt, accumulo-2827.raw_data, new_heapiter.png, 
> old_heapiter.png, together.png
>
>
> We've been running a few performance tests of our iterator stack and noticed 
> a decent amount of time spent in the HeapIterator specifically related to 
> add/removal into the heap.
> This may not be a general enough optimization but we thought we'd see what 
> people thought. Our assumption is that it's more probable that the current 
> "top iterator" will supply the next value in the iteration than not. The 
> current implementation takes the other assumption by always removing + 
> inserting the minimum iterator back into the heap. With the implementation of 
> a binary heap that we're using, this can get costly if our assumption is 
> wrong because we pay the log penalty of percolating up the iterator in the 
> heap upon insertion and again when percolating down upon removal.
> We believe our assumption is a fair one to hold given that as major 
> compactions create a log distribution of file sizes, it's likely that we may 
> see a long chain of consecutive entries coming from 1 iterator. 
> Understandably, taking this assumption comes at an additional cost in the 
> case that we're wrong. Therefore, we've run a few benchmarking tests to see 
> how much of a cost we pay as well as what kind of benefit we see. I've 
> attached a potential patch (which includes a test harness) + image that 
> captures the results of our tests. The x-axis represents # of repeated keys 
> before switching to another iterator. The y-axis represents iteration time. 
> The sets of blue + red lines varies in # of iterators present in the heap.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (ACCUMULO-2827) HeapIterator optimization

2014-06-23 Thread Jonathan Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Park updated ACCUMULO-2827:


Attachment: ACCUMULO-2827.2.patch.txt

Removed trailing whitespace on all lines. 

I'm currently running [~kturner]'s tests to make sure changing from a 
PriorityBuffer to PriorityQueue doesn't affect performance too much.

> HeapIterator optimization
> -
>
> Key: ACCUMULO-2827
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2827
> Project: Accumulo
>  Issue Type: Improvement
>Affects Versions: 1.5.1, 1.6.0
>Reporter: Jonathan Park
>Assignee: Jonathan Park
>Priority: Minor
> Fix For: 1.5.2, 1.6.1, 1.7.0
>
> Attachments: ACCUMULO-2827-compaction-performance-test.patch, 
> ACCUMULO-2827.0.patch.txt, ACCUMULO-2827.1.patch.txt, 
> ACCUMULO-2827.2.patch.txt, accumulo-2827.raw_data, new_heapiter.png, 
> old_heapiter.png, together.png
>
>
> We've been running a few performance tests of our iterator stack and noticed 
> a decent amount of time spent in the HeapIterator specifically related to 
> add/removal into the heap.
> This may not be a general enough optimization but we thought we'd see what 
> people thought. Our assumption is that it's more probable that the current 
> "top iterator" will supply the next value in the iteration than not. The 
> current implementation takes the other assumption by always removing + 
> inserting the minimum iterator back into the heap. With the implementation of 
> a binary heap that we're using, this can get costly if our assumption is 
> wrong because we pay the log penalty of percolating up the iterator in the 
> heap upon insertion and again when percolating down upon removal.
> We believe our assumption is a fair one to hold given that as major 
> compactions create a log distribution of file sizes, it's likely that we may 
> see a long chain of consecutive entries coming from 1 iterator. 
> Understandably, taking this assumption comes at an additional cost in the 
> case that we're wrong. Therefore, we've run a few benchmarking tests to see 
> how much of a cost we pay as well as what kind of benefit we see. I've 
> attached a potential patch (which includes a test harness) + image that 
> captures the results of our tests. The x-axis represents # of repeated keys 
> before switching to another iterator. The y-axis represents iteration time. 
> The sets of blue + red lines varies in # of iterators present in the heap.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (ACCUMULO-2827) HeapIterator optimization

2014-06-23 Thread Jonathan Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Park updated ACCUMULO-2827:


Attachment: ACCUMULO-2827.1.patch.txt

Updating patch to use PriorityQueue instead of PriorityBuffer.

> HeapIterator optimization
> -
>
> Key: ACCUMULO-2827
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2827
> Project: Accumulo
>  Issue Type: Improvement
>Affects Versions: 1.5.1, 1.6.0
>Reporter: Jonathan Park
>Assignee: Jonathan Park
>Priority: Minor
> Fix For: 1.5.2, 1.6.1, 1.7.0
>
> Attachments: ACCUMULO-2827-compaction-performance-test.patch, 
> ACCUMULO-2827.0.patch.txt, ACCUMULO-2827.1.patch.txt, accumulo-2827.raw_data, 
> new_heapiter.png, old_heapiter.png, together.png
>
>
> We've been running a few performance tests of our iterator stack and noticed 
> a decent amount of time spent in the HeapIterator specifically related to 
> add/removal into the heap.
> This may not be a general enough optimization but we thought we'd see what 
> people thought. Our assumption is that it's more probable that the current 
> "top iterator" will supply the next value in the iteration than not. The 
> current implementation takes the other assumption by always removing + 
> inserting the minimum iterator back into the heap. With the implementation of 
> a binary heap that we're using, this can get costly if our assumption is 
> wrong because we pay the log penalty of percolating up the iterator in the 
> heap upon insertion and again when percolating down upon removal.
> We believe our assumption is a fair one to hold given that as major 
> compactions create a log distribution of file sizes, it's likely that we may 
> see a long chain of consecutive entries coming from 1 iterator. 
> Understandably, taking this assumption comes at an additional cost in the 
> case that we're wrong. Therefore, we've run a few benchmarking tests to see 
> how much of a cost we pay as well as what kind of benefit we see. I've 
> attached a potential patch (which includes a test harness) + image that 
> captures the results of our tests. The x-axis represents # of repeated keys 
> before switching to another iterator. The y-axis represents iteration time. 
> The sets of blue + red lines varies in # of iterators present in the heap.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ACCUMULO-2827) HeapIterator optimization

2014-06-18 Thread Jonathan Park (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14036680#comment-14036680
 ] 

Jonathan Park commented on ACCUMULO-2827:
-

Sorry I've been delayed in my responses.

I agree with [~kturner] on the lack of performance gain due to the uniform 
random data. Thanks for volunteering to run the test with multiple columns per 
row Keith! :) Looking forward to seeing the results!

> HeapIterator optimization
> -
>
> Key: ACCUMULO-2827
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2827
> Project: Accumulo
>  Issue Type: Improvement
>Affects Versions: 1.5.1, 1.6.0
>Reporter: Jonathan Park
>Assignee: Jonathan Park
>Priority: Minor
> Fix For: 1.5.2, 1.6.1, 1.7.0
>
> Attachments: ACCUMULO-2827.0.patch.txt, accumulo-2827.raw_data, 
> new_heapiter.png, old_heapiter.png, together.png
>
>
> We've been running a few performance tests of our iterator stack and noticed 
> a decent amount of time spent in the HeapIterator specifically related to 
> add/removal into the heap.
> This may not be a general enough optimization but we thought we'd see what 
> people thought. Our assumption is that it's more probable that the current 
> "top iterator" will supply the next value in the iteration than not. The 
> current implementation takes the other assumption by always removing + 
> inserting the minimum iterator back into the heap. With the implementation of 
> a binary heap that we're using, this can get costly if our assumption is 
> wrong because we pay the log penalty of percolating up the iterator in the 
> heap upon insertion and again when percolating down upon removal.
> We believe our assumption is a fair one to hold given that as major 
> compactions create a log distribution of file sizes, it's likely that we may 
> see a long chain of consecutive entries coming from 1 iterator. 
> Understandably, taking this assumption comes at an additional cost in the 
> case that we're wrong. Therefore, we've run a few benchmarking tests to see 
> how much of a cost we pay as well as what kind of benefit we see. I've 
> attached a potential patch (which includes a test harness) + image that 
> captures the results of our tests. The x-axis represents # of repeated keys 
> before switching to another iterator. The y-axis represents iteration time. 
> The sets of blue + red lines varies in # of iterators present in the heap.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (ACCUMULO-2889) Batch metadata table updates for new walogs

2014-06-16 Thread Jonathan Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Park updated ACCUMULO-2889:


Attachment: ACCUMULO-2889.0.patch.txt

> Batch metadata table updates for new walogs
> ---
>
> Key: ACCUMULO-2889
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2889
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.5.1, 1.6.0
>Reporter: Jonathan Park
>Assignee: Jonathan Park
> Attachments: ACCUMULO-2889.0.patch.txt
>
>
> Currently, when we update the Metadata table with new loggers, we will update 
> the metadata for each tablet serially. We could optimize this to instead use 
> a batchwriter to send all metadata updates for all tablets in a batch.
> A few special cases include:
> - What if the !METADATA tablet was included in the batch?
> - What about the root tablet?
> Benefit:
> In one of our clusters, we're experiencing particularly slow HDFS operations 
> leading to large oscillations in ingest performance. We haven't isolated the 
> cause in HDFS but when we profile the tservers, we noticed that they were 
> waiting for metadata table operations to complete. This would target the 
> waiting.
> Potential downsides:
> Given the existing locking scheme, it looks like we may have to lock a tablet 
> for slightly longer (we'll lock for the duration of the batch).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (ACCUMULO-2889) Batch metadata table updates for new walogs

2014-06-16 Thread Jonathan Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Park updated ACCUMULO-2889:


Affects Version/s: 1.6.0
   Status: Patch Available  (was: In Progress)

First pass at batching metadata updates for new WALs. I'll attach a screenshot 
of its affects as well.

> Batch metadata table updates for new walogs
> ---
>
> Key: ACCUMULO-2889
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2889
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.6.0, 1.5.1
>Reporter: Jonathan Park
>Assignee: Jonathan Park
> Attachments: ACCUMULO-2889.0.patch.txt
>
>
> Currently, when we update the Metadata table with new loggers, we will update 
> the metadata for each tablet serially. We could optimize this to instead use 
> a batchwriter to send all metadata updates for all tablets in a batch.
> A few special cases include:
> - What if the !METADATA tablet was included in the batch?
> - What about the root tablet?
> Benefit:
> In one of our clusters, we're experiencing particularly slow HDFS operations 
> leading to large oscillations in ingest performance. We haven't isolated the 
> cause in HDFS but when we profile the tservers, we noticed that they were 
> waiting for metadata table operations to complete. This would target the 
> waiting.
> Potential downsides:
> Given the existing locking scheme, it looks like we may have to lock a tablet 
> for slightly longer (we'll lock for the duration of the batch).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ACCUMULO-2827) HeapIterator optimization

2014-06-12 Thread Jonathan Park (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14029547#comment-14029547
 ] 

Jonathan Park commented on ACCUMULO-2827:
-

This was only a single compaction using each of the old and new iterators.

Any more details on hooking up a profiler and seeing how long the HeapIterator 
takes? By profiler, do you mean something like jvisualvm? Do you want us to try 
and profile a major compaction as its running in a tserver vs a test harness? 
How would you like the profiler hooked up? Typically we've found that attaching 
profilers to the iterators greatly affect the performance. It should still show 
the benefit from this change though. Would those results be valid?

> HeapIterator optimization
> -
>
> Key: ACCUMULO-2827
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2827
> Project: Accumulo
>  Issue Type: Improvement
>Affects Versions: 1.5.1, 1.6.0
>Reporter: Jonathan Park
>Assignee: Jonathan Park
>Priority: Minor
> Fix For: 1.5.2, 1.6.1, 1.7.0
>
> Attachments: ACCUMULO-2827.0.patch.txt, accumulo-2827.raw_data, 
> new_heapiter.png, old_heapiter.png, together.png
>
>
> We've been running a few performance tests of our iterator stack and noticed 
> a decent amount of time spent in the HeapIterator specifically related to 
> add/removal into the heap.
> This may not be a general enough optimization but we thought we'd see what 
> people thought. Our assumption is that it's more probable that the current 
> "top iterator" will supply the next value in the iteration than not. The 
> current implementation takes the other assumption by always removing + 
> inserting the minimum iterator back into the heap. With the implementation of 
> a binary heap that we're using, this can get costly if our assumption is 
> wrong because we pay the log penalty of percolating up the iterator in the 
> heap upon insertion and again when percolating down upon removal.
> We believe our assumption is a fair one to hold given that as major 
> compactions create a log distribution of file sizes, it's likely that we may 
> see a long chain of consecutive entries coming from 1 iterator. 
> Understandably, taking this assumption comes at an additional cost in the 
> case that we're wrong. Therefore, we've run a few benchmarking tests to see 
> how much of a cost we pay as well as what kind of benefit we see. I've 
> attached a potential patch (which includes a test harness) + image that 
> captures the results of our tests. The x-axis represents # of repeated keys 
> before switching to another iterator. The y-axis represents iteration time. 
> The sets of blue + red lines varies in # of iterators present in the heap.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ACCUMULO-2827) HeapIterator optimization

2014-06-11 Thread Jonathan Park (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028114#comment-14028114
 ] 

Jonathan Park commented on ACCUMULO-2827:
-

Results of accumulo continuous ingest (against 1.5.1 on hadoop-2.2.0). Tests 
were run against a 12 physical-core, 64GB RAM, 8 drive single-node machine:

Test:
- Ingest roughly 1 billion entries (set NUM=1,000,000,000 (without commas))
- Pre-split into 8 tablets
- table.split.threshold=100G (Avoid splits so we can have more entries per 
tablet)
- table.compaction.major.ratio=4
- table.file.max=10
- tserver.compaction.major.concurrent.max=9 (enough to have all compactions 
running concurrently)
- tserver.compaction.major.thread.files.open.max=20 (all files open at once 
during majc)
- tserver.memory.maps.max=4G

We only used 1 ingester instance (so a single batchwriter thread).

Results:
After ingest completed, we triggered a full majc and timed how long it took to 
complete.
{noformat}
time accumulo shell -u root -p  -e 'compact -t ci -w'
{noformat}

1.5.1 old heap iterator
{no format}
real21m48.785s
user0m6.014s
sys 0m0.475s
{no format}

1.5.1 new heap iterator
{no format}
real20m45.002s
user0m5.693s
sys 0m0.456s
{no format}

> HeapIterator optimization
> -
>
> Key: ACCUMULO-2827
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2827
> Project: Accumulo
>  Issue Type: Improvement
>Affects Versions: 1.5.1, 1.6.0
>Reporter: Jonathan Park
>Assignee: Jonathan Park
>Priority: Minor
> Fix For: 1.5.2, 1.6.1, 1.7.0
>
> Attachments: ACCUMULO-2827.0.patch.txt, accumulo-2827.raw_data, 
> new_heapiter.png, old_heapiter.png, together.png
>
>
> We've been running a few performance tests of our iterator stack and noticed 
> a decent amount of time spent in the HeapIterator specifically related to 
> add/removal into the heap.
> This may not be a general enough optimization but we thought we'd see what 
> people thought. Our assumption is that it's more probable that the current 
> "top iterator" will supply the next value in the iteration than not. The 
> current implementation takes the other assumption by always removing + 
> inserting the minimum iterator back into the heap. With the implementation of 
> a binary heap that we're using, this can get costly if our assumption is 
> wrong because we pay the log penalty of percolating up the iterator in the 
> heap upon insertion and again when percolating down upon removal.
> We believe our assumption is a fair one to hold given that as major 
> compactions create a log distribution of file sizes, it's likely that we may 
> see a long chain of consecutive entries coming from 1 iterator. 
> Understandably, taking this assumption comes at an additional cost in the 
> case that we're wrong. Therefore, we've run a few benchmarking tests to see 
> how much of a cost we pay as well as what kind of benefit we see. I've 
> attached a potential patch (which includes a test harness) + image that 
> captures the results of our tests. The x-axis represents # of repeated keys 
> before switching to another iterator. The y-axis represents iteration time. 
> The sets of blue + red lines varies in # of iterators present in the heap.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (ACCUMULO-2889) Batch metadata table updates for new walogs

2014-06-10 Thread Jonathan Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Park reassigned ACCUMULO-2889:
---

Assignee: Jonathan Park

> Batch metadata table updates for new walogs
> ---
>
> Key: ACCUMULO-2889
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2889
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.5.1
>Reporter: Jonathan Park
>Assignee: Jonathan Park
>
> Currently, when we update the Metadata table with new loggers, we will update 
> the metadata for each tablet serially. We could optimize this to instead use 
> a batchwriter to send all metadata updates for all tablets in a batch.
> A few special cases include:
> - What if the !METADATA tablet was included in the batch?
> - What about the root tablet?
> Benefit:
> In one of our clusters, we're experiencing particularly slow HDFS operations 
> leading to large oscillations in ingest performance. We haven't isolated the 
> cause in HDFS but when we profile the tservers, we noticed that they were 
> waiting for metadata table operations to complete. This would target the 
> waiting.
> Potential downsides:
> Given the existing locking scheme, it looks like we may have to lock a tablet 
> for slightly longer (we'll lock for the duration of the batch).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (ACCUMULO-2827) HeapIterator optimization

2014-06-10 Thread Jonathan Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Park updated ACCUMULO-2827:


Attachment: accumulo-2827.raw_data

Attaching raw data for the together.png image. Values are in ms so there may be 
some amount of noise involved.

Continuous ingest tests are still running. Sorry for the delay.

> HeapIterator optimization
> -
>
> Key: ACCUMULO-2827
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2827
> Project: Accumulo
>  Issue Type: Improvement
>Affects Versions: 1.5.1, 1.6.0
>Reporter: Jonathan Park
>Assignee: Jonathan Park
>Priority: Minor
> Fix For: 1.5.2, 1.6.1, 1.7.0
>
> Attachments: ACCUMULO-2827.0.patch.txt, accumulo-2827.raw_data, 
> new_heapiter.png, old_heapiter.png, together.png
>
>
> We've been running a few performance tests of our iterator stack and noticed 
> a decent amount of time spent in the HeapIterator specifically related to 
> add/removal into the heap.
> This may not be a general enough optimization but we thought we'd see what 
> people thought. Our assumption is that it's more probable that the current 
> "top iterator" will supply the next value in the iteration than not. The 
> current implementation takes the other assumption by always removing + 
> inserting the minimum iterator back into the heap. With the implementation of 
> a binary heap that we're using, this can get costly if our assumption is 
> wrong because we pay the log penalty of percolating up the iterator in the 
> heap upon insertion and again when percolating down upon removal.
> We believe our assumption is a fair one to hold given that as major 
> compactions create a log distribution of file sizes, it's likely that we may 
> see a long chain of consecutive entries coming from 1 iterator. 
> Understandably, taking this assumption comes at an additional cost in the 
> case that we're wrong. Therefore, we've run a few benchmarking tests to see 
> how much of a cost we pay as well as what kind of benefit we see. I've 
> attached a potential patch (which includes a test harness) + image that 
> captures the results of our tests. The x-axis represents # of repeated keys 
> before switching to another iterator. The y-axis represents iteration time. 
> The sets of blue + red lines varies in # of iterators present in the heap.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ACCUMULO-2801) define tablet syncs walog for each tablet in a batch

2014-06-10 Thread Jonathan Park (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027036#comment-14027036
 ] 

Jonathan Park commented on ACCUMULO-2801:
-

Oops, sorry for spam... Unsure if I referenced the right Keith Turner in the 
1st comment so wanted to link kturner. 

> define tablet syncs walog for each tablet in a batch
> 
>
> Key: ACCUMULO-2801
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2801
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Keith Turner
>
> When the batch writer sends a batch of mutations for N tablets that were not 
> currently using a walog, then define tablet will be called for each tablet.  
> Define tablet will sync the walog.   In hadoop 2 hsync is used, which is much 
> slower than hadoop1 sync calls.  If hsync takes 50ms and there are 100 
> tablets, then this operation would take 5 secs.  The calls to define tablet 
> do not occur frequently, just when walogs switch or tablets are loaded so the 
> cost will be amortized.  Ideally there could be one walog sync call for all 
> of the tablets in a batch of mutations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ACCUMULO-2801) define tablet syncs walog for each tablet in a batch

2014-06-10 Thread Jonathan Park (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027032#comment-14027032
 ] 

Jonathan Park commented on ACCUMULO-2801:
-

[~kturner]

> define tablet syncs walog for each tablet in a batch
> 
>
> Key: ACCUMULO-2801
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2801
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Keith Turner
>
> When the batch writer sends a batch of mutations for N tablets that were not 
> currently using a walog, then define tablet will be called for each tablet.  
> Define tablet will sync the walog.   In hadoop 2 hsync is used, which is much 
> slower than hadoop1 sync calls.  If hsync takes 50ms and there are 100 
> tablets, then this operation would take 5 secs.  The calls to define tablet 
> do not occur frequently, just when walogs switch or tablets are loaded so the 
> cost will be amortized.  Ideally there could be one walog sync call for all 
> of the tablets in a batch of mutations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ACCUMULO-2889) Batch metadata table updates for new walogs

2014-06-10 Thread Jonathan Park (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027009#comment-14027009
 ] 

Jonathan Park commented on ACCUMULO-2889:
-

Our current proposal:

TabletServer client threads:
order(commitSessions) // this is to avoid deadlock across multiple client 
threads

batch.start
foreach tablet:
  tablet.logLock.lock
  if (tablet.mustRegisterNewLoggers)
then
  defineTablet(tablet) // write WAL entry for tablet
  tablet.addLoggerToMetadataBatch(batch)
  // hold onto the lock
else
  tablet.logLock.release

batch.flush
release(allCurrentlyHeldLocks);

> Batch metadata table updates for new walogs
> ---
>
> Key: ACCUMULO-2889
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2889
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.5.1
>Reporter: Jonathan Park
>
> Currently, when we update the Metadata table with new loggers, we will update 
> the metadata for each tablet serially. We could optimize this to instead use 
> a batchwriter to send all metadata updates for all tablets in a batch.
> A few special cases include:
> - What if the !METADATA tablet was included in the batch?
> - What about the root tablet?
> Benefit:
> In one of our clusters, we're experiencing particularly slow HDFS operations 
> leading to large oscillations in ingest performance. We haven't isolated the 
> cause in HDFS but when we profile the tservers, we noticed that they were 
> waiting for metadata table operations to complete. This would target the 
> waiting.
> Potential downsides:
> Given the existing locking scheme, it looks like we may have to lock a tablet 
> for slightly longer (we'll lock for the duration of the batch).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (ACCUMULO-2889) Batch metadata table updates for new walogs

2014-06-10 Thread Jonathan Park (JIRA)
Jonathan Park created ACCUMULO-2889:
---

 Summary: Batch metadata table updates for new walogs
 Key: ACCUMULO-2889
 URL: https://issues.apache.org/jira/browse/ACCUMULO-2889
 Project: Accumulo
  Issue Type: Bug
Affects Versions: 1.5.1
Reporter: Jonathan Park


Currently, when we update the Metadata table with new loggers, we will update 
the metadata for each tablet serially. We could optimize this to instead use a 
batchwriter to send all metadata updates for all tablets in a batch.

A few special cases include:
- What if the !METADATA tablet was included in the batch?
- What about the root tablet?

Benefit:
In one of our clusters, we're experiencing particularly slow HDFS operations 
leading to large oscillations in ingest performance. We haven't isolated the 
cause in HDFS but when we profile the tservers, we noticed that they were 
waiting for metadata table operations to complete. This would target the 
waiting.

Potential downsides:
Given the existing locking scheme, it looks like we may have to lock a tablet 
for slightly longer (we'll lock for the duration of the batch).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ACCUMULO-2801) define tablet syncs walog for each tablet in a batch

2014-06-10 Thread Jonathan Park (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026951#comment-14026951
 ] 

Jonathan Park commented on ACCUMULO-2801:
-

[~keith_turner] what are your thoughts on not calling sync for define tablet 
and instead relying on the sync for a data write to ensure that it exists?

It will make it possible for there to be a metadata table entry for the WAL 
without there being an associated DEFINE_TABLET in the WAL which I think 
recovery will currently ignore (looking at 1.5.1). It might change our recovery 
semantics (I'm not fully familiar with what our current guarantees are) in the 
case of log rollovers/defines. 

> define tablet syncs walog for each tablet in a batch
> 
>
> Key: ACCUMULO-2801
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2801
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: Keith Turner
>
> When the batch writer sends a batch of mutations for N tablets that were not 
> currently using a walog, then define tablet will be called for each tablet.  
> Define tablet will sync the walog.   In hadoop 2 hsync is used, which is much 
> slower than hadoop1 sync calls.  If hsync takes 50ms and there are 100 
> tablets, then this operation would take 5 secs.  The calls to define tablet 
> do not occur frequently, just when walogs switch or tablets are loaded so the 
> cost will be amortized.  Ideally there could be one walog sync call for all 
> of the tablets in a batch of mutations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ACCUMULO-2827) HeapIterator optimization

2014-06-05 Thread Jonathan Park (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018832#comment-14018832
 ] 

Jonathan Park commented on ACCUMULO-2827:
-

My apologies, I haven't gotten to this yet. I'll try to find some time this 
weekend.

> HeapIterator optimization
> -
>
> Key: ACCUMULO-2827
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2827
> Project: Accumulo
>  Issue Type: Improvement
>Affects Versions: 1.5.1, 1.6.0
>Reporter: Jonathan Park
>Assignee: Jonathan Park
>Priority: Minor
> Fix For: 1.5.2, 1.6.1, 1.7.0
>
> Attachments: ACCUMULO-2827.0.patch.txt, new_heapiter.png, 
> old_heapiter.png, together.png
>
>
> We've been running a few performance tests of our iterator stack and noticed 
> a decent amount of time spent in the HeapIterator specifically related to 
> add/removal into the heap.
> This may not be a general enough optimization but we thought we'd see what 
> people thought. Our assumption is that it's more probable that the current 
> "top iterator" will supply the next value in the iteration than not. The 
> current implementation takes the other assumption by always removing + 
> inserting the minimum iterator back into the heap. With the implementation of 
> a binary heap that we're using, this can get costly if our assumption is 
> wrong because we pay the log penalty of percolating up the iterator in the 
> heap upon insertion and again when percolating down upon removal.
> We believe our assumption is a fair one to hold given that as major 
> compactions create a log distribution of file sizes, it's likely that we may 
> see a long chain of consecutive entries coming from 1 iterator. 
> Understandably, taking this assumption comes at an additional cost in the 
> case that we're wrong. Therefore, we've run a few benchmarking tests to see 
> how much of a cost we pay as well as what kind of benefit we see. I've 
> attached a potential patch (which includes a test harness) + image that 
> captures the results of our tests. The x-axis represents # of repeated keys 
> before switching to another iterator. The y-axis represents iteration time. 
> The sets of blue + red lines varies in # of iterators present in the heap.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (ACCUMULO-2827) HeapIterator optimization

2014-05-19 Thread Jonathan Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Park updated ACCUMULO-2827:


Assignee: Jonathan Park
  Status: Patch Available  (was: Open)

> HeapIterator optimization
> -
>
> Key: ACCUMULO-2827
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2827
> Project: Accumulo
>  Issue Type: Improvement
>Affects Versions: 1.5.1
>Reporter: Jonathan Park
>Assignee: Jonathan Park
>Priority: Minor
> Attachments: ACCUMULO-2827.0.patch.txt, new_heapiter.png, 
> old_heapiter.png, together.png
>
>
> We've been running a few performance tests of our iterator stack and noticed 
> a decent amount of time spent in the HeapIterator specifically related to 
> add/removal into the heap.
> This may not be a general enough optimization but we thought we'd see what 
> people thought. Our assumption is that it's more probable that the current 
> "top iterator" will supply the next value in the iteration than not. The 
> current implementation takes the other assumption by always removing + 
> inserting the minimum iterator back into the heap. With the implementation of 
> a binary heap that we're using, this can get costly if our assumption is 
> wrong because we pay the log penalty of percolating up the iterator in the 
> heap upon insertion and again when percolating down upon removal.
> We believe our assumption is a fair one to hold given that as major 
> compactions create a log distribution of file sizes, it's likely that we may 
> see a long chain of consecutive entries coming from 1 iterator. 
> Understandably, taking this assumption comes at an additional cost in the 
> case that we're wrong. Therefore, we've run a few benchmarking tests to see 
> how much of a cost we pay as well as what kind of benefit we see. I've 
> attached a potential patch (which includes a test harness) + image that 
> captures the results of our tests. The x-axis represents # of repeated keys 
> before switching to another iterator. The y-axis represents iteration time. 
> The sets of blue + red lines varies in # of iterators present in the heap.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (ACCUMULO-2827) HeapIterator optimization

2014-05-19 Thread Jonathan Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Park updated ACCUMULO-2827:


Attachment: together.png
old_heapiter.png
new_heapiter.png
ACCUMULO-2827.0.patch.txt

> HeapIterator optimization
> -
>
> Key: ACCUMULO-2827
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2827
> Project: Accumulo
>  Issue Type: Improvement
>Affects Versions: 1.5.1
>Reporter: Jonathan Park
>Priority: Minor
> Attachments: ACCUMULO-2827.0.patch.txt, new_heapiter.png, 
> old_heapiter.png, together.png
>
>
> We've been running a few performance tests of our iterator stack and noticed 
> a decent amount of time spent in the HeapIterator specifically related to 
> add/removal into the heap.
> This may not be a general enough optimization but we thought we'd see what 
> people thought. Our assumption is that it's more probable that the current 
> "top iterator" will supply the next value in the iteration than not. The 
> current implementation takes the other assumption by always removing + 
> inserting the minimum iterator back into the heap. With the implementation of 
> a binary heap that we're using, this can get costly if our assumption is 
> wrong because we pay the log penalty of percolating up the iterator in the 
> heap upon insertion and again when percolating down upon removal.
> We believe our assumption is a fair one to hold given that as major 
> compactions create a log distribution of file sizes, it's likely that we may 
> see a long chain of consecutive entries coming from 1 iterator. 
> Understandably, taking this assumption comes at an additional cost in the 
> case that we're wrong. Therefore, we've run a few benchmarking tests to see 
> how much of a cost we pay as well as what kind of benefit we see. I've 
> attached a potential patch (which includes a test harness) + image that 
> captures the results of our tests. The x-axis represents # of repeated keys 
> before switching to another iterator. The y-axis represents iteration time. 
> The sets of blue + red lines varies in # of iterators present in the heap.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (ACCUMULO-2827) HeapIterator optimization

2014-05-19 Thread Jonathan Park (JIRA)
Jonathan Park created ACCUMULO-2827:
---

 Summary: HeapIterator optimization
 Key: ACCUMULO-2827
 URL: https://issues.apache.org/jira/browse/ACCUMULO-2827
 Project: Accumulo
  Issue Type: Improvement
Affects Versions: 1.5.1
Reporter: Jonathan Park
Priority: Minor


We've been running a few performance tests of our iterator stack and noticed a 
decent amount of time spent in the HeapIterator specifically related to 
add/removal into the heap.

This may not be a general enough optimization but we thought we'd see what 
people thought. Our assumption is that it's more probable that the current "top 
iterator" will supply the next value in the iteration than not. The current 
implementation takes the other assumption by always removing + inserting the 
minimum iterator back into the heap. With the implementation of a binary heap 
that we're using, this can get costly if our assumption is wrong because we pay 
the log(n) penalty of percolating up the iterator in the heap upon insertion 
and again when percolating down upon removal.

We believe our assumption is a fair one to hold given that as major compactions 
create a log distribution of file sizes, it's likely that we may see a long 
chain of consecutive entries coming from 1 iterator. Understandably, taking 
this assumption comes at an additional cost in the case that we're wrong. 
Therefore, we've run a few benchmarking tests to see how much of a cost we pay 
as well as what kind of benefit we see. I've attached a potential patch (which 
includes a test harness) + image that captures the results of our tests. The 
x-axis represents # of repeated keys before switching to another iterator. The 
y-axis represents iteration time. The sets of blue + red lines varies in # of 
iterators present in the heap.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (ACCUMULO-2827) HeapIterator optimization

2014-05-19 Thread Jonathan Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Park updated ACCUMULO-2827:


Description: 
We've been running a few performance tests of our iterator stack and noticed a 
decent amount of time spent in the HeapIterator specifically related to 
add/removal into the heap.

This may not be a general enough optimization but we thought we'd see what 
people thought. Our assumption is that it's more probable that the current "top 
iterator" will supply the next value in the iteration than not. The current 
implementation takes the other assumption by always removing + inserting the 
minimum iterator back into the heap. With the implementation of a binary heap 
that we're using, this can get costly if our assumption is wrong because we pay 
the log penalty of percolating up the iterator in the heap upon insertion and 
again when percolating down upon removal.

We believe our assumption is a fair one to hold given that as major compactions 
create a log distribution of file sizes, it's likely that we may see a long 
chain of consecutive entries coming from 1 iterator. Understandably, taking 
this assumption comes at an additional cost in the case that we're wrong. 
Therefore, we've run a few benchmarking tests to see how much of a cost we pay 
as well as what kind of benefit we see. I've attached a potential patch (which 
includes a test harness) + image that captures the results of our tests. The 
x-axis represents # of repeated keys before switching to another iterator. The 
y-axis represents iteration time. The sets of blue + red lines varies in # of 
iterators present in the heap.

  was:
We've been running a few performance tests of our iterator stack and noticed a 
decent amount of time spent in the HeapIterator specifically related to 
add/removal into the heap.

This may not be a general enough optimization but we thought we'd see what 
people thought. Our assumption is that it's more probable that the current "top 
iterator" will supply the next value in the iteration than not. The current 
implementation takes the other assumption by always removing + inserting the 
minimum iterator back into the heap. With the implementation of a binary heap 
that we're using, this can get costly if our assumption is wrong because we pay 
the log(n) penalty of percolating up the iterator in the heap upon insertion 
and again when percolating down upon removal.

We believe our assumption is a fair one to hold given that as major compactions 
create a log distribution of file sizes, it's likely that we may see a long 
chain of consecutive entries coming from 1 iterator. Understandably, taking 
this assumption comes at an additional cost in the case that we're wrong. 
Therefore, we've run a few benchmarking tests to see how much of a cost we pay 
as well as what kind of benefit we see. I've attached a potential patch (which 
includes a test harness) + image that captures the results of our tests. The 
x-axis represents # of repeated keys before switching to another iterator. The 
y-axis represents iteration time. The sets of blue + red lines varies in # of 
iterators present in the heap.


> HeapIterator optimization
> -
>
> Key: ACCUMULO-2827
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2827
> Project: Accumulo
>  Issue Type: Improvement
>Affects Versions: 1.5.1
>Reporter: Jonathan Park
>Priority: Minor
>
> We've been running a few performance tests of our iterator stack and noticed 
> a decent amount of time spent in the HeapIterator specifically related to 
> add/removal into the heap.
> This may not be a general enough optimization but we thought we'd see what 
> people thought. Our assumption is that it's more probable that the current 
> "top iterator" will supply the next value in the iteration than not. The 
> current implementation takes the other assumption by always removing + 
> inserting the minimum iterator back into the heap. With the implementation of 
> a binary heap that we're using, this can get costly if our assumption is 
> wrong because we pay the log penalty of percolating up the iterator in the 
> heap upon insertion and again when percolating down upon removal.
> We believe our assumption is a fair one to hold given that as major 
> compactions create a log distribution of file sizes, it's likely that we may 
> see a long chain of consecutive entries coming from 1 iterator. 
> Understandably, taking this assumption comes at an additional cost in the 
> case that we're wrong. Therefore, we've run a few benchmarking tests to see 
> how much of a cost we pay as well as what kind of benefit we see. I've 
> attached a potential patch (which includes a test harness) + image that 
> captures the results of our tests. The x-axis represents # of repeated keys 
> bef

[jira] [Commented] (ACCUMULO-2668) slow WAL writes

2014-04-15 Thread Jonathan Park (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969705#comment-13969705
 ] 

Jonathan Park commented on ACCUMULO-2668:
-

Hey Sean. It would be great to be listed as a contributor.

Name: Jonathan Park
Company name: sqrrl
Timezone: ET

Thanks for all the help everyone! 

> slow WAL writes
> ---
>
> Key: ACCUMULO-2668
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2668
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Jonathan Park
>Assignee: Jonathan Park
>Priority: Blocker
>  Labels: 16_qa_bug
> Fix For: 1.6.1
>
> Attachments: ACCUMULO-2668.0.patch.txt, noflush.diff
>
>
> During continuous ingest, we saw over 70% of our ingest time taken up by 
> writes to the WAL. When we ran the DfsLogger in isolation (created one 
> outside of the Tserver), we saw about ~25MB/s throughput as opposed to nearly 
> 100MB/s from just writing directly to an hdfs outputstream (computed by 
> taking the estimated size of the mutations sent to the DfsLogger class 
> divided by the time it took for it to flush + sync the data to HDFS).
> After investigating, we found one possible culprit was the 
> NoFlushOutputStream. It is a subclass of java.io.FilterOutputStream but does 
> not override the write(byte[], int, int) method signature. The javadoc 
> indicates that subclasses of the FilterOutputStream should provide a more 
> efficient implementation.
> I've attached a small diff that illustrates and addresses the issue but this 
> may not be how we ultimately want to fix it.
> As a side note, I may be misreading the implementation of DfsLogger, but it 
> looks like we always make use of the NoFlushOutputStream, even if encryption 
> isn't enabled. There appears to be a faulty check in the DfsLogger.open() 
> implementation that I don't believe can be satisfied (line 384).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (ACCUMULO-2671) BlockedOutputStream can hit a StackOverflowError

2014-04-14 Thread Jonathan Park (JIRA)
Jonathan Park created ACCUMULO-2671:
---

 Summary: BlockedOutputStream can hit a StackOverflowError
 Key: ACCUMULO-2671
 URL: https://issues.apache.org/jira/browse/ACCUMULO-2671
 Project: Accumulo
  Issue Type: Bug
Affects Versions: 1.6.0
Reporter: Jonathan Park


This issue mostly came up after a resolution to ACCUMULO-2668 that allows a 
byte[] to be passed directly to the underlying stream from the 
NoFlushOutputStream.

The problem appears to be due to the BlockedOutputStream.write(byte[], int, 
int) implementation that recursively writes out blocks/buffers out. When the 
stream is passed a large mutation (128MB was sufficient to trigger the error 
for me), this will cause a StackOverflowError. 

This is appears to be specifically with encryption at rest turned on.

A simple fix would be to unroll the recursion.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (ACCUMULO-2668) slow WAL writes

2014-04-14 Thread Jonathan Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Park updated ACCUMULO-2668:


Attachment: ACCUMULO-2668.0.patch.txt

Reuploading file from format-patch command

microbenchmark:
- Ran continuous ingest on my laptop (2013 mbp: 2.6 GHz quad core i7, 16 GB 
RAM) using default 3GB accumulo config using native maps. Used a single 
continuous ingester instance against a table with 4 tablets.

results:
with fix: 120K entries/s, 12.66 MB/s
without fix: 83K entries/s, 9.05 MB/s

#s were obtained at some point in time during the ingest


> slow WAL writes
> ---
>
> Key: ACCUMULO-2668
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2668
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Jonathan Park
>Assignee: Jonathan Park
>Priority: Blocker
>  Labels: 16_qa_bug
> Fix For: 1.6.1
>
> Attachments: ACCUMULO-2668.0.patch.txt, noflush.diff
>
>
> During continuous ingest, we saw over 70% of our ingest time taken up by 
> writes to the WAL. When we ran the DfsLogger in isolation (created one 
> outside of the Tserver), we saw about ~25MB/s throughput as opposed to nearly 
> 100MB/s from just writing directly to an hdfs outputstream (computed by 
> taking the estimated size of the mutations sent to the DfsLogger class 
> divided by the time it took for it to flush + sync the data to HDFS).
> After investigating, we found one possible culprit was the 
> NoFlushOutputStream. It is a subclass of java.io.FilterOutputStream but does 
> not override the write(byte[], int, int) method signature. The javadoc 
> indicates that subclasses of the FilterOutputStream should provide a more 
> efficient implementation.
> I've attached a small diff that illustrates and addresses the issue but this 
> may not be how we ultimately want to fix it.
> As a side note, I may be misreading the implementation of DfsLogger, but it 
> looks like we always make use of the NoFlushOutputStream, even if encryption 
> isn't enabled. There appears to be a faulty check in the DfsLogger.open() 
> implementation that I don't believe can be satisfied (line 384).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (ACCUMULO-2669) NoFlushOutputStream always in use in DfsLogger

2014-04-14 Thread Jonathan Park (JIRA)
Jonathan Park created ACCUMULO-2669:
---

 Summary: NoFlushOutputStream always in use in DfsLogger
 Key: ACCUMULO-2669
 URL: https://issues.apache.org/jira/browse/ACCUMULO-2669
 Project: Accumulo
  Issue Type: Bug
Affects Versions: 1.6.0
Reporter: Jonathan Park
Priority: Minor


I may be misreading the implementation of DfsLogger, but it looks like we 
always make use of the NoFlushOutputStream, even if encryption isn't enabled. 
There appears to be a faulty check in the DfsLogger.open() implementation that 
I don't believe can be satisfied (line 384).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (ACCUMULO-2668) slow WAL writes

2014-04-14 Thread Jonathan Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Park updated ACCUMULO-2668:


Description: 
During continuous ingest, we saw over 70% of our ingest time taken up by writes 
to the WAL. When we ran the DfsLogger in isolation (created one outside of the 
Tserver), we saw about ~25MB/s throughput as opposed to nearly 100MB/s from 
just writing directly to an hdfs outputstream (computed by taking the estimated 
size of the mutations sent to the DfsLogger class divided by the time it took 
for it to flush + sync the data to HDFS).

After investigating, we found one possible culprit was the NoFlushOutputStream. 
It is a subclass of java.io.FilterOutputStream but does not override the 
write(byte[], int, int) method signature. The javadoc indicates that subclasses 
of the FilterOutputStream should provide a more efficient implementation.

I've attached a small diff that illustrates and addresses the issue but this 
may not be how we ultimately want to fix it.

As a side note, I may be misreading the implementation of DfsLogger, but it 
looks like we always make use of the NoFlushOutputStream, even if encryption 
isn't enabled. There appears to be a faulty check in the DfsLogger.open() 
implementation that I don't believe can be satisfied (line 384).

  was:
During continuous ingest, we saw over 70% of our ingest time taken up by writes 
to the WAL. When we ran the DfsLogger in isolation (created one outside of the 
Tserver), we saw about ~25MB/s throughput (computed by taking the estimated 
size of the mutations sent to the DfsLogger class divided by the time it took 
for it to flush + sync the data to HDFS).

After investigating, we found one possible culprit was the NoFlushOutputStream. 
It is a subclass of java.io.FilterOutputStream but does not override the 
write(byte[], int, int) method signature. The javadoc indicates that subclasses 
of the FilterOutputStream should provide a more efficient implementation.

I've attached a small diff that illustrates and addresses the issue but this 
may not be how we ultimately want to fix it.

As a side note, I may be misreading the implementation of DfsLogger, but it 
looks like we always make use of the NoFlushOutputStream, even if encryption 
isn't enabled. There appears to be a faulty check in the DfsLogger.open() 
implementation that I don't believe can be satisfied (line 384).


> slow WAL writes
> ---
>
> Key: ACCUMULO-2668
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2668
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Jonathan Park
> Attachments: noflush.diff
>
>
> During continuous ingest, we saw over 70% of our ingest time taken up by 
> writes to the WAL. When we ran the DfsLogger in isolation (created one 
> outside of the Tserver), we saw about ~25MB/s throughput as opposed to nearly 
> 100MB/s from just writing directly to an hdfs outputstream (computed by 
> taking the estimated size of the mutations sent to the DfsLogger class 
> divided by the time it took for it to flush + sync the data to HDFS).
> After investigating, we found one possible culprit was the 
> NoFlushOutputStream. It is a subclass of java.io.FilterOutputStream but does 
> not override the write(byte[], int, int) method signature. The javadoc 
> indicates that subclasses of the FilterOutputStream should provide a more 
> efficient implementation.
> I've attached a small diff that illustrates and addresses the issue but this 
> may not be how we ultimately want to fix it.
> As a side note, I may be misreading the implementation of DfsLogger, but it 
> looks like we always make use of the NoFlushOutputStream, even if encryption 
> isn't enabled. There appears to be a faulty check in the DfsLogger.open() 
> implementation that I don't believe can be satisfied (line 384).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (ACCUMULO-2668) slow WAL writes

2014-04-14 Thread Jonathan Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Park updated ACCUMULO-2668:


Status: Patch Available  (was: Open)

Attaching a possible fix.

> slow WAL writes
> ---
>
> Key: ACCUMULO-2668
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2668
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Jonathan Park
> Attachments: noflush.diff
>
>
> During continuous ingest, we saw over 70% of our ingest time taken up by 
> writes to the WAL. When we ran the DfsLogger in isolation (created one 
> outside of the Tserver), we saw about ~25MB/s throughput (computed by taking 
> the estimated size of the mutations sent to the DfsLogger class divided by 
> the time it took for it to flush + sync the data to HDFS).
> After investigating, we found one possible culprit was the 
> NoFlushOutputStream. It is a subclass of java.io.FilterOutputStream but does 
> not override the write(byte[], int, int) method signature. The javadoc 
> indicates that subclasses of the FilterOutputStream should provide a more 
> efficient implementation.
> I've attached a small diff that illustrates and addresses the issue but this 
> may not be how we ultimately want to fix it.
> As a side note, I may be misreading the implementation of DfsLogger, but it 
> looks like we always make use of the NoFlushOutputStream, even if encryption 
> isn't enabled. There appears to be a faulty check in the DfsLogger.open() 
> implementation that I don't believe can be satisfied (line 384).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (ACCUMULO-2668) slow WAL writes

2014-04-14 Thread Jonathan Park (JIRA)
Jonathan Park created ACCUMULO-2668:
---

 Summary: slow WAL writes
 Key: ACCUMULO-2668
 URL: https://issues.apache.org/jira/browse/ACCUMULO-2668
 Project: Accumulo
  Issue Type: Bug
Affects Versions: 1.6.0
Reporter: Jonathan Park
 Attachments: noflush.diff

During continuous ingest, we saw over 70% of our ingest time taken up by writes 
to the WAL. When we ran the DfsLogger in isolation (created one outside of the 
Tserver), we saw about ~25MB/s throughput (computed by taking the estimated 
size of the mutations sent to the DfsLogger class divided by the time it took 
for it to flush + sync the data to HDFS).

After investigating, we found one possible culprit was the NoFlushOutputStream. 
It is a subclass of java.io.FilterOutputStream but does not override the 
write(byte[], int, int) method signature. The javadoc indicates that subclasses 
of the FilterOutputStream should provide a more efficient implementation.

I've attached a small diff that illustrates and addresses the issue but this 
may not be how we ultimately want to fix it.

As a side note, I may be misreading the implementation of DfsLogger, but it 
looks like we always make use of the NoFlushOutputStream, even if encryption 
isn't enabled. There appears to be a faulty check in the DfsLogger.open() 
implementation that I don't believe can be satisfied (line 384).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (ACCUMULO-2668) slow WAL writes

2014-04-14 Thread Jonathan Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-2668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Park updated ACCUMULO-2668:


Attachment: noflush.diff

> slow WAL writes
> ---
>
> Key: ACCUMULO-2668
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2668
> Project: Accumulo
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Jonathan Park
> Attachments: noflush.diff
>
>
> During continuous ingest, we saw over 70% of our ingest time taken up by 
> writes to the WAL. When we ran the DfsLogger in isolation (created one 
> outside of the Tserver), we saw about ~25MB/s throughput (computed by taking 
> the estimated size of the mutations sent to the DfsLogger class divided by 
> the time it took for it to flush + sync the data to HDFS).
> After investigating, we found one possible culprit was the 
> NoFlushOutputStream. It is a subclass of java.io.FilterOutputStream but does 
> not override the write(byte[], int, int) method signature. The javadoc 
> indicates that subclasses of the FilterOutputStream should provide a more 
> efficient implementation.
> I've attached a small diff that illustrates and addresses the issue but this 
> may not be how we ultimately want to fix it.
> As a side note, I may be misreading the implementation of DfsLogger, but it 
> looks like we always make use of the NoFlushOutputStream, even if encryption 
> isn't enabled. There appears to be a faulty check in the DfsLogger.open() 
> implementation that I don't believe can be satisfied (line 384).



--
This message was sent by Atlassian JIRA
(v6.2#6252)