Accumulo-Integration-Tests - Build # 720 - Aborted! -- 1.7
Accumulo-Integration-Tests - Build # 720 - Aborted: Check console output at https://secure.penguinsinabox.com/jenkins/job/Accumulo-Integration-Tests/720/ to view the results.
Accumulo-Integration-Tests - Build # 719 - Still unstable! -- 1.6
Accumulo-Integration-Tests - Build # 719 - Still unstable: Check console output at https://secure.penguinsinabox.com/jenkins/job/Accumulo-Integration-Tests/719/ to view the results.
[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations
[ https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176713#comment-15176713 ] Dave Marion commented on ACCUMULO-1755: --- I took the test that I created and ran it against master and my feature branch with 1 to 6 threads. I didn't see much difference, but looking back at it now I think its because the test pre-creates all of the mutations and adds them as fast as possible. The test is really for multi-threaded correctness rather than performance. In the new code there is still a synchronization point when adding the binned mutations to the queues for the tablet servers. The send threads in the test (local mini accumulo cluster) must be able to keep up with adding of the binned mutations. I don't expect that to be the case in a real deployment. Good news - performance wasn't worse. I think a better test is to write a simple multi-threaded client that creates and adds mutations to a common batch writer. Then, time the application as whole trying to insert N mutations with 1 to N client threads. The previous implementation blocked all client threads from calling BatchWriter.addMutation(), meaning the clients could not do any work. In the new implementation the clients will be able to continue to do work, adding mutations, and even binning them in their own thread if necessary, before blocking. I'll see if I can re-test with this new approach in the next few days. Do you have a different thought about how to test this? > BatchWriter blocks all addMutation calls while binning mutations > > > Key: ACCUMULO-1755 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1755 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Adam Fuchs >Assignee: Dave Marion > Fix For: 1.6.6, 1.7.2, 1.8.0 > > Attachments: ACCUMULO-1755.patch > > Time Spent: 2h > Remaining Estimate: 0h > > Through code inspection, we found that the BatchWriter bins mutations inside > of a synchronized block that covers calls to addMutation. Binning potentially > involves lookups of tablet metadata and processes a fair amount of > information. We will get better parallelism if we can either unlock the lock > while binning, dedicate another thread to do the binning, or use one of the > send threads to do the binning. > This has not been verified empirically yet, so there is not yet any profiling > info to indicate the level of improvement that we should expect. Profiling > and repeatable demonstration of this performance bottleneck should be the > first step on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations
[ https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176585#comment-15176585 ] Adam Fuchs commented on ACCUMULO-1755: -- Thanks, Dave. Got any perf testing results that show how much this improved things? > BatchWriter blocks all addMutation calls while binning mutations > > > Key: ACCUMULO-1755 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1755 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Adam Fuchs >Assignee: Dave Marion > Fix For: 1.6.6, 1.7.2, 1.8.0 > > Attachments: ACCUMULO-1755.patch > > Time Spent: 2h > Remaining Estimate: 0h > > Through code inspection, we found that the BatchWriter bins mutations inside > of a synchronized block that covers calls to addMutation. Binning potentially > involves lookups of tablet metadata and processes a fair amount of > information. We will get better parallelism if we can either unlock the lock > while binning, dedicate another thread to do the binning, or use one of the > send threads to do the binning. > This has not been verified empirically yet, so there is not yet any profiling > info to indicate the level of improvement that we should expect. Profiling > and repeatable demonstration of this performance bottleneck should be the > first step on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4156) Tunable replication frequency
[ https://issues.apache.org/jira/browse/ACCUMULO-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176574#comment-15176574 ] Josh Elser commented on ACCUMULO-4156: -- To expand some more (for my own benefit): I was initially considering using offset into WAL entries as the tracking point, however, it might be more consistent to use the same last compaction after the last start (same logic wal recovery uses). I'm not sure if we need to do that for the same consistency reasons that WAL does, but it might be a little more natural (the flow inside the tabletserver is already set up to record the flushID) than just offset tracking into the WALs . > Tunable replication frequency > - > > Key: ACCUMULO-4156 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4156 > Project: Accumulo > Issue Type: Improvement > Components: core >Affects Versions: 1.7.1 >Reporter: William Slacum > Fix For: 1.8.0 > > > Currently, replication happens when a write ahead log file is closed. The > only parameter to toggle when this event occurs is write ahead log size, and > is only applicable to the tablet servers themselves. > By default this means that when replication happens isn't tied to the table > it is configured on, but also exogenous factors such as total write load and > failures. If a system receives ~100MB/day/TServer, and the WAL size is its > default 1GB, it will take 10 days for any replication event to occur. Another > possibility is that an unreplicated table is receiving many writes, which > will cause more frequent replication events, but proportionally the work will > involve less data for the table being replicated. > I don't have a specific implementation in mind, but I'd like to see a > solution that involves isolating the work down to specific table events such > as time-since-last-replication and data-added-since-last-replication. > [~elserj] has had some ideas about doing things incrementally within WAL > files (ie, replicating between two sync points) that can also help with this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4156) Tunable replication frequency
[ https://issues.apache.org/jira/browse/ACCUMULO-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176521#comment-15176521 ] William Slacum commented on ACCUMULO-4156: -- I was specifically talking about just how Accumulo handles WAL replay in the face of failures, unrelated to replication. We clarified offline and to summarize: there is the possibility of piggybacking off the flush ID used to prevent WAL data from being replayed after it has been flushed to disk via minor compaction. > Tunable replication frequency > - > > Key: ACCUMULO-4156 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4156 > Project: Accumulo > Issue Type: Improvement > Components: core >Affects Versions: 1.7.1 >Reporter: William Slacum > Fix For: 1.8.0 > > > Currently, replication happens when a write ahead log file is closed. The > only parameter to toggle when this event occurs is write ahead log size, and > is only applicable to the tablet servers themselves. > By default this means that when replication happens isn't tied to the table > it is configured on, but also exogenous factors such as total write load and > failures. If a system receives ~100MB/day/TServer, and the WAL size is its > default 1GB, it will take 10 days for any replication event to occur. Another > possibility is that an unreplicated table is receiving many writes, which > will cause more frequent replication events, but proportionally the work will > involve less data for the table being replicated. > I don't have a specific implementation in mind, but I'd like to see a > solution that involves isolating the work down to specific table events such > as time-since-last-replication and data-added-since-last-replication. > [~elserj] has had some ideas about doing things incrementally within WAL > files (ie, replicating between two sync points) that can also help with this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations
[ https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Marion updated ACCUMULO-1755: -- Attachment: ACCUMULO-1755.patch Attaching original patch > BatchWriter blocks all addMutation calls while binning mutations > > > Key: ACCUMULO-1755 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1755 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Adam Fuchs >Assignee: Dave Marion > Fix For: 1.6.6, 1.7.2, 1.8.0 > > Attachments: ACCUMULO-1755.patch > > Time Spent: 2h > Remaining Estimate: 0h > > Through code inspection, we found that the BatchWriter bins mutations inside > of a synchronized block that covers calls to addMutation. Binning potentially > involves lookups of tablet metadata and processes a fair amount of > information. We will get better parallelism if we can either unlock the lock > while binning, dedicate another thread to do the binning, or use one of the > send threads to do the binning. > This has not been verified empirically yet, so there is not yet any profiling > info to indicate the level of improvement that we should expect. Profiling > and repeatable demonstration of this performance bottleneck should be the > first step on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations
[ https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Marion resolved ACCUMULO-1755. --- Resolution: Fixed Committed to 1.6 and merged up to master. Built with 'mvn clean verify -DskipITs' on each branch and ran the new IT seperately. > BatchWriter blocks all addMutation calls while binning mutations > > > Key: ACCUMULO-1755 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1755 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Adam Fuchs >Assignee: Dave Marion > Fix For: 1.6.6, 1.7.2, 1.8.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Through code inspection, we found that the BatchWriter bins mutations inside > of a synchronized block that covers calls to addMutation. Binning potentially > involves lookups of tablet metadata and processes a fair amount of > information. We will get better parallelism if we can either unlock the lock > while binning, dedicate another thread to do the binning, or use one of the > send threads to do the binning. > This has not been verified empirically yet, so there is not yet any profiling > info to indicate the level of improvement that we should expect. Profiling > and repeatable demonstration of this performance bottleneck should be the > first step on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations
[ https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dave Marion updated ACCUMULO-1755: -- Fix Version/s: 1.7.2 1.6.6 > BatchWriter blocks all addMutation calls while binning mutations > > > Key: ACCUMULO-1755 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1755 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Adam Fuchs >Assignee: Dave Marion > Fix For: 1.6.6, 1.7.2, 1.8.0 > > Time Spent: 2h > Remaining Estimate: 0h > > Through code inspection, we found that the BatchWriter bins mutations inside > of a synchronized block that covers calls to addMutation. Binning potentially > involves lookups of tablet metadata and processes a fair amount of > information. We will get better parallelism if we can either unlock the lock > while binning, dedicate another thread to do the binning, or use one of the > send threads to do the binning. > This has not been verified empirically yet, so there is not yet any profiling > info to indicate the level of improvement that we should expect. Profiling > and repeatable demonstration of this performance bottleneck should be the > first step on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4156) Tunable replication frequency
[ https://issues.apache.org/jira/browse/ACCUMULO-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176380#comment-15176380 ] Josh Elser commented on ACCUMULO-4156: -- bq. I didn't see any special offsets for WAL files Sorry, I meant to say, definitively, that this doesn't exist. I think this was something I considered as a future improvement which would enable more responsive replication. bq. I think you'd need some marker for a WAL that lives through a flush so you don't do a double-insert incase of a failure after a flush. But a flush is just pushing the IMM to disk -- the records should already be recorded in the WAL by the time they make it into the IMM. Am I misunderstanding? > Tunable replication frequency > - > > Key: ACCUMULO-4156 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4156 > Project: Accumulo > Issue Type: Improvement > Components: core >Affects Versions: 1.7.1 >Reporter: William Slacum > Fix For: 1.8.0 > > > Currently, replication happens when a write ahead log file is closed. The > only parameter to toggle when this event occurs is write ahead log size, and > is only applicable to the tablet servers themselves. > By default this means that when replication happens isn't tied to the table > it is configured on, but also exogenous factors such as total write load and > failures. If a system receives ~100MB/day/TServer, and the WAL size is its > default 1GB, it will take 10 days for any replication event to occur. Another > possibility is that an unreplicated table is receiving many writes, which > will cause more frequent replication events, but proportionally the work will > involve less data for the table being replicated. > I don't have a specific implementation in mind, but I'd like to see a > solution that involves isolating the work down to specific table events such > as time-since-last-replication and data-added-since-last-replication. > [~elserj] has had some ideas about doing things incrementally within WAL > files (ie, replicating between two sync points) that can also help with this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations
[ https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176370#comment-15176370 ] ASF GitHub Bot commented on ACCUMULO-1755: -- Github user dlmarion closed the pull request at: https://github.com/apache/accumulo/pull/75 > BatchWriter blocks all addMutation calls while binning mutations > > > Key: ACCUMULO-1755 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1755 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Adam Fuchs >Assignee: Dave Marion > Fix For: 1.8.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Through code inspection, we found that the BatchWriter bins mutations inside > of a synchronized block that covers calls to addMutation. Binning potentially > involves lookups of tablet metadata and processes a fair amount of > information. We will get better parallelism if we can either unlock the lock > while binning, dedicate another thread to do the binning, or use one of the > send threads to do the binning. > This has not been verified empirically yet, so there is not yet any profiling > info to indicate the level of improvement that we should expect. Profiling > and repeatable demonstration of this performance bottleneck should be the > first step on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4156) Tunable replication frequency
[ https://issues.apache.org/jira/browse/ACCUMULO-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176372#comment-15176372 ] William Slacum commented on ACCUMULO-4156: -- Yeah I wouldn't doubt it. I didn't see any special offsets for WAL files, though I think you'd need some marker for a WAL that lives through a flush so you don't do a double-insert incase of a failure after a flush. > Tunable replication frequency > - > > Key: ACCUMULO-4156 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4156 > Project: Accumulo > Issue Type: Improvement > Components: core >Affects Versions: 1.7.1 >Reporter: William Slacum > Fix For: 1.8.0 > > > Currently, replication happens when a write ahead log file is closed. The > only parameter to toggle when this event occurs is write ahead log size, and > is only applicable to the tablet servers themselves. > By default this means that when replication happens isn't tied to the table > it is configured on, but also exogenous factors such as total write load and > failures. If a system receives ~100MB/day/TServer, and the WAL size is its > default 1GB, it will take 10 days for any replication event to occur. Another > possibility is that an unreplicated table is receiving many writes, which > will cause more frequent replication events, but proportionally the work will > involve less data for the table being replicated. > I don't have a specific implementation in mind, but I'd like to see a > solution that involves isolating the work down to specific table events such > as time-since-last-replication and data-added-since-last-replication. > [~elserj] has had some ideas about doing things incrementally within WAL > files (ie, replicating between two sync points) that can also help with this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations
[ https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176369#comment-15176369 ] ASF GitHub Bot commented on ACCUMULO-1755: -- Github user dlmarion commented on the pull request: https://github.com/apache/accumulo/pull/75#issuecomment-191403081 Will apply manually. Thx. > BatchWriter blocks all addMutation calls while binning mutations > > > Key: ACCUMULO-1755 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1755 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Adam Fuchs >Assignee: Dave Marion > Fix For: 1.8.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Through code inspection, we found that the BatchWriter bins mutations inside > of a synchronized block that covers calls to addMutation. Binning potentially > involves lookups of tablet metadata and processes a fair amount of > information. We will get better parallelism if we can either unlock the lock > while binning, dedicate another thread to do the binning, or use one of the > send threads to do the binning. > This has not been verified empirically yet, so there is not yet any profiling > info to indicate the level of improvement that we should expect. Profiling > and repeatable demonstration of this performance bottleneck should be the > first step on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4156) Tunable replication frequency
[ https://issues.apache.org/jira/browse/ACCUMULO-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176288#comment-15176288 ] Josh Elser commented on ACCUMULO-4156: -- {quote} I don't have a specific implementation in mind, but I'd like to see a solution that involves isolating the work down to specific table events such as time-since-last-replication and data-added-since-last-replication. Josh Elser has had some ideas about doing things incrementally within WAL files (ie, replicating between two sync points) that can also help with this. {quote} I wish I remembered a bit more the model of "doing this safely", replicating from some offsetA to offsetB in a WAL, but my brain has evicted what little I once had figured out. The original design was meant to work like this (proactively replicate the data once it was synced to the WAL -- as this is the point we are guaranteed that the data is "written"), but there was something I had run into along the way. I wish I remembered what exactly it was, but it would be great to remove the little flag that ignores replication until a WAL is "closed" (impossible to be used by any tserver anymore). Maybe it was related to the lack of implicit entries in a WAL? We don't explicitly track how many entries are in a WAL now (just an "infinite length" equating to reading the entire WAL for replication); that would make it very difficult to track this. If we could keep a simple one-level index somewhere (byte offset to WAL entry record offset), that might be enough. It might be easy to force a roll of WALs from some client admin API, but that also has local write performance implications. I think we'd need to think about it from both sides: operational use and developer enablement/ease-of-use. > Tunable replication frequency > - > > Key: ACCUMULO-4156 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4156 > Project: Accumulo > Issue Type: Improvement > Components: core >Affects Versions: 1.7.1 >Reporter: William Slacum > Fix For: 1.8.0 > > > Currently, replication happens when a write ahead log file is closed. The > only parameter to toggle when this event occurs is write ahead log size, and > is only applicable to the tablet servers themselves. > By default this means that when replication happens isn't tied to the table > it is configured on, but also exogenous factors such as total write load and > failures. If a system receives ~100MB/day/TServer, and the WAL size is its > default 1GB, it will take 10 days for any replication event to occur. Another > possibility is that an unreplicated table is receiving many writes, which > will cause more frequent replication events, but proportionally the work will > involve less data for the table being replicated. > I don't have a specific implementation in mind, but I'd like to see a > solution that involves isolating the work down to specific table events such > as time-since-last-replication and data-added-since-last-replication. > [~elserj] has had some ideas about doing things incrementally within WAL > files (ie, replicating between two sync points) that can also help with this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations
[ https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176270#comment-15176270 ] ASF GitHub Bot commented on ACCUMULO-1755: -- Github user keith-turner commented on the pull request: https://github.com/apache/accumulo/pull/75#issuecomment-191376367 > I guarded the two methods that update the stats with trace logging checks. thats a nice improvement. I think this patch looks good now +1 > BatchWriter blocks all addMutation calls while binning mutations > > > Key: ACCUMULO-1755 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1755 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Adam Fuchs >Assignee: Dave Marion > Fix For: 1.8.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Through code inspection, we found that the BatchWriter bins mutations inside > of a synchronized block that covers calls to addMutation. Binning potentially > involves lookups of tablet metadata and processes a fair amount of > information. We will get better parallelism if we can either unlock the lock > while binning, dedicate another thread to do the binning, or use one of the > send threads to do the binning. > This has not been verified empirically yet, so there is not yet any profiling > info to indicate the level of improvement that we should expect. Profiling > and repeatable demonstration of this performance bottleneck should be the > first step on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (ACCUMULO-4156) Tunable replication frequency
William Slacum created ACCUMULO-4156: Summary: Tunable replication frequency Key: ACCUMULO-4156 URL: https://issues.apache.org/jira/browse/ACCUMULO-4156 Project: Accumulo Issue Type: Improvement Components: core Affects Versions: 1.7.1 Reporter: William Slacum Fix For: 1.8.0 Currently, replication happens when a write ahead log file is closed. The only parameter to toggle when this event occurs is write ahead log size, and is only applicable to the tablet servers themselves. By default this means that when replication happens isn't tied to the table it is configured on, but also exogenous factors such as total write load and failures. If a system receives ~100MB/day/TServer, and the WAL size is its default 1GB, it will take 10 days for any replication event to occur. Another possibility is that an unreplicated table is receiving many writes, which will cause more frequent replication events, but proportionally the work will involve less data for the table being replicated. I don't have a specific implementation in mind, but I'd like to see a solution that involves isolating the work down to specific table events such as time-since-last-replication and data-added-since-last-replication. [~elserj] has had some ideas about doing things incrementally within WAL files (ie, replicating between two sync points) that can also help with this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Accumulo-Pull-Requests - Build # 220 - Fixed
The Apache Jenkins build system has built Accumulo-Pull-Requests (build #220) Status: Fixed Check console output at https://builds.apache.org/job/Accumulo-Pull-Requests/220/ to view the results.
[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations
[ https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175819#comment-15175819 ] ASF GitHub Bot commented on ACCUMULO-1755: -- Github user keith-turner commented on the pull request: https://github.com/apache/accumulo/pull/75#issuecomment-191295793 Java 8 added accumulateAndGet to AtomicInt which can be used w/ lambdas to compute min max. Java 8 is so nice, but we can't use it yet. In java 8 could do the following ```java atomicInt.accumulateAndGet(update, Math::min) ``` > BatchWriter blocks all addMutation calls while binning mutations > > > Key: ACCUMULO-1755 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1755 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Adam Fuchs >Assignee: Dave Marion > Fix For: 1.8.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Through code inspection, we found that the BatchWriter bins mutations inside > of a synchronized block that covers calls to addMutation. Binning potentially > involves lookups of tablet metadata and processes a fair amount of > information. We will get better parallelism if we can either unlock the lock > while binning, dedicate another thread to do the binning, or use one of the > send threads to do the binning. > This has not been verified empirically yet, so there is not yet any profiling > info to indicate the level of improvement that we should expect. Profiling > and repeatable demonstration of this performance bottleneck should be the > first step on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations
[ https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175815#comment-15175815 ] ASF GitHub Bot commented on ACCUMULO-1755: -- Github user keith-turner commented on the pull request: https://github.com/apache/accumulo/pull/75#issuecomment-191293786 To make findbugs happy could CAS in a loop to compute the min and max, something like : ```java private static void computeMin(AtomicInt stat, int update) { int old = stat.get(); while(!stat.compareAndSet(old, Math.min(old, update))){ old = stat.get(); } } ``` > BatchWriter blocks all addMutation calls while binning mutations > > > Key: ACCUMULO-1755 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1755 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Adam Fuchs >Assignee: Dave Marion > Fix For: 1.8.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Through code inspection, we found that the BatchWriter bins mutations inside > of a synchronized block that covers calls to addMutation. Binning potentially > involves lookups of tablet metadata and processes a fair amount of > information. We will get better parallelism if we can either unlock the lock > while binning, dedicate another thread to do the binning, or use one of the > send threads to do the binning. > This has not been verified empirically yet, so there is not yet any profiling > info to indicate the level of improvement that we should expect. Profiling > and repeatable demonstration of this performance bottleneck should be the > first step on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations
[ https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175793#comment-15175793 ] ASF GitHub Bot commented on ACCUMULO-1755: -- Github user joshelser commented on the pull request: https://github.com/apache/accumulo/pull/75#issuecomment-191289655 > which is a findbugs issue, and I don't see the issue that it's complaining about Can you reproduce it locally via `mvn verify -DskipTests -Dcheckstyle.skip`? I know the jenkins output can sometimes be... a little weird to parse for w/e reason. > BatchWriter blocks all addMutation calls while binning mutations > > > Key: ACCUMULO-1755 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1755 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Adam Fuchs >Assignee: Dave Marion > Fix For: 1.8.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Through code inspection, we found that the BatchWriter bins mutations inside > of a synchronized block that covers calls to addMutation. Binning potentially > involves lookups of tablet metadata and processes a fair amount of > information. We will get better parallelism if we can either unlock the lock > while binning, dedicate another thread to do the binning, or use one of the > send threads to do the binning. > This has not been verified empirically yet, so there is not yet any profiling > info to indicate the level of improvement that we should expect. Profiling > and repeatable demonstration of this performance bottleneck should be the > first step on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations
[ https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175643#comment-15175643 ] ASF GitHub Bot commented on ACCUMULO-1755: -- Github user dlmarion commented on the pull request: https://github.com/apache/accumulo/pull/75#issuecomment-191251406 I looked into the build failure, which is a findbugs issue, and I don't see the issue that it's complaining about. AtomicInteger / Long do not implement Lock. > BatchWriter blocks all addMutation calls while binning mutations > > > Key: ACCUMULO-1755 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1755 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Adam Fuchs >Assignee: Dave Marion > Fix For: 1.8.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Through code inspection, we found that the BatchWriter bins mutations inside > of a synchronized block that covers calls to addMutation. Binning potentially > involves lookups of tablet metadata and processes a fair amount of > information. We will get better parallelism if we can either unlock the lock > while binning, dedicate another thread to do the binning, or use one of the > send threads to do the binning. > This has not been verified empirically yet, so there is not yet any profiling > info to indicate the level of improvement that we should expect. Profiling > and repeatable demonstration of this performance bottleneck should be the > first step on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Accumulo-Pull-Requests - Build # 219 - Failure
The Apache Jenkins build system has built Accumulo-Pull-Requests (build #219) Status: Failure Check console output at https://builds.apache.org/job/Accumulo-Pull-Requests/219/ to view the results.
[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations
[ https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175625#comment-15175625 ] ASF GitHub Bot commented on ACCUMULO-1755: -- Github user dlmarion commented on the pull request: https://github.com/apache/accumulo/pull/75#issuecomment-191248202 So, I took a different approach. I believe that I resolved the race conditions by synchronizing on the objects being updated. This would still cause the performance penalty that you are talking about going to main memory. However, the stats objects being updated are only used if trace logging is enabled, so I guarded the two methods that update the stats with trace logging checks. Therefore, you will only pay a performance penalty if trace logging is enabled, but by turning on trace logging you should expect a little bit of a performance hit anyway. > BatchWriter blocks all addMutation calls while binning mutations > > > Key: ACCUMULO-1755 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1755 > Project: Accumulo > Issue Type: Improvement > Components: client >Reporter: Adam Fuchs >Assignee: Dave Marion > Fix For: 1.8.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Through code inspection, we found that the BatchWriter bins mutations inside > of a synchronized block that covers calls to addMutation. Binning potentially > involves lookups of tablet metadata and processes a fair amount of > information. We will get better parallelism if we can either unlock the lock > while binning, dedicate another thread to do the binning, or use one of the > send threads to do the binning. > This has not been verified empirically yet, so there is not yet any profiling > info to indicate the level of improvement that we should expect. Profiling > and repeatable demonstration of this performance bottleneck should be the > first step on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Accumulo-Integration-Tests - Build # 718 - Aborted! -- master
Accumulo-Integration-Tests - Build # 718 - Aborted: Check console output at https://secure.penguinsinabox.com/jenkins/job/Accumulo-Integration-Tests/718/ to view the results.