Accumulo-Integration-Tests - Build # 720 - Aborted! -- 1.7

2016-03-02 Thread elserj
Accumulo-Integration-Tests - Build # 720 - Aborted:

Check console output at 
https://secure.penguinsinabox.com/jenkins/job/Accumulo-Integration-Tests/720/ 
to view the results.

Accumulo-Integration-Tests - Build # 719 - Still unstable! -- 1.6

2016-03-02 Thread elserj
Accumulo-Integration-Tests - Build # 719 - Still unstable:

Check console output at 
https://secure.penguinsinabox.com/jenkins/job/Accumulo-Integration-Tests/719/ 
to view the results.

[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations

2016-03-02 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176713#comment-15176713
 ] 

Dave Marion commented on ACCUMULO-1755:
---

I took the test that I created and ran it against master and my feature branch 
with 1 to 6 threads. I didn't see much difference, but looking back at it now I 
think its because the test pre-creates all of the mutations and adds them as 
fast as possible. The test is really for multi-threaded correctness rather than 
performance. In the new code there is still a synchronization point when adding 
the binned mutations to the queues for the tablet servers. The send threads in 
the test (local mini accumulo cluster) must be able to keep up with adding of 
the binned mutations. I don't expect that to be the case in a real deployment. 
Good news - performance wasn't worse.

I think a better test is to write a simple multi-threaded client that creates 
and adds mutations to a common batch writer. Then, time the application as 
whole trying to insert N mutations with 1 to N client threads. The previous 
implementation blocked all client threads from calling 
BatchWriter.addMutation(), meaning the clients could not do any work. In the 
new implementation the clients will be able to continue to do work, adding 
mutations, and even binning them in their own thread if necessary, before 
blocking. I'll see if I can re-test with this new approach in the next few 
days. Do you have a different thought about how to test this?

> BatchWriter blocks all addMutation calls while binning mutations
> 
>
> Key: ACCUMULO-1755
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1755
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: Adam Fuchs
>Assignee: Dave Marion
> Fix For: 1.6.6, 1.7.2, 1.8.0
>
> Attachments: ACCUMULO-1755.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Through code inspection, we found that the BatchWriter bins mutations inside 
> of a synchronized block that covers calls to addMutation. Binning potentially 
> involves lookups of tablet metadata and processes a fair amount of 
> information. We will get better parallelism if we can either unlock the lock 
> while binning, dedicate another thread to do the binning, or use one of the 
> send threads to do the binning.
> This has not been verified empirically yet, so there is not yet any profiling 
> info to indicate the level of improvement that we should expect. Profiling 
> and repeatable demonstration of this performance bottleneck should be the 
> first step on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations

2016-03-02 Thread Adam Fuchs (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176585#comment-15176585
 ] 

Adam Fuchs commented on ACCUMULO-1755:
--

Thanks, Dave. Got any perf testing results that show how much this improved 
things?

> BatchWriter blocks all addMutation calls while binning mutations
> 
>
> Key: ACCUMULO-1755
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1755
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: Adam Fuchs
>Assignee: Dave Marion
> Fix For: 1.6.6, 1.7.2, 1.8.0
>
> Attachments: ACCUMULO-1755.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Through code inspection, we found that the BatchWriter bins mutations inside 
> of a synchronized block that covers calls to addMutation. Binning potentially 
> involves lookups of tablet metadata and processes a fair amount of 
> information. We will get better parallelism if we can either unlock the lock 
> while binning, dedicate another thread to do the binning, or use one of the 
> send threads to do the binning.
> This has not been verified empirically yet, so there is not yet any profiling 
> info to indicate the level of improvement that we should expect. Profiling 
> and repeatable demonstration of this performance bottleneck should be the 
> first step on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4156) Tunable replication frequency

2016-03-02 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176574#comment-15176574
 ] 

Josh Elser commented on ACCUMULO-4156:
--

To expand some more (for my own benefit): I was initially considering using 
offset into WAL entries as the tracking point, however, it might be more 
consistent to use the same last compaction after the last start (same logic wal 
recovery uses). I'm not sure if we need to do that for the same consistency 
reasons that WAL does, but it might be a little more natural (the flow inside 
the tabletserver is already set up to record the flushID) than just offset 
tracking into the WALs .

> Tunable replication frequency
> -
>
> Key: ACCUMULO-4156
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4156
> Project: Accumulo
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.7.1
>Reporter: William Slacum
> Fix For: 1.8.0
>
>
> Currently, replication happens when a write ahead log file is closed. The 
> only parameter to toggle when this event occurs is write ahead log size, and 
> is only applicable to the tablet servers themselves.
> By default this means that when replication happens isn't tied to the table 
> it is configured on, but also exogenous factors such as total write load and 
> failures. If a system receives ~100MB/day/TServer, and the WAL size is its 
> default 1GB, it will take 10 days for any replication event to occur. Another 
> possibility is that an unreplicated table is receiving many writes, which 
> will cause more frequent replication events, but proportionally the work will 
> involve less data for the table being replicated.
> I don't have a specific implementation in mind, but I'd like to see a 
> solution that involves isolating the work down to specific table events such 
> as time-since-last-replication and data-added-since-last-replication.
> [~elserj] has had some ideas about doing things incrementally within WAL 
> files (ie, replicating between two sync points) that can also help with this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4156) Tunable replication frequency

2016-03-02 Thread William Slacum (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176521#comment-15176521
 ] 

William Slacum commented on ACCUMULO-4156:
--

I was specifically talking about just how Accumulo handles WAL replay in the 
face of failures, unrelated to replication. We clarified offline and to 
summarize: there is the possibility of piggybacking off the flush ID used to 
prevent WAL data from being replayed after it has been flushed to disk via 
minor compaction. 

> Tunable replication frequency
> -
>
> Key: ACCUMULO-4156
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4156
> Project: Accumulo
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.7.1
>Reporter: William Slacum
> Fix For: 1.8.0
>
>
> Currently, replication happens when a write ahead log file is closed. The 
> only parameter to toggle when this event occurs is write ahead log size, and 
> is only applicable to the tablet servers themselves.
> By default this means that when replication happens isn't tied to the table 
> it is configured on, but also exogenous factors such as total write load and 
> failures. If a system receives ~100MB/day/TServer, and the WAL size is its 
> default 1GB, it will take 10 days for any replication event to occur. Another 
> possibility is that an unreplicated table is receiving many writes, which 
> will cause more frequent replication events, but proportionally the work will 
> involve less data for the table being replicated.
> I don't have a specific implementation in mind, but I'd like to see a 
> solution that involves isolating the work down to specific table events such 
> as time-since-last-replication and data-added-since-last-replication.
> [~elserj] has had some ideas about doing things incrementally within WAL 
> files (ie, replicating between two sync points) that can also help with this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations

2016-03-02 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion updated ACCUMULO-1755:
--
Attachment: ACCUMULO-1755.patch

Attaching original patch

> BatchWriter blocks all addMutation calls while binning mutations
> 
>
> Key: ACCUMULO-1755
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1755
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: Adam Fuchs
>Assignee: Dave Marion
> Fix For: 1.6.6, 1.7.2, 1.8.0
>
> Attachments: ACCUMULO-1755.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Through code inspection, we found that the BatchWriter bins mutations inside 
> of a synchronized block that covers calls to addMutation. Binning potentially 
> involves lookups of tablet metadata and processes a fair amount of 
> information. We will get better parallelism if we can either unlock the lock 
> while binning, dedicate another thread to do the binning, or use one of the 
> send threads to do the binning.
> This has not been verified empirically yet, so there is not yet any profiling 
> info to indicate the level of improvement that we should expect. Profiling 
> and repeatable demonstration of this performance bottleneck should be the 
> first step on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations

2016-03-02 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion resolved ACCUMULO-1755.
---
Resolution: Fixed

Committed to 1.6 and merged up to master. Built with 'mvn clean verify 
-DskipITs' on each branch and ran the new IT seperately.

> BatchWriter blocks all addMutation calls while binning mutations
> 
>
> Key: ACCUMULO-1755
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1755
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: Adam Fuchs
>Assignee: Dave Marion
> Fix For: 1.6.6, 1.7.2, 1.8.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Through code inspection, we found that the BatchWriter bins mutations inside 
> of a synchronized block that covers calls to addMutation. Binning potentially 
> involves lookups of tablet metadata and processes a fair amount of 
> information. We will get better parallelism if we can either unlock the lock 
> while binning, dedicate another thread to do the binning, or use one of the 
> send threads to do the binning.
> This has not been verified empirically yet, so there is not yet any profiling 
> info to indicate the level of improvement that we should expect. Profiling 
> and repeatable demonstration of this performance bottleneck should be the 
> first step on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations

2016-03-02 Thread Dave Marion (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dave Marion updated ACCUMULO-1755:
--
Fix Version/s: 1.7.2
   1.6.6

> BatchWriter blocks all addMutation calls while binning mutations
> 
>
> Key: ACCUMULO-1755
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1755
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: Adam Fuchs
>Assignee: Dave Marion
> Fix For: 1.6.6, 1.7.2, 1.8.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Through code inspection, we found that the BatchWriter bins mutations inside 
> of a synchronized block that covers calls to addMutation. Binning potentially 
> involves lookups of tablet metadata and processes a fair amount of 
> information. We will get better parallelism if we can either unlock the lock 
> while binning, dedicate another thread to do the binning, or use one of the 
> send threads to do the binning.
> This has not been verified empirically yet, so there is not yet any profiling 
> info to indicate the level of improvement that we should expect. Profiling 
> and repeatable demonstration of this performance bottleneck should be the 
> first step on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4156) Tunable replication frequency

2016-03-02 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176380#comment-15176380
 ] 

Josh Elser commented on ACCUMULO-4156:
--

bq. I didn't see any special offsets for WAL files

Sorry, I meant to say, definitively, that this doesn't exist. I think this was 
something I considered as a future improvement which would enable more 
responsive replication.

bq. I think you'd need some marker for a WAL that lives through a flush so you 
don't do a double-insert incase of a failure after a flush.

But a flush is just pushing the IMM to disk -- the records should already be 
recorded in the WAL by the time they make it into the IMM. Am I 
misunderstanding?

> Tunable replication frequency
> -
>
> Key: ACCUMULO-4156
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4156
> Project: Accumulo
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.7.1
>Reporter: William Slacum
> Fix For: 1.8.0
>
>
> Currently, replication happens when a write ahead log file is closed. The 
> only parameter to toggle when this event occurs is write ahead log size, and 
> is only applicable to the tablet servers themselves.
> By default this means that when replication happens isn't tied to the table 
> it is configured on, but also exogenous factors such as total write load and 
> failures. If a system receives ~100MB/day/TServer, and the WAL size is its 
> default 1GB, it will take 10 days for any replication event to occur. Another 
> possibility is that an unreplicated table is receiving many writes, which 
> will cause more frequent replication events, but proportionally the work will 
> involve less data for the table being replicated.
> I don't have a specific implementation in mind, but I'd like to see a 
> solution that involves isolating the work down to specific table events such 
> as time-since-last-replication and data-added-since-last-replication.
> [~elserj] has had some ideas about doing things incrementally within WAL 
> files (ie, replicating between two sync points) that can also help with this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations

2016-03-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176370#comment-15176370
 ] 

ASF GitHub Bot commented on ACCUMULO-1755:
--

Github user dlmarion closed the pull request at:

https://github.com/apache/accumulo/pull/75


> BatchWriter blocks all addMutation calls while binning mutations
> 
>
> Key: ACCUMULO-1755
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1755
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: Adam Fuchs
>Assignee: Dave Marion
> Fix For: 1.8.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Through code inspection, we found that the BatchWriter bins mutations inside 
> of a synchronized block that covers calls to addMutation. Binning potentially 
> involves lookups of tablet metadata and processes a fair amount of 
> information. We will get better parallelism if we can either unlock the lock 
> while binning, dedicate another thread to do the binning, or use one of the 
> send threads to do the binning.
> This has not been verified empirically yet, so there is not yet any profiling 
> info to indicate the level of improvement that we should expect. Profiling 
> and repeatable demonstration of this performance bottleneck should be the 
> first step on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4156) Tunable replication frequency

2016-03-02 Thread William Slacum (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176372#comment-15176372
 ] 

William Slacum commented on ACCUMULO-4156:
--

Yeah I wouldn't doubt it. I didn't see any special offsets for WAL files, 
though I think you'd need some marker for a WAL that lives through a flush so 
you don't do a double-insert incase of a failure after a flush.

> Tunable replication frequency
> -
>
> Key: ACCUMULO-4156
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4156
> Project: Accumulo
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.7.1
>Reporter: William Slacum
> Fix For: 1.8.0
>
>
> Currently, replication happens when a write ahead log file is closed. The 
> only parameter to toggle when this event occurs is write ahead log size, and 
> is only applicable to the tablet servers themselves.
> By default this means that when replication happens isn't tied to the table 
> it is configured on, but also exogenous factors such as total write load and 
> failures. If a system receives ~100MB/day/TServer, and the WAL size is its 
> default 1GB, it will take 10 days for any replication event to occur. Another 
> possibility is that an unreplicated table is receiving many writes, which 
> will cause more frequent replication events, but proportionally the work will 
> involve less data for the table being replicated.
> I don't have a specific implementation in mind, but I'd like to see a 
> solution that involves isolating the work down to specific table events such 
> as time-since-last-replication and data-added-since-last-replication.
> [~elserj] has had some ideas about doing things incrementally within WAL 
> files (ie, replicating between two sync points) that can also help with this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations

2016-03-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176369#comment-15176369
 ] 

ASF GitHub Bot commented on ACCUMULO-1755:
--

Github user dlmarion commented on the pull request:

https://github.com/apache/accumulo/pull/75#issuecomment-191403081
  
Will apply manually. Thx.


> BatchWriter blocks all addMutation calls while binning mutations
> 
>
> Key: ACCUMULO-1755
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1755
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: Adam Fuchs
>Assignee: Dave Marion
> Fix For: 1.8.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Through code inspection, we found that the BatchWriter bins mutations inside 
> of a synchronized block that covers calls to addMutation. Binning potentially 
> involves lookups of tablet metadata and processes a fair amount of 
> information. We will get better parallelism if we can either unlock the lock 
> while binning, dedicate another thread to do the binning, or use one of the 
> send threads to do the binning.
> This has not been verified empirically yet, so there is not yet any profiling 
> info to indicate the level of improvement that we should expect. Profiling 
> and repeatable demonstration of this performance bottleneck should be the 
> first step on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4156) Tunable replication frequency

2016-03-02 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176288#comment-15176288
 ] 

Josh Elser commented on ACCUMULO-4156:
--

{quote}
I don't have a specific implementation in mind, but I'd like to see a solution 
that involves isolating the work down to specific table events such as 
time-since-last-replication and data-added-since-last-replication.

Josh Elser has had some ideas about doing things incrementally within WAL files 
(ie, replicating between two sync points) that can also help with this. 
{quote}

I wish I remembered a bit more the model of "doing this safely", replicating 
from some offsetA to offsetB in a WAL, but my brain has evicted what little I 
once had figured out. The original design was meant to work like this 
(proactively replicate the data once it was synced to the WAL -- as this is the 
point we are guaranteed that the data is "written"), but there was something I 
had run into along the way. I wish I remembered what exactly it was, but it 
would be great to remove the little flag that ignores replication until a WAL 
is "closed" (impossible to be used by any tserver anymore). Maybe it was 
related to the lack of implicit entries in a WAL? We don't explicitly track how 
many entries are in a WAL now (just an "infinite length" equating to reading 
the entire WAL for replication); that would make it very difficult to track 
this. If we could keep a simple one-level index somewhere (byte offset to WAL 
entry record offset), that might be enough.

It might be easy to force a roll of WALs from some client admin API, but that 
also has local write performance implications. I think we'd need to think about 
it from both sides: operational use and developer enablement/ease-of-use.

> Tunable replication frequency
> -
>
> Key: ACCUMULO-4156
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4156
> Project: Accumulo
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.7.1
>Reporter: William Slacum
> Fix For: 1.8.0
>
>
> Currently, replication happens when a write ahead log file is closed. The 
> only parameter to toggle when this event occurs is write ahead log size, and 
> is only applicable to the tablet servers themselves.
> By default this means that when replication happens isn't tied to the table 
> it is configured on, but also exogenous factors such as total write load and 
> failures. If a system receives ~100MB/day/TServer, and the WAL size is its 
> default 1GB, it will take 10 days for any replication event to occur. Another 
> possibility is that an unreplicated table is receiving many writes, which 
> will cause more frequent replication events, but proportionally the work will 
> involve less data for the table being replicated.
> I don't have a specific implementation in mind, but I'd like to see a 
> solution that involves isolating the work down to specific table events such 
> as time-since-last-replication and data-added-since-last-replication.
> [~elserj] has had some ideas about doing things incrementally within WAL 
> files (ie, replicating between two sync points) that can also help with this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations

2016-03-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176270#comment-15176270
 ] 

ASF GitHub Bot commented on ACCUMULO-1755:
--

Github user keith-turner commented on the pull request:

https://github.com/apache/accumulo/pull/75#issuecomment-191376367
  
>  I guarded the two methods that update the stats with trace logging 
checks. 

thats a nice improvement.

I think this patch looks good now +1


> BatchWriter blocks all addMutation calls while binning mutations
> 
>
> Key: ACCUMULO-1755
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1755
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: Adam Fuchs
>Assignee: Dave Marion
> Fix For: 1.8.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Through code inspection, we found that the BatchWriter bins mutations inside 
> of a synchronized block that covers calls to addMutation. Binning potentially 
> involves lookups of tablet metadata and processes a fair amount of 
> information. We will get better parallelism if we can either unlock the lock 
> while binning, dedicate another thread to do the binning, or use one of the 
> send threads to do the binning.
> This has not been verified empirically yet, so there is not yet any profiling 
> info to indicate the level of improvement that we should expect. Profiling 
> and repeatable demonstration of this performance bottleneck should be the 
> first step on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ACCUMULO-4156) Tunable replication frequency

2016-03-02 Thread William Slacum (JIRA)
William Slacum created ACCUMULO-4156:


 Summary: Tunable replication frequency
 Key: ACCUMULO-4156
 URL: https://issues.apache.org/jira/browse/ACCUMULO-4156
 Project: Accumulo
  Issue Type: Improvement
  Components: core
Affects Versions: 1.7.1
Reporter: William Slacum
 Fix For: 1.8.0


Currently, replication happens when a write ahead log file is closed. The only 
parameter to toggle when this event occurs is write ahead log size, and is only 
applicable to the tablet servers themselves.

By default this means that when replication happens isn't tied to the table it 
is configured on, but also exogenous factors such as total write load and 
failures. If a system receives ~100MB/day/TServer, and the WAL size is its 
default 1GB, it will take 10 days for any replication event to occur. Another 
possibility is that an unreplicated table is receiving many writes, which will 
cause more frequent replication events, but proportionally the work will 
involve less data for the table being replicated.

I don't have a specific implementation in mind, but I'd like to see a solution 
that involves isolating the work down to specific table events such as 
time-since-last-replication and data-added-since-last-replication.

[~elserj] has had some ideas about doing things incrementally within WAL files 
(ie, replicating between two sync points) that can also help with this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Accumulo-Pull-Requests - Build # 220 - Fixed

2016-03-02 Thread Apache Jenkins Server
The Apache Jenkins build system has built Accumulo-Pull-Requests (build #220)

Status: Fixed

Check console output at 
https://builds.apache.org/job/Accumulo-Pull-Requests/220/ to view the results.

[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations

2016-03-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175819#comment-15175819
 ] 

ASF GitHub Bot commented on ACCUMULO-1755:
--

Github user keith-turner commented on the pull request:

https://github.com/apache/accumulo/pull/75#issuecomment-191295793
  
Java 8 added accumulateAndGet to AtomicInt which can be used w/ lambdas to 
compute min max.  Java 8 is so nice, but we can't use it yet.

In java 8 could do the following

```java
   atomicInt.accumulateAndGet(update, Math::min)
```




> BatchWriter blocks all addMutation calls while binning mutations
> 
>
> Key: ACCUMULO-1755
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1755
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: Adam Fuchs
>Assignee: Dave Marion
> Fix For: 1.8.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Through code inspection, we found that the BatchWriter bins mutations inside 
> of a synchronized block that covers calls to addMutation. Binning potentially 
> involves lookups of tablet metadata and processes a fair amount of 
> information. We will get better parallelism if we can either unlock the lock 
> while binning, dedicate another thread to do the binning, or use one of the 
> send threads to do the binning.
> This has not been verified empirically yet, so there is not yet any profiling 
> info to indicate the level of improvement that we should expect. Profiling 
> and repeatable demonstration of this performance bottleneck should be the 
> first step on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations

2016-03-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175815#comment-15175815
 ] 

ASF GitHub Bot commented on ACCUMULO-1755:
--

Github user keith-turner commented on the pull request:

https://github.com/apache/accumulo/pull/75#issuecomment-191293786
  

To make findbugs happy could CAS in a loop to compute the min and max, 
something like :

```java
   private static void computeMin(AtomicInt stat, int update) {
   int old = stat.get();
   while(!stat.compareAndSet(old, Math.min(old, update))){
   old = stat.get();
   }
   }
```



> BatchWriter blocks all addMutation calls while binning mutations
> 
>
> Key: ACCUMULO-1755
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1755
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: Adam Fuchs
>Assignee: Dave Marion
> Fix For: 1.8.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Through code inspection, we found that the BatchWriter bins mutations inside 
> of a synchronized block that covers calls to addMutation. Binning potentially 
> involves lookups of tablet metadata and processes a fair amount of 
> information. We will get better parallelism if we can either unlock the lock 
> while binning, dedicate another thread to do the binning, or use one of the 
> send threads to do the binning.
> This has not been verified empirically yet, so there is not yet any profiling 
> info to indicate the level of improvement that we should expect. Profiling 
> and repeatable demonstration of this performance bottleneck should be the 
> first step on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations

2016-03-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175793#comment-15175793
 ] 

ASF GitHub Bot commented on ACCUMULO-1755:
--

Github user joshelser commented on the pull request:

https://github.com/apache/accumulo/pull/75#issuecomment-191289655
  
> which is a findbugs issue, and I don't see the issue that it's 
complaining about

Can you reproduce it locally via `mvn verify -DskipTests 
-Dcheckstyle.skip`? I know the jenkins output can sometimes be... a little 
weird to parse for w/e reason.


> BatchWriter blocks all addMutation calls while binning mutations
> 
>
> Key: ACCUMULO-1755
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1755
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: Adam Fuchs
>Assignee: Dave Marion
> Fix For: 1.8.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Through code inspection, we found that the BatchWriter bins mutations inside 
> of a synchronized block that covers calls to addMutation. Binning potentially 
> involves lookups of tablet metadata and processes a fair amount of 
> information. We will get better parallelism if we can either unlock the lock 
> while binning, dedicate another thread to do the binning, or use one of the 
> send threads to do the binning.
> This has not been verified empirically yet, so there is not yet any profiling 
> info to indicate the level of improvement that we should expect. Profiling 
> and repeatable demonstration of this performance bottleneck should be the 
> first step on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations

2016-03-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175643#comment-15175643
 ] 

ASF GitHub Bot commented on ACCUMULO-1755:
--

Github user dlmarion commented on the pull request:

https://github.com/apache/accumulo/pull/75#issuecomment-191251406
  
I looked into the build failure, which is a findbugs issue, and I don't see 
the issue that it's complaining about. AtomicInteger / Long do not implement 
Lock.


> BatchWriter blocks all addMutation calls while binning mutations
> 
>
> Key: ACCUMULO-1755
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1755
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: Adam Fuchs
>Assignee: Dave Marion
> Fix For: 1.8.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Through code inspection, we found that the BatchWriter bins mutations inside 
> of a synchronized block that covers calls to addMutation. Binning potentially 
> involves lookups of tablet metadata and processes a fair amount of 
> information. We will get better parallelism if we can either unlock the lock 
> while binning, dedicate another thread to do the binning, or use one of the 
> send threads to do the binning.
> This has not been verified empirically yet, so there is not yet any profiling 
> info to indicate the level of improvement that we should expect. Profiling 
> and repeatable demonstration of this performance bottleneck should be the 
> first step on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Accumulo-Pull-Requests - Build # 219 - Failure

2016-03-02 Thread Apache Jenkins Server
The Apache Jenkins build system has built Accumulo-Pull-Requests (build #219)

Status: Failure

Check console output at 
https://builds.apache.org/job/Accumulo-Pull-Requests/219/ to view the results.

[jira] [Commented] (ACCUMULO-1755) BatchWriter blocks all addMutation calls while binning mutations

2016-03-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175625#comment-15175625
 ] 

ASF GitHub Bot commented on ACCUMULO-1755:
--

Github user dlmarion commented on the pull request:

https://github.com/apache/accumulo/pull/75#issuecomment-191248202
  
So, I took a different approach. I believe that I resolved the race 
conditions by synchronizing on the objects being updated. This would still 
cause the performance penalty that you are talking about going to main memory. 
However, the stats objects being updated are only used if trace logging is 
enabled, so I guarded the two methods that update the stats with trace logging 
checks. Therefore, you will only pay a performance penalty if trace logging is 
enabled, but by turning on trace logging you should expect a little bit of a 
performance hit anyway.


> BatchWriter blocks all addMutation calls while binning mutations
> 
>
> Key: ACCUMULO-1755
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1755
> Project: Accumulo
>  Issue Type: Improvement
>  Components: client
>Reporter: Adam Fuchs
>Assignee: Dave Marion
> Fix For: 1.8.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Through code inspection, we found that the BatchWriter bins mutations inside 
> of a synchronized block that covers calls to addMutation. Binning potentially 
> involves lookups of tablet metadata and processes a fair amount of 
> information. We will get better parallelism if we can either unlock the lock 
> while binning, dedicate another thread to do the binning, or use one of the 
> send threads to do the binning.
> This has not been verified empirically yet, so there is not yet any profiling 
> info to indicate the level of improvement that we should expect. Profiling 
> and repeatable demonstration of this performance bottleneck should be the 
> first step on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Accumulo-Integration-Tests - Build # 718 - Aborted! -- master

2016-03-02 Thread elserj
Accumulo-Integration-Tests - Build # 718 - Aborted:

Check console output at 
https://secure.penguinsinabox.com/jenkins/job/Accumulo-Integration-Tests/718/ 
to view the results.