[jira] [Commented] (ACCUMULO-1833) MultiTableBatchWriterImpl.getBatchWriter() is not performant for multiple threads

ASF subversion and git services (JIRA) Sun, 17 Nov 2013 19:35:17 -0800

    [ 
https://issues.apache.org/jira/browse/ACCUMULO-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13825071#comment-13825071
 ]


ASF subversion and git services commented on ACCUMULO-1833:
-----------------------------------------------------------

Commit 6b87c870d9475f024911649deb6eeb614325d00a in branch refs/heads/master 
from [~elserj]
[ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=6b87c87 ]

ACCUMULO-1833 Squashed merge of multiple commits that let MTBW work much more 
efficiently with concurrent access.

Squashed commit of the following:

commit 58d61759cdc673cc5ee86ad1176b7db3b2955679
Author: Josh Elser <els...@apache.org>
Date:   Fri Nov 15 14:26:29 2013 -0800

    ACCUMULO-1833 Recommended changes from Keith regarding previous fixes.

    Guava Cache Exception throwing is covered in tests. Added additional test 
to exercise table rename. Updated state check
    to be more active and be less susceptible to a paused thread.

commit dd73f52180ca00623469850c4b2d4b03c3768837
Author: Josh Elser <els...@apache.org>
Date:   Tue Nov 12 18:00:07 2013 -0800

    ACCUMULO-1833 Change out the AtomicInteger to AtomicLong to make it 
slightly more robust.

commit 9f7916db23adfd561254b432e5f5a5c4e9b02e54
Author: Josh Elser <josh.el...@gmail.com>
Date:   Fri Nov 8 11:36:06 2013 -0500

    ACCUMULO-1833 Simple usage of AtomicInteger to catch table cache
    invalidations and propagate them through MTBW's cache.

commit e8cb6c8ef53afaf41eb9e574607cb03093eec1e8
Author: Josh Elser <josh.el...@gmail.com>
Date:   Fri Nov 8 10:55:01 2013 -0500

    ACCUMULO-1833 Remove Connector client methods, but leave constructor on
    MTBW in place for testing purposes.

commit b6c6c0270a8bf52d99e0463b2acc98910c4087ca
Author: Josh Elser <josh.el...@gmail.com>
Date:   Thu Nov 7 22:22:19 2013 -0500

    ACCUMULO-1833 Ensure that we close the MTBW at the end of the test to
    avoid it getting GC'ed later and trying to flush when ZK and the
    instance is already gone.

commit a11883e62de57eaacf0aba6a5019b7abe79563ec
Author: Josh Elser <josh.el...@gmail.com>
Date:   Thu Nov 7 22:21:29 2013 -0500

    ACCUMULO-1833 Update MTBW close method to match what TSBW is doing
    (update internal boolean then perform the close)

commit e634ca03f326070a42a811d1ed9a181df5214a03
Author: Josh Elser <josh.el...@gmail.com>
Date:   Thu Nov 7 20:54:20 2013 -0500

    ACCUMULO-1833 Another instance of primitive without synchronization being 
used instead of
    AtomicBoolean with expected concurrent access.

commit ffe8c243dec4d7c7947cc6512394e9a70a29bc77
Author: Josh Elser <josh.el...@gmail.com>
Date:   Thu Nov 7 20:37:28 2013 -0500

    ACCUMULO-1833 Tests for expected functionality in the face of table
    operations.

commit 721616e3ff6a4200fd326b7f1ce4be6e1298a7ec
Author: Josh Elser <josh.el...@gmail.com>
Date:   Thu Nov 7 16:49:41 2013 -0500

    ACCUMULO-1833 Rework the getBatchWriter method on MTBW to remove
    zookeeper lock contention and get better concurrent throughput.


> MultiTableBatchWriterImpl.getBatchWriter() is not performant for multiple 
> threads
> ---------------------------------------------------------------------------------
>
>                 Key: ACCUMULO-1833
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1833
>             Project: Accumulo
>          Issue Type: Improvement
>    Affects Versions: 1.5.0, 1.6.0
>            Reporter: Chris McCubbin
>            Assignee: Josh Elser
>             Fix For: 1.5.1, 1.6.0
>
>         Attachments: ACCUMULO-1833-test.patch, ZooKeeperThreadUtilization.png
>
>
> This issue comes from profiling our application. We have a 
> MultiTableBatchWriter created by normal means. I am attempting to write to it 
> with multiple threads by doing things like the following:
> {code}
> batchWriter.getBatchWriter(table).addMutations(mutations);
> {code}
> In my test with 4 threads writing to one table, this call is quite 
> inefficient and results in a large performance degradation over a single 
> BatchWriter.
> I believe the culprit is the fact that the call is synchronized. Also there 
> is the possibility that the zookeeper call to Tables.getTableState on every 
> call is negatively affecting performance:
> {code}
>   @Override
>   public synchronized BatchWriter getBatchWriter(String tableName) throws 
> AccumuloException, AccumuloSecurityException, TableNotFoundException {
>     ArgumentChecker.notNull(tableName);
>     String tableId = Tables.getNameToIdMap(instance).get(tableName);
>     if (tableId == null)
>       throw new TableNotFoundException(tableId, tableName, null);
>     
>     if (Tables.getTableState(instance, tableId) == TableState.OFFLINE)
>       throw new TableOfflineException(instance, tableId);
>     
>     BatchWriter tbw = tableWriters.get(tableId);
>     if (tbw == null) {
>       tbw = new TableBatchWriter(tableId);
>       tableWriters.put(tableId, tbw);
>     }
>     return tbw;
>   }
> {code}
> I recommend moving the synchronized block to happen only if the batchwriter 
> is not present, and also only checking if the table is online at that time:
> {code}
>   @Override
>   public BatchWriter getBatchWriter(String tableName) throws 
> AccumuloException, AccumuloSecurityException, TableNotFoundException {
>     ArgumentChecker.notNull(tableName);
>     String tableId = Tables.getNameToIdMap(instance).get(tableName);
>     if (tableId == null)
>       throw new TableNotFoundException(tableId, tableName, null);
>     BatchWriter tbw = tableWriters.get(tableId);
>     if (tbw == null) {
>       if (Tables.getTableState(instance, tableId) == TableState.OFFLINE)
>           throw new TableOfflineException(instance, tableId);
>       tbw = new TableBatchWriter(tableId);
>       synchronized(tableWriters){
>           //only create a new table writer if we haven't been beaten to it.
>           if (tableWriters.get(tableId) == null)      
>               tableWriters.put(tableId, tbw);
>       }
>     }
>     return tbw;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (ACCUMULO-1833) MultiTableBatchWriterImpl.getBatchWriter() is not performant for multiple threads

Reply via email to