[ https://issues.apache.org/jira/browse/ACCUMULO-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817353#comment-13817353 ]
Keith Turner commented on ACCUMULO-1833: ---------------------------------------- As I said earlier, I am only in favor of a change in 1.5.x that does not change API or behavior. I do not think we should change the behavior of something like the following. {code:java} mtbw = conn.createMultilTableBatchWriter(...) conn.tableOperations().rename("a","a_old"); conn.tableOperations().rename("a_tmp","a"); bw = mtbw.getBatchWriter("a"); {code} > MultiTableBatchWriterImpl.getBatchWriter() is not performant for multiple > threads > --------------------------------------------------------------------------------- > > Key: ACCUMULO-1833 > URL: https://issues.apache.org/jira/browse/ACCUMULO-1833 > Project: Accumulo > Issue Type: Improvement > Affects Versions: 1.5.0, 1.6.0 > Reporter: Chris McCubbin > Attachments: ACCUMULO-1833-test.patch, ZooKeeperThreadUtilization.png > > > This issue comes from profiling our application. We have a > MultiTableBatchWriter created by normal means. I am attempting to write to it > with multiple threads by doing things like the following: > {code} > batchWriter.getBatchWriter(table).addMutations(mutations); > {code} > In my test with 4 threads writing to one table, this call is quite > inefficient and results in a large performance degradation over a single > BatchWriter. > I believe the culprit is the fact that the call is synchronized. Also there > is the possibility that the zookeeper call to Tables.getTableState on every > call is negatively affecting performance: > {code} > @Override > public synchronized BatchWriter getBatchWriter(String tableName) throws > AccumuloException, AccumuloSecurityException, TableNotFoundException { > ArgumentChecker.notNull(tableName); > String tableId = Tables.getNameToIdMap(instance).get(tableName); > if (tableId == null) > throw new TableNotFoundException(tableId, tableName, null); > > if (Tables.getTableState(instance, tableId) == TableState.OFFLINE) > throw new TableOfflineException(instance, tableId); > > BatchWriter tbw = tableWriters.get(tableId); > if (tbw == null) { > tbw = new TableBatchWriter(tableId); > tableWriters.put(tableId, tbw); > } > return tbw; > } > {code} > I recommend moving the synchronized block to happen only if the batchwriter > is not present, and also only checking if the table is online at that time: > {code} > @Override > public BatchWriter getBatchWriter(String tableName) throws > AccumuloException, AccumuloSecurityException, TableNotFoundException { > ArgumentChecker.notNull(tableName); > String tableId = Tables.getNameToIdMap(instance).get(tableName); > if (tableId == null) > throw new TableNotFoundException(tableId, tableName, null); > BatchWriter tbw = tableWriters.get(tableId); > if (tbw == null) { > if (Tables.getTableState(instance, tableId) == TableState.OFFLINE) > throw new TableOfflineException(instance, tableId); > tbw = new TableBatchWriter(tableId); > synchronized(tableWriters){ > //only create a new table writer if we haven't been beaten to it. > if (tableWriters.get(tableId) == null) > tableWriters.put(tableId, tbw); > } > } > return tbw; > } > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)