[ https://issues.apache.org/jira/browse/HBASE-12728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14279186#comment-14279186 ]
Solomon Duskis commented on HBASE-12728: ---------------------------------------- I'm certainly not against you doing the work. I'm not against picking up the work again myself either. What would be nice at this point is to have a clear notion of whether Table ought to extend BufferedMutator or not. > buffered writes substantially less useful after removal of HTablePool > --------------------------------------------------------------------- > > Key: HBASE-12728 > URL: https://issues.apache.org/jira/browse/HBASE-12728 > Project: HBase > Issue Type: Bug > Components: hbase > Affects Versions: 0.98.0 > Reporter: Aaron Beppu > Priority: Blocker > Fix For: 1.0.0, 2.0.0, 1.1.0 > > Attachments: 12728.connection-owns-buffers.example.branch-1.0.patch, > HBASE-12728-2.patch, HBASE-12728.patch, bulk-mutator.patch > > > In previous versions of HBase, when use of HTablePool was encouraged, HTable > instances were long-lived in that pool, and for that reason, if autoFlush was > set to false, the table instance could accumulate a full buffer of writes > before a flush was triggered. Writes from the client to the cluster could > then be substantially larger and less frequent than without buffering. > However, when HTablePool was deprecated, the primary justification seems to > have been that creating HTable instances is cheap, so long as the connection > and executor service being passed to it are pre-provided. A use pattern was > encouraged where users should create a new HTable instance for every > operation, using an existing connection and executor service, and then close > the table. In this pattern, buffered writes are substantially less useful; > writes are as small and as frequent as they would have been with > autoflush=true, except the synchronous write is moved from the operation > itself to the table close call which immediately follows. > More concretely : > ``` > // Given these two helpers ... > private HTableInterface getAutoFlushTable(String tableName) throws > IOException { > // (autoflush is true by default) > return storedConnection.getTable(tableName, executorService); > } > private HTableInterface getBufferedTable(String tableName) throws IOException > { > HTableInterface table = getAutoFlushTable(tableName); > table.setAutoFlush(false); > return table; > } > // it's my contention that these two methods would behave almost identically, > // except the first will hit a synchronous flush during the put call, > and the second will > // flush during the (hidden) close call on table. > private void writeAutoFlushed(Put somePut) throws IOException { > try (HTableInterface table = getAutoFlushTable(tableName)) { > table.put(somePut); // will do synchronous flush > } > } > private void writeBuffered(Put somePut) throws IOException { > try (HTableInterface table = getBufferedTable(tableName)) { > table.put(somePut); > } // auto-close will trigger synchronous flush > } > ``` > For buffered writes to actually provide a performance benefit to users, one > of two things must happen: > - The writeBuffer itself shouldn't live, flush and die with the lifecycle of > it's HTableInstance. If the writeBuffer were managed elsewhere and had a long > lifespan, this could cease to be an issue. However, if the same writeBuffer > is appended to by multiple tables, then some additional concurrency control > will be needed around it. > - Alternatively, there should be some pattern for having long-lived HTable > instances. However, since HTable is not thread-safe, we'd need multiple > instances, and a mechanism for leasing them out safely -- which sure sounds a > lot like the old HTablePool to me. > See discussion on mailing list here : > http://mail-archives.apache.org/mod_mbox/hbase-user/201412.mbox/%3CCAPdJLkEzmUQZ_kvD%3D8mrxi4V%3DhCmUp3g9MUZsddD%2Bmon%2BAvNtg%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)