[ https://issues.apache.org/jira/browse/HBASE-12728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14267678#comment-14267678 ]
Solomon Duskis commented on HBASE-12728: ---------------------------------------- I like having a separate interface for bulk writing that's accessible from a new method on Connection. At this point, I'm rethinking the AsyncPutter / BufferedTable approach. Bulk writing asynchronously is geared to a couple of very specific cases. Table currently has 37 methods on it, most of which will not be implemented any differently in the asynchronous use cases. Given those two complexities, I would think that a Separation of Concerns and Keep It Simple might be best. A BulkWriter (or BulkMutator) interface with a limited number of methods on it might work better than extending Table. Perhaps a simplified API like this might work: {code} public interface BulkWriter { void put(Put p); void delete(Delete); flush(); close(); } public interface Connection { ... BulkWriter getBulkWriter(int maxBufferSize [, some other configuration parameters]); } {code} Thoughts? > buffered writes substantially less useful after removal of HTablePool > --------------------------------------------------------------------- > > Key: HBASE-12728 > URL: https://issues.apache.org/jira/browse/HBASE-12728 > Project: HBase > Issue Type: Bug > Components: hbase > Affects Versions: 0.98.0 > Reporter: Aaron Beppu > Assignee: Solomon Duskis > Priority: Blocker > Fix For: 1.0.0, 2.0.0, 1.1.0 > > > In previous versions of HBase, when use of HTablePool was encouraged, HTable > instances were long-lived in that pool, and for that reason, if autoFlush was > set to false, the table instance could accumulate a full buffer of writes > before a flush was triggered. Writes from the client to the cluster could > then be substantially larger and less frequent than without buffering. > However, when HTablePool was deprecated, the primary justification seems to > have been that creating HTable instances is cheap, so long as the connection > and executor service being passed to it are pre-provided. A use pattern was > encouraged where users should create a new HTable instance for every > operation, using an existing connection and executor service, and then close > the table. In this pattern, buffered writes are substantially less useful; > writes are as small and as frequent as they would have been with > autoflush=true, except the synchronous write is moved from the operation > itself to the table close call which immediately follows. > More concretely : > ``` > // Given these two helpers ... > private HTableInterface getAutoFlushTable(String tableName) throws > IOException { > // (autoflush is true by default) > return storedConnection.getTable(tableName, executorService); > } > private HTableInterface getBufferedTable(String tableName) throws IOException > { > HTableInterface table = getAutoFlushTable(tableName); > table.setAutoFlush(false); > return table; > } > // it's my contention that these two methods would behave almost identically, > // except the first will hit a synchronous flush during the put call, > and the second will > // flush during the (hidden) close call on table. > private void writeAutoFlushed(Put somePut) throws IOException { > try (HTableInterface table = getAutoFlushTable(tableName)) { > table.put(somePut); // will do synchronous flush > } > } > private void writeBuffered(Put somePut) throws IOException { > try (HTableInterface table = getBufferedTable(tableName)) { > table.put(somePut); > } // auto-close will trigger synchronous flush > } > ``` > For buffered writes to actually provide a performance benefit to users, one > of two things must happen: > - The writeBuffer itself shouldn't live, flush and die with the lifecycle of > it's HTableInstance. If the writeBuffer were managed elsewhere and had a long > lifespan, this could cease to be an issue. However, if the same writeBuffer > is appended to by multiple tables, then some additional concurrency control > will be needed around it. > - Alternatively, there should be some pattern for having long-lived HTable > instances. However, since HTable is not thread-safe, we'd need multiple > instances, and a mechanism for leasing them out safely -- which sure sounds a > lot like the old HTablePool to me. > See discussion on mailing list here : > http://mail-archives.apache.org/mod_mbox/hbase-user/201412.mbox/%3CCAPdJLkEzmUQZ_kvD%3D8mrxi4V%3DhCmUp3g9MUZsddD%2Bmon%2BAvNtg%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)