[ https://issues.apache.org/jira/browse/HBASE-12728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14262718#comment-14262718 ]
Solomon Duskis commented on HBASE-12728: ---------------------------------------- bq. 2. is a BufferedConnection really necessary? The connection isn't buffering, it's just handing back the same BufferedTable instance to each caller, correct? That makes for little/no code change for a consumer to convert to the buffered implementation, but I think that isn't idea. Better to have the decorator accept the Table instance (or TableName + Connection instance?) and allow for setting up buffers/queues/&c for async flushing. Are HTable.delete(), .batch(), coprocessorService() and others thread-safe? If not, then in the multithreaded case, we likely need to create new (or a pooled) BufferredTable per thread. The "BufferredConnection" would perform that functionality. We could implement it as a group of BufferTables that use the same underlying write buffer. > buffered writes substantially less useful after removal of HTablePool > --------------------------------------------------------------------- > > Key: HBASE-12728 > URL: https://issues.apache.org/jira/browse/HBASE-12728 > Project: HBase > Issue Type: Bug > Components: hbase > Affects Versions: 0.98.0 > Reporter: Aaron Beppu > > In previous versions of HBase, when use of HTablePool was encouraged, HTable > instances were long-lived in that pool, and for that reason, if autoFlush was > set to false, the table instance could accumulate a full buffer of writes > before a flush was triggered. Writes from the client to the cluster could > then be substantially larger and less frequent than without buffering. > However, when HTablePool was deprecated, the primary justification seems to > have been that creating HTable instances is cheap, so long as the connection > and executor service being passed to it are pre-provided. A use pattern was > encouraged where users should create a new HTable instance for every > operation, using an existing connection and executor service, and then close > the table. In this pattern, buffered writes are substantially less useful; > writes are as small and as frequent as they would have been with > autoflush=true, except the synchronous write is moved from the operation > itself to the table close call which immediately follows. > More concretely : > ``` > // Given these two helpers ... > private HTableInterface getAutoFlushTable(String tableName) throws > IOException { > // (autoflush is true by default) > return storedConnection.getTable(tableName, executorService); > } > private HTableInterface getBufferedTable(String tableName) throws IOException > { > HTableInterface table = getAutoFlushTable(tableName); > table.setAutoFlush(false); > return table; > } > // it's my contention that these two methods would behave almost identically, > // except the first will hit a synchronous flush during the put call, > and the second will > // flush during the (hidden) close call on table. > private void writeAutoFlushed(Put somePut) throws IOException { > try (HTableInterface table = getAutoFlushTable(tableName)) { > table.put(somePut); // will do synchronous flush > } > } > private void writeBuffered(Put somePut) throws IOException { > try (HTableInterface table = getBufferedTable(tableName)) { > table.put(somePut); > } // auto-close will trigger synchronous flush > } > ``` > For buffered writes to actually provide a performance benefit to users, one > of two things must happen: > - The writeBuffer itself shouldn't live, flush and die with the lifecycle of > it's HTableInstance. If the writeBuffer were managed elsewhere and had a long > lifespan, this could cease to be an issue. However, if the same writeBuffer > is appended to by multiple tables, then some additional concurrency control > will be needed around it. > - Alternatively, there should be some pattern for having long-lived HTable > instances. However, since HTable is not thread-safe, we'd need multiple > instances, and a mechanism for leasing them out safely -- which sure sounds a > lot like the old HTablePool to me. > See discussion on mailing list here : > http://mail-archives.apache.org/mod_mbox/hbase-user/201412.mbox/%3CCAPdJLkEzmUQZ_kvD%3D8mrxi4V%3DhCmUp3g9MUZsddD%2Bmon%2BAvNtg%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)