[ https://issues.apache.org/jira/browse/HBASE-12728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14279373#comment-14279373 ]
Nick Dimiduk commented on HBASE-12728: -------------------------------------- I think it's better to cover the multi-threaded coordination on behalf of the user than expect them to do the synchronizing themselves. The train rolling here is a good one -- it's nice cleanup, it's consistent with previous behaviors, and it makes things more obvious for users. Accompany this with thoughtful javadoc review and a fat example that we can dump into the online book and this will be a fine resolution. I still like better having a {{Table}},{{BufferedTable}} instead of {{Table}},{{BufferedMutator}}. I think having a drop-in buffering option will make the most sense for a usable API. I hear the argument of maybe it's not the place of our client out-of-the-box, but we have a solution to this today that some folks depend on, so I think it's irresponsible to omit it for 1.0. If [~sduskis] is truly fed up with us ( ::smile:: ) I'm happy to pick up the patch in this direction. I also think splitting the {{Table}} concept into a reader and a writer is something worth exploring, but not for 1.0. I'm hoping by 2.0 we'll have a valid story for an async (or [reactive|http://www.reactivemanifesto.org]?) client and maybe even something that operates on top of a C/native implementation so we can close the gap for folks who aren't on the JVM. For now, let's get 1.0 release unblocked. > buffered writes substantially less useful after removal of HTablePool > --------------------------------------------------------------------- > > Key: HBASE-12728 > URL: https://issues.apache.org/jira/browse/HBASE-12728 > Project: HBase > Issue Type: Bug > Components: hbase > Affects Versions: 0.98.0 > Reporter: Aaron Beppu > Priority: Blocker > Fix For: 1.0.0, 2.0.0, 1.1.0 > > Attachments: 12728.connection-owns-buffers.example.branch-1.0.patch, > HBASE-12728-2.patch, HBASE-12728.patch, bulk-mutator.patch > > > In previous versions of HBase, when use of HTablePool was encouraged, HTable > instances were long-lived in that pool, and for that reason, if autoFlush was > set to false, the table instance could accumulate a full buffer of writes > before a flush was triggered. Writes from the client to the cluster could > then be substantially larger and less frequent than without buffering. > However, when HTablePool was deprecated, the primary justification seems to > have been that creating HTable instances is cheap, so long as the connection > and executor service being passed to it are pre-provided. A use pattern was > encouraged where users should create a new HTable instance for every > operation, using an existing connection and executor service, and then close > the table. In this pattern, buffered writes are substantially less useful; > writes are as small and as frequent as they would have been with > autoflush=true, except the synchronous write is moved from the operation > itself to the table close call which immediately follows. > More concretely : > ``` > // Given these two helpers ... > private HTableInterface getAutoFlushTable(String tableName) throws > IOException { > // (autoflush is true by default) > return storedConnection.getTable(tableName, executorService); > } > private HTableInterface getBufferedTable(String tableName) throws IOException > { > HTableInterface table = getAutoFlushTable(tableName); > table.setAutoFlush(false); > return table; > } > // it's my contention that these two methods would behave almost identically, > // except the first will hit a synchronous flush during the put call, > and the second will > // flush during the (hidden) close call on table. > private void writeAutoFlushed(Put somePut) throws IOException { > try (HTableInterface table = getAutoFlushTable(tableName)) { > table.put(somePut); // will do synchronous flush > } > } > private void writeBuffered(Put somePut) throws IOException { > try (HTableInterface table = getBufferedTable(tableName)) { > table.put(somePut); > } // auto-close will trigger synchronous flush > } > ``` > For buffered writes to actually provide a performance benefit to users, one > of two things must happen: > - The writeBuffer itself shouldn't live, flush and die with the lifecycle of > it's HTableInstance. If the writeBuffer were managed elsewhere and had a long > lifespan, this could cease to be an issue. However, if the same writeBuffer > is appended to by multiple tables, then some additional concurrency control > will be needed around it. > - Alternatively, there should be some pattern for having long-lived HTable > instances. However, since HTable is not thread-safe, we'd need multiple > instances, and a mechanism for leasing them out safely -- which sure sounds a > lot like the old HTablePool to me. > See discussion on mailing list here : > http://mail-archives.apache.org/mod_mbox/hbase-user/201412.mbox/%3CCAPdJLkEzmUQZ_kvD%3D8mrxi4V%3DhCmUp3g9MUZsddD%2Bmon%2BAvNtg%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v6.3.4#6332)