[ 
https://issues.apache.org/jira/browse/HBASE-12728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277551#comment-14277551
 ] 

Nick Dimiduk commented on HBASE-12728:
--------------------------------------

The patch is looking really good. I have no major issues to add to previous 
reviewers'. I do prefer a {{BufferedTable}} interface, that extends {{Table}}, 
over separating out the buffered functionality into {{BufferedMutator}}. I 
think bringing along the Table implementation will make it easier for folks to 
sub in. However, no one else is raising the point, so maybe I'm in the minority.

For future reference, you may want to post larger patches to reviews.apache.org 
so it's easier to keep track of reviewer comments. It's no gerrit, but it's 
better than nothing.

throughout javadocs and exception strings: s/HTable/Table/, 
s/BulkMutator/BatchMutator/

nit: throughout, theres on value in empty @param or @throws javadocs. Usually 
at least param names are obvious and this javadoc can be omitted.

on the docs for BatchMutator#put(List<Put>), it should be pointed out that this 
list of edits will not necessarily be sent in the same batch (i think this is 
an omission in our current doc strings too).

re: BufferedMutatorConfig, I think you're following the TableConfiguration 
convention. This isn't common within the HBase code; we tend to use builders or 
"context" objects instead.

Looks like tabs in BufferedMutatorExceptionListener as well. If you're using 
Eclipse or IntelliJ, we have formatting config file in 
dev-support/hbase_eclipse_formatter.xml

s/prodcution/production/

formatting of ASF header in CachingConnection is mangled, I think this will set 
off the apache-rat-plugin license check at release time.

this doc on Connection#getBufferedMutator(TableName, BufferedMutatorConfig) is 
incomplete "  * provided Lock object. This object can be used for long lived 
<br>"

Better to move this default value decision from getBufferedMutator down into 
the BufferedMutatorConfig so it's contained there.

{noformat}
      Lock lock = config.isMultithreaded() ? new ReentrantLock() : new 
DoNothingLock();
{noformat}

> buffered writes substantially less useful after removal of HTablePool
> ---------------------------------------------------------------------
>
>                 Key: HBASE-12728
>                 URL: https://issues.apache.org/jira/browse/HBASE-12728
>             Project: HBase
>          Issue Type: Bug
>          Components: hbase
>    Affects Versions: 0.98.0
>            Reporter: Aaron Beppu
>            Assignee: Solomon Duskis
>            Priority: Blocker
>             Fix For: 1.0.0, 2.0.0, 1.1.0
>
>         Attachments: 12728.connection-owns-buffers.example.branch-1.0.patch, 
> HBASE-12728-2.patch, HBASE-12728.patch, bulk-mutator.patch
>
>
> In previous versions of HBase, when use of HTablePool was encouraged, HTable 
> instances were long-lived in that pool, and for that reason, if autoFlush was 
> set to false, the table instance could accumulate a full buffer of writes 
> before a flush was triggered. Writes from the client to the cluster could 
> then be substantially larger and less frequent than without buffering.
> However, when HTablePool was deprecated, the primary justification seems to 
> have been that creating HTable instances is cheap, so long as the connection 
> and executor service being passed to it are pre-provided. A use pattern was 
> encouraged where users should create a new HTable instance for every 
> operation, using an existing connection and executor service, and then close 
> the table. In this pattern, buffered writes are substantially less useful; 
> writes are as small and as frequent as they would have been with 
> autoflush=true, except the synchronous write is moved from the operation 
> itself to the table close call which immediately follows.
> More concretely :
> ```
> // Given these two helpers ...
> private HTableInterface getAutoFlushTable(String tableName) throws 
> IOException {
>   // (autoflush is true by default)
>   return storedConnection.getTable(tableName, executorService);
> }
> private HTableInterface getBufferedTable(String tableName) throws IOException 
> {
>   HTableInterface table = getAutoFlushTable(tableName);
>   table.setAutoFlush(false);
>   return table;
> }
> // it's my contention that these two methods would behave almost identically,
> // except the first will hit a synchronous flush during the put call,
> and the second will
> // flush during the (hidden) close call on table.
> private void writeAutoFlushed(Put somePut) throws IOException {
>   try (HTableInterface table = getAutoFlushTable(tableName)) {
>     table.put(somePut); // will do synchronous flush
>   }
> }
> private void writeBuffered(Put somePut) throws IOException {
>   try (HTableInterface table = getBufferedTable(tableName)) {
>     table.put(somePut);
>   } // auto-close will trigger synchronous flush
> }
> ```
> For buffered writes to actually provide a performance benefit to users, one 
> of two things must happen:
> - The writeBuffer itself shouldn't live, flush and die with the lifecycle of 
> it's HTableInstance. If the writeBuffer were managed elsewhere and had a long 
> lifespan, this could cease to be an issue. However, if the same writeBuffer 
> is appended to by multiple tables, then some additional concurrency control 
> will be needed around it.
> - Alternatively, there should be some pattern for having long-lived HTable 
> instances. However, since HTable is not thread-safe, we'd need multiple 
> instances, and a mechanism for leasing them out safely -- which sure sounds a 
> lot like the old HTablePool to me.
> See discussion on mailing list here : 
> http://mail-archives.apache.org/mod_mbox/hbase-user/201412.mbox/%3CCAPdJLkEzmUQZ_kvD%3D8mrxi4V%3DhCmUp3g9MUZsddD%2Bmon%2BAvNtg%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to