[jira] [Commented] (HBASE-12728) buffered writes substantially less useful after removal of HTablePool

Solomon Duskis (JIRA) Wed, 14 Jan 2015 11:09:13 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-12728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277474#comment-14277474
 ]


Solomon Duskis commented on HBASE-12728:
----------------------------------------

bq. This bit of doc is incomplete: ...

I'll update the comments .  I wanted to get the code out for review.  The 
documentation can definitely use more work

b1. We need to expose threadsafe as an option? Why not just threadsafe all the 
time? One less thing for the user to worry about (Minor cost when uncontended 
crossing of synchronization barrier)

I'm fine either way.  I'll make it all synchronized.

{quote}
This class is a bit of an odd bird (particularly so because all the rest of 
your changes are elegant). Let me read the rest of the patch.... OK. Back 
again. I see how it is intended to be used. Yeah, what Enis says, can we do 
Builder... and perhaps Fluent pattern (e.g. 
http://jlordiales.me/2012/12/13/the-builder-pattern-in-practice/) so instead of 
a method name thatIsMultithreaded, it'd be multhreaded and instead of withPool, 
it'd be just pool. Not a deal-breaker. Just a suggestion.
{quote}

I'll defer to the wisdom of the crowd here.  I added in a lot of options which 
could result in a proliferation of Connection.getBufferedMutator() methods.  
I'll remove the Config, since we're removing the number of optiosn.

bq. Public constructor on BufferedMutatorImpl needs to be shutdown... private.

OK.

{quote}
This is interesting:
164     // This behavior is highly non-intuitive... it does not protect us 
against
165     // 94-incompatible behavior, which is a timing issue because hasError, 
the below code
166     // and setter of hasError are not synchronized. Perhaps it should be 
removed.
I got a little lost (because it non-intuitive I suppose – smile) We are 
flushing out writes that were buffered before the error showed up?
{quote}

That bit was copied directly from HTable.  I'm impartial about the correct 
algorithm in this case, I just didn't want to rock the boat.  This issue is 
definitely worth a discussion, but I'd ask that if we want to change the 
algorithm, we do it in another JIRA issue.  

bq. Can the lock be an internal detail rather than passed on construction? (the 
less options the user has the better)

If everything is synchronized, then sure.  We can even rely on plain old 
synchronized methods instead of fancy Lock objects.

{quote}
s/CachingConnection/BufferingConnection/ to highlight the connection between 
BufferedMutator and this new Connection type?
Is user supposed to be able to create their own CachingConnection? Should this 
be package protected to force folks via the ConnectionFactory?
{quote}

I got a bit fancy for the sake of MultiTableOutputFormat which puts Tables into 
a Map<String, Table>.  I needed to add a Map<String, BufferedMutator> so I 
moved that functionality into a separate class for potential reusability.  It 
looks like it would be better to KISS and move the CachingConnection 
functionality back into MultiTableOutputFormat.

{quote}
The redo of HTable to use the new stuff and the changes in Interfaces look 
great.
Very nice work Solomon Duskis
{quote}

:)

> buffered writes substantially less useful after removal of HTablePool
> ---------------------------------------------------------------------
>
>                 Key: HBASE-12728
>                 URL: https://issues.apache.org/jira/browse/HBASE-12728
>             Project: HBase
>          Issue Type: Bug
>          Components: hbase
>    Affects Versions: 0.98.0
>            Reporter: Aaron Beppu
>            Assignee: Solomon Duskis
>            Priority: Blocker
>             Fix For: 1.0.0, 2.0.0, 1.1.0
>
>         Attachments: 12728.connection-owns-buffers.example.branch-1.0.patch, 
> HBASE-12728-2.patch, HBASE-12728.patch, bulk-mutator.patch
>
>
> In previous versions of HBase, when use of HTablePool was encouraged, HTable 
> instances were long-lived in that pool, and for that reason, if autoFlush was 
> set to false, the table instance could accumulate a full buffer of writes 
> before a flush was triggered. Writes from the client to the cluster could 
> then be substantially larger and less frequent than without buffering.
> However, when HTablePool was deprecated, the primary justification seems to 
> have been that creating HTable instances is cheap, so long as the connection 
> and executor service being passed to it are pre-provided. A use pattern was 
> encouraged where users should create a new HTable instance for every 
> operation, using an existing connection and executor service, and then close 
> the table. In this pattern, buffered writes are substantially less useful; 
> writes are as small and as frequent as they would have been with 
> autoflush=true, except the synchronous write is moved from the operation 
> itself to the table close call which immediately follows.
> More concretely :
> ```
> // Given these two helpers ...
> private HTableInterface getAutoFlushTable(String tableName) throws 
> IOException {
>   // (autoflush is true by default)
>   return storedConnection.getTable(tableName, executorService);
> }
> private HTableInterface getBufferedTable(String tableName) throws IOException 
> {
>   HTableInterface table = getAutoFlushTable(tableName);
>   table.setAutoFlush(false);
>   return table;
> }
> // it's my contention that these two methods would behave almost identically,
> // except the first will hit a synchronous flush during the put call,
> and the second will
> // flush during the (hidden) close call on table.
> private void writeAutoFlushed(Put somePut) throws IOException {
>   try (HTableInterface table = getAutoFlushTable(tableName)) {
>     table.put(somePut); // will do synchronous flush
>   }
> }
> private void writeBuffered(Put somePut) throws IOException {
>   try (HTableInterface table = getBufferedTable(tableName)) {
>     table.put(somePut);
>   } // auto-close will trigger synchronous flush
> }
> ```
> For buffered writes to actually provide a performance benefit to users, one 
> of two things must happen:
> - The writeBuffer itself shouldn't live, flush and die with the lifecycle of 
> it's HTableInstance. If the writeBuffer were managed elsewhere and had a long 
> lifespan, this could cease to be an issue. However, if the same writeBuffer 
> is appended to by multiple tables, then some additional concurrency control 
> will be needed around it.
> - Alternatively, there should be some pattern for having long-lived HTable 
> instances. However, since HTable is not thread-safe, we'd need multiple 
> instances, and a mechanism for leasing them out safely -- which sure sounds a 
> lot like the old HTablePool to me.
> See discussion on mailing list here : 
> http://mail-archives.apache.org/mod_mbox/hbase-user/201412.mbox/%3CCAPdJLkEzmUQZ_kvD%3D8mrxi4V%3DhCmUp3g9MUZsddD%2Bmon%2BAvNtg%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12728) buffered writes substantially less useful after removal of HTablePool

Reply via email to