[jira] [Commented] (HBASE-24749) Direct insert HFiles and Persist in-memory HFile tracking

Zach York (Jira) Thu, 23 Jul 2020 17:04:26 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-24749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164039#comment-17164039
 ]


Zach York commented on HBASE-24749:
-----------------------------------

[~anoop.hbase] Can you expand on how we can get in a situation where a partial 
file is written? I'm trying to see if there are any failure modes we haven't 
though of. If the case is a complete file written to the data directory, is 
there harm in picking up the new file (even if it hasn't successfully committed 
to the SFM)?

In any case, for all user tables, this could be covered by the new store file 
list. The only case that is tricky/of concern is the file list for the root.

> On HBASE-14090, it is old but still cool, virtuous, aiming to hit a bigger 
> target.

[~stack] We've definitely looked at it and gained some inspiration :) I think 
at this point, we want to keep this in a manageable scope to be able to deliver 
something. However, this approach should help break some of the reliance on FS 
structure and make it easier to accomplish the goal in the future.

> Direct insert HFiles and Persist in-memory HFile tracking
> ---------------------------------------------------------
>
>                 Key: HBASE-24749
>                 URL: https://issues.apache.org/jira/browse/HBASE-24749
>             Project: HBase
>          Issue Type: Umbrella
>          Components: Compaction, HFile
>    Affects Versions: 3.0.0-alpha-1
>            Reporter: Tak-Lon (Stephen) Wu
>            Assignee: Tak-Lon (Stephen) Wu
>            Priority: Major
>              Labels: design, discussion, objectstore, storeFile, storeengine
>         Attachments: 1B100m-25m25m-performance.pdf, Apache HBase - Direct 
> insert HFiles and Persist in-memory HFile tracking.pdf
>
>
> We propose a new feature (a new store engine) to remove the {{.tmp}} 
> directory used in the commit stage for common HFile operations such as flush 
> and compaction to improve the write throughput and latency on object stores. 
> Specifically for S3 filesystems, this will also mitigate read-after-write 
> inconsistencies caused by immediate HFiles validation after moving the 
> HFile(s) to data directory.
> Please see attached for this proposal and the initial result captured with 
> 25m (25m operations) and 1B (100m operations) YCSB workload A LOAD and RUN, 
> and workload C RUN result.
> The goal of this JIRA is to discuss with the community if the proposed 
> improvement on the object stores use case makes senses and if we miss 
> anything should be included.
> Improvement Highlights
>  1. Lower write latency, especially the p99+
>  2. Higher write throughput on flush and compaction 
>  3. Lower MTTR on region (re)open or assignment 
>  4. Remove consistent check dependencies (e.g. DynamoDB) supported by file 
> system implementation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HBASE-24749) Direct insert HFiles and Persist in-memory HFile tracking

Reply via email to