[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604844#comment-16604844
 ] 

Josh Elser commented on HBASE-20952:
------------------------------------

{quote}Ratis doesn't have an FS? The WAL will hang on skyhooks? When the target 
is so inspecifically described, how we have a hope of knowing when we've hit it?
{quote}
Of course Ratis is storing data somewhere, but with HBase using this 
Ratis-backed LogService *shouldn't* know what the underlying filesystem is. In 
other words, we don't _want_ to have to know. In an ideal world, we can use a 
datastructure and let the API hide the details for us.
{quote} * A sentence like, "Before making code changes, we studied Apache 
Kafka, Apache BookKeeper and Apache Ratis.", is usually followed by a summary 
of what was learned.{quote}
That's a good suggestion. I can say that a significant portion of the direction 
was strongly influenced by Apache DistributedLog. The architecture/abstraction 
they presented seems to jive naturally with what we want from HBase.
{quote}"The refactoring of WAL related code is to decouple WAL from FileSystem 
so that other consensus protocols can be accommodated to back different WAL 
implementations." BooKeeper, for instance, a purported target, is not consensus 
based... nor Kafka.
{quote}
I'm not sure how to interpret this to make a positive change. We are all in 
agreement that we don't want to be shoe-horned into a specific WAL 
implementation with this work (per the original "design" discussion). I think 
Ted was just trying to capture this. We just need a re-wording? I hope this is 
not a surprising statement.
{quote}WALInfo represents 'Id' but is called WALInfo?
{quote}
Yeah, agreed confusing. WALInfo is a unique identifier to a one WAL in HBase, 
regardless of the WAL implementation. That's it.
{quote}Why is a class that identifies a WAL called WALInfo and not 
WALIdentifier or WAL_ID?
{quote}
That is a great suggestion. I'm glad you made it. Which do you think is better?
{quote}And why we even have an FS version of anything? I'd think we'd have 
ClassicWAL, KafkaWAL, rather than an attempt at a generic "FSWAL"...
{quote}
Ted and Ankit were trying to consolidate some of the logic which is spread 
across FSHWal and AsyncFSWal, given their approach of: WALProvider, 
WALMetaDataProvider, and WALInfo interfaces. Are you suggesting that think 
there should be a brand-new naming convention and get rid of the "provider" 
notion completely? I'm struggling to peel away a suggestion from the criticism.
{quote}Where do I even go to look for the new WAL API?
{quote}
So, what would help? A Java package in which WAL API is encapsulated? A smaller 
review in which *only* the new WAL API is provided?
{quote}the only change is removal of roll writer? So now the implmentation is 
responsible for rolling? Even when trouble syncing?
{quote}
I think we need to tease this apart. How much of "log rolling" is due to 
HDFS-isms and how much is how HBase intrinsically operates? We know we want to 
do size-based "rolling" (new WAL file over a certain size), but is that 
relevant for all potential implementations? (e.g. I would think that a 
Kafka-backed WAL would not have any notion of rolling).

Do you have something in mind in how this abstraction should work? Do we create 
some sort of API which gives WAL impls the ability to "tell us" when their 
implementation isn't working well? Or, should HBase be completely agnostic of 
that?

> Re-visit the WAL API
> --------------------
>
>                 Key: HBASE-20952
>                 URL: https://issues.apache.org/jira/browse/HBASE-20952
>             Project: HBase
>          Issue Type: Sub-task
>          Components: wal
>            Reporter: Josh Elser
>            Priority: Major
>         Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup&restore. Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B&R doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to