I completely understand why HBase wouldn't want to expose tags that it uses
for internal security purposes, like ACLs or visibility, to clients.
However, making _all_ tags be off-limits seems to me to limit quite a few
useful features.

Overloading the delete marker's value solves one particular problem, but
not the general case, because it can't be extended to Puts, which already
use their value field for real data. The motivating example in HBASE-25118
is distinguishing a bulk delete from customer operations. But there are
times we may want to distinguish an ETL or bulk write from customer
operations.

Let's say I have a batch job that does an ETL into a cluster at the same
time the cluster is taking other writes. I want to be really sure that all
my data got loaded properly, so I generate a checksum from the ETL dataset
before I load it. After the ETL, I want to generate a checksum for the
loaded data on the cluster and compare. So I need to write a Filter that
distinguishes the loaded data from any other operations going on at the
same time. (Let's assume I'm scanning raw and have major compaction
disabled so nothing gets purged, and there's nothing distinguishing about
the data itself)

The simplest way to do this would be to have a (hopefully tiny) Cell-level
annotation that identifies that it originally came from my ETL. That's
exactly what the Tag array field would provide. Now, I could hack something
into the Put value and change all my applications to ignore part of the
value array, but that assumes that I have full control over the value's
format (not true if I'm using, say, Phoenix). And like using the Delete
value, that's just hacking my own proprietary "Tag" capability into HBase
when a real one already exists.

So I'm curious why, so long as HBase internal tags continue to be
suppressed, is the Tag capability a bad thing to expose?

Geoffrey



On Fri, Oct 16, 2020 at 12:58 PM Andrew Purtell <apurt...@apache.org> wrote:

> I responded on the JIRA.
>
> You would be far better served adapting values for your proposal instead of
> tags. Tags are not a client side feature. Tags were and are designed for
> server side use only, and are stripped from client inbound and outbound
> RPCs.
>
> On Wed, Oct 14, 2020 at 9:40 AM Rushabh Shah
> <rushabh.s...@salesforce.com.invalid> wrote:
>
> > Thank you Ram for your response !
> >
> > > For your case, is there a possibility to have yournew feature as a
> first
> > class feature using Tags? Just asking?
> >
> > Could you elaborate what you mean by first class feature ?
> >
> >
> > Rushabh Shah
> >
> >    - Software Engineering SMTS | Salesforce
> >    -
> >       - Mobile: 213 422 9052
> >
> >
> >
> > On Wed, Oct 14, 2020 at 9:35 AM ramkrishna vasudevan <
> > ramkrishna.s.vasude...@gmail.com> wrote:
> >
> > > Hi Rushabh
> > >
> > > If I remember correctly, the decision was not to expose tags for
> clients
> > > directly. All the tags were used as internal to the cell formation at
> the
> > > server side (for eg ACL and Visibility labels).
> > >
> > > For your case, is there a possibility to have yournew feature as a
> first
> > > class feature using Tags? Just asking?
> > >
> > > Regards
> > > Ram
> > >
> > > On Wed, Oct 14, 2020 at 8:17 PM Rushabh Shah
> > > <rushabh.s...@salesforce.com.invalid> wrote:
> > >
> > > > Hi Everyone,
> > > > I want to understand how to use the Hbase Cell Tags feature. We have
> a
> > > use
> > > > case to identify the source of deletes (not the same as authenticated
> > > > kerberos user). I have added more details about my use case in
> > > HBASE-25118
> > > > <https://issues.apache.org/jira/browse/HBASE-25118>. At my day job
> we
> > > use
> > > > Phoenix to interact with hbase and we are passing this information
> via
> > > > Phoenix ConnectionProperties. We are exploring the Cell Tags feature
> to
> > > add
> > > > this metadata to Hbase Cells (only to Delete Markers as of now).
> > > >
> > > > Via HBASE-18995 <https://issues.apache.org/jira/browse/HBASE-18995>,
> > we
> > > > have moved all the createCell methods which use Tag(s) as an argument
> > to
> > > > PrivateCellUtil class and made the InterfaceAudience of that class
> > > Private.
> > > > I saw some discussion on that jira
> > > > <
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/HBASE-18995?focusedCommentId=16219960&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16219960
> > > > >]
> > > > to expose some methods as LimitedPrivate accessible to CP but was
> > decided
> > > > to do it later. We only expose CellBuilderFactory
> > > > <
> > > >
> > >
> >
> https://github.com/apache/hbase/blob/master/hbase-common/src/main/java/org/apache/hadoop/hbase/CellBuilderFactory.java
> > > > >
> > > > which returns which returns an instance of CellBuilder
> > > > <
> > > >
> > >
> >
> https://github.com/apache/hbase/blob/master/hbase-common/src/main/java/org/apache/hadoop/hbase/CellBuilder.java
> > > > >
> > > > which doesn't have a setTags method. Also the code is vastly
> different
> > in
> > > > branch-1.
> > > >
> > > > Could someone please educate me on how to populate tags from the
> client
> > > > side (i.e Phoenix) while creating a Delete object ?
> > > > Thank you !
> > > >
> > >
> >
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>    - A23, Crosstalk
>

Reply via email to