Re: [DISCUSS] Move Type out of KeyValue

Chia-Ping Tsai Sat, 30 Sep 2017 21:29:37 -0700

The "custom cell type" never exists in the story. (Sorry for misleading you)


Here is the story. i add some custom cells (for saving memory) to Put via 
Put#add(Cell). The pseudocode of custom cell is shown below.

{code}
class MyObject() {
  Cell toCell() {
      return CellBuilderFactory.newBuilfer(SHALLOW_COPY)
                    .setRow(sharedBuffer, myRowOffset, myRowLength).
                    .setType(KeyValue.Type.Put.getCode()) // We call the 
IA.Private to get valid code of Put
                    // set other fields
                    .build();
  }
}

put.add(myObject.toCell);
{code}

And then, I noticed the Put#add is not optimized for our heavy table(a chunk of 
cells in single row), so I also extend the Put to add some #add methods for 
avoiding resizing collection.

That was the story -- I try to reducer the cost of converting our object to 
Put/Cell. A another story i had mentioned is to build custom write path via 
Endpoint, but it is unrelated to this topic. 

All class we use are shown below:
1) Cell -> IA.Public
2) CellBuilder -> IA.Public
3) CellBuilderFactory -> IA.Public
4) Put -> IA.Public
5) Put#add(Cell) -> IA.Public
5) KeyValue#Type -> IA.Private

That is why i want to make KeyValue#Type IA.Public.

--
Chia-Ping

On 2017-10-01 00:34, Andrew Purtell <[email protected]> wrote: 
> Thanks for sharing these details. They are intriguing. If possible could you 
> explain why the custom type is needed? 
> 
> Something has to be deployed on the server or the custom cell type isnât 
> guaranteed to be handled correctly. It may work now by accident. Iâm a 
> little surprised a custom cell type doesnât cause an abort. Did you patch 
> the code to handle it?
> 
> 
> > On Sep 30, 2017, at 1:06 AM, Chia-Ping Tsai <[email protected]> wrote:
> > 
> > Thanks for the nice suggestions. Andrew. Sorry for delay response. Busy 
> > today.
> > 
> > The root reason we must build own Cell on client side is that the data are 
> > located on shared memory which is similar with MSLAB.
> > 
> > You are right. We can use attribute to carry our data but the byte[] is not 
> > acceptable because we canât assign the offset and length. In fact, the 
> > endpoint is a better way for our case because our object can be  directly 
> > converted to PB object. Also it is easy to apply shared memory to manage 
> > our object. However, it will be easier and more readable to follow regular 
> > Put operation. All we have to do is to build own cell and extended Put. 
> > Nothing have to be deployed on server.
> > 
> > I agree the custom cell is low level thing, and it should be used by 
> > advanced users. What I concern is the classes related to  custom Cell have 
> > different IA declaration. Iâam fine to make them IA.Private but building 
> > the custom cell may be a common case.
> > 
> > â 
> > Chia-Ping
> > 
> >> On 2017-09-30 06:05, Andrew Purtell <[email protected]> wrote: 
> >> âConstruct a normal put or delete or batch mutation, add whatever extra
> >> state you need in one or more operation attributes, and use a
> >> regionobserver to extend normal processing to handle the extra state. I'm
> >> curious what dispatching to extension code because of a custom cell type
> >> buys you over dispatching to extension code because of the presence of an
> >> attribute (or cell tag). For example, in security coprocessors we take
> >> attribute data and attach it to the cell using cell tags. Later we check
> >> for cell tag(s) to determine if we have to take special action when the
> >> cell is accessed by a scanner, or during some operations (e.g. appends or
> >> increments have to do extra handling for cell security tags).
> >> 
> >> 
> >> On Fri, Sep 29, 2017 at 2:43 PM, Chia-Ping Tsai <[email protected]> 
> >> wrote:
> >> 
> >>>> Instead of a custom cell, could you use a regular cell with a custom
> >>>> operation attribute (see OperationWithAttributes).
> >>> Pardon me, I didn't get what you said.
> >>> 
> >>> 
> >>> 
> >>>> On 2017-09-30 04:31, Andrew Purtell <[email protected]> wrote:
> >>>> Instead of a custom cell, could you use a regular cell with a custom
> >>>> operation attribute (see OperationWithAttributes).
> >>>> 
> >>>> On Fri, Sep 29, 2017 at 1:28 PM, Chia-Ping Tsai <[email protected]>
> >>> wrote:
> >>>> 
> >>>>> The custom cell help us to save memory consumption. We don't have own
> >>>>> serialization/deserialization mechanism, hence to transform data from
> >>>>> client to server needs many conversion phase (user data -> Put/Cell ->
> >>> pb
> >>>>> object). The cost of conversion is large in transferring bulk data. In
> >>>>> fact, we also have custom mutation to manage the memory usage of inner
> >>> cell
> >>>>> collection.
> >>>>> 
> >>>>>> On 2017-09-30 02:43, Andrew Purtell <[email protected]> wrote:
> >>>>>> What are the use cases for a custom cell? It seems a dangerously low
> >>>>> level
> >>>>>> thing to attempt and perhaps we should unwind support for it. But
> >>> perhaps
> >>>>>> there is a compelling justification.
> >>>>>> 
> >>>>>> 
> >>>>>> On Thu, Sep 28, 2017 at 10:20 PM, Chia-Ping Tsai <
> >>> [email protected]>
> >>>>>> wrote:
> >>>>>> 
> >>>>>>> Thanks for all comment.
> >>>>>>> 
> >>>>>>> The problem i want to resolve is the valid code should be exposed
> >>> as
> >>>>>>> IA.Public. Otherwise, end user have to access the IA.Private class
> >>> to
> >>>>> build
> >>>>>>> the custom cell.
> >>>>>>> 
> >>>>>>> For example, I have a use case which plays a streaming role in our
> >>>>>>> appliaction. It
> >>>>>>> applies the CellBuilder(HBASE-18519) to build custom cells. These
> >>> cells
> >>>>>>> have many same fields so they are put in shared-memory for
> >>> avoiding GC
> >>>>>>> pause. Everything is wonderful. However, we have to access the
> >>>>> IA.Private
> >>>>>>> class - KeyValue#Type - to get the valid code of Put.
> >>>>>>> 
> >>>>>>> I believe there are many use cases of custom cell, and
> >>> consequently it
> >>>>> is
> >>>>>>> worth adding a way to get the valid type via IA.Public class.
> >>>>> Otherwise, it
> >>>>>>> may imply that the custom cell is based on a unstable way, because
> >>> the
> >>>>>>> related code can be changed at any time.
> >>>>>>> --
> >>>>>>> Chia-Ping
> >>>>>>> 
> >>>>>>>> On 2017-09-29 00:49, Andrew Purtell <[email protected]> wrote:
> >>>>>>>> I agree with Stack. Was typing up a reply to Anoop but let me
> >>> move it
> >>>>>>> down
> >>>>>>>> here.
> >>>>>>>> 
> >>>>>>>> The type code exposes some low level details of how our current
> >>>>> stores
> >>>>>>> are
> >>>>>>>> architected. But what if in the future you could swap out HStore
> >>>>>>> implements
> >>>>>>>> Store with PStore implements Store, where HStore is backed by
> >>> HFiles
> >>>>> and
> >>>>>>>> PStore is backed by Parquet? Just as a hypothetical example. I
> >>> know
> >>>>> there
> >>>>>>>> would be larger issues if this were actually attempted. Bear with
> >>>>> me. You
> >>>>>>>> can imagine some different new Store implementation that has some
> >>>>>>>> advantages but is not a design derived from the log structured
> >>> merge
> >>>>> tree
> >>>>>>>> if you like. Most values from a new Cell.Type based on
> >>> KeyValue.Type
> >>>>>>>> wouldn't apply to cells from such a thing because they are
> >>>>> particular to
> >>>>>>>> how LSMs work. I'm sure such a project if attempted would make a
> >>>>> number
> >>>>>>> of
> >>>>>>>> changes requiring a major version increment and low level details
> >>>>> could
> >>>>>>> be
> >>>>>>>> unwound from Cell then, but if we could avoid doing it in the
> >>> first
> >>>>>>> place,
> >>>>>>>> I think it would better for maintainability.
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>>> On Thu, Sep 28, 2017 at 9:39 AM, Stack <[email protected]> wrote:
> >>>>>>>>> 
> >>>>>>>>> On Thu, Sep 28, 2017 at 2:25 AM, Chia-Ping Tsai <
> >>>>> [email protected]>
> >>>>>>>>> wrote:
> >>>>>>>>> 
> >>>>>>>>>> hi folks,
> >>>>>>>>>> 
> >>>>>>>>>> User is allowed to create custom cell but the valid code of
> >>> type
> >>>>> -
> >>>>>>>>>> KeyValue#Type - is declared as IA.Private. As i see it, we
> >>> should
> >>>>>>> expose
> >>>>>>>>>> KeyValue#Type as Public Client. Three possible ways are shown
> >>>>> below:
> >>>>>>>>>> 1) Change declaration of KeyValue#Type from IA.Private to
> >>>>> IA.Public
> >>>>>>>>>> 2) Move KeyValue#Type into Cell.
> >>>>>>>>>> 3) Move KeyValue#Type to upper level
> >>>>>>>>>> 
> >>>>>>>>>> Any suggestions?
> >>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>> What is the problem that we are trying to solve Chia-Ping? You
> >>>>> want to
> >>>>>>> make
> >>>>>>>>> Cells of a new Type?
> >>>>>>>>> 
> >>>>>>>>> My first reaction is that KV#Type is particular to the KV
> >>>>>>> implementation.
> >>>>>>>>> Any new Cell implementation should not have to adopt the
> >>> KeyValue
> >>>>>>> typing
> >>>>>>>>> mechanism.
> >>>>>>>>> 
> >>>>>>>>> S
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>>> --
> >>>>>>>>>> Chia-Ping
> >>>>>>>>>> 
> >>>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> --
> >>>>>>>> Best regards,
> >>>>>>>> Andrew
> >>>>>>>> 
> >>>>>>>> Words like orphans lost among the crosstalk, meaning torn from
> >>>>> truth's
> >>>>>>>> decrepit hands
> >>>>>>>>   - A23, Crosstalk
> >>>>>>>> 
> >>>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> --
> >>>>>> Best regards,
> >>>>>> Andrew
> >>>>>> 
> >>>>>> Words like orphans lost among the crosstalk, meaning torn from
> >>> truth's
> >>>>>> decrepit hands
> >>>>>>   - A23, Crosstalk
> >>>>>> 
> >>>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>> --
> >>>> Best regards,
> >>>> Andrew
> >>>> 
> >>>> Words like orphans lost among the crosstalk, meaning torn from truth's
> >>>> decrepit hands
> >>>>   - A23, Crosstalk
> >>>> 
> >>> 
> >> 
> >> 
> >> 
> >> -- 
> >> Best regards,
> >> Andrew
> >> 
> >> Words like orphans lost among the crosstalk, meaning torn from truth's
> >> decrepit hands
> >>   - A23, Crosstalk
> >> 
>

Re: [DISCUSS] Move Type out of KeyValue

Reply via email to