[jira] [Commented] (HBASE-7320) Remove KeyValue.getBuffer()

2014-02-19 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13905864#comment-13905864
 ] 

Andrew Purtell commented on HBASE-7320:
---

bq. Another tricky part is serialization and deserialization. We do not want to 
change the HFile storage format (I think).

Maybe. I think an ideal state is where we are operating in place on encoded 
cells from disk to socket and vice versa. (For example, 
http://kentonv.github.io/capnproto/encoding.html)

> Remove KeyValue.getBuffer()
> ---
>
> Key: HBASE-7320
> URL: https://issues.apache.org/jira/browse/HBASE-7320
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: stack
> Fix For: 0.99.0
>
> Attachments: 7320-simple.txt
>
>
> In many places this is simple task of just replacing the method name.
> There, however, quite a few places where we assume that:
> # the entire KV is backed by a single byte array
> # the KVs key portion is backed by a single byte array
> Some of those can easily be fixed, others will need their own jiras.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7320) Remove KeyValue.getBuffer()

2014-02-16 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903019#comment-13903019
 ] 

ramkrishna.s.vasudevan commented on HBASE-7320:
---

bq.I think maybe changing every occurrence to Cell is going too far
Yes, i too think its ok to have KeyValue on the memstore.  But may be on the 
server side on the block caches we may have to have a different type of cell.  
So all KVs we can say accept KeyValue but return it as a cell.  Having heapsize 
in a util method is also fine with me.


> Remove KeyValue.getBuffer()
> ---
>
> Key: HBASE-7320
> URL: https://issues.apache.org/jira/browse/HBASE-7320
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: stack
> Fix For: 0.99.0
>
> Attachments: 7320-simple.txt
>
>
> In many places this is simple task of just replacing the method name.
> There, however, quite a few places where we assume that:
> # the entire KV is backed by a single byte array
> # the KVs key portion is backed by a single byte array
> Some of those can easily be fixed, others will need their own jiras.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7320) Remove KeyValue.getBuffer()

2014-02-14 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902066#comment-13902066
 ] 

Matt Corgan commented on HBASE-7320:


{quote}I am just trying to make a patch where in every place possible we will 
refer as Cell rather than KV{quote}I think maybe changing *every* occurrence to 
Cell is going too far.  There are places where we know it is a KeyValue, like 
the memstore, so a method that gets a KeyValue from the memstore should have a 
return type of KeyValue.  This return type will be accepted by callers who want 
a Cell, but it's better because it contains more information.

Because of the above, you can rely on the KeyValue.heapSize() method from the 
memstore, but anywhere you get a Cell, you couldn't rely on heapSize.  If you 
are dealing with Cells, then heapSize should be calculated on a more granular 
basis (the size of the block of encoded bytes that contains the cells).  So I'm 
basically proposing that Cell should not implement heapSize().

I'm not sure if that helps with every situation Ram, just trying to illustrate 
some general thoughts.

> Remove KeyValue.getBuffer()
> ---
>
> Key: HBASE-7320
> URL: https://issues.apache.org/jira/browse/HBASE-7320
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: stack
> Fix For: 0.99.0
>
> Attachments: 7320-simple.txt
>
>
> In many places this is simple task of just replacing the method name.
> There, however, quite a few places where we assume that:
> # the entire KV is backed by a single byte array
> # the KVs key portion is backed by a single byte array
> Some of those can easily be fixed, others will need their own jiras.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7320) Remove KeyValue.getBuffer()

2014-02-14 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901805#comment-13901805
 ] 

Nick Dimiduk commented on HBASE-7320:
-

For the heapsize question, there's further discussion over on HBASE-9383.

> Remove KeyValue.getBuffer()
> ---
>
> Key: HBASE-7320
> URL: https://issues.apache.org/jira/browse/HBASE-7320
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: stack
> Fix For: 0.99.0
>
> Attachments: 7320-simple.txt
>
>
> In many places this is simple task of just replacing the method name.
> There, however, quite a few places where we assume that:
> # the entire KV is backed by a single byte array
> # the KVs key portion is backed by a single byte array
> Some of those can easily be fixed, others will need their own jiras.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7320) Remove KeyValue.getBuffer()

2014-02-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901158#comment-13901158
 ] 

stack commented on HBASE-7320:
--

bq. Fine with moving to CellUtil then.

It might not work but worth giving it a go I'd say.  You'll have a better 
argument for why Cell needs to implement HeapSize after trying (smile).

bq. So CellUtil would be talking with a factory that knows how the cell was 
created right?

I hope not.  Should just ask the Cell.  This might be an argument for Cell 
implementing HeapSize (It knows how much space it has occupied, it knows if its 
data is compressed on heap so will return the Cell overhead + compressed sizes 
whereas a KV will return the size of the backing array.

bq.  So we are agreeing to change the KeyValue format as mentioned in this 
blog...

That is a nice old Matteo blog.  It suggests one way of packing keys, a method 
we should pursue but this will not be the only one.  Thinking about this 
I'd suggest keep in mind Cells that could be formatted as our current KeyValue 
is, how Matteo describes it in his block, and then a third format would be the 
PrefixTree encoding that is a module here in hbase; i.e. the content stays 
encoded even as we traverse it.

bq. ...the way we form this combination may vary.

yes

bq. Sorry if am asking too many questions and going off track

Smile.  Thanks for digging in here [~ram_krish]

> Remove KeyValue.getBuffer()
> ---
>
> Key: HBASE-7320
> URL: https://issues.apache.org/jira/browse/HBASE-7320
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: stack
> Fix For: 0.99.0
>
> Attachments: 7320-simple.txt
>
>
> In many places this is simple task of just replacing the method name.
> There, however, quite a few places where we assume that:
> # the entire KV is backed by a single byte array
> # the KVs key portion is backed by a single byte array
> Some of those can easily be fixed, others will need their own jiras.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7320) Remove KeyValue.getBuffer()

2014-02-13 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901147#comment-13901147
 ] 

ramkrishna.s.vasudevan commented on HBASE-7320:
---

bq.What about when the Cell is a facade over an onheap block? Here again you 
could return the Cell overhead and hope the block is being accounted some other 
way. What if the Cell row, family, qualifier, type, and ts – i.e. the 'key' – 
are onheap and the data offheap? And so on
I like your explanation.  Fine with moving to CellUtil then.  So CellUtil would 
be talking with a factory that knows how the cell was created right?  That 
would internally know what was the heapsize?
Ok one basic question :), may sound silly.  So we are agreeing to change the 
KeyValue format as mentioned in this blog, for eg, 
http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/
and various sources that talks about our KeyValue format.  Or to say 
keyvalueformat may be different?
bq.What if the Cell row, family, qualifier, type, and ts – i.e. the 'key' 
Though we may call this combination as key, the way we form this combination 
may vary.
Sorry if am asking too many questions and going off track. am just trying to 
ensure that we all are same page or atleast am on the same page.

> Remove KeyValue.getBuffer()
> ---
>
> Key: HBASE-7320
> URL: https://issues.apache.org/jira/browse/HBASE-7320
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: stack
> Fix For: 0.99.0
>
> Attachments: 7320-simple.txt
>
>
> In many places this is simple task of just replacing the method name.
> There, however, quite a few places where we assume that:
> # the entire KV is backed by a single byte array
> # the KVs key portion is backed by a single byte array
> Some of those can easily be fixed, others will need their own jiras.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7320) Remove KeyValue.getBuffer()

2014-02-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901141#comment-13901141
 ] 

stack commented on HBASE-7320:
--

bq. So can we make Cell of type HeapSize and Cloneable?

Cloneable makes sense.

HeapSize is how we do our accounting now so makes sense to me that we'd 
implement it.  But what to do when a Cell implementation is but a facade on 
data that is elsewhere, not 'owned' by the Cell?  For instance, say the Cell is 
a facade on direct byte buffers.  What will you return when I call heapSize?  
(I suppose it would be the overhead the Cell consumes on heap, not the sizes of 
data which is offheap?).  What about when the Cell is a facade over an onheap 
block?  Here again you could return the Cell overhead and hope the block is 
being accounted some other way.  What if the Cell row, family, qualifier, type, 
and ts -- i.e. the 'key' -- are onheap and the data offheap?  And so on.

HeapSize is probably unavoidable.  A bunch of basic mechanisms in hbase count 
on it returning a decent answer.

IIRC, I suggested Cell implementing HeapSize in the past and [~mcorgan] asked 
why not have size accounting done by a CellUtil method... This would probably 
be awkward to call in lots of contexts but would that work?


> Remove KeyValue.getBuffer()
> ---
>
> Key: HBASE-7320
> URL: https://issues.apache.org/jira/browse/HBASE-7320
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: stack
> Fix For: 0.99.0
>
> Attachments: 7320-simple.txt
>
>
> In many places this is simple task of just replacing the method name.
> There, however, quite a few places where we assume that:
> # the entire KV is backed by a single byte array
> # the KVs key portion is backed by a single byte array
> Some of those can easily be fixed, others will need their own jiras.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7320) Remove KeyValue.getBuffer()

2014-02-13 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901138#comment-13901138
 ] 

ramkrishna.s.vasudevan commented on HBASE-7320:
---

One more thing is any change in kv object, the way we create kv object changes 
then the heapsize of it will also change and hence the no of kvs the memstore 
can occupy will also change.


> Remove KeyValue.getBuffer()
> ---
>
> Key: HBASE-7320
> URL: https://issues.apache.org/jira/browse/HBASE-7320
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: stack
> Fix For: 0.99.0
>
> Attachments: 7320-simple.txt
>
>
> In many places this is simple task of just replacing the method name.
> There, however, quite a few places where we assume that:
> # the entire KV is backed by a single byte array
> # the KVs key portion is backed by a single byte array
> Some of those can easily be fixed, others will need their own jiras.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7320) Remove KeyValue.getBuffer()

2014-02-13 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901134#comment-13901134
 ] 

ramkrishna.s.vasudevan commented on HBASE-7320:
---

bq.We're seeking to something identified by its coordinates, which in our case 
are row-key, family, column identifier, and timestamp. Passing the "key" was 
just a convenient way to pass of these together.
Yes. So either the comparator would change or the way we build this key should 
change. 
So can we make Cell of type HeapSize and Cloneable?

> Remove KeyValue.getBuffer()
> ---
>
> Key: HBASE-7320
> URL: https://issues.apache.org/jira/browse/HBASE-7320
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: stack
> Fix For: 0.99.0
>
> Attachments: 7320-simple.txt
>
>
> In many places this is simple task of just replacing the method name.
> There, however, quite a few places where we assume that:
> # the entire KV is backed by a single byte array
> # the KVs key portion is backed by a single byte array
> Some of those can easily be fixed, others will need their own jiras.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7320) Remove KeyValue.getBuffer()

2014-02-13 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901126#comment-13901126
 ] 

ramkrishna.s.vasudevan commented on HBASE-7320:
---

bq.How you think we fix these Lars? Would the best approach now be to try and 
implement a new Cell type altogether? That would shake out any reliance on KV?
Stack, you asked this question in one of the above comments.
I think your point is valid.
So my thinking was that may be if we could implement a different KeyValue 
altogether.  May be all the components of the kv are in individual byte arrays. 
bq.The server just iterates the Codec and reconstitutes Cells as it thinks best.
So this what i mean by saying use a new codec but again tightly couple it with 
current KeyValue. Hope you got what am trying to say here. 
 bq.But you know, I don't have this worked out end to end. The Codecs and their 
API made sense doing the RPC. Might not be best fit for in and out of hfiles 
(though I like the idea of the hfile block being able to come up into block 
cache and our being able to keep blocks around and have Cell iterators over 
them that rehydrate Cells only when necessary (and not if it can be avoided).
I think atleast in cases where we need to strip the tags from the KVs, we are 
surely in need of a format which would help us do that easily without having a 
needs to reconstruct the KV as how are bound to do now.  I am just trying to 
make a patch where in every place possible we will refer as Cell rather than 
KV.  This would  mean that even if we change the format of the KV or use a new 
type of Cell our internal code does not change. 


> Remove KeyValue.getBuffer()
> ---
>
> Key: HBASE-7320
> URL: https://issues.apache.org/jira/browse/HBASE-7320
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: stack
> Fix For: 0.99.0
>
> Attachments: 7320-simple.txt
>
>
> In many places this is simple task of just replacing the method name.
> There, however, quite a few places where we assume that:
> # the entire KV is backed by a single byte array
> # the KVs key portion is backed by a single byte array
> Some of those can easily be fixed, others will need their own jiras.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7320) Remove KeyValue.getBuffer()

2014-02-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901117#comment-13901117
 ] 

stack commented on HBASE-7320:
--

bq. Also I am still not getting the notion of using the same KeyValue.java 
class but use a different codec? 

What are you referring to here [~ram_krish]?

bq. I think somewhere the codec type and the cell type used should be matched 
up? 

At the moment we have KeyValueCodec or CellCodec where the Codec is type 
particular in the first case but not so in the second.  For first case, the 
decoder would return KeyValues.  KeyValue implements Cell so it should pass 
through a Cell-based server fine.

We'd more want codecs like the second above where the type of Cell is not 
dictated by the Codec.  The server just iterates the Codec and reconstitutes 
Cells as it thinks best.

But you know, I don't have this worked out end to end.  The Codecs and their 
API made sense doing the RPC. Might not be best fit for in and out of hfiles 
(though I like the idea of the hfile block being able to come up into block 
cache and our being able to keep blocks around and have Cell iterators over 
them that rehydrate Cells only when necessary (and not if it can be avoided).

bq. Some where in the code we are using KeyValue.getRow(), which currently 
returns the entire byte[]. But that should be ideally keyvalue.getRowArray().

Yeah, that seems wrong.

> Remove KeyValue.getBuffer()
> ---
>
> Key: HBASE-7320
> URL: https://issues.apache.org/jira/browse/HBASE-7320
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: stack
> Fix For: 0.99.0
>
> Attachments: 7320-simple.txt
>
>
> In many places this is simple task of just replacing the method name.
> There, however, quite a few places where we assume that:
> # the entire KV is backed by a single byte array
> # the KVs key portion is backed by a single byte array
> Some of those can easily be fixed, others will need their own jiras.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7320) Remove KeyValue.getBuffer()

2014-02-13 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901110#comment-13901110
 ] 

ramkrishna.s.vasudevan commented on HBASE-7320:
---

bq.Could you implement a NoopDataBlockEncoder that writes the same format as 
unencoded blocks? 
This is the way that i started of when we wanted Tags in the hfiles.
bq.On HFile, it needs a redo especially when you come up through compressors 
and codecs
Yes.  I think we cannot have the stream as it is currently holding it. .May be 
the serilization we could do, but the deserialization should not be so.  It 
should be based on the codec format.
I agree with Stack here.  
Also I am still not getting the notion of using the same KeyValue.java class 
but use a different codec?  Should we have factory that reads the type of Cell 
we would be using an then create those instances of Cell.  
I think somewhere the codec type and the cell type used should be matched up? 
Some where in the code we are using KeyValue.getRow(), which currently returns 
the entire byte[].  But that should be ideally keyvalue.getRowArray().  

> Remove KeyValue.getBuffer()
> ---
>
> Key: HBASE-7320
> URL: https://issues.apache.org/jira/browse/HBASE-7320
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: stack
> Fix For: 0.99.0
>
> Attachments: 7320-simple.txt
>
>
> In many places this is simple task of just replacing the method name.
> There, however, quite a few places where we assume that:
> # the entire KV is backed by a single byte array
> # the KVs key portion is backed by a single byte array
> Some of those can easily be fixed, others will need their own jiras.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7320) Remove KeyValue.getBuffer()

2014-02-13 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901025#comment-13901025
 ] 

Matt Corgan commented on HBASE-7320:


{quote}Another tricky part is serialization and deserialization. We do not want 
to change the HFile storage format (I think).{quote}Could you implement a 
NoopDataBlockEncoder that writes the same format as unencoded blocks?  (might 
need a trick to get the block header correct).  This may allow you to delete 
the original unencoded serialization path and treat everything as encoded.  It 
might delete a lot of the code that otherwise would have to be converted to 
cells.

> Remove KeyValue.getBuffer()
> ---
>
> Key: HBASE-7320
> URL: https://issues.apache.org/jira/browse/HBASE-7320
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: stack
> Fix For: 0.99.0
>
> Attachments: 7320-simple.txt
>
>
> In many places this is simple task of just replacing the method name.
> There, however, quite a few places where we assume that:
> # the entire KV is backed by a single byte array
> # the KVs key portion is backed by a single byte array
> Some of those can easily be fixed, others will need their own jiras.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7320) Remove KeyValue.getBuffer()

2014-02-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13900665#comment-13900665
 ] 

stack commented on HBASE-7320:
--

I read through your note [~ram_krish]  What [~lhofhansl] says.

On HFile, it needs a redo especially when you come up through compressors and 
codecs.  HFile APIs have KV hardwiring.  Start over might be appropriate, v4.  
Unless better idea, for serializations, lets do cellblocks recording the codec 
used in the hfile metadata.

> Remove KeyValue.getBuffer()
> ---
>
> Key: HBASE-7320
> URL: https://issues.apache.org/jira/browse/HBASE-7320
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: stack
> Fix For: 0.99.0
>
> Attachments: 7320-simple.txt
>
>
> In many places this is simple task of just replacing the method name.
> There, however, quite a few places where we assume that:
> # the entire KV is backed by a single byte array
> # the KVs key portion is backed by a single byte array
> Some of those can easily be fixed, others will need their own jiras.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7320) Remove KeyValue.getBuffer()

2014-02-13 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13900599#comment-13900599
 ] 

Lars Hofhansl commented on HBASE-7320:
--

bq. Currently everywhere in the code we are passing KeyValue in the args and 
also returning keyvalue everywhere.
I think that not a problem per se, as long as nobody calls getBuffer, getKey, 
etc.
I have not quite wrapped my head around Cell vs. KeyValue. 

bq. KVComparator some of the apis still take KeyValue as the argument. I think 
we could change that.
We could and probably should. But we do not need to as long the comparator does 
not assume anything beyond a continuous layout of row-key, family, column.

bq. In StoreFileScanner we do reseekTo and seekTo that seeks to a key ( this is 
not the rowkey). So can we change this to the rowkey?
We're seeking to something identified by its coordinates, which in our case are 
row-key, family, column identifier, and timestamp. Passing the "key" was just a 
convenient way to pass of these together.

Another tricky part is serialization and deserialization. We do not want to 
change the HFile storage format (I think). It is not incorrect to deserialize a 
KeyValue into a single byte[] (as long as nobody relies on it). So we could 
keep the current storage format and have serialization code that generates it 
from the individual byte[]'s (i.e. we calculate the length and keylength from 
the individual parts and then write them to the HFile just like it would have 
been written now). Upon read we always deserilize into the current format.

> Remove KeyValue.getBuffer()
> ---
>
> Key: HBASE-7320
> URL: https://issues.apache.org/jira/browse/HBASE-7320
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: stack
> Fix For: 0.99.0
>
> Attachments: 7320-simple.txt
>
>
> In many places this is simple task of just replacing the method name.
> There, however, quite a few places where we assume that:
> # the entire KV is backed by a single byte array
> # the KVs key portion is backed by a single byte array
> Some of those can easily be fixed, others will need their own jiras.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-7320) Remove KeyValue.getBuffer()

2014-02-13 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13900263#comment-13900263
 ] 

ramkrishna.s.vasudevan commented on HBASE-7320:
---

Replacing KeyValues with cells
==
I am trying to work on this, as a first step seeing how to replace the 
references to Keyvalue as cells and seeing how to ensure a new KeyValue format 
to fit in this code base.
Currently everywhere in the code we are passing KeyValue in the args and also 
returning keyvalue everywhere.
There are some places where there are specific instances of Keyvalue methods 
getting used like in TimeRangeTracker isDeleteColumnOrFamily. Already 
CellUtil.java has some of them.
I think we can move this to some helper class.

When we trying to apply a different format of KeyValue say if we have 
individual byte arrays for rows, families, qualifiers, values and tags (if 
present), like Cellcodec
we need to handle these cases.  
As in the comment 
https://issues.apache.org/jira/browse/HBASE-7320?focusedCommentId=13882073&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13882073
we may have to change the existing keyvalue only but support a new format, that 
would help us in avoiding these problems.

Also take the case if we implement a new Cell type then every where we may need 
to instantiate that new class but reference it with a cell. And change the args
and return type every where to Cell.
In turn this would require changes through out the code.
KVComparator some of the apis still take KeyValue as the argument.  I think we 
could change that.
Should Cell interface itself extend HeapSize and Cloneable?  I think this would 
be needed. - (I can file JIRA for these two points).  
Memstore.maybeCloneWithAllocator is one place where may have to use the clone 
method of the Cell's implementation

In StoreFileScanner we do reseekTo and seekTo that seeks to a key ( this is not 
the rowkey).  So can we change this to the rowkey?
Also I would suggest that all the StoreScanner, KeyValueSCanner etc. interfaces 
can be changed to work with Cell like the seek, reseek etc.  Let me know what 
you guys feel so that I can raise individual subtasks for them.

> Remove KeyValue.getBuffer()
> ---
>
> Key: HBASE-7320
> URL: https://issues.apache.org/jira/browse/HBASE-7320
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: stack
> Fix For: 0.99.0
>
> Attachments: 7320-simple.txt
>
>
> In many places this is simple task of just replacing the method name.
> There, however, quite a few places where we assume that:
> # the entire KV is backed by a single byte array
> # the KVs key portion is backed by a single byte array
> Some of those can easily be fixed, others will need their own jiras.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)