Re: In AbstractRocksDBState, why write a byte 42 between key and namespace?

Maximilian Michels Wed, 20 Jul 2016 07:08:51 -0700

Is there a JIRA issue for this?


On Mon, Jul 18, 2016 at 12:15 PM, Aljoscha Krettek <aljos...@apache.org> wrote:
> Ah I see, Stephan and I had a quick chat and it's for cases where there are
> 42s around the edges of the key/namespace.
>
> On Mon, 18 Jul 2016 at 11:51 Aljoscha Krettek <aljos...@apache.org> wrote:
>
>> In which cases is it not solved? Because then we should make sure to solve
>> it.
>>
>> On Mon, 18 Jul 2016 at 10:33 Stephan Ewen <se...@apache.org> wrote:
>>
>>> Got it. But the ambiguity is not really solved by that, just lessened.
>>>
>>> On Sun, Jul 17, 2016 at 2:10 PM, Aljoscha Krettek <aljos...@apache.org>
>>> wrote:
>>>
>>> > @Stephan It's not about the serializers not being able to read the key.
>>> The
>>> > key/namespace are never read again. It's just about the serialized form
>>> > possibly being ambiguous since we don't control the TypeSerializers and
>>> > there might be wanky var-length encoding schemes and what not.
>>> >
>>> > On Fri, 15 Jul 2016 at 19:20 Timothy Farkas <
>>> timothytiborfar...@gmail.com>
>>> > wrote:
>>> >
>>> > > I've faced a similar issue when serializing data two a key value
>>> store.
>>> > Not
>>> > > sure how helpful it is for this case but two possible solutions I've
>>> used
>>> > > for persisting keys and values under different namespaces to the same
>>> key
>>> > > value store are:
>>> > >
>>> > > - have all namespaces be the same number of bytes and prefix each key
>>> > with
>>> > > its namespace.
>>> > > - Include the number of bytes in the name space and key. So the bytes
>>> > would
>>> > > look like this:
>>> > >
>>> > > [name space num bytes] [ name space] [key num bytes] [key]
>>> > >
>>> > > Thanks,
>>> > > Tim
>>> > >
>>> > > On Fri, Jul 15, 2016 at 9:45 AM, Stephan Ewen <se...@apache.org>
>>> wrote:
>>> > >
>>> > > > Every serializer should know how many bytes to consume. The key
>>> > > serializer
>>> > > > should not need to look for 42 to know where to terminate.
>>> > > >
>>> > > > Otherwise this would be a problem case:
>>> > > > key[42, 42] - 42 - namespace [42, 42, 42]
>>> > > > key[42, 42, 42] - 42 - namespace [42, 42]
>>> > > >
>>> > > >
>>> > > >
>>> > > > On Fri, Jul 15, 2016 at 5:38 PM, Aljoscha Krettek <
>>> aljos...@apache.org
>>> > >
>>> > > > wrote:
>>> > > >
>>> > > > > I left that in on purpose to protect against cases where the
>>> > > combination
>>> > > > > of key and namespace can be ambiguous. For example, these two
>>> > > > combinations
>>> > > > > of key and namespace have the same written representation:
>>> > > > > key [0 1 2] namespace [3 4 5] (values in brackets are byte arrays)
>>> > > > > key [0 1] namespace [2 3 4 5]
>>> > > > >
>>> > > > > having the "magic number" in there protects against such cases.
>>> > > > >
>>> > > > > On Fri, 15 Jul 2016 at 16:31 Stephan Ewen <se...@apache.org>
>>> wrote:
>>> > > > >
>>> > > > >> My assumption is that this was a sanity check that actually just
>>> > stuck
>>> > > > in
>>> > > > >> the code.
>>> > > > >>
>>> > > > >> It can probably be removed.
>>> > > > >>
>>> > > > >> PS: Moving this to the dev@flink.apache.org list...
>>> > > > >>
>>> > > > >>
>>> > > > >>
>>> > > > >> On Fri, Jul 15, 2016 at 11:05 AM, 刘彪 <mmyy1...@gmail.com> wrote:
>>> > > > >>
>>> > > > >> > In AbstractRocksDBState.writeKeyAndNamespace():
>>> > > > >> >
>>> > > > >> > protected void writeKeyAndNamespace(DataOutputView out) throws
>>> > > > >> IOException
>>> > > > >> > {
>>> > > > >> > backend.keySerializer().serialize(backend.currentKey(), out);
>>> > > > >> > out.writeByte(42);
>>> > > > >> > namespaceSerializer.serialize(currentNamespace, out);
>>> > > > >> > }
>>> > > > >> >
>>> > > > >> > Why write a byte 42 between key and namespace? The
>>> keySerializer
>>> > and
>>> > > > >> > namespaceSerializer know their lengths. It seems we don't need
>>> > this
>>> > > > >> byte.
>>> > > > >> >
>>> > > > >> > Could anybody tell me what it is for?  Is there any situation
>>> that
>>> > > we
>>> > > > >> must
>>> > > > >> > have this separator?
>>> > > > >> >
>>> > > > >>
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>

Re: In AbstractRocksDBState, why write a byte 42 between key and namespace?

Reply via email to