Ah I see, Stephan and I had a quick chat and it's for cases where there are 42s around the edges of the key/namespace.
On Mon, 18 Jul 2016 at 11:51 Aljoscha Krettek <aljos...@apache.org> wrote: > In which cases is it not solved? Because then we should make sure to solve > it. > > On Mon, 18 Jul 2016 at 10:33 Stephan Ewen <se...@apache.org> wrote: > >> Got it. But the ambiguity is not really solved by that, just lessened. >> >> On Sun, Jul 17, 2016 at 2:10 PM, Aljoscha Krettek <aljos...@apache.org> >> wrote: >> >> > @Stephan It's not about the serializers not being able to read the key. >> The >> > key/namespace are never read again. It's just about the serialized form >> > possibly being ambiguous since we don't control the TypeSerializers and >> > there might be wanky var-length encoding schemes and what not. >> > >> > On Fri, 15 Jul 2016 at 19:20 Timothy Farkas < >> timothytiborfar...@gmail.com> >> > wrote: >> > >> > > I've faced a similar issue when serializing data two a key value >> store. >> > Not >> > > sure how helpful it is for this case but two possible solutions I've >> used >> > > for persisting keys and values under different namespaces to the same >> key >> > > value store are: >> > > >> > > - have all namespaces be the same number of bytes and prefix each key >> > with >> > > its namespace. >> > > - Include the number of bytes in the name space and key. So the bytes >> > would >> > > look like this: >> > > >> > > [name space num bytes] [ name space] [key num bytes] [key] >> > > >> > > Thanks, >> > > Tim >> > > >> > > On Fri, Jul 15, 2016 at 9:45 AM, Stephan Ewen <se...@apache.org> >> wrote: >> > > >> > > > Every serializer should know how many bytes to consume. The key >> > > serializer >> > > > should not need to look for 42 to know where to terminate. >> > > > >> > > > Otherwise this would be a problem case: >> > > > key[42, 42] - 42 - namespace [42, 42, 42] >> > > > key[42, 42, 42] - 42 - namespace [42, 42] >> > > > >> > > > >> > > > >> > > > On Fri, Jul 15, 2016 at 5:38 PM, Aljoscha Krettek < >> aljos...@apache.org >> > > >> > > > wrote: >> > > > >> > > > > I left that in on purpose to protect against cases where the >> > > combination >> > > > > of key and namespace can be ambiguous. For example, these two >> > > > combinations >> > > > > of key and namespace have the same written representation: >> > > > > key [0 1 2] namespace [3 4 5] (values in brackets are byte arrays) >> > > > > key [0 1] namespace [2 3 4 5] >> > > > > >> > > > > having the "magic number" in there protects against such cases. >> > > > > >> > > > > On Fri, 15 Jul 2016 at 16:31 Stephan Ewen <se...@apache.org> >> wrote: >> > > > > >> > > > >> My assumption is that this was a sanity check that actually just >> > stuck >> > > > in >> > > > >> the code. >> > > > >> >> > > > >> It can probably be removed. >> > > > >> >> > > > >> PS: Moving this to the dev@flink.apache.org list... >> > > > >> >> > > > >> >> > > > >> >> > > > >> On Fri, Jul 15, 2016 at 11:05 AM, 刘彪 <mmyy1...@gmail.com> wrote: >> > > > >> >> > > > >> > In AbstractRocksDBState.writeKeyAndNamespace(): >> > > > >> > >> > > > >> > protected void writeKeyAndNamespace(DataOutputView out) throws >> > > > >> IOException >> > > > >> > { >> > > > >> > backend.keySerializer().serialize(backend.currentKey(), out); >> > > > >> > out.writeByte(42); >> > > > >> > namespaceSerializer.serialize(currentNamespace, out); >> > > > >> > } >> > > > >> > >> > > > >> > Why write a byte 42 between key and namespace? The >> keySerializer >> > and >> > > > >> > namespaceSerializer know their lengths. It seems we don't need >> > this >> > > > >> byte. >> > > > >> > >> > > > >> > Could anybody tell me what it is for? Is there any situation >> that >> > > we >> > > > >> must >> > > > >> > have this separator? >> > > > >> > >> > > > >> >> > > > > >> > > > >> > > >> > >> >