Re: ANN: Gloss, a byte-format DSL

pepijn (aka fliebel) Mon, 10 Jan 2011 01:52:35 -0800

The later. The string in my example is \n terminated, so it should
read past the \0, and then the outer list should terminate on the last
\0.


My point is that I need to parse recursive structures, which of course
contain the same terminator as the outer one.

What if I wanted to parse null terminated lists of null terminated
lists of ... of bytes? Like so:

abc\0def\0\0
def\0ghi\0jkl\0\0
\0

Your implementation would just make the outer list read to the first
\0 and call it a day. What I need is that it only checks for a
terminator on the outer list after reading a full inner list, and keep
doing that, until it finds a terminator right after the inner list.

Pepijn

On Jan 6, 7:17 am, Zach Tellman <[email protected]> wrote:
> I'm confused by what you expect the decoded value for that string to
> look like.  Is it:
>
> [["blabla" "hi hi" "jg"] [" grrrr"]]
>
> or
>
> [["blabla" "hi hi" "jg\0 grrrr"]]
>
> Zach
>
> On Jan 5, 3:36 am, "pepijn (aka fliebel)" <[email protected]>
> wrote:
>
>
>
>
>
>
>
> > Nothing :(
>
> > $ lein repl
> > "REPL started; server listening on localhost:32399."
> > user=> (use '[gloss core io])
> > nil
> > user=> (defcodec t (repeated (string :utf-8 :delimiters
> > ["\n"]) :delimiters ["\0"]))
> > #'user/t
> > user=> (decode t (.getBytes "blabla\nhihi\njg\0 grrrr\n\0"))
> > java.lang.Exception: Cannot evenly divide bytes into sequence of
> > frames. (NO_SOURCE_FILE:0)
>
> > What I think is happening is that repeated reads up to the first \0
> > and then tries to fit the subframes inside of that. What I think it
> > *should* do is, check the next byte for a delimiter, if not, read a
> > subframe, rinse and repeat.
>
> > Pepijn
>
> > On Jan 2, 7:38 pm, Zach Tellman <[email protected]> wrote:
>
> > > There was a bug in repeated, which is fixed now.  Pull the latest from
> > > github or clojars and let me know how it goes.
>
> > > Zach
>
> > > On Jan 2, 3:29 am, "pepijn (aka fliebel)" <[email protected]>
> > > wrote:
>
> > > > Okay, clicked send to early. The header also contains 0-bytes, so the
> > > > repeated stops 'mid-sentence' and tries to balance things. Is there
> > > > any wayGlosscan handle nested structures like this?
>
> > > > On Jan 2, 12:20 pm, "pepijn (aka fliebel)" <[email protected]>
> > > > wrote:
>
> > > > > Hi,
>
> > > > > Thanks for helping out. After using codecs rather than frames, I get
> > > > > even weirder errors.
>
> > > > > My code now gives me: java.lang.Exception: Cannot evenly divide bytes
> > > > > into sequence of frames.
>
> > > > > Hard coding the header followed by a terminating :byte works as it
> > > > > should:
>
> > > > > (decode (compile-frame [tag tstring (header tag (memoize #(compile-
> > > > > frame [(name %) tstring (get-tag %)])) (comp symbol first)) :byte])
> > > > > data)
> > > > > [:compound "hello world" ["string" "name" "Bananrama"] 0]
>
> > > > > So the problem seems to be with the repeated. Investigating that, I
> > > > > copied an example from the introduction page, and this is the result:
> > > > > (defcodec t (repeated (string :utf-8 :delimiters ["\n"]) :delimiters
> > > > > ["\0"]))
> > > > > (encode t ["foo" "bar" "baz"])
> > > > > java.lang.IllegalArgumentException: Don't know how to create ISeq
> > > > > from: java.nio.HeapByteBuffer
> > > > > ((#<HeapByteBuffer java.nio.HeapByteBuffer[pos=0 lim=3 cap=3]>
> > > > > #<HeapByteBuffer java.nio.HeapByteBuffer[pos=0 lim=1 cap=1]>)
> > > > > (#<HeapByteBuffer java.nio.HeapByteBuffer[pos=0 lim=3 cap=3]>
> > > > > #<HeapByteBuffer java.nio.HeapByteBuffer[pos=0 lim=1 cap=1]>)
>
> > > > > But on the other hand:
> > > > > (decode t (.getBytes "blabla\nhihi\ngrrrr\n\0"))
> > > > > ["blabla" "hihi" "grrrr"]
>
> > > > > This gives me the same error as my code, but since the header in my
> > > > > code seems correct, I don't see why it has leftover bytes.
> > > > > (decode t (.getBytes "blabla\nhihi\ngrrrr\0"))
> > > > > java.lang.Exception: Cannot evenly divide bytes into sequence of
> > > > > frames.
>
> > > > > By the way, is there an easy way to get something readable out of
> > > > > encode? Like Unix hexdump, or even just a seq of integers. Debugging
> > > > > is a weak point inGlossso far, if you ask me.
>
> > > > > Thanks!
>
> > > > > Pepijn de Vos
>
> > > > > On Jan 1, 10:47 pm, Zach Tellman <[email protected]> wrote:
>
> > > > > > The header->body function in (header ...) must return a codec, so 
> > > > > > you
> > > > > > need to call compile-frame on the vector you're generating.  Since 
> > > > > > you
> > > > > > don't want to call compile-frame every time you decode a frame, you
> > > > > > can memoize the function.  A version that does both can be found 
> > > > > > athttps://gist.github.com/762031.
>
> > > > > > I agree that the way the enumeration and types are blurred in your
> > > > > > code is a little confusing.  You could create a stronger distinction
> > > > > > by calling your enumerated types :tag-byte, :tag-int32, etc, and 
> > > > > > then
> > > > > > defining a map from those tags onto :byte, :int32, and so on.
>
> > > > > > Zach
>
> > > > > > On Jan 1, 1:01 pm, "pepijn (aka fliebel)" <[email protected]>
> > > > > > wrote:
>
> > > > > > > Hey,
>
> > > > > > > I am tryingGlossfor reading NBT [1] files.
>
> > > > > > > First thing I did like is that it seems to make things real easy.
> > > > > > > First thing I did not like is the weak separation between types
> > > > > > > like :byte and extra data like :foo.
>
> > > > > > > I think I'm nearly done with the NBT reader [2], but I ran into a
> > > > > > > problem. Whatever I put in the header form, I get exceptions like
> > > > > > > this:
>
> > > > > > > java.lang.IllegalArgumentException: No implementation of
> > > > > > > method: :sizeof of protocol: #'gloss.core.protocols/Writer found 
> > > > > > > for
> > > > > > > class: clojure.lang.PersistentVector
>
> > > > > > > Only thing it mentions in the stacktrace [3] is methods on a 
> > > > > > > reify,
> > > > > > > which calls the same method again, or in the most recent case, 
> > > > > > > just
> > > > > > > return nil.
>
> > > > > > > [1]http://www.minecraft.net/docs/NBT.txt
> > > > > > > [2]https://gist.github.com/761997
> > > > > > > [3]http://pastebin.com/AqrsbjuS
>
> > > > > > > On Nov 28 2010, 8:14 pm, Zach Tellman <[email protected]> wrote:
>
> > > > > > > > You're right, that's an omission from the frame syntax.  I'll 
> > > > > > > > add the
> > > > > > > > ability for all or part of the frame to be scoped as (little-
> > > > > > > > endian ...) and (big-endian ...), with big-endian as the 
> > > > > > > > default.
>
> > > > > > > > Just as a side-note, though, Calx [1] is already handling 
> > > > > > > > little-
> > > > > > > > endian data by using encode-to-buffer, where it's writing to a 
> > > > > > > > buffer
> > > > > > > > whose endianness has been preset.   This obviously isn't a 
> > > > > > > > general
> > > > > > > > solution, but just thought I'd point it out.
>
> > > > > > > > Zach
>
> > > > > > > > [1]https://github.com/ztellman/calx
>
> > > > > > > > On Nov 28, 8:50 am, zoka <[email protected]> wrote:
>
> > > > > > > > > IfGlossis to decode incoming packet (byte array) in 
> > > > > > > > > little-endian
> > > > > > > > > format it is straightforward:
> > > > > > > > > Wrap byte array into ByteBuffer b, invoke 
> > > > > > > > > b.order(LITTLE_ENDIAN) and
> > > > > > > > > pass b to decode function that will return Clojure map of 
> > > > > > > > > decoded
> > > > > > > > > values.
>
> > > > > > > > > However, when outgoing packet byte array is to be produced 
> > > > > > > > > from map of
> > > > > > > > > values, encode function will always return ByteBuffer in 
> > > > > > > > > default big-
> > > > > > > > > endian format, so resulting byte array extracted form 
> > > > > > > > > ByteBuffer using
> > > > > > > > > get() method will be incorrect.
>
> > > > > > > > > IfGlossis to support little-endian frames, it seems that 
> > > > > > > > > endianness
> > > > > > > > > needs to be part of frame definition. In that caseGlossdecode 
> > > > > > > > > fun
> > > > > > > > > would refuse to accept ByteBuffers with wrong order() and 
> > > > > > > > > encode fun
> > > > > > > > > will always generate the correct result.
>
> > > > > > > > > Zoka
>
> > > > > > > > > On Nov 25, 3:00 am, Zach Tellman <[email protected]> wrote:
>
> > > > > > > > > > ByteBuffers have an order() method which allows you to 
> > > > > > > > > > toggle the
> > > > > > > > > > endianness.  I haven't tested this, but since everything is 
> > > > > > > > > > built on
> > > > > > > > > > top of Java's ByteBuffer functionality it should be fine as 
> > > > > > > > > > long as
> > > > > > > > > > the ByteBuffers are correctly set and correctly ordered 
> > > > > > > > > > with respect
> > > > > > > > > > to each other.
>
> > > > > > > > > > Zach
>
> > > > > > > > > > On Nov 23, 2:52 pm, zoka <[email protected]> wrote:
>
> > > > > > > > > > > JVM stores numbers in in big endian format - is there a 
> > > > > > > > > > > way to process
> > > > > > > > > > > binary stream containing little endian numbers?
>
> > > > > > > > > > > Zoka
>
> > > > > > > > > > > On Nov 24, 7:24 am, Zach Tellman <[email protected]> 
> > > > > > > > > > > wrote:
>
> > > > > > > > > > > > Good question.  The solution didn't make the cut for my 
> > > > > > > > > > > > initial
> > > > > > > > > > > > release, but will be added soon.  My plan is to have an 
> > > > > > > > > > > > (ordered-
> > > > > > > > > > > > map ...) frame which encodes and decodes the keys in 
> > > > > > > > > > > > the given order.
> > > > > > > > > > > > So for C interop, the frame would be
>
> > > > > > > > > > > > (ordered-map :a :int16, :b :float32)
>
> > > > > > > > > > > > An alternative would be to just turn any vector which 
> > > > > > > > > > > > is alternating
> > > > > > > > > > > > keys and types into an ordered-map, but that seems a 
> > > > > > > > > > > > bit too magical.
>
> > > > > > > > > > > > Zach
>
> > > > > > > > > > > > On Nov 23, 12:12 pm, Chris Perkins 
> > > > > > > > > > > > <[email protected]> wrote:
>
> > > > > > > > > > > > > On Nov 23, 12:03 pm, Zach Tellman 
> > > > > > > > > > > > > <[email protected]> wrote:
>
> > > > > > > > > > > > > > When writing Calx [1], I discovered it was a huge 
> > > > > > > > > > > > > > pain to deal with
> > > > > > > > > > > > > > mixed C datatypes in Java.  When writing Aleph [2], 
> > > > > > > > > > > > > > I discovered the
> > > > > > > > > > > > > > problem increases by a factor of ten when dealing 
> > > > > > > > > > > > > > with streams of
> > > > > > > > > > > > > > bytes.  In an attempt to alleviate my own pain, and 
> > > > > > > > > > > > > > hopefully help a
> > > > > > > > > > > > > > few other people out, I've writtenGloss, which can 
> > > > > > > > > > > > > > transform a simple
> > > > > > > > > > > > > > byte-format specification into an encoder and 
> > > > > > > > > > > > > > streaming decoder.
>
> > > > > > > > > > > > > > A full writeup can be found 
> > > > > > > > > > > > > > athttps://github.com/ztellman/gloss/wiki.
>
> > > > > > > > > > > > > > A few people have already asked me how this differs 
> > > > > > > > > > > > > > from protocol
> > > > > > > > > > > > > > buffers, so I'll preemptively answer that protocol 
> > > > > > > > > > > > > > buffers are a fixed
> > > > > > > > > > > > > > format that cannot be used to interface with 
> > > > > > > > > > > > > > external systems.  Gloss
> > > > > > > > > > > > > > is less performant than protocol buffers, but is 
> > > > > > > > > > > > > > also much less picky
> > > > > > > > > > > > > > about formats.
>
> > > > > > > > > > > > > > If anyone has any questions, I'd be happy to answer 
> > > > > > > > > > > > > > them....
>
> read more »

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: ANN: Gloss, a byte-format DSL

Reply via email to