Hello,

I've been playing around with tnetstrings over Unix pipes in a few
different programs.  Thanks for publishing a nice simple web page
describing them.

The main one is a web server, which originally used JSON to
communicate between the server and apps.  I recently changed it to use
tnetstrings, which is a perfect replacement for JSON because it's
simpler, faster, and can have byte strings as fields.  So I went
through a similar design evolution as Mongrel2.

The original use case was writing web apps in R, but now it supports
multiple languages.  In particular, I'm adding protocol buffer support
for statically typed languages, so I'm storing encoded protobufs
inside a tnetstrings payload, which is one of the reasons why byte
strings are important.

One modification I've made in my implementation:

1. $ is a tag for a utf-8 encoded string.

Why do this instead of just having a schema?  One reason is that I
want the protocol to be usable without a schema (then you have hassles
of schema transmission, etc).  And also because protocol buffers,
which are a part of my project, have the distinction between bytes and
strings.  Protocol buffers originally had byte strings only -- but
unicode was deemed important enough to add, so I'm taking that as a
cue.  And also, when coding in Python/Java/C++ it's just natural to
keep the strings and bytes types separate.  Programming languages make
the distinction, so to reduce friction I think the serialization
format should as well.

2. Another thing I noticed:

0:~ is null, but 4:true! is True.

It seems more symmetrical if one of these is true:

4:null~ is null and 4:true! is True.

0:~ is null and 1:T? is True (and 1:F? is False).

This is mostly a cosmetic issue, but it could also be significantly
faster/smaller for certain types of data.


If I want to add a $ type and a ? type, I want to check here to see
what is preferred:

1) I could just add them to my implementation.  This could be
confusing since it would emit data that other parsers wouldn't handle.
2) I could add it and rename it to something else besides
"tnetstrings".  Since you described it first I don't want to
bastardize it under the same name.
3) Add them to the tnetstrings spec if you're convinced?
4) ?

I have actually been calling it "tnet" in my head because it's short
to type and sort of "rhymes" with json -- i.e. the API I chose is
tnet.dumps and tnet.loads, like json.dumps and json.loads from the
Python standard library.  I could call it "pnet" or something if
"tnet" is too close.

e.g. here is the R version, the Python version is based on the web
page (Go version is an experiment; not close to done):

http://code.google.com/p/tnet/source/browse/R/tnet.R

-----

Also, I haven't fully thought about floats (they're important for R,
but I might end up tunneling R float vectors through byte strings for
parsing speed), but this article:

http://research.swtch.com/ftoa

may be relevant to this:

http://aaronblohowiak.com/tnetstrings (asking why the float type isn't
the same as JSON's number type)

From:

http://librelist.com/browser//mongrel2/2011/4/6/some-tnetstring-feedback/#7f5c0a271df4efa98394c58f3b175070

"Actually, that's why I took floats out of mine.  We have to think about
the statement that we're sending a float since they don't translate
reliably between platforms.  If we can make it *very* clear that, no
insane math dude, you do not get the exact same number in Haskell as you
do in Javascript, then it should be fine to do another type for floats."

On second thought: I don't see why this is an issue for tnetstrings
but not JSON?  Various JSON libraries may not produce the exact same
number, but no one seems to be cursing JSON because of this.  I think
if people wanted to guarantee exactness then they would use byte
strings.  But having an "approximate" double seems reasonable.  Every
language has a JSON parser so the code to parse and emit floats could
just be copied from there.

thanks,
Andy

Reply via email to