Hello, I've been playing around with tnetstrings over Unix pipes in a few different programs. Thanks for publishing a nice simple web page describing them.
The main one is a web server, which originally used JSON to communicate between the server and apps. I recently changed it to use tnetstrings, which is a perfect replacement for JSON because it's simpler, faster, and can have byte strings as fields. So I went through a similar design evolution as Mongrel2. The original use case was writing web apps in R, but now it supports multiple languages. In particular, I'm adding protocol buffer support for statically typed languages, so I'm storing encoded protobufs inside a tnetstrings payload, which is one of the reasons why byte strings are important. One modification I've made in my implementation: 1. $ is a tag for a utf-8 encoded string. Why do this instead of just having a schema? One reason is that I want the protocol to be usable without a schema (then you have hassles of schema transmission, etc). And also because protocol buffers, which are a part of my project, have the distinction between bytes and strings. Protocol buffers originally had byte strings only -- but unicode was deemed important enough to add, so I'm taking that as a cue. And also, when coding in Python/Java/C++ it's just natural to keep the strings and bytes types separate. Programming languages make the distinction, so to reduce friction I think the serialization format should as well. 2. Another thing I noticed: 0:~ is null, but 4:true! is True. It seems more symmetrical if one of these is true: 4:null~ is null and 4:true! is True. 0:~ is null and 1:T? is True (and 1:F? is False). This is mostly a cosmetic issue, but it could also be significantly faster/smaller for certain types of data. If I want to add a $ type and a ? type, I want to check here to see what is preferred: 1) I could just add them to my implementation. This could be confusing since it would emit data that other parsers wouldn't handle. 2) I could add it and rename it to something else besides "tnetstrings". Since you described it first I don't want to bastardize it under the same name. 3) Add them to the tnetstrings spec if you're convinced? 4) ? I have actually been calling it "tnet" in my head because it's short to type and sort of "rhymes" with json -- i.e. the API I chose is tnet.dumps and tnet.loads, like json.dumps and json.loads from the Python standard library. I could call it "pnet" or something if "tnet" is too close. e.g. here is the R version, the Python version is based on the web page (Go version is an experiment; not close to done): http://code.google.com/p/tnet/source/browse/R/tnet.R ----- Also, I haven't fully thought about floats (they're important for R, but I might end up tunneling R float vectors through byte strings for parsing speed), but this article: http://research.swtch.com/ftoa may be relevant to this: http://aaronblohowiak.com/tnetstrings (asking why the float type isn't the same as JSON's number type) From: http://librelist.com/browser//mongrel2/2011/4/6/some-tnetstring-feedback/#7f5c0a271df4efa98394c58f3b175070 "Actually, that's why I took floats out of mine. We have to think about the statement that we're sending a float since they don't translate reliably between platforms. If we can make it *very* clear that, no insane math dude, you do not get the exact same number in Haskell as you do in Javascript, then it should be fine to do another type for floats." On second thought: I don't see why this is an issue for tnetstrings but not JSON? Various JSON libraries may not produce the exact same number, but no one seems to be cursing JSON because of this. I think if people wanted to guarantee exactness then they would use byte strings. But having an "approximate" double seems reasonable. Every language has a JSON parser so the code to parse and emit floats could just be copied from there. thanks, Andy
