Re: [Open Babel] Open Babel in the browser

2013-06-07 Thread Dimitri Maziuk
On 2013-06-06 22:13, Geoffrey Hutchison wrote:
 Although I'm starting to think that json is such a simple format
 that
it could do without a strict chemical specification. Getting json out of
an OBMol is 5 lines of code

 My concern is the opposite. It's always easy to write to an
 arbitrary
format from an OBMol. Parsing a pile of different formats is a pain,
which is why it'd be better to have a somewhat standardized, extensible
style.


I'd argue that chemdoodle json, cml json, whatever json should be added 
to input/output formats. Openbabel's own json format would obviously 
be OBMols serialized to json. Neither requires making up yet another 
data model.

Dimitri



--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
___
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss


Re: [Open Babel] Open Babel in the browser

2013-06-07 Thread Craig James
Regarding using JSON as a new file format...

This discussion has focussed on the syntax of JSON, but completely
overlooks the real problem with ALL chemical file formats: how do you
handle all of the cases where a simple connection-table (ball and stick)
doesn't capture reality?  Things like aromaticity, tautomers,
organo-metallic bonds, boron-hydrogen cages, distributed bonds (ferrocenes
and the like) ... these are the problems.

If we could solve these problems, it wouldn't much matter which file format
we picked ... they'd all be equivalent and sufficient.  Without solving
these problems, a new file format doesn't really matter very much.  All it
does is make another parser with yet-another-interpretation of these hard
problems.

If JSON is a need, I suggest that you embed an existing chemical format
(see my previous note that uses SMILES) into a JSON object.

Craig
--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j___
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss


Re: [Open Babel] Open Babel in the browser

2013-06-07 Thread Craig James
On Thu, Jun 6, 2013 at 2:11 PM, Patrick Fuller patrickful...@gmail.comwrote:

  Tim,

 I think Dimitri's point is that all the references are implicitly defined
 by list indices, rather than explicit keys. For example, something like

 {
 atoms: {
 C1: { element: C, location: [ 0.230811, 0.380820, -0.610968 ] 
 },
 C2: { element: C, location: [ -0.230811, -0.380820, 0.610968 
 ] }
 },
 bonds: [
 { atoms: [ C1, C2 ], order: 1 }
 ]}

 will result in generally cleaner code. That is,
 molecule[atoms][C1][location] is easier to understand than
 molecule[elements][coords][3d][0]. In that regard, I completely
 agree with him.


If you're going to rely on positions within arrays, why not just do it the
simple way?

{ smiles:  [CCO],
  2D: [1,1,2,2,3,3],
  3D: [1,1,1,2,2,2,3,3,3]
}

The atoms are indexed left-to-right in the SMILES. That's it.  Everything
else keys to that.

A SMILES contains exactly the same information as the atom/bond lists in a
much more compact form.  If you want to avoid the aromaticity problem, just
use Kekule form, which makes it virtually identical to any other connection
table format, but in about 10x to 20x fewer bytes.  SMILES are very easy to
parse, and there are dozens of parsers around.

If we're going to invent yet-another-file-format, can't we at least move
past 1970s atom/bond table technology?

Craig
--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j___
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss


Re: [Open Babel] Open Babel in the browser

2013-06-07 Thread Patrick Fuller
If you're going to rely on positions within arrays, why not just do it the
simple way?

{ smiles: [CCO],
2D: [1,1,2,2,3,3],
3D: [1,1,1,2,2,2,3,3,3]
}

Smiles are a great representation of molecules (especially with
smarts/smirks regex), and, in cases where they can be used, I think they're
the best thing out there. However, they don't cover everything. I work with
metal-organic frameworks, which are large crystals that require more
extensibility than smiles offers (I still use _-separated smiles of the mof
constituents to hash the cif / json files, however). Also, my point in that
previous email is that referencing by index is bad, not good. It's less
direct than explicitly referencing items, which makes the format more
difficult to understand for new users + more prone to user error.

A SMILES contains exactly the same information as the atom/bond lists in a
much more compact form. If you want to avoid the aromaticity problem, just
use Kekule form, which makes it virtually identical to any other connection
table format, but in about 10x to 20x fewer bytes. SMILES are very easy to
parse, and there are dozens of parsers around.

What I truly like about smiles is that it's human readable + hashable,
which I see as the real goal. The shorter length is just a corollary of
that. Prove me wrong, but I think people make too big a deal about size of
molecule formats. I just bought a 2 TB hard disk drive for $70. WIth mongo
db + their json serialization, I estimated that I can put 200 million
verbose json mof structures on that drive. I only have a few thousand, so I
some room to spare.

This discussion has focussed on the syntax of JSON, but completely
overlooks the real problem with ALL chemical file formats: how do you
handle all of the cases where a simple connection-table (ball and stick)
doesn't capture reality? Things like aromaticity, tautomers,
organo-metallic bonds, boron-hydrogen cages, distributed bonds (ferrocenes
and the like) ... these are the problems.

The point of json (and xml) is that they are *extensible*- that's why json
has exploded in the developer community. If you need handles for
aromaticity and metallic bonding, just add new properties to the json/xml.
Because of the extensibility, adding new properties will not break any
existing code. That's the advantage over all of the older table formats,
which weren't built to be extensible. And you see the repercussions in
scientific code all the time. (I was recently handed a project where
someone used heavy metals in molfiles to encode rotational data. That kind
of hack is exactly what json/xml fixes.)

There's also the advantage that many languages don't need a third-party
library to parse a json file. Or, if you do, it's *heavily* supported (ie.
gson for java).

Geoff - Outside of some fairly minor issues, xml translates easily to json.
Could the chemical xml specification just be translated to json?


On Fri, Jun 7, 2013 at 11:32 AM, Craig James cja...@emolecules.com wrote:

 Regarding using JSON as a new file format...

 This discussion has focussed on the syntax of JSON, but completely
 overlooks the real problem with ALL chemical file formats: how do you
 handle all of the cases where a simple connection-table (ball and stick)
 doesn't capture reality?  Things like aromaticity, tautomers,
 organo-metallic bonds, boron-hydrogen cages, distributed bonds (ferrocenes
 and the like) ... these are the problems.

 If we could solve these problems, it wouldn't much matter which file
 format we picked ... they'd all be equivalent and sufficient.  Without
 solving these problems, a new file format doesn't really matter very much.
 All it does is make another parser with yet-another-interpretation of these
 hard problems.

 If JSON is a need, I suggest that you embed an existing chemical format
 (see my previous note that uses SMILES) into a JSON object.

 Craig



 --
 How ServiceNow helps IT people transform IT departments:
 1. A cloud service to automate IT design, transition and operations
 2. Dashboards that offer high-level views of enterprise services
 3. A single system of record for all IT processes
 http://p.sf.net/sfu/servicenow-d2d-j
 ___
 OpenBabel-discuss mailing list
 OpenBabel-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/openbabel-discuss


--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j___
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss


Re: [Open Babel] Open Babel in the browser

2013-06-07 Thread Dimitri Maziuk
On 06/07/2013 12:25 PM, Patrick Fuller wrote:

 Geoff - Outside of some fairly minor issues, xml translates easily to json.
 Could the chemical xml specification just be translated to json?

If you gloss over things like #[P]CDATA, (elt+), (#CDATA|(foo,bar,baz)),
it's trivial. Except for attributes: if you have bad xml, like atom
idx=1 id=C/ instead of atomidx1/idxidC/id/atom, then it
isn't.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j___
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss


Re: [Open Babel] Open Babel in the browser

2013-06-07 Thread Patrick Fuller
I don't think we need to worry about the naming conventions of corner cases
just yet. Taking something basic, ethane in cml

molecule
 atomArray
  atom id=a1 elementType=C x3=0.229656 y3=0.720147 z3=-0.015085/
  atom id=a2 elementType=C x3=-0.229656 y3=-0.720147 z3=0.015085/
 /atomArray
 bondArray
  bond atomRefs2=a1 a2 order=1/
 /bondArray/molecule

and my translation to json

{
atoms: {
a1: { element type: C, x3: 0.229656, y3: 0.720147,
z3: -0.015085 },
a2: { element type: C, x3: -0.229656, y3: -0.720147,
z3: 0.015085 }

}
bonds: [
{atom refs: [a1, a2], order: 1}
]}

it could use some cleaning up, but that's the idea.


On Fri, Jun 7, 2013 at 12:50 PM, Dimitri Maziuk dmaz...@bmrb.wisc.eduwrote:

 On 06/07/2013 12:25 PM, Patrick Fuller wrote:

  Geoff - Outside of some fairly minor issues, xml translates easily to
 json.
  Could the chemical xml specification just be translated to json?

 If you gloss over things like #[P]CDATA, (elt+), (#CDATA|(foo,bar,baz)),
 it's trivial. Except for attributes: if you have bad xml, like atom
 idx=1 id=C/ instead of atomidx1/idxidC/id/atom, then it
 isn't.

 --
 Dimitri Maziuk
 Programmer/sysadmin
 BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



 --
 How ServiceNow helps IT people transform IT departments:
 1. A cloud service to automate IT design, transition and operations
 2. Dashboards that offer high-level views of enterprise services
 3. A single system of record for all IT processes
 http://p.sf.net/sfu/servicenow-d2d-j
 ___
 OpenBabel-discuss mailing list
 OpenBabel-discuss@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/openbabel-discuss


--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j___
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss


Re: [Open Babel] Open Babel in the browser

2013-06-07 Thread Craig James
On Fri, Jun 7, 2013 at 10:25 AM, Patrick Fuller patrickful...@gmail.comwrote:

 A SMILES contains exactly the same information as the atom/bond lists in a
 much more compact form. If you want to avoid the aromaticity problem, just
 use Kekule form, which makes it virtually identical to any other connection
 table format, but in about 10x to 20x fewer bytes. SMILES are very easy to
 parse, and there are dozens of parsers around.

 What I truly like about smiles is that it's human readable + hashable,
 which I see as the real goal. The shorter length is just a corollary of
 that. Prove me wrong, but I think people make too big a deal about size of
 molecule formats. I just bought a 2 TB hard disk drive for $70. WIth mongo
 db + their json serialization, I estimated that I can put 200 million
 verbose json mof structures on that drive. I only have a few thousand, so I
 some room to spare.

I have a database of 10 million compounds. The SDF version, even
compressed, is difficult over the internet.  It's not about disks, it's
about file transfers and database performance.  It's not a matter of a few
bytes here or there (I agree that people worry about file size too much).
It's about a factor of ten or twenty.  Connection-table lists of atoms and
bonds are just a dumb way to represent atoms and bonds.

  This discussion has focussed on the syntax of JSON, but completely
 overlooks the real problem with ALL chemical file formats: how do you
 handle all of the cases where a simple connection-table (ball and stick)
 doesn't capture reality? Things like aromaticity, tautomers,
 organo-metallic bonds, boron-hydrogen cages, distributed bonds (ferrocenes
 and the like) ... these are the problems.

 The point of json (and xml) is that they are *extensible*- that's why
 json has exploded in the developer community.

This isn't necessarily a good thing.  One of the biggest problems in
cheminformatics and molecular modeling is that people have altered existing
formats to suit their own needs ... and that has led to disaster.  There is
no such thing as the PDB format -- rather, you mostly have to know the
origin of a particular PDB file in order to interpret it.  Each project
effectively has its own PDB format.

JSON may be extensible, but that is useless unless there is a widely
recognized authority on the meaning of each extension, along with
open-source software that illustrates a practical application of the
standard.

Never forget the old joke, The great thing about standards is that there
are so many to choose from!  JSON essentially gives you a stronger rope
when you in the process of hanging yourself.

 If you need handles for aromaticity and metallic bonding, just add new
 properties to the json/xml. Because of the extensibility, adding new
 properties will not break any existing code.

Then why have a standard at all? What is the use of new properties if
nobody knows what they mean?  What happens when five projects all introduce
their own syntax and semantics for representing aromaticity and metallic
bonding?  Chaos.

 That's the advantage over all of the older table formats, which weren't
 built to be extensible. And you see the repercussions in scientific code
 all the time.

The real problem had nothing to do with being built to be extensible, but
rather that the table format definitions were controlled by commercial
companies that had no interest in data exchange or in participation by the
chemistry community.

When I created the OpenSMILES.org web page, I more-or-less did it by
stealing the leadership from Daylight, the company that invented SMILES.  I
invited their participation but, while they didn't object to our project,
they also elected to stay out of it.  SMILES now has a future that's in the
hands of the community.  If the community decides to add features, we can
... and we'll all be able to agree on those features.

It might seem as if I'm trying to discourage JSON, but nothing could be
farther from the truth.  A modern, object-oriented, extensible and well
documented format is long overdue.  The CML project is one such (you might
want to look at it for ideas), but it never got traction.  Maybe JSON, with
its widespread use and readily-available software, is just the thing.

If you really want to make JSON a standard, the JSON syntax itself is a
trivial part of the problem. The real problem is establishing standards for
how each datatype is to be interpreted, followed by clear, published
standards for each datatype.  If you let people just add their own
datatypes on an as-you-please basis, you'll just have another Tower of
Babel ... and that's where the name OpenBabel came from in the first place.

Craig
--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record 

Re: [Open Babel] Open Babel in the browser

2013-06-07 Thread Patrick Fuller
Wow, that was a very insightful email. Thank you for writing it.

Getting back to something actionable, what do you think about the idea of
just translating the CML standard to json? Outside of some nuances, XML and
JSON generally accomplish the same thing, so I would think that the
chemical XML standard would be easily translatable to chemical JSON.


On Fri, Jun 7, 2013 at 1:45 PM, Craig James cja...@emolecules.com wrote:

 On Fri, Jun 7, 2013 at 10:25 AM, Patrick Fuller 
 patrickful...@gmail.comwrote:

  A SMILES contains exactly the same information as the atom/bond lists
 in a much more compact form. If you want to avoid the aromaticity problem,
 just use Kekule form, which makes it virtually identical to any other
 connection table format, but in about 10x to 20x fewer bytes. SMILES are
 very easy to parse, and there are dozens of parsers around.

 What I truly like about smiles is that it's human readable + hashable,
 which I see as the real goal. The shorter length is just a corollary of
 that. Prove me wrong, but I think people make too big a deal about size of
 molecule formats. I just bought a 2 TB hard disk drive for $70. WIth mongo
 db + their json serialization, I estimated that I can put 200 million
 verbose json mof structures on that drive. I only have a few thousand, so I
 some room to spare.

 I have a database of 10 million compounds. The SDF version, even
 compressed, is difficult over the internet.  It's not about disks, it's
 about file transfers and database performance.  It's not a matter of a few
 bytes here or there (I agree that people worry about file size too much).
 It's about a factor of ten or twenty.  Connection-table lists of atoms and
 bonds are just a dumb way to represent atoms and bonds.

  This discussion has focussed on the syntax of JSON, but completely
 overlooks the real problem with ALL chemical file formats: how do you
 handle all of the cases where a simple connection-table (ball and stick)
 doesn't capture reality? Things like aromaticity, tautomers,
 organo-metallic bonds, boron-hydrogen cages, distributed bonds (ferrocenes
 and the like) ... these are the problems.

 The point of json (and xml) is that they are *extensible*- that's why
 json has exploded in the developer community.

 This isn't necessarily a good thing.  One of the biggest problems in
 cheminformatics and molecular modeling is that people have altered existing
 formats to suit their own needs ... and that has led to disaster.  There is
 no such thing as the PDB format -- rather, you mostly have to know the
 origin of a particular PDB file in order to interpret it.  Each project
 effectively has its own PDB format.

 JSON may be extensible, but that is useless unless there is a widely
 recognized authority on the meaning of each extension, along with
 open-source software that illustrates a practical application of the
 standard.

 Never forget the old joke, The great thing about standards is that there
 are so many to choose from!  JSON essentially gives you a stronger rope
 when you in the process of hanging yourself.

  If you need handles for aromaticity and metallic bonding, just add new
 properties to the json/xml. Because of the extensibility, adding new
 properties will not break any existing code.

 Then why have a standard at all? What is the use of new properties if
 nobody knows what they mean?  What happens when five projects all introduce
 their own syntax and semantics for representing aromaticity and metallic
 bonding?  Chaos.

  That's the advantage over all of the older table formats, which weren't
 built to be extensible. And you see the repercussions in scientific code
 all the time.

 The real problem had nothing to do with being built to be extensible,
 but rather that the table format definitions were controlled by commercial
 companies that had no interest in data exchange or in participation by the
 chemistry community.

 When I created the OpenSMILES.org web page, I more-or-less did it by
 stealing the leadership from Daylight, the company that invented SMILES.  I
 invited their participation but, while they didn't object to our project,
 they also elected to stay out of it.  SMILES now has a future that's in the
 hands of the community.  If the community decides to add features, we can
 ... and we'll all be able to agree on those features.

 It might seem as if I'm trying to discourage JSON, but nothing could be
 farther from the truth.  A modern, object-oriented, extensible and well
 documented format is long overdue.  The CML project is one such (you might
 want to look at it for ideas), but it never got traction.  Maybe JSON, with
 its widespread use and readily-available software, is just the thing.

 If you really want to make JSON a standard, the JSON syntax itself is a
 trivial part of the problem. The real problem is establishing standards for
 how each datatype is to be interpreted, followed by clear, published
 standards for each datatype.  If you let 

Re: [Open Babel] Open Babel in the browser

2013-06-07 Thread Dimitri Maziuk
On 06/07/2013 01:45 PM, Craig James wrote:

 ...  The CML project is one such (you might
 want to look at it for ideas), but it never got traction.

XML is bad at tabular data. A table of x, y, x coordinates in properly
formatted xml is at least twice as many bytes (x123.456/x uses as
many bytes for markup as for the value).

So projects like cml try to get around that by encoding values in
attributes -- about the #1 on how not to design your dtd list. The
problem is that only scales to a few dozen rows. Once you get to 10^6
molecules of 10^3 atoms, it doesn't scale either.

So it doesn't get widely adopted. Instead others do one worse and create
xml where tables are stuffed into #CDATA. Which means a bunch of bytes
with whose meaning and structure was known to the postdoc who went back
to China three years ago.

  Maybe JSON, with
 its widespread use and readily-available software, is just the thing.

JSON comes with less markup overhead, that's one of the reasons it's
seeing more use. The downside is exactly as you said -- too many (read
no) standards. The advantage of xml is the dtd: a valid xml document
tells you what the elements mean. All json tells you is array,
associative array, string, number, boolean.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j___
OpenBabel-discuss mailing list
OpenBabel-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-discuss


Re: [Open Babel] Open Babel in the browser

2013-06-07 Thread Matt Swain
Hi,

I think CML is definitely a useful starting point, however I think it would be 
a mistake to just translate everything across in a literal way. In particular, 
I think it's definitely worth thinking carefully about the different strengths 
of the XML and JSON formats (in both syntax and philosophy), and also about the 
reasons why CML has struggled a bit to gain traction.

The idea of keeping things as simple as possible is a major aspect of the JSON 
syntax and philosophy, so I think it's worth keeping that in mind as much as 
possible, especially considering the perceived complexity and verbosity of CML 
seems to put a lot of people off. In practice this might mean defining things 
implicitly where possible (like array indices as ids, and 2D/3D defined by the 
absence of any z coordinates, rather than having x2, y2, x3, y3, z3), and 
purposefully avoiding certain features in the core specification (multiple 
conformers, distributed bonds?).

Part of the reason behind the current increased interest in JSON is that it 
plays so nicely with many modern technologies that are becoming more and more 
widespread - i.e. web applications, REST APIs,  document-oriented NoSQL 
databases etc. I think these use-cases definitely need to taken into account 
when designing the format. For example, shorter key names are helpful in NoSQL 
databases and when embedding data in web pages, but there is a tradeoff there 
with readability. It's also worth thinking about how people might want to query 
and index these documents in NoSQL databases - for example having atoms as an 
array of objects allows elemMatch style queries in MongoDB, which could be 
useful.

As others have said, the downside to JSON's simplicity is that extensibility is 
not standardised like it is in CML - it is essentially a free-for-all, which as 
Craig points out will likely lead to chaos in the long run. There are projects 
like JSON-LD (http://json-ld.org) which would allow proper decentralised 
extensibility, but it's not widely supported and sacrifices simplicity, meaning 
you lose some of JSON's biggest strengths over XML anyway. Maybe something like 
namespaced keys (org.openbabel.fp2: ...) would be simpler, along with some 
kind of ongoing community project to define equivalent keys in a 
machine-readable dictionary. Or maybe we just accept an inevitable free-for-all 
and just aim to define a sensible common core.

Basically, I just think it's worth being cautious not to just repeat the work 
people have done with CML, and It would be great if we could create something 
that really plays to JSON's strengths.

Matt

On 7 Jun 2013, at 20:21, Patrick Fuller patrickful...@gmail.com wrote:

 Wow, that was a very insightful email. Thank you for writing it.
 
 Getting back to something actionable, what do you think about the idea of 
 just translating the CML standard to json? Outside of some nuances, XML and 
 JSON generally accomplish the same thing, so I would think that the chemical 
 XML standard would be easily translatable to chemical JSON.
 
 
 On Fri, Jun 7, 2013 at 1:45 PM, Craig James cja...@emolecules.com wrote:
 On Fri, Jun 7, 2013 at 10:25 AM, Patrick Fuller patrickful...@gmail.com 
 wrote:
 A SMILES contains exactly the same information as the atom/bond lists in a 
 much more compact form. If you want to avoid the aromaticity problem, just 
 use Kekule form, which makes it virtually identical to any other connection 
 table format, but in about 10x to 20x fewer bytes. SMILES are very easy to 
 parse, and there are dozens of parsers around.
 
 What I truly like about smiles is that it's human readable + hashable, which 
 I see as the real goal. The shorter length is just a corollary of that. Prove 
 me wrong, but I think people make too big a deal about size of molecule 
 formats. I just bought a 2 TB hard disk drive for $70. WIth mongo db + their 
 json serialization, I estimated that I can put 200 million verbose json mof 
 structures on that drive. I only have a few thousand, so I some room to spare.
 
 I have a database of 10 million compounds. The SDF version, even compressed, 
 is difficult over the internet.  It's not about disks, it's about file 
 transfers and database performance.  It's not a matter of a few bytes here or 
 there (I agree that people worry about file size too much).  It's about a 
 factor of ten or twenty.  Connection-table lists of atoms and bonds are just 
 a dumb way to represent atoms and bonds.
 This discussion has focussed on the syntax of JSON, but completely overlooks 
 the real problem with ALL chemical file formats: how do you handle all of the 
 cases where a simple connection-table (ball and stick) doesn't capture 
 reality? Things like aromaticity, tautomers, organo-metallic bonds, 
 boron-hydrogen cages, distributed bonds (ferrocenes and the like) ... these 
 are the problems.
 
 The point of json (and xml) is that they are extensible- that's why json has 
 exploded in the developer community.