It’s been so long since I wrote this, I’ve just had a quick look to refresh
my memory.

It looks to me like my code always assumes the element to be provided as a
string containing the element symbol. But you are right, the PubChem REST
API is clearly now returning the element as an integer atomic number, so
the code actually currently fails completely. The fact that no one noticed
shows how widely used this format is :)

I checked some old files that I have and it definitely used to be provided
as a string (all lowercase element symbol, with some additional special
cases). I doubt anyone else will have old files like this, so it’s probably
safe to switch completely to integers and remove the string code? The
writer will need updating also, to write integers instead of strings.

By the way, I suspect the pubchem ASN spec is the closest thing to a spec
for the JSON format:
ftp://ftp.ncbi.nih.gov//pubchem/specifications/pubchem.asn
Here’s the element section:

PC-Element::= INTEGER {
    -- Illegal Atom Numbers that may be Interpreted to be something else
    a  (255),                                    -- Unspecified Atom
(Asterick)
    d  (254),                                    -- Dummy Atom
    r  (253),                                    -- Rgroup Label
    lp (252),                                    -- Lone Pair

    -- Elements
    h  (1), he (2), li (3), be (4), b  (5),
    c  (6), n  (7), o  (8), f  (9), ne(10),
    na(11), mg(12), al(13), si(14), p (15),
    s (16), cl(17), ar(18), k (19), ca(20),
    sc(21), ti(22), v (23), cr(24), mn(25),
    fe(26), co(27), ni(28), cu(29), zn(30),
    ga(31), ge(32), as(33), se(34), br(35),
    kr(36), rb(37), sr(38), y (39), zr(40),
    nb(41), mo(42), tc(43), ru(44), rh(45),
    pd(46), ag(47), cd(48), in(49), sn(50),
    sb(51), te(52), i (53), xe(54), cs(55),
    ba(56), la(57), ce(58), pr(59), nd(60),
    pm(61), sm(62), eu(63), gd(64), tb(65),
    dy(66), ho(67), er(68), tm(69), yb(70),
    lu(71), hf(72), ta(73), w (74), re(75),
    os(76), ir(77), pt(78), au(79), hg(80),
    tl(81), pb(82), bi(83), po(84), at(85),
    rn(86), fr(87), ra(88), ac(89), th(90),
    pa(91), u(92),  np(93), pu(94), am(95),
    cm(96), bk(97), cf(98), es(99), fm(100),
    md(101), no(102), lr(103), rf(104), db(105),
    sg(106), bh(107), hs(108), mt(109), ds(110),
    rg(111)
}


Matt


On 29 June 2017 at 08:41:31, Noel O'Boyle (baoille...@gmail.com) wrote:

Hi Matt,

I'm in the middle of
https://github.com/openbabel/enhancement-proposals/pull/4 and have
come to the JSON formats.

When parsing the PubChem JSON you try first whether it's an integer
and then later if it's a string. I think it's always an integer and
plan to remove the string code - is this okay? I assume that this is a
copy+paste of logic from the ChemDoodle JSON parsing where
(presumably) this can occur.

- Noel
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Reply via email to