Has anyone compiled a set of UTF-8 codepoints (characters) that are
essential for chemistry specifically in Anglophone countries and aimed at
machine processing?

[Note I am not including the glyphs in this mail in case of corruption]

For example chemistry uses over half of the printing characters in the
range 32-127, probably most of the Greek characters (they can be used for
locants), some of the ISO-latin,  plus-minus (for racemates) , middot (e.g.
Et2O.BF3) [http://en.wikipedia.org/wiki/Interpunct].

I would exclude personal names (e.g. Hueckel) and units (e.g. Angstrom) as
they are used elsewhere.

Where possible it would be valuable to have a normalized value. Thus IMO
machine-processable chemical names should use '-' (char #45, hyphen-minus
http://en.wikipedia.org/wiki/Hyphen ) rather than true minus, or dashes.
Similarly minus should also use this character.

-- 
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Blueobelisk-discuss mailing list
Blueobelisk-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss

Reply via email to