Wow.  I love seqcis! Rule 3 was trivial

Robert M. Hanson
St. Olaf College Chemistry 
from my Windows phone 

-----Original Message-----
From: "Robert Hanson" <hans...@stolaf.edu>
Sent: ‎4/‎11/‎2017 7:30 AM
To: "BlueObelisk-Discuss" <blueobelisk-discuss@lists.sourceforge.net>
Subject: Fwd: [BlueObelisk-discuss] Fwd: Cahn-Ingold-Prelog rules into Jmol

[sorry - forgot that this list requires "reply-all"]

---------- Forwarded message ----------
From: Robert Hanson <hans...@stolaf.edu>
Date: Tue, Apr 11, 2017 at 7:29 AM
Subject: Re: [BlueObelisk-discuss] Fwd: Cahn-Ingold-Prelog rules into Jmol
To: John Mayfield <john.wilkinson...@gmail.com>







On Tue, Apr 11, 2017 at 2:37 AM, John Mayfield <john.wilkinson...@gmail.com> 
wrote:



On 11 April 2017 at 04:37, Robert Hanson <hans...@stolaf.edu> wrote:

2) What did you get for the other test case, that one checks you have the 
ordering ranking for atomic masses. 
CC[C@@](CO)([H])[14CH2]C


R. 
 
There you go, that should also be S, ordering is: *CO, *[14CH2]C, *CC, *[H]
https://nextmovesoftware.com/blog/2015/01/21/r-or-s-lets-vote/.




John, what basis in the IUPAC rules leads you to this reading? It suggests that 
atoms in the nth sphere cannot be ranked until atoms in the (n+1)th sphere are 
checked after application of Rule 1, even if they could be distinguished by 
Rule 2. Are you suggesting that after each rule is checked (Rule 1a, Rule 1b, 
Rule 2 -- or is it Rule 1(a and b), Rule 2,...?) one must expand to the next 
sphere before making a decision? That seems to me (a) unsupportable by the 
IUPAC rules and (b) just asking for extremely complex code and a whole lot of 
unnecessary checks.


My understanding is that exhaustive application of all rules are done within 
the sphere first, then the process is repeated at the next sphere.  What I read 
is this: 

The ranking of each atom in the nth sphere depends in the first place on the
ranking of atoms of the same branch in (n − 1)th sphere, and then by the
application of the Sequence Rules to it; the smaller the number, the higher the
relative ranking. (Ranking Rule 2).


This is certainly my understanding from all the reading I have done. You have 
three atoms connected to an atom. You rank those three atoms based on the 
rules.  Atoms that are tied are taken to  the next sphere, but not until that 
process is completed. 

 

To me that is pretty clear: We apply all rules to rank all atoms in a single 
sphere. Nothing here says, "Atoms in a sphere are compared pairwise, and if 
they are identical, then the comparison of this pair is continued to the next 
sphere. Once this depth-first relative ranking is determined, the procedure is 
repeated with all pairs of the sphere." I can certainly see where that reading 
could drive one mad.





 
Q: Is there software that does a nice job with producing digraphs from SMILES?


I think I added a utility in Centres, however I've barely looked at the code in 
5 years - but am planning to brush it off and clean up now though. BTW if you 
look closely, Centres is abstract and wraps around existing toolkits - I only 
wrapped it around CDK though in theory you could do the same with JMol.


Q: These all implement Rule 1b and the rest of the rules? Have they been 
validated in some systematic, common way, so we know they don't have any bugs?


I don't think so. IIRC 1b was introduced to fix this case: 
O[C@H](C(CCC1CC1)(CCC1CC1)CCC1CC1)C12CCC(CC1)CC2. If you use that molecule you 
can tell whether it does/doesn't implement that rule. Without rule 1b it should 
not be possible to label it. In centres you can change the rules of the 
ranking: CDKPerceptor.java.


Yes, that's one of the models Mikko sent me. I used it for checking Rule 1b.
 



Q: Doesn't this argue against the "Why bother doing this -- it's been done 
seven times already" argument? Which one is IUPAC-2013-standard?


It wasn't me who said that, I'd only say don't do it because the implementation 
will drive you mad :-). The "blessed" version would allow everyone to confirm 
against it, as your original question asks - you want to test yours it would be 
much simpler just to point to a complete one leave it there. However from my 
previous testing I don't know if a complete one exists anywhere (maybe the 
LHASA one: http://pubs.acs.org/doi/abs/10.1021/ci00019a004 but of course this 
maybe doesn't exist anymore, will ask them).


Supposedly ACD/Labs has a compliant CIP-determining algorithm. 
http://bulletin.acscinf.org/PDFs/247nm44.pdf

Is ACD/Labs represented on this list?




I guess this would matter if you had 1,000,000 compounds to check; the 100-line 
algorithm (Rules 1 and 2) I wrote seems quite straightforward and suitable for 
my purposes. Hard to believe any molecule of interest would push the limits for 
such. 


CHEBI:51439, whether that's of interest or not is of course subjective 

That's a nice test model.
 

Bob




-- 

Robert M. Hanson
Larson-Anderson Professor of Chemistry
St. Olaf College
Northfield, MN
http://www.stolaf.edu/people/hansonr


If nature does not answer first what we want,
it is better to take what answer we get. 

-- Josiah Willard Gibbs, Lecture XXX, Monday, February 5, 1900
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Blueobelisk-discuss mailing list
Blueobelisk-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/blueobelisk-discuss

Reply via email to