Hi Greg:

James wrote:
>> So I guess the simple question is - should polymers, etc be handled by the
>> parser (maybe if not fully, just partially - eg by deleting the * atoms if
>> the S-Group data are found)?

Greg wrote:
> I'm reluctant to do this since I don't understand the semantics of
> Sgroups well enough to be able to tell if this modification only makes
> sense in this one case or if it's general. In the cases of polymers I
> would tend to say that the correct thing to do is to reject the
> molecule completely since the RDKit is incapable of correctly storing
> what the user intended with the mol block.

> I will try to find the time to grok the CTFile documentation for
> Sgroups, but I would be happy to get input on this from others.


I found some time to have a look into this a bit more myself, and would be 
inclined to agree that the best thing to do would be to reject polymers.  From 
my reading of the CTFILE spec, and the (extremely useful) Gushurst paper (J. 
Chem. Inf. Comput. Sci. 1991, 31, 447-454) I would suggest rejecting any 
molecule with "polymer" or "components, mixtures, and formulations" Sgroup data 
in the molblock; and ignoring or handling the "drawing and displaying 
shortcuts" Sgroup data (if there are no Sgroup data in the previous two 
categories).

I have copied below my understanding of the categories:

Sgroup types for "polymers":

SRU - structural repeating unit (for structure-based representation)
MON - monomer type (for source-based representation)
COP - copolymer
CRO - cross-link across two polymers
GRA - graft (eg terminally-attached) polymer2 on repeat unit of polymer1
MOD - for representing incomplete(?) modifications
MER - used when monomer repeat is 1 - ie alternating copolymers
ANY - (query) for posing more general polymer search queries


Sgroup types for "components, mixtures, and formulations":

COM - components (members of mixtures/formulations)
MIX - mixtures (order is not important)
FOR - formulations (order is important)


Sgroup types for "drawing and display shortcuts"

SUP - superatoms (can be contracted/expanded for representation purposes)
MUL - multiple groups (like a repeating superatom, but can only have 0 or 2 
crossing bonds)
GEN - generic bracketing (does not affect structure)


Rejecting based on the first two categories should be straightforward(?), and 
equally applicable to V2000 and V3000.  Ignoring the SUP and MUL types will 
only (I think...) cause issues in 2D layout - so 'handling' could maybe be to 
force the expansion of these groups, then get rid of them and regenerate 
coordinates?

Kind regards

James


______________________________________________________________________
PLEASE READ: This email is confidential and may be privileged. It is intended 
for the named addressee(s) only and access to it by anyone else is 
unauthorised. If you are not an addressee, any disclosure or copying of the 
contents of this email or any action taken (or not taken) in reliance on it is 
unauthorised and may be unlawful. If you have received this email in error, 
please notify the sender or postmas...@vernalis.com. Email is not a secure 
method of communication and the Company cannot accept responsibility for the 
accuracy or completeness of this message or any attachment(s). Please check 
this email for virus infection for which the Company accepts no responsibility. 
If verification of this email is sought then please request a hard copy. Unless 
otherwise stated, any views or opinions presented are solely those of the 
author and do not represent those of the Company.

The Vernalis Group of Companies
Oakdene Court
613 Reading Road
Winnersh, Berkshire
RG41 5UA.
Tel: +44 118 977 3133

To access trading company registration and address details, please go to the 
Vernalis website at www.vernalis.com and click on the "Company address and 
registration details" link at the bottom of the page..
______________________________________________________________________
------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@Cisco Self-Assessment and learn 
about Cisco certifications, training, and career opportunities. 
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to