James,

On Wed, Oct 19, 2011 at 1:58 PM, James Davidson <j.david...@vernalis.com> wrote:
>
> I just wanted to raise an observation about the behaviour of the molblock
> parser.  I was running some SMARTS-based substructure queries in KNIME, and
> happened to be looking for aromatic N-oxides - the query was just "nO" -
> which should maybe be the answer as well!  : )

:-)

> Anyway, I was actually searching DrugBank (via the SDF -
> http://www.drugbank.ca/system/downloads/current/structures/small_molecule.sdf.zip)
> and found Heparin was a hit for my query - which I thought was a bit funny
> as there are no aromatic nitrogens.  It seems, however, that the match is
> due to the * atoms in the molblock (see below) that are representing the
> polymer repeat points (leading to *-O, which is matching n-O).

Yes, a * in a mol block, without additional information, is a query
atom that matches anything.

> As I
> understand it, the rest of the info about the polymer is stored as S-Group
> data - and I am presuming that RDKit is not currently interpreting this(?)

Correct. The RDKit does not do anything with Sgroup data from mol blocks.

> So I guess the simple question is - should polymers, etc be handled by the
> parser (maybe if not fully, just partially - eg by deleting the * atoms if
> the S-Group data are found)?

I'm reluctant to do this since I don't understand the semantics of
Sgroups well enough to be able to tell if this modification only makes
sense in this one case or if it's general. In the cases of polymers I
would tend to say that the correct thing to do is to reject the
molecule completely since the RDKit is incapable of correctly storing
what the user intended with the mol block.

I will try to find the time to grok the CTFile documentation for
Sgroups, but I would be happy to get input on this from others.

-greg

------------------------------------------------------------------------------
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@Cisco Self-Assessment and learn 
about Cisco certifications, training, and career opportunities. 
http://p.sf.net/sfu/cisco-dev2dev
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to