Hi all,

I am currently working on a program which needs to process libraries of
large SDF files. One requirement is to always produce a valid output
including the molecule title/name or a specified property for referencing.

With specifying sanitize=False with ForwardSDMolSupplier and using
Chem.Sanitize afterwards with an appropriate Exception handling helps in
most cases to get the SD file properties and still detect errors in the
molecules to avoid importing rubbish.

However, in some cases this does not help. E.g. when an unknown atom (most
of the time this is X) is found in the MolBlock the import fails with an
Post-condition Violation and None is yielded. This is fine to detect the
problem BUT it is impossible to get any information about the molecule
which failed.

My question is if there is a way to get to the data even for those cases?
The files tend to be very big so accessing the molecule re-parsing it
line-by-line in python to get the name for a specific molecule number
(found by enumerating the supplier) is not really an option.

What would be a good solution in my opinion is to create an empty molecule
with all sd properties, including _Name, in case of an error instead of
None. The actual error could then also be communicated into python via an
'_Error' property. With this it would still be possible to continue
processing of the file in a for loop, in contrast to raising an Exception,
and it is easy to check if the molecule is empty.
Maybe this behaviour could be activated via an option and the default would
be to return None, to not break any existing code.

I am very keen on getting your view on this issue.

Best regards,
Michael
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to