Hi Michael,

What you request is certainly possible, but it is a pretty fundamental
change in the way the supplier (and mol file parser) works, so it would
need some thought.

Once concern that immediately occurs to me is that you will not be able to
tell which molecules from the input file were actually empty in the input
and which were just empty because there was a problem parsing an input
molecule.

A possible alternative, more general and somewhat lighter weight, would be
to ensure that you can always get the text of the last item parsed from a
ForwardSDMolSupplier (a method like: suppl.GetLastItemText()); this would
allow you to do whatever special error handling you are interested in doing

-greg


On Fri, May 1, 2015 at 12:01 AM, Michael Reutlinger <rd...@mulchi.de> wrote:

> Hi all,
>
> I am currently working on a program which needs to process libraries of
> large SDF files. One requirement is to always produce a valid output
> including the molecule title/name or a specified property for referencing.
>
> With specifying sanitize=False with ForwardSDMolSupplier and using
> Chem.Sanitize afterwards with an appropriate Exception handling helps in
> most cases to get the SD file properties and still detect errors in the
> molecules to avoid importing rubbish.
>
> However, in some cases this does not help. E.g. when an unknown atom (most
> of the time this is X) is found in the MolBlock the import fails with an
> Post-condition Violation and None is yielded. This is fine to detect the
> problem BUT it is impossible to get any information about the molecule
> which failed.
>
> My question is if there is a way to get to the data even for those cases?
> The files tend to be very big so accessing the molecule re-parsing it
> line-by-line in python to get the name for a specific molecule number
> (found by enumerating the supplier) is not really an option.
>
> What would be a good solution in my opinion is to create an empty molecule
> with all sd properties, including _Name, in case of an error instead of
> None. The actual error could then also be communicated into python via an
> '_Error' property. With this it would still be possible to continue
> processing of the file in a for loop, in contrast to raising an Exception,
> and it is easy to check if the molecule is empty.
> Maybe this behaviour could be activated via an option and the default
> would be to return None, to not break any existing code.
>
> I am very keen on getting your view on this issue.
>
> Best regards,
> Michael
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to