Hi Greg,

thanks for your answer, I agree that the lighter weighted solution is
certainly also a possibility and would clearly solve my (and possibly
others) problem. Maybe a suppl.GetLastItemError() would then also be handy
to get the error messages that usually are only visible in the log.

But maybe something like an ErrorMol (as described in more detail by Andrew
Dalke) could potentially be more versatile. If an ErrorMol class is
inherited from Mol it could be processed in a standard way but one could
clearly differentiate this "vehicle" from an empty molecule. By having
different handlers, it would also be possible to add Exceptions in the
future, if people prefer having this behaviour :-)

However, both implementations would be a big improvement and could help to
avoid dealing with special cases somewhere else in the workflow, leading to
more robust workflows and eventually less errors.

Have a nice weekend,
Michael




On Sat, May 2, 2015 at 2:25 PM, Greg Landrum <greg.land...@gmail.com> wrote:

> Hi Michael,
>
> What you request is certainly possible, but it is a pretty fundamental
> change in the way the supplier (and mol file parser) works, so it would
> need some thought.
>
> Once concern that immediately occurs to me is that you will not be able to
> tell which molecules from the input file were actually empty in the input
> and which were just empty because there was a problem parsing an input
> molecule.
>
> A possible alternative, more general and somewhat lighter weight, would be
> to ensure that you can always get the text of the last item parsed from a
> ForwardSDMolSupplier (a method like: suppl.GetLastItemText()); this would
> allow you to do whatever special error handling you are interested in doing
>
> -greg
>
>
> On Fri, May 1, 2015 at 12:01 AM, Michael Reutlinger <rd...@mulchi.de>
> wrote:
>
>> Hi all,
>>
>> I am currently working on a program which needs to process libraries of
>> large SDF files. One requirement is to always produce a valid output
>> including the molecule title/name or a specified property for referencing.
>>
>> With specifying sanitize=False with ForwardSDMolSupplier and using
>> Chem.Sanitize afterwards with an appropriate Exception handling helps in
>> most cases to get the SD file properties and still detect errors in the
>> molecules to avoid importing rubbish.
>>
>> However, in some cases this does not help. E.g. when an unknown atom
>> (most of the time this is X) is found in the MolBlock the import fails with
>> an Post-condition Violation and None is yielded. This is fine to detect the
>> problem BUT it is impossible to get any information about the molecule
>> which failed.
>>
>> My question is if there is a way to get to the data even for those cases?
>> The files tend to be very big so accessing the molecule re-parsing it
>> line-by-line in python to get the name for a specific molecule number
>> (found by enumerating the supplier) is not really an option.
>>
>> What would be a good solution in my opinion is to create an empty
>> molecule with all sd properties, including _Name, in case of an error
>> instead of None. The actual error could then also be communicated into
>> python via an '_Error' property. With this it would still be possible to
>> continue processing of the file in a for loop, in contrast to raising an
>> Exception, and it is easy to check if the molecule is empty.
>> Maybe this behaviour could be activated via an option and the default
>> would be to return None, to not break any existing code.
>>
>> I am very keen on getting your view on this issue.
>>
>> Best regards,
>> Michael
>>
>>
>> ------------------------------------------------------------------------------
>> One dashboard for servers and applications across Physical-Virtual-Cloud
>> Widest out-of-the-box monitoring support with 50+ applications
>> Performance metrics, stats and reports that give you Actionable Insights
>> Deep dive visibility with transaction tracing using APM Insight.
>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to