Hi Dimitri,

> *If you have the line numbers* it's something like "head | tail" or a
> 2-line for loop w/ line counter.


That's so inefficient for large files, especially if you have already
parsed them once.


> If it's not a one-off and your upstream keeps generating junk, the
> proper solution is to "have a talk" with them.

Sadly, if you are dealing with data outside of your own
department/group/company it is not that easy and you have to live with what
you get.


> The worst possible solution is to happily generate a garbage molecule
> that will blow up user's entire downstream pipeline. *If they're lucky*
> -- most likely it'll be garbage in - garbage out and crap happily flows
> on to the next stage. If ErrorMolecule "is a" Molecule that will happen.

It depends on the software engineer to implementing non-garbage code using
this error handler. I also doubt that returning an EMPTY molecule which is
clearly distinct from a normal Molecule could be considered garbage.
The current state is much worse because reporting errors only in the
logfile without a molecule ID etc. does not foster feedback to the original
molecule creator.
I'm also not a big fan of letting corrupt molecules pass, e.g. like
Pipeline Pilot, and it is a strong feature of the RDKIT to be strict, which
my proposal is not affecting at all.


> I most emphatically do not want to take any drug developed using that
> kind of software quality assurance and error control procedures. Or have
> any new material developed like that anywhere near my bike, car, or diving
> gear. And so on.

Well... I think my proposal should enable us to put more strict, robust QC
in place, but I guess you are missing this point.

Best,
Michael
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to