On Dec 5, 2016, at 11:35 AM, Alexis Parenty wrote:
> I have tested my script on:
> • 7900 unique SMILES for “drug-like molecules”
> • Alice’s adventure in wonderland (I never read the book but I assumed
> there is no SMILES!)
> • A shuffled mixture of Alice’s in wonderland and 7900 unique SMILES
Is that representative of your expected corpus?
I would have though that something like the Wikipedia page for SMILES or
Daylight's theory.smiles page, after using an HTML -> text converter to strip
out some of the markup and any possible escaped characters, would be a more
realistic text case. That's also small enough to identify the actual SMILES
manually.
Andrew
[email protected]
------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss