On Dec 5, 2016, at 11:35 AM, Alexis Parenty wrote:
> I have tested my script on:
> •     7900 unique SMILES for “drug-like molecules”
> •     Alice’s adventure in wonderland (I never read the book but I assumed 
> there is no SMILES!)
> •     A shuffled mixture of Alice’s in wonderland and 7900 unique SMILES

Is that representative of your expected corpus?

I would have though that something like the Wikipedia page for SMILES or 
Daylight's theory.smiles page, after using an HTML -> text converter to strip 
out some of the markup and any possible escaped characters, would be a more 
realistic text case. That's also small enough to identify the actual SMILES 
manually.

                                Andrew
                                [email protected]



------------------------------------------------------------------------------
_______________________________________________
Rdkit-discuss mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to