Re: [Rdkit-discuss] reading tag data from string, not file
Hi TJ, On Thu, Dec 30, 2010 at 2:27 AM, TJ O'Donnell wrote: > I can see how to read an sd file using SDMolSupplier and using mol.GetProp() > to get the tag data from the file. > But, I have each molblock (chunk of lines between in an sdf file) > in a separate string. I don't see a way to get properties from that > molblock string or > even better from the mol=Chem.MolFromMolBlock(molblock) > E.g. mol.GetPropNames() returns a null array (or just the private and > computed props if mol.GetPropNames(True,True) > Can you give me some hints on how I might get the property tag data > from a string molblock? The easiest way I can think to do it is by constructing an SDMolSupplier and using its SetData method: To have some sample data I start by getting a block of data from an SDF: In [2]: suppl = Chem.SDMolSupplier('activity_classes.sdf') In [3]: mb = suppl.GetItemText(12) Set up a new SDMolSupplier using that: In [5]: nsuppl = Chem.SDMolSupplier() In [6]: nsuppl.SetData(mb) And then grab the molecule: In [7]: mol = nsuppl.next() [05:29:05] deprecated group abbreviation ignored [05:29:05] deprecated group abbreviation ignored In [8]: list(mol.GetPropNames()) Out[8]: ['ACTIV_CLASS', 'ACTIV_INDEX', 'EXTREG', 'MOLREGNO'] You can no doubt do the same thing with Chem.MolFromMolBlock to build the molecule, a regular expression to extract the data from the SD block, and a series of SetProp() calls, but the above somehow seems easier. :-) Best Regards, -greg -- Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] reading tag data from string, not file
I can see how to read an sd file using SDMolSupplier and using mol.GetProp() to get the tag data from the file. But, I have each molblock (chunk of lines between in an sdf file) in a separate string. I don't see a way to get properties from that molblock string or even better from the mol=Chem.MolFromMolBlock(molblock) E.g. mol.GetPropNames() returns a null array (or just the private and computed props if mol.GetPropNames(True,True) Can you give me some hints on how I might get the property tag data from a string molblock? TJ O'Donnell -- Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Question: modifying default parameters for the RDKit fingerprint?
Hi Greg- No objection here. I've been using 1024 with 2 bits here. Are you still using 2048 for the default size? TJ O'Donnell On Tue, Dec 28, 2010 at 11:33 PM, Greg Landrum wrote: > Dear all, > > As I mentioned in an earlier message > (http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg01430.html), > the default parameters for the RDKit fingerprint end up setting far > too many bits for drug-like molecules. The result of this is > similarity values that are in general too high and more frequent > occurrences of molecules that are similar to each other only due to > bit collisions. > > The easy solution to this problem is to decrease the number of bits > set per path found (the nBitsPerHash parameter) from 4 to 2. I propose > doing this for the Q4 2010 release of the RDKit. The downside is that > the fingerprints generated with that release will not be compatible > with fingerprints from earlier releases unless you specify > nBitsPerHash=4 on your own. The upside is a much more useful > similarity fingerprint. > > Any objections to me making this change? > > -greg > > -- > Learn how Oracle Real Application Clusters (RAC) One Node allows customers > to consolidate database storage, standardize their database environment, and, > should the need arise, upgrade to a full multi-node Oracle RAC database > without downtime or disruption > http://p.sf.net/sfu/oracle-sfdevnl > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- Learn how Oracle Real Application Clusters (RAC) One Node allows customers to consolidate database storage, standardize their database environment, and, should the need arise, upgrade to a full multi-node Oracle RAC database without downtime or disruption http://p.sf.net/sfu/oracle-sfdevnl ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss