Re: [Rdkit-discuss] reading tag data from string, not file

2010-12-29 Thread Greg Landrum
Hi TJ,

On Thu, Dec 30, 2010 at 2:27 AM, TJ O'Donnell  wrote:
> I can see how to read an sd file using SDMolSupplier and using mol.GetProp()
> to get the tag data from the file.
> But, I have each molblock (chunk of lines between  in an sdf file)
> in a separate string.  I don't see a way to get properties from that
> molblock string or
> even better from the mol=Chem.MolFromMolBlock(molblock)
> E.g. mol.GetPropNames() returns a null array (or just the private and
> computed props if mol.GetPropNames(True,True)
> Can you give me some hints on how I might get the property tag data
> from a string molblock?

The easiest way I can think to do it is by constructing an
SDMolSupplier and using its SetData method:

To have some sample data I start by getting a block of data from an SDF:
In [2]: suppl = Chem.SDMolSupplier('activity_classes.sdf')
In [3]: mb = suppl.GetItemText(12)

Set up a new SDMolSupplier using that:
In [5]: nsuppl = Chem.SDMolSupplier()
In [6]: nsuppl.SetData(mb)

And then grab the molecule:
In [7]: mol = nsuppl.next()
[05:29:05]  deprecated group abbreviation ignored
[05:29:05]  deprecated group abbreviation ignored
In [8]: list(mol.GetPropNames())
Out[8]: ['ACTIV_CLASS', 'ACTIV_INDEX', 'EXTREG', 'MOLREGNO']

You can no doubt do the same thing with Chem.MolFromMolBlock to build
the molecule, a regular expression to extract the data from the SD
block, and a series of SetProp() calls, but the above somehow seems
easier. :-)

Best Regards,
-greg

--
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] reading tag data from string, not file

2010-12-29 Thread TJ O'Donnell
I can see how to read an sd file using SDMolSupplier and using mol.GetProp()
to get the tag data from the file.
But, I have each molblock (chunk of lines between  in an sdf file)
in a separate string.  I don't see a way to get properties from that
molblock string or
even better from the mol=Chem.MolFromMolBlock(molblock)
E.g. mol.GetPropNames() returns a null array (or just the private and
computed props if mol.GetPropNames(True,True)
Can you give me some hints on how I might get the property tag data
from a string molblock?

TJ O'Donnell

--
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Question: modifying default parameters for the RDKit fingerprint?

2010-12-29 Thread TJ O'Donnell
Hi Greg-

No objection here.  I've been using 1024 with 2 bits here.
Are you still using 2048 for the default size?

TJ O'Donnell


On Tue, Dec 28, 2010 at 11:33 PM, Greg Landrum  wrote:
> Dear all,
>
> As I mentioned in an earlier message
> (http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg01430.html),
> the default parameters for the RDKit fingerprint end up setting far
> too many bits for drug-like molecules. The result of this is
> similarity values that are in general too high and more frequent
> occurrences of molecules that are similar to each other only due to
> bit collisions.
>
> The easy solution to this problem is to decrease the number of bits
> set per path found (the nBitsPerHash parameter) from 4 to 2. I propose
> doing this for the Q4 2010 release of the RDKit. The downside is that
> the fingerprints generated with that release will not be compatible
> with fingerprints from earlier releases unless you specify
> nBitsPerHash=4 on your own. The upside is a much more useful
> similarity fingerprint.
>
> Any objections to me making this change?
>
> -greg
>
> --
> Learn how Oracle Real Application Clusters (RAC) One Node allows customers
> to consolidate database storage, standardize their database environment, and,
> should the need arise, upgrade to a full multi-node Oracle RAC database
> without downtime or disruption
> http://p.sf.net/sfu/oracle-sfdevnl
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>

--
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss