Re: [Rdkit-discuss] Warning as error

2019-01-21 Thread Stephen O'hagan
I've had similar problems; none of the claimed methods to switch off RDKit 
logging of warnings has worked for me.

I ended up just re-directing stderr when running the script like this:

python myfile.py  2> myErrorLog.txt


Dr. Steve O'Hagan,
 

-Original Message-
From: Jean-Marc Nuzillard [mailto:jm.nuzill...@univ-reims.fr] 
Sent: 21 January 2019 12:33
To: RDKit Discuss 
Subject: [Rdkit-discuss] Warning as error

Dear all,

The minimalist python code:
     reader = Chem.SDMolSupplier('my_file.sdf')
     for mol in reader:
         pass

gives me warning messages when run on a particular SD file.
How can I simply run a specific action for the molecules that cause problem, 
possibly using  try/catch statements?
Best,

Jean-Marc


--
Jean-Marc Nuzillard
Directeur de Recherches au CNRS

Institut de Chimie Moléculaire de Reims
CNRS UMR 7312
Moulin de la Housse
CPCBAI, Bâtiment 18
BP 1039
51687 REIMS Cedex 2
France

Tel : 03 26 91 82 10
Fax : 03 26 91 31 66
http://www.univ-reims.fr/ICMR
http://eos.univ-reims.fr/LSD/CSNteam.html

http://www.univ-reims.fr/LSD/
http://www.univ-reims.fr/LSD/JmnSoft/


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] smarts substructure query match = FALSE?

2018-09-18 Thread Stephen O'hagan
Hi folks,

This looks as if HasSubstructMatch should return TRUE, so why is it FALSE? 
[Python 3.6, RDKit 2017.09.3]

from rdkit import Chem
from rdkit.Chem import Draw

patt = 
Chem.MolFromSmarts("[*,#1]-[#7]-1-[#6]-[#6]-[#7](-[#6]-[#6]-1)-[#6](\[*,#1])=[#7]\[#6]-1=[#6]-[#6](-[*,#1])=[#6](-[*,#1])-[#6]=[#6]1-[*,#1]")

mol = Chem.MolFromSmiles("O=C(N3CCN(c2nc1cc(OC)c(OC)cc1c(n2)N)CC3)c4occc4")

fig = Draw.MolToMPL(patt)

fig2 = Draw.MolToMPL(mol)

mol.HasSubstructMatch(patt)
#why is this FALSE ?

Cheers,
Steve

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] comparing two or more tables of molecules

2016-12-01 Thread Stephen O'hagan
Thanks for the interesting links.

MolVS looks good, but failed on ‘NC(CC(=O)O)C(=O)[O-].O.O.[Na+]’ which isn’t 
that extraordinary…

Couldn’t get Standardise to work at all, even on the example given; API not 
intuitive or docs wrong or out of date.

I will have a look at the info in the UniChem paper, though not inclined to use 
a web service for what I want to do.

Cheers,
Steve.

From: George Papadatos [mailto:gpapada...@gmail.com]
Sent: 01 December 2016 14:26
To: Greg Landrum <greg.land...@gmail.com>
Cc: Stephen O'hagan <soha...@manchester.ac.uk>; 
rdkit-discuss@lists.sourceforge.net; Francis Atkinson <fran...@ebi.ac.uk>
Subject: Re: [Rdkit-discuss] comparing two or more tables of molecules

HI Stephen,

Further to Greg's excellent reply, see this paper on how InChI strings and keys 
can be used in practice to map together tautomer (ones covered by InChI at 
least), isotope, stereo and parent-salt variants.
http://rd.springer.com/article/10.1186/s13321-014-0043-5

Francis (cc'ed) has a nice notebook somewhere illustrating these nice InChI 
splits to find these variants.

For educational purposes, there have been other approaches like the NCI's 
identifiers - discussion here:
http://acscinf.org/docs/meetings/237nm/presentations/237nm17.pdf

For pure structure standardization using RDKit see here:
https://github.com/flatkinson/standardiser
and
https://github.com/mcs07/MolVS


Cheers,

George




On 29 November 2016 at 17:02, Greg Landrum 
<greg.land...@gmail.com<mailto:greg.land...@gmail.com>> wrote:
Wow, this is a great question and quite a fun thread.

It's hard to really make much of a contribution here without writing a 
book/review article (something that I'm really not willing to do!), but I have 
a few thoughts. Most of this is repeating/rephrasing things others have already 
said.

I'm going to propose some things as facts. I think that these won't be 
controversial:
fact 1: if the structures are coming from different sources, they need to be 
standardized/normalized before you compare them. This is true regardless of how 
you want to compare them. The details of the standardization process are not 
incredibly important, but it does need to take care of the things you care 
about when comparing molecules. For example, if you don't care about 
differences between salts, it should strip salts. If you don't care about 
differences between tautomers, it should normalize tautomers.
fact 2: The InChI algorithm includes a standardization step that normalizes 
some tautomers, but does not remove salts.
fact 3: The InChI representation contain a number of layers defining the 
structure in increasing detail (this isn't strictly true, because some of the 
choices about how layers are ordered are arbitrary, but it's close).
fact 4: canonicalization, the way I define it, produces a canonical atom 
numbering for a given structure, but it does *not* standardize
fact 5: the RDKit has essentially no well-documented standardization code

fact X: we don't have any standard, broadly accepted approach for 
standardization, canonicalization or representation that is fool-proof or that 
works for even all of organic chemistry, never mind organometallics. InChI, 
useful as it is for some things, completely fails to handle things like 
atropisomers (they are working on this kind of thing, but it's not out yet).

Given all of this, if I wanted to have flexible duplicate checking *right* now, 
I think I would use the AvalonTools struchk functionality that the RDKit 
provides (the new pure-RDKit version still needs a bit more testing) to handle 
basic standardization and salt stripping and then produce a table that includes 
the InChI in a couple of different forms. I'd want to be able to recognize 
molecules that differ only by stereochemistry, molecules that differ only by 
location of tautomeric Hs, and molecules that differ only by the location of 
isotopic labels. You can do this with various clever splits of the InChI (how 
to do it is left as an exercise for the reader and/or a future RDKit blog post).

I think there's something fun to be done here with SMILES variants, borrowing 
heavily from some of the things that Roger has written about:
https://nextmovesoftware.com/blog/2013/04/25/finding-all-types-of-every-mer/
here's a more recent application of that from Noel: 
https://nextmovesoftware.com/blog/2016/06/22/fishing-for-matched-series-in-a-sea-of-structure-representations/

If I didn't really care about details and just wanted something that I could 
explain easily to others, I'd skip all the complication and just use InChIs (or 
InChI keys) to recognize duplicates. There would be times when that would be 
the wrong answer, but it would be a broadly accepted kind of wrong.[1]

Regardless of the approach, I would not, under most any circumstances, discard 
the original input structures that I had. It's really good to be able to figure 
out what the original data looked like lat

[Rdkit-discuss] comparing two or more tables of molecules

2016-11-28 Thread Stephen O'hagan
Has anyone come up with fool-proof way of matching structurally equivalent 
molecules?

Unique Smiles or InChI String comparisons don't appear to work presumable 
because there are different but equivalent structures, e.g. explicit vs 
non-explicit H's, Kekule vs Aromatic, isomeric forms vs non-isomeric form, 
tautomers etc.

I also expect that comparing InChI strings might need something more than just 
a simple string comparison, such as masking off stereo information when you 
don't care about stereo isomers.

I assume there are suitable tools within RDKit that can do this?

N.B. I need to collate tables from several sources that have a mix of smiles / 
InChI / sdf molecular representations.

I usually use RDKit via Python and/or Knime.

Cheers,
Steve.

--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] MolWt of substructure hit?

2016-09-07 Thread Stephen O'hagan
Hi, 

Thanks for this, the clue that I needed was that there's a method:

" matches = mol.GetSubstructMatches(pat) "

This should work fine for what I need.

Cheers,
Steve.

-Original Message-
From: Andrew Dalke [mailto:da...@dalkescientific.com] 
Sent: 07 September 2016 12:10
To: Stephen O'hagan <soha...@manchester.ac.uk>
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] MolWt of substructure hit?

On Sep 7, 2016, at 11:53 AM, Stephen O'hagan wrote:
> How would I find the molecular weight (fraction) of that substructure within 
> a compounds expressed as a SMILES string, e.g.:

I don't know if a built-in function which does this. It's possible to write 
one. Here's a function which will compute the molecular weight given the 
molecule and the atom indices for the fragment.

def get_fragment_molwt(mol, atom_indices):
assert len(atom_indices) == len(set(atom_indices)) # quick duplicate check
molwt = 0.0
for atom_index in atom_indices:
atom = mol.GetAtomWithIdx(atom_index)
molwt += atom.GetMass()
return molt

If you want to include the hydrogen mass, then use this variant:

from rdkit import Chem

_H_mass = Chem.Atom(1).GetMass()
def get_fragment_molwt(mol, atom_indices):
assert len(atom_indices) == len(set(atom_indices)) # quick duplicate check
molwt = 0.0
for atom_index in atom_indices:
atom = mol.GetAtomWithIdx(atom_index)
molwt += atom.GetMass() + atom.GetTotalNumHs() * _H_mass
return molt


Here's an example of how to use the function:

#==
from rdkit import Chem

def get_fragment_molwt():
  ... as above ...

smiles = "CC(=O)O[C@H]1CC[C@@]2(C)C(=CCC3C4CC=C(c5cccnc5)[C@@]4(C)CCC32)C1"
smarts = 
"[#6](:,-[#6]:,-[#6](-[#6]):,-[#6]-[#6](:[#6]:[#7]):[#6]:[#6]):,-[#6]:,-[#6]"

mol = Chem.MolFromSmiles(smiles)
assert mol is not None, smiles

pat = Chem.MolFromSmarts(smarts)
assert pat is not None, smarts

matches = mol.GetSubstructMatches(pat)

molwt = MolWt(mol)
for match_no, match in enumerate(matches, 1):
fragment_molwt = get_fragment_molwt(mol, match)
print("#{}: {:.2%}".format(match_no, fragment_molwt/molwt)) #==


If I don't include the hydrogens in the fragment weight calculation then I get:

#1: 37.32%
#2: 37.32%
#3: 37.32%
   ...

If I include the hydrogens, then I get:

#1: 40.15%
#2: 39.64%
#3: 40.15%
   ...

Cheers,

Andrew
da...@dalkescientific.com



--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] MolWt of substructure hit?

2016-09-07 Thread Stephen O'hagan
Hi,

Supposing I have identified a substructure as a SMARTS string, e.g.

[#6](:,-[#6]:,-[#6](-[#6]):,-[#6]-[#6](:[#6]:[#7]):[#6]:[#6]):,-[#6]:,-[#6]

- In general, this may have wild card atoms.

How would I find the molecular weight (fraction) of that substructure within a 
compounds expressed as a SMILES string, e.g.:

CC(=O)O[C@H]1CC[C@@]2(C)C(=CCC3C4CC=C(c5cccnc5)[C@@]4(C)CCC32)C1

I may or may not wish to count multiple hits in one target.

Cheers,
Steve.
--
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit compile is successful, but python does see RDKit?

2015-02-18 Thread Stephen O'hagan
Hi,

I have it working now. Two problems were causing the errors.


1)  I hadn’t fully purged the RDKit libraries from an earlier apt-get 
install.

2)  I had assumed that LD_LIBRARY_PATH being set in the usual places (such 
as ~/.bashrc) would work.

It seems Ubuntu has a “feature” whereby LD_LIBRARY_PATH is automatically reset. 
To get RDKit to work, one needs to add an entry to /etc/ld.so.conf.d/ and do 
‘sudo ldconfig’.

Cheers,
Steve.

From: JP [mailto:jeanpaul.ebe...@inhibox.com]
Sent: 17 February 2015 20:43
To: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] RDKit compile is successful, but python does see 
RDKit?

Hi Stephen,

As Christos pointed out, it is almost always the environment variables which 
get you.  What is the error message you are getting?

Some installation instructions specific for Ubuntu may be found at (work and 
tested till version 14.04):
http://www.blopig.com/blog/2013/02/how-to-install-rdkit-on-ubuntu-12-04/

Take Care,
JP


-
Jean-Paul Ebejer
Early Stage Researcher

On 17 February 2015 at 17:20, Stephen O'hagan 
soha...@manchester.ac.ukmailto:soha...@manchester.ac.uk wrote:
Hi,

On one our Ubuntu machines, I’ve installed RDKit (compiled from source to get 
the latest version);  ctest passed all tests.

Cmake seemed to detect the correct python version and boost libs.

However, python does not see the RDkit module(s).

Any ideas what might be going wrong?


Dr. Steve O'Hagan,
Computer Officer,
Bioanalytical Sciences Group,
School of Chemistry,
Manchester Institute of Biotechnology,
University of Manchester,
131, Princess St,
MANCHESTER M1 7DN.

Email: soha...@manchester.ac.ukmailto:soha...@manchester.ac.uk
Phone: 0161 306 4562


--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.netmailto:Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] conda-rdkit fails to install Win7.

2014-11-19 Thread Stephen O'hagan
Hi Riccardo,

Thanks for the quick reply.

Closer scrutiny revealed that CMake was not finding the correct version of MS 
compiler, so compilations were therefore failing.

Tried “conda build rdkit” again using the VS x64 command Prompt (2010), and it 
now appears to have worked [mostly].

Some tests failed, but info has since scrolled off into hyperspace… is there an 
easy method to re-run the RDKit test suite?

BTW, ‘LastTestsFailed.log’ contains:

7:pyDiscreteValueVect
8:pySparseIntVect
34:testMolSupplier
50:pyPartialCharges
71:pyGraphMolWrap
77:pyRanker
79:pyFeatures
80:pythonTestDbCLI
81:pythonTestDirML
86:pythonTestDirChem

Cheers,
Steve.

From: Riccardo Vianello [mailto:riccardo.viane...@gmail.com]
Sent: 18 November 2014 19:46
To: Stephen O'hagan
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] conda-rdkit fails to install Win7.

Hi Steve,

On Tue, Nov 18, 2014 at 2:33 PM, Stephen O'hagan 
soha...@manchester.ac.ukmailto:soha...@manchester.ac.uk wrote:
Trying to install conda-rdkit on win7 64-bit as per instructions 
https://github.com/rdkit/conda-rdkit

‘conda build boost’ appears to work.

Yes, the conda recipe for boost on windows currently simply performs a 
repackaging of the official boost binary distribution, so it should work in 
almost all cases.

‘conda build rdkit’ appears to download and re-install boost during 
installation.

This is probably expected, during the build process boost is installed together 
with the other build-time dependencies into a temporary environment which is 
automatically created by conda. A message saying that some packages are being 
downloaded is most likely to refer to packages that are not already available 
from the local conda cache. These packages are most usually downloaded from a 
remote distribution channel, but the list may also include packages that are 
copied from the local build directory (which I think was probably the case for 
boost).

It then fails with cmake unable to find boost, and subsequently ‘nmake error 
U1073’.

And this is quite unexpected, but also difficult to interpret with the provided 
amount of information.. Could you please send a copy of the actual cmake 
command line that was issued, and/or the CMakeCache.txt file that should have 
been created inside the top-level RDKit source distribution directory at path 
to your anaconda installation\conda-bld\work?

Finally, I don't know if it may be of help, but some windows 64-bit packages 
for the latest RDKit release should now be also available from the binstar 
rdkit channel (in order to fetch them, you would just need to add '-c rdkit' to 
the conda create/install/update command line. The build for these packages 
passed all tests but the two related to the avalon tools).

Best,
Riccardo

--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] conda-rdkit fails to install Win7.

2014-11-18 Thread Stephen O'hagan
Trying to install conda-rdkit on win7 64-bit as per instructions 
https://github.com/rdkit/conda-rdkit

'conda build boost' appears to work.

'conda build rdkit' appears to download and re-install boost during 
installation.

It then fails with cmake unable to find boost, and subsequently 'nmake error 
U1073'.

Any ideas what might be wrong?

Cheers,
Steve.

--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=157005751iu=/4140/ostg.clktrk___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] remove redundant bits from bitvector fingerprints

2014-06-13 Thread Stephen O'hagan
OK, thanks for this – I’ll have a go and see it works for me.

Cheers,
Steve.

From: Greg Landrum [mailto:greg.land...@gmail.com]
Sent: 13 June 2014 13:23
To: Stephen O'hagan
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] remove redundant bits from bitvector fingerprints

hmm, this got lost in my mailbox. Sorry.

You can do what I think you want to do using the information theory machinery 
that the rdkit has available. Here's a short snippet that finds the bits that 
are not redundant in a data set (redundancy here calculated using information 
entropy):

In [48]: ms = [Chem.MolFromSmiles(x.split()[1]) for x in 
file('./Target_no_107_58879.txt')]

In [49]: nbits = 2048

In [50]: fps = [rdMolDescriptors.GetHashedAtomPairFingerprintAsBitVect(x,nbits) 
for x in ms]

In [58]: entropies = []

In [59]: for i in range(nbits):
arr = numpy.array([x[i] for x in fps])
e = InfoTheory.InfoEntropy(arr)
entropies.append(e)
   :

In [60]: entropies = numpy.array(entropies)

In [61]: goodbits = numpy.array(range(nbits))[entropies0.0]

In [62]: len(goodbits)
Out[62]: 891

your case is pretty big, so this may take a bit, but it shouldn't be too slow.

-greg


On Wed, Jun 4, 2014 at 5:08 AM, Stephen O'hagan 
soha...@manchester.ac.ukmailto:soha...@manchester.ac.uk wrote:
Hi,

I have a set of say 1000 generated fingerprints each of length 39972; across 
all 1000 fingerprints many bits are the same – they contain no information 
about the differences between the 1000 molecules.

e.g. for list

01011
010110100
010101110
010100010

The first four bits are redundant, I could just record them as:

1
10100
01110
00010

In reality, the redundant bits are distributed through the bit string, so I 
need a method to determine which bits are redundant, and then remove them from 
each fingerprint.

Cheers,
Steve.



From: Greg Landrum 
[mailto:greg.land...@gmail.commailto:greg.land...@gmail.com]
Sent: 04 June 2014 04:40
To: Stephen O'hagan
Cc: 
rdkit-discuss@lists.sourceforge.netmailto:rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] remove redundant bits from bitvector fingerprints

Hi Steve,

On Tue, Jun 3, 2014 at 2:08 PM, Stephen O'hagan 
soha...@manchester.ac.ukmailto:soha...@manchester.ac.uk wrote:
I have a fragment of code generating fingerprints for a long  list of molecules 
(length ~ 1000)

for index in range(0,len(smi)):
smiles=smi[index]
mol=Chem.MolFromSmiles(smiles)
AllChem.EmbedMolecule(mol)
AllChem.UFFOptimizeMolecule(mol)
dm = Chem.Get3DDistanceMatrix(mol)
fp = Generate.Gen2DFingerprint(mol,factory, dMat=dm)
fp = fp.ToBitString()
bs[index]=fp

The length of  each bitvectors generated is 39972, and the list has a lot of 
redundant ‘1’s and ‘0’s.

Is there an easy method to filter out these redundant bits?

What do you mean by redundant bits?

The length of the bit vectors is determined by the parameters you provide for 
building the pharmacophore fingerprints (number of points, number of features, 
and number of distance bins). The length of the strings that you get from 
fp.ToBitString() should be equal to this number of bits.

-greg


--
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing  Easy Data Exploration
http://p.sf.net/sfu/hpccsystems___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] remove redundant bits from bitvector fingerprints

2014-06-04 Thread Stephen O'hagan
Hi,

I have a set of say 1000 generated fingerprints each of length 39972; across 
all 1000 fingerprints many bits are the same – they contain no information 
about the differences between the 1000 molecules.

e.g. for list

01011
010110100
010101110
010100010

The first four bits are redundant, I could just record them as:

1
10100
01110
00010

In reality, the redundant bits are distributed through the bit string, so I 
need a method to determine which bits are redundant, and then remove them from 
each fingerprint.

Cheers,
Steve.



From: Greg Landrum [mailto:greg.land...@gmail.com]
Sent: 04 June 2014 04:40
To: Stephen O'hagan
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] remove redundant bits from bitvector fingerprints

Hi Steve,

On Tue, Jun 3, 2014 at 2:08 PM, Stephen O'hagan 
soha...@manchester.ac.ukmailto:soha...@manchester.ac.uk wrote:
I have a fragment of code generating fingerprints for a long  list of molecules 
(length ~ 1000)

for index in range(0,len(smi)):
smiles=smi[index]
mol=Chem.MolFromSmiles(smiles)
AllChem.EmbedMolecule(mol)
AllChem.UFFOptimizeMolecule(mol)
dm = Chem.Get3DDistanceMatrix(mol)
fp = Generate.Gen2DFingerprint(mol,factory, dMat=dm)
fp = fp.ToBitString()
bs[index]=fp

The length of  each bitvectors generated is 39972, and the list has a lot of 
redundant ‘1’s and ‘0’s.

Is there an easy method to filter out these redundant bits?

What do you mean by redundant bits?

The length of the bit vectors is determined by the parameters you provide for 
building the pharmacophore fingerprints (number of points, number of features, 
and number of distance bins). The length of the strings that you get from 
fp.ToBitString() should be equal to this number of bits.

-greg

--
Learn Graph Databases - Download FREE O'Reilly Book
Graph Databases is the definitive new guide to graph databases and their 
applications. Written by three acclaimed leaders in the field, 
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] remove redundant bits from bitvector fingerprints

2014-06-03 Thread Stephen O'hagan
I have a fragment of code generating fingerprints for a long  list of molecules 
(length ~ 1000)

for index in range(0,len(smi)):
smiles=smi[index]
mol=Chem.MolFromSmiles(smiles)
AllChem.EmbedMolecule(mol)
AllChem.UFFOptimizeMolecule(mol)
dm = Chem.Get3DDistanceMatrix(mol)
fp = Generate.Gen2DFingerprint(mol,factory, dMat=dm)
fp = fp.ToBitString()
bs[index]=fp

The length of  each bitvectors generated is 39972, and the list has a lot of 
redundant '1's and '0's.

Is there an easy method to filter out these redundant bits?

Cheers,
Steve.

--
Learn Graph Databases - Download FREE O'Reilly Book
Graph Databases is the definitive new guide to graph databases and their 
applications. Written by three acclaimed leaders in the field, 
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] 3D-Pharmacophore fingerprints ?

2014-05-08 Thread Stephen O'hagan
It appears that Eclipse PyDev code completion and syntax colouring was fooling 
me!

Get3DDistanceMatrix is  flagged as “undefined”, but code runs just fine!?

Cheers,
Steve.

From: Greg Landrum [mailto:greg.land...@gmail.com]
Sent: 08 May 2014 02:52
To: Stephen O'hagan
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] 3D-Pharmacophore fingerprints ?

Hmm, it is definitely there.
If you built from source and are using the new build it should be available as: 
Chem.Get3DDistanceMatrix()

-greg


On Wed, May 7, 2014 at 3:48 PM, Stephen O'hagan 
soha...@manchester.ac.ukmailto:soha...@manchester.ac.uk wrote:
I still don’t see it in the beta of the Q1 2014 release?

From: Greg Landrum 
[mailto:greg.land...@gmail.commailto:greg.land...@gmail.com]
Sent: 02 May 2014 15:00
To: Stephen O'hagan
Cc: 
rdkit-discuss@lists.sourceforge.netmailto:rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] 3D-Pharmacophore fingerprints ?

I can find no Get3DDistanceMatrix defined?

It is, unfortunately, a new feature. It's in the github version of the rdkit 
and will be in the next release (available next week).



--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
#149; 3 signs your SCM is hindering your productivity
#149; Requirements for releasing software faster
#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] 3D-Pharmacophore fingerprints ?

2014-05-07 Thread Stephen O'hagan
I still don’t see it in the beta of the Q1 2014 release?

From: Greg Landrum [mailto:greg.land...@gmail.com]
Sent: 02 May 2014 15:00
To: Stephen O'hagan
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] 3D-Pharmacophore fingerprints ?

I can find no Get3DDistanceMatrix defined?

It is, unfortunately, a new feature. It's in the github version of the rdkit 
and will be in the next release (available next week).


--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
#149; 3 signs your SCM is hindering your productivity
#149; Requirements for releasing software faster
#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] 3D-Pharmacophore fingerprints ?

2014-05-02 Thread Stephen O'hagan
Hi Greg,

Should that be :

dm = Chem.GetDistanceMatrix(mol)

I can find no Get3DDistanceMatrix defined?

For a list of molecules, do we recalculate the dm for each one, or do we use 
one molecule’s dm as a ‘reference’?

Without trawling through the source code, I’m not clear what’s actually being 
done here as the documentation is a bit Spartan.

Is there any reference to a journal article?

Cheers,
Steve.

From: Greg Landrum [mailto:greg.land...@gmail.com]
Sent: 01 May 2014 14:57
To: Stephen O'hagan
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] 3D-Pharmacophore fingerprints ?

Steve,

On Thu, May 1, 2014 at 12:23 PM, Stephen O'hagan 
soha...@manchester.ac.ukmailto:soha...@manchester.ac.uk wrote:
Would it be possible to generate 3D-pharmacophore fingerprints similar to the 
existing 2D ones?


Yes. The function Generate.Gen2DFingerprint() takes an optional argument dMat 
which can be used to provide the distance matrix. If you pass this a 3D 
distance matrix, you get a 3D pharmacophore fingerprint.

Here's a crude example:

In [34]: m = Chem.MolFromSmiles('OCN')

In [35]: AllChem.EmbedMolecule(m)
Out[35]: 0

In [36]: dm = Chem.Get3DDistanceMatrix(m)

In [37]: from rdkit.Chem.Pharm2D import Gobbi_Pharm2D,Generate

In [38]: factory = Gobbi_Pharm2D.factory

In [39]: sig1 = Generate.Gen2DFingerprint(m,factory)

In [40]: sig2 = Generate.Gen2DFingerprint(m,factory,dMat=dm)

In [41]: sig1==sig2
Out[41]: False

In [42]: sig1.GetOnBits()[0]
Out[42]: 116

In [43]: sig2.GetOnBits()[0]
Out[43]: 115

In [44]: factory.GetBitDescription(115)
Out[44]: 'BG HA |0 3|3 0|'

In [45]: factory.GetBitDescription(116)
Out[45]: 'BG HA |0 4|4 0|'

I hope this helps,
-greg

--
Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.  Get 
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free.
http://p.sf.net/sfu/SauceLabs___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] 3D-Pharmacophore fingerprints ?

2014-05-01 Thread Stephen O'hagan
Would it be possible to generate 3D-pharmacophore fingerprints similar to the 
existing 2D ones?


Dr. Steve O'Hagan,
Computer Officer,
Bioanalytical Sciences Group,
School of Chemistry,
Manchester Institute of Biotechnology,
University of Manchester,
131, Princess St,
MANCHESTER M1 7DN.

Email: soha...@manchester.ac.uk
Phone: 0161 306 4562

--
Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.  Get 
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free.
http://p.sf.net/sfu/SauceLabs___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] RDKit pharmacophore features

2014-04-24 Thread Stephen O'hagan
OK,

Adding:

AllChem.EmbedMolecule(m1)
AllChem.UFFOptimizeMolecule(m1)

Fixed the problem. Now to work out what it all means!

From: Greg Landrum [mailto:greg.land...@gmail.com]
Sent: 24 April 2014 04:39
To: Stephen O'hagan
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] RDKit pharmacophore features

Hi Steve,

On Wed, Apr 23, 2014 at 5:41 PM, Stephen O'hagan 
soha...@manchester.ac.ukmailto:soha...@manchester.ac.uk wrote:
I’m trying to understand how the RDKit pharmacophore features work; tried this 
fragment from a previous post:
import os
from rdkit import Chem
from rdkit.Chem import ChemicalFeatures
from rdkit import Geometry
from rdkit import RDConfig
from rdkit.Chem import AllChem
from rdkit.Chem.Pharm3D import Pharmacophore, EmbedLib
m1 = Chem.MolFromSmiles('Cc1c1')
FEATURE_DEF_FILE = os.path.join(RDConfig.RDDataDir,'BaseFeatures.fdef')
feat_factory = ChemicalFeatures.BuildFeatureFactory(FEATURE_DEF_FILE)
feats = feat_factory.GetFeaturesForMol(m1)
pcophore = Pharmacophore.Pharmacophore(feats)
 I get an immediate CTD on the call to Pharmacophore.Pharmacophore(feats)

 There are two bugs here:
1) in your code the molecule has no conformations generated, so trying to 
create a pharmacophore from the features associated with that molecule should 
not work
2) instead of an error message or exception you get a crash (seg fault on linux 
or the mac). The RDKit should never do that...

If you add coordinates (2D or 3D) to your molecule before constructing the 
pharmacophore, your code should work.

Best,
-greg

--
Start Your Social Network Today - Download eXo Platform
Build your Enterprise Intranet with eXo Platform Software
Java Based Open Source Intranet - Social, Extensible, Cloud Ready
Get Started Now And Turn Your Intranet Into A Collaboration Platform
http://p.sf.net/sfu/ExoPlatform___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDKit pharmacophore features

2014-04-23 Thread Stephen O'hagan
I'm trying to understand how the RDKit pharmacophore features work; tried this 
fragment from a previous post:


import os
from rdkit import Chem
from rdkit.Chem import ChemicalFeatures
from rdkit import Geometry
from rdkit import RDConfig
from rdkit.Chem import AllChem
from rdkit.Chem.Pharm3D import Pharmacophore, EmbedLib

m1 = Chem.MolFromSmiles('Cc1c1')


FEATURE_DEF_FILE = os.path.join(RDConfig.RDDataDir,'BaseFeatures.fdef')
feat_factory = ChemicalFeatures.BuildFeatureFactory(FEATURE_DEF_FILE)
feats = feat_factory.GetFeaturesForMol(m1)
pcophore = Pharmacophore.Pharmacophore(feats)

I get an immediate CTD on the call to Pharmacophore.Pharmacophore(feats)

32-bit Python 2.7; windows 7; RDKit binaries from RDKit_2013_09_2.win32.py27

Any ideas?

Cheers,
Steve.

--
Start Your Social Network Today - Download eXo Platform
Build your Enterprise Intranet with eXo Platform Software
Java Based Open Source Intranet - Social, Extensible, Cloud Ready
Get Started Now And Turn Your Intranet Into A Collaboration Platform
http://p.sf.net/sfu/ExoPlatform___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss