[Rdkit-discuss] calculation of spherical harmonics coefficients for 3D molecules?

2020-08-03 Thread James T. Metz via Rdkit-discuss
RDkit Discussion Group,
    Has anyone implemented code to calculate spherical harmonicscoefficients 
for 3D conformations of molecules?  The approach I havein mind is described in 
Morris, R.J. et al., Real Spherical Harmonic expansion coefficients for protein 
binding pocket and ligand comparisons,Bioinformatics, 21 (2005) 2347 - 2355.
    Regards,    Jim Metz

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] molecular dynamics using RDkit only

2019-04-13 Thread James T. Metz via Rdkit-discuss
RDkit Discussion Group,
    I am aware of RDkit scripts that use the MMFF force field to minimize 
smallmolecules.  Has anyone written RDkit code to perform molecular dynamics 
(MD) of small molecules or protein-ligand complexes using only RDkit and 
existingRDkit force fields.  I am aware of a number of other programs to 
perform MD, but I am specifically interested in RDkit/Python only codes at 
present i.e., no other dependencies.  If anyone has any code, even if 
preliminary, that calculates the potential energies, forces, velocities, 
accelerations, etc to propagate the motions of the atoms, and is willing to 
share that would be much appreciated.
    Regards,
    Jim Metz

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Structure-Based Drug Design

2019-03-16 Thread James T. Metz via Rdkit-discuss
RDkit Discussion Group,
    RDkit has quite a number of useful tools and algorithms for ligand-based
drug design (LBDD).  However, what about structure-based drug design (SBDD)?  
Perhaps a few questions to motivate the discussion.
    1) Since RDkit supposedly includes the force field MMFF, does this mean 
that one can read in reasonably prepared proteins (from a PDB file) and 
ligands(from a MOL file) and compute energies of the complex, proteins, and 
ligands and presumably interaction energies, etc?
    2) Visualization is clearly important in SBDD.  Has anyone developed a tool
that nicely integrates macromolecular editing and visualization with RDkit?
    3) Given the visualization capabilities of Jupyter, has anyone developed
Jupyter/RDkit scripts for #1 and #2?
    I welcome thoughts and comments especially from those who have been thinking
about or are wrestling with SBDD and RDkit integration.  Thank you.
    Regards,
    Jim Metz




___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] AM1-BCC charges for small molecules

2019-03-11 Thread James T. Metz via Rdkit-discuss
RDkit Discussion Group,
    I am interested in generating and assigning AM1-BCC charges to small 
molecules,
preferably in batch mode.  I understand this topic has been discussed 
previously, buthas there been RDkit code written to do this?  Since this relies 
on the results of AM1 calculations, has anyone perhaps written RDkit code to 
calculate and assign the charges if I have already generated a MOPAC output 
file by some other means? 
    I greatly appreciate all the capabilities of RDkit, and not to be 
off-topic, but if someone is aware of a non-RDkit way to generate AM1-BCC 
charges, that might workfor me.  Hence, please let me know.  Thank you. 
    Regards,
    Jim Metz


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] atomic contributions to molecular polarizability

2019-01-11 Thread James T. Metz via Rdkit-discuss
Greg,
    Thank you for your response and for your GitHub site where you are 
generouslyproviding snippets of code to help others.  I agree that the atomic 
number to polarizabilityapproach is crude.  However, experimental 
polarizabilities for small organics are available in the  CRC Handbook and 
other sources.  Hence, the error of this approach can be readily estimated.  
Thanks again for your help - greatly appreciated.  
    Regards,    Jim Metz



-Original Message-
From: Greg Landrum 
To: James T. Metz 
Cc: RDKit Discuss 
Sent: Fri, Jan 11, 2019 1:51 am
Subject: Re: [Rdkit-discuss] atomic contributions to molecular polarizability

Hi Jim,
On Thu, Jan 10, 2019 at 11:59 PM James T. Metz via Rdkit-discuss 
 wrote:


    I would like to calculate and be able to visualize the atomic 
contributionsto the total molecular polarizability of small organic molecules.  
Apparentlythere is a molecular descriptor, apol, that is the sum total from the 
atomiccontributions to polarizability available in some programs e.g., CDK.

The CDK documentation doesn't include a citation for this (aside from a link to 
a web page that no longer exists), but since the code is there and this 
descriptor is super simple, it's easy to 
re-implement:https://gist.github.com/greglandrum/7936fcf631bfdae0041e298421554bec


    Does anyone have code that will compute and write out the atomic 
contributions to molecular polarizability to either the b factor column 
orperhaps the charge column of PDB or MOL2 files, respectively?  I couldthen 
use other programs to visualize the structures with those numbers.Thank you.
 The gist linked above shows how to store values in the temperature factors 
that end up in PDB output.
It's worth pointing out that this descriptor is pretty crude: the atomic 
contribution is determined solely by the atomic number, not by atom 
environment.The RDKit includes a more complicated (presumably more accurate?) 
polarizability descriptor: the Molar Refractivity (MR) values. The gist also 
shows how to get the atomic contributions to this.
I hope this helps,-greg
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] atomic contributions to molecular polarizability

2019-01-10 Thread James T. Metz via Rdkit-discuss
Carlos,
    Thank you for your suggestion.  However, rather than wrestle with the 
technical
details of obtaining polarizabilities from new quantum mechanics schemes, I am 
simply interested in extracting the atomic contributions that are apparently 
already available for adescriptor called apol which is apparently in the CDK 
descriptor set.  I am not sure,but I think there are some overlaps between CDK 
and RDkit descriptors.  I am hoping, perhaps, that someone has coded the atomic 
contributions for polarizabilityand hence with minor modifications to the code, 
can expose the atomiccontributions.  At this point, I am willing to accept the 
approximate values from theapol descriptor.
    Regards,
    Jim Metz

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] atomic contributions to molecular polarizability

2019-01-10 Thread James T. Metz via Rdkit-discuss
RDkit Discussion Group,
    I would like to calculate and be able to visualize the atomic 
contributionsto the total molecular polarizability of small organic molecules.  
Apparentlythere is a molecular descriptor, apol, that is the sum total from the 
atomiccontributions to polarizability available in some programs e.g., CDK.
    Does anyone have code that will compute and write out the atomic 
contributions to molecular polarizability to either the b factor column 
orperhaps the charge column of PDB or MOL2 files, respectively?  I couldthen 
use other programs to visualize the structures with those numbers.Thank you.
    Regards,
    Jim Metz


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] How to encode atomic contributions to logP (hydrophobicity) in MOL2 formatted charge slot?

2018-11-13 Thread James T. Metz via Rdkit-discuss
RDkit Discussion Group,
    Given a set of small molecules as a SDF file, I would like to generate a 
MOL2
file where the atomic contributions to logP (hydrophobicity) from each atom 
including hydrogens have been calculated and are now encoded in the partial 
atomic charge"slot" in a MOL2 file.  Is this possible using RDkit?
    I have found code that calculates the atomic contributions for the Crippen
logP model and then generates a colorized 2D plot:
>>> from rdkit.Chem import rdMolDescriptors >>> contribs = 
>>> rdMolDescriptors._CalcCrippenContribs(mol) >>> fig = 
>>> SimilarityMaps.GetSimilarityMapFromWeights(mol,[x for x,y in 
>>> contribs],˓→colorMap='jet', contourLines=10) 

    However, I would like to encode the atomic contributions as partial atomic 
charges so
that this information can be written out in a MOL2 file for each atom.
    Does anyone have PYTHON/RDkit code to do this?  Thank you.

    Regards,
    Jim Metz



    


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Butina clustering with additional output

2018-09-20 Thread James T. Metz via Rdkit-discuss
RDkit Discussion Group,
    I note that RDkit can perform Butina clustering.  Given an SDF ofsmall 
molecules I would like to cluster the ligands, but obtain additionalinformation 
from the clustering algorithm.  In particular, I would like to obtainthe 
cluster number and Tanimoto distance from the centroid for every ligandin the 
SDF.  The centroid would obviously have a distance of 0.00.
    Has anyone written additional RDkit code to extract this additional 
information?
Thank you.
    Regards,
    Jim Metz

___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] descriptors beyond rotatable bond count and possible correlations with entropy

2018-08-27 Thread James T. Metz via Rdkit-discuss

RDkit users,


Is there a RDkit descriptor (or code) to determine the largest number of 
contiguous 
rotatable bonds in a small molecule?


Hmmm... it seems likely that ligand conformational flexibility might be
somehow related to the entropy component of ligand binding.  Has anyone made
a plot of the experimental TdS term from calorimetry vs. any number of 
computational measures of ligand flexibility?  Any correlation?


Regards,
Jim Metz






--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] How to get more CPU utilization for RDkit calculations

2018-07-06 Thread James T. Metz via Rdkit-discuss

RDkit Discussion Group,


I am running the RDkit diverse max min picker code on my WINDOWS 7
64 bit 4 dual core I7 computer.  I have tested the code on a small set of 
compounds
and the results are OK.


I am now running diverse subset picking on an SDF containing about 40k 
compounds.  The CPU usage across all 8 CPUs is about 14 - 15 %.  Are there
any flags or parameters that can be set in the PYTHON or RDkit code to get
more CPU utilization?  Thank you.


Jim Metz

Metz Research LLC




--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Interest in a RDkit UGM in the USA midwest?

2018-04-10 Thread James T. Metz via Rdkit-discuss
RDkit Discussion Group,


Is there interest in having a RDkit UGM in the USA midwest, perhaps 
somewhere

in the Chicago area?  I would recommend a similar format to the European UGM and
I would insist that presenters deposit their slides and code on the RDkit 
GitHub site.


Regards,

Jim Metz





--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] appropriate literature reference for RDkit 2017.03.1

2018-03-27 Thread James T. Metz via Rdkit-discuss
RDkit Discussion Group,


What is an appropriate literature reference for RDkit 2017.03.1?


Thank you.



Regards,

Jim Metz

Northwestern University

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] suggestions or ideas for SMILES for an electron?

2018-02-28 Thread James T. Metz via Rdkit-discuss
RDkit Discussion Group,


I am trying to get RDkit to recognize reactants and products in RedOx 
reactions including

reactions which include explicit electrons using SMILES as patterns.  Does 
anyone have any 
suggestions or ideas for how to deal with an explicit electron?


I have tried:



pattern = '[-1]'
match = mol_object.GetSubstructMatches(pattern)



but this does not work.


I am hesitant to use [H-1], which is read in and is processed, but it 
technically refers to

hydride which is a legitimate molecule species and is obviously not an electron.


I can think of (possibly) another hack involving a large atomic number 
(beyond typical organics),
but I am wondering if someone has a more intelligent solution, or perhaps there 
is some unusual 
or special SMILES notation that I am not aware of.


Ideas or suggestions are welcome.  Thank you.



Regards,
Jim Metz

Northwestern University





--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] RDkit ionization state prediction at a given pH

2018-02-02 Thread James T. Metz via Rdkit-discuss
RDkit Discussion Group,


Can RDkit predict the ionization state of a molecule at a given pH by
calculating the pKa, predominant species at that pH, etc?


I have already checked the RDkit discussion email archive, and per

my searching there was a response from Greg Landrum about 10-18-2017
stating that this was attempted, but was not successfully completed due
to technical difficulties (and was abandoned?).


Hence, I am inquiring if anyone has done further work in this area.



Thank you.



Regards,

Jim Metz





--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] automatic generation of reaction SMARTS in RDkit

2018-02-01 Thread James T. Metz via Rdkit-discuss
RDkit discussion group,


Is it possible to automatically generate reaction SMARTS in

RDkit from reactant(s) and product(s)?  If so, are there various
"trims" that can be performed to focus on the reaction center
going out certain number of atoms which could be used to specify
reaction specificity.


For example, suppose I have



reactant_smiles = 'CCO'
product_smiles = 'CC=O'


I am looking for PYTHON/RDkit code that would automatically

generate reaction SMARTS such as:


[*:1]-[C]-[O]>> [*:1]-[C]=[O]



I would be grateful for any suggestions, code, literature examples, etc.

Thank you.




Regards,

Jim Metz





--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] suggestions for comprehensive searchable database of natural products

2017-11-27 Thread James T. Metz via Rdkit-discuss
RDkit Discussion Group,


My apologies in advance if my request is not appropriate for this 
discussion group.



Given a small molecule that might have some resemblance to natural 
products, 

can someone suggest a free, comprehensive, PYTHON/RDkit searchable database
of natural products that might be suitable for similarity and substructure 
searching.


I am aware of a few websites that permit searching on the website. If 
possible,
I would like to programmatically search by running a PYTHON/RDkit script on my
local machine and then return the structures of related molecules to my local 
script.


I would prefer not having to download and store a huge database.



Also, if possible, it would be important to return the organism(s) that 
creates

the natural product.  Pathway information would be also very, very helpful.


I greatly welcome comments and suggestions.



Thank you.



Regards,

Jim Metz

Northwestern University









--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] transformation of an atom or group of atoms by atom indices

2017-11-09 Thread James T. Metz via Rdkit-discuss
Jason,


Exactly what I need to do.  Thank you for your example code.  Much 
appreciated.


Regards,
Jim Metz





-Original Message-
From: Jason Biggs <jasondbi...@gmail.com>
To: James T. Metz <jamestm...@aol.com>
Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
Sent: Thu, Nov 9, 2017 3:28 pm
Subject: Re: [Rdkit-discuss] transformation of an atom or group of atoms by 
atom indices



Something along these lines?





mol = Chem.MolFromSmiles('')
emol = Chem.EditableMol(mol)
emol.ReplaceAtom(3, Chem.Atom(1))
mol = emol.GetMol()





In [12]:


Chem.MolToSmiles(mol)





Out[12]:

'[H]CCC'




In [13]:


Chem.MolToSmiles(Chem.RemoveHs(mol))





Out[13]:

'CCC'










Jason Biggs




On Thu, Nov 9, 2017 at 3:12 PM, James T. Metz via Rdkit-discuss 
<rdkit-discuss@lists.sourceforge.net> wrote:

RDKit Discussion Group,


Suppose I have a molecule



smiles1 = ''


The carbon atoms will be assigned indices 0, 1, 2, and 3.



Suppose I want to specifically change carbon 3 to a hydrogen.
Is this possible using RDkit?


I am aware of using SMARTS to match a pattern and then change

that group of atoms.  In my example, I would like to be able to
change an atom or group of atoms based on the atom indices, not
a SMARTS pattern which might be problematic for molecules with
local symmetry.


Regards,

Jim Metz






--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss






--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] transformation of an atom or group of atoms by atom indices

2017-11-09 Thread James T. Metz via Rdkit-discuss
RDKit Discussion Group,


Suppose I have a molecule



smiles1 = ''


The carbon atoms will be assigned indices 0, 1, 2, and 3.



Suppose I want to specifically change carbon 3 to a hydrogen.
Is this possible using RDkit?


I am aware of using SMARTS to match a pattern and then change

that group of atoms.  In my example, I would like to be able to
change an atom or group of atoms based on the atom indices, not
a SMARTS pattern which might be problematic for molecules with
local symmetry.


Regards,

Jim Metz





--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Python code to merge tuples from a SMARTS match

2017-11-08 Thread James T. Metz via Rdkit-discuss


Brian, Greg, and David,


Thank you for your suggestions.  I will try to respond to your questions 
and comments:


I am trying to reproduce results from a literature paper that used 
non-PYTHON and non-RDkit
code to identify certain patterns in molecules as part of a group contribution 
scheme resulting in the
prediction of thermodynamic quantities.  I have a training set of molecules and 
the results of calculations
for that training set (individual counts of groups of atoms and resulting 
energies).  Hence, my first goal 
is to reproduce the results reported for that training set, but using PYTHON 
and RDkit.  Since my goal 
is to reproduce literature results as closely as possible, I am not in a 
position to debate the logic of the 
original authors in their assignments of SMARTS/SMILES matching and counts.


After this initial goal is met, I might consider alternative pattern 
matching and counting schemes and

compare those results to the literature results.  In fact, that would be good 
science.


As I mentioned in my first email on this topic, I do think I have come up 
with a "rule" that will give me

the correct answer (I have tried it for 8 cases using pencil and paper), my 
challenge is to code up the
"rule" in PYTHON.  I am a beginner at PYTHON, so I am struggling to get this 
idea into functional, bug-free
code.  Peter Shenkin's idea/code is getting close to what needs to be done, but 
doesn't handle all the cases.


Regards,

Jim Metz




-Original Message-
From: Brian Cole <col...@gmail.com>
To: James T. Metz <jamestm...@aol.com>
Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
Sent: Tue, Nov 7, 2017 7:23 pm
Subject: Re: [Rdkit-discuss] Python code to merge tuples from a SMARTS match



You can use Chem.CanonicalRankAtoms to de-duplicate the SMARTS matches based 
upon the atom symmetry like this: 



def count_unique_substructures(smiles, smarts):
mol = Chem.MolFromSmiles(smiles)
ranks = list(Chem.CanonicalRankAtoms(mol, breakTies=False))
pattern = Chem.MolFromSmarts(smarts)

unique_sets_of_atoms = set()
for match in mol.GetSubstructMatches(pattern):
match_ranks = frozenset([ranks[idx] for idx in match])
unique_sets_of_atoms.add(match_ranks)

return len(unique_sets_of_atoms)



However, this returns 1 for each of your cases. It's not clear to me why you 
would want your 2nd case to return 2 as all paths from a chlorine to a chlorine 
through 2 carbons are symmetric. 



>>> SMARTS = '[Cl]-[C,c]-,=,:[C,c]-[Cl]'
>>> smiles1 = 'ClC(Cl)CCl'
>>> smiles2 = 'ClC(Cl)C(Cl)(Cl)(Cl)'

>>> count_unique_substructures(smiles1, SMARTS)

1
>>> count_unique_substructures(smiles2, SMARTS)
1


-Brian







On Tue, Nov 7, 2017 at 7:38 PM, James T. Metz via Rdkit-discuss 
<rdkit-discuss@lists.sourceforge.net> wrote:

RDkit Discussion Group,




I have written a SMARTS to detect vicinal chlorine groups

using RDkit.  There are 4 atoms involved in a vicinal chlorine group.


SMARTS = '[Cl]-[C,c]-,=,:[C,c]-[Cl]'


I am trying to count the number of ("unique") occurrences of this

pattern.


For some molecules with symmetry, this results in

over-counting.
   

For the molecule, smiles1 below, I want to obtain

a count of 1 i.e., 1 tuple of 4 atoms.


smiles1 = 'ClC(Cl)CCl'



However, using the SMARTS above, I obtain 2 tuples of 4 atoms.  
Beginning with a MOL file representation of smiles1, I get


((1,2,4,3), (0,2,4,3))



One possible solution is to somehow merge the two tuples according 

to a "rule."  One rule that works is "if 3 of the atom indices are the same, 
then combine into one tuple."


However, the rule needs a bit of modification for more complicated
cases (higher symmetry).


Consider



smiles2 = 'ClC(Cl)CCl(Cl)(Cl)



My goal is to get 2 tuples of 4 atoms for smiles2



smiles2 is somewhat tricky because there are either

2 groups of 3 (4 atom) tuples, or 3 groups of 2 (4 atom)
tuples depending on how you choose your 3 atom indices.


Again, if my goal is to get 2 tuples, then I need to somehow

pick the largest group, i.e., 2 groups of 3 tuples to do the merge 
operation which will give me 2 remaining groups (desired).


I have already checked stackoverflow and a few other places

for PYTHON code to do the necessary merging, but I could not
find anything specific and appropriate.


I would be most grateful if anyone has ideas how to do this.  I

suspect the answer is a few lines of well-written PYTHON code, 
and not modifying the SMARTS (I could be mistaken!).


Thank you.



Regards,

Jim Metz





--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashd

Re: [Rdkit-discuss] Python code to merge tuples from a SMARTS match

2017-11-08 Thread James T. Metz via Rdkit-discuss
Peter,


Thank you for your suggestions and accompanying code.


I have modified your code slightly and have created 3 tuples
for testing.  Your code works for tuples, match1 and match2, but
does not work for match3.  The code should return a 2 for match3,
because there are 2 sets of 3 tuples each containing 4 atom indices.
Using my "rule" that, "if 3 indices are the same, they are in one group 
and one must form the groups of the largest possible size", one arrives
at 2 groups.  The merge function should then select one tuple from
each group, resulting in a count of 2 (for the final number of groups).


Keep in mind that I will not know how many groups of tuples will be

created for any given molecule.  Hence, I can not use hard coded array
indices.


Any ideas how to modify the code below to obtain the desired result

for tuple, match3, and how to deal with tuples of various sizes?


Regards,

Jim Metz








def merge2(matches):
if len(matches) > 1:
d = {}
for match in matches:
t = (matches[0], matches[1])
if (matches[0] < matches[1]):
t = (matches[0], matches[1])
else:
t = (matches[1], matches[0])
d[t] = match
merged_match = (d[t],)
else:
merged_match = matches

count = len(merged_match)
return(count)


match1 = ((0,2,3,4),)
match2 = ((0,2,3,4), (1,2,3,4))
match3 = ((0,2,4,5), (1,2,5,6), (2,3,4,5), (2,3,5,6), (0,2,5,6), (1,2,4,5))
matches = match2   # Change the number to test different tuples


output = merge2(matches)
print("Output is   ", output)










-Original Message-
From: Peter S. Shenkin <shen...@gmail.com>
To: James T. Metz <jamestm...@aol.com>
Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
Sent: Tue, Nov 7, 2017 7:05 pm
Subject: Re: [Rdkit-discuss] Python code to merge tuples from a SMARTS match



I think you probably used a slightly different SMILES than the one you showed. 
The one you showed should have given ((0,1,3,4),(2,1,3,4)).


The proper merge rule would then be to consider all matches equivalent if the 
2nd and 3rd atom in the match agree, in any order; i.e, the two carbons, 
indices 1 and 3 in this case. 


So to do this, for each molecule, do something like this:


d = dict{}
for match in matches:
t = (match[1], match[2])
if match[1] < match[2] ):
t = (match[1], match[2])
else:
t = (match[2], match[1])
d[t] = match


You will wind up with as many dictionary elements as there are matches.


-P.
 



On Tue, Nov 7, 2017 at 7:38 PM, James T. Metz via Rdkit-discuss 
<rdkit-discuss@lists.sourceforge.net> wrote:

RDkit Discussion Group,




I have written a SMARTS to detect vicinal chlorine groups

using RDkit.  There are 4 atoms involved in a vicinal chlorine group.


SMARTS = '[Cl]-[C,c]-,=,:[C,c]-[Cl]'


I am trying to count the number of ("unique") occurrences of this

pattern.


For some molecules with symmetry, this results in

over-counting.
   

For the molecule, smiles1 below, I want to obtain

a count of 1 i.e., 1 tuple of 4 atoms.


smiles1 = 'ClC(Cl)CCl'



However, using the SMARTS above, I obtain 2 tuples of 4 atoms.  
Beginning with a MOL file representation of smiles1, I get


((1,2,4,3), (0,2,4,3))



One possible solution is to somehow merge the two tuples according 

to a "rule."  One rule that works is "if 3 of the atom indices are the same, 
then combine into one tuple."


However, the rule needs a bit of modification for more complicated
cases (higher symmetry).


Consider



smiles2 = 'ClC(Cl)CCl(Cl)(Cl)



My goal is to get 2 tuples of 4 atoms for smiles2



smiles2 is somewhat tricky because there are either

2 groups of 3 (4 atom) tuples, or 3 groups of 2 (4 atom)
tuples depending on how you choose your 3 atom indices.


Again, if my goal is to get 2 tuples, then I need to somehow

pick the largest group, i.e., 2 groups of 3 tuples to do the merge 
operation which will give me 2 remaining groups (desired).


I have already checked stackoverflow and a few other places

for PYTHON code to do the necessary merging, but I could not
find anything specific and appropriate.


I would be most grateful if anyone has ideas how to do this.  I

suspect the answer is a few lines of well-written PYTHON code, 
and not modifying the SMARTS (I could be mistaken!).


Thank you.



Regards,

Jim Metz





--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss






--

[Rdkit-discuss] Python code to merge tuples from a SMARTS match

2017-11-07 Thread James T. Metz via Rdkit-discuss
RDkit Discussion Group,




I have written a SMARTS to detect vicinal chlorine groups

using RDkit.  There are 4 atoms involved in a vicinal chlorine group.


SMARTS = '[Cl]-[C,c]-,=,:[C,c]-[Cl]'


I am trying to count the number of ("unique") occurrences of this

pattern.


For some molecules with symmetry, this results in

over-counting.
   

For the molecule, smiles1 below, I want to obtain

a count of 1 i.e., 1 tuple of 4 atoms.


smiles1 = 'ClC(Cl)CCl'



However, using the SMARTS above, I obtain 2 tuples of 4 atoms.  
Beginning with a MOL file representation of smiles1, I get


((1,2,4,3), (0,2,4,3))



One possible solution is to somehow merge the two tuples according 

to a "rule."  One rule that works is "if 3 of the atom indices are the same, 
then combine into one tuple."


However, the rule needs a bit of modification for more complicated
cases (higher symmetry).


Consider



smiles2 = 'ClC(Cl)CCl(Cl)(Cl)



My goal is to get 2 tuples of 4 atoms for smiles2



smiles2 is somewhat tricky because there are either

2 groups of 3 (4 atom) tuples, or 3 groups of 2 (4 atom)
tuples depending on how you choose your 3 atom indices.


Again, if my goal is to get 2 tuples, then I need to somehow

pick the largest group, i.e., 2 groups of 3 tuples to do the merge 
operation which will give me 2 remaining groups (desired).


I have already checked stackoverflow and a few other places

for PYTHON code to do the necessary merging, but I could not
find anything specific and appropriate.


I would be most grateful if anyone has ideas how to do this.  I

suspect the answer is a few lines of well-written PYTHON code, 
and not modifying the SMARTS (I could be mistaken!).


Thank you.



Regards,

Jim Metz




--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] ctest

2017-11-05 Thread James T. Metz via Rdkit-discuss
RDkit Experts,


What is ctest?  Is it fully documented somewhere?



I am running PYTHON 3.5.2/RDkit 2017.03.1 on WINDOWS 7 via

Pycharm 2017.2.3


Is it possible to run ctest in this environment?  ctest seems to be a good 
way

to exercise and test many of the capabilities of RDkit, some of which I am
not familiar with.  Thank you.


Regards,

Jim Metz





--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] 2017.09.1 RDKit release

2017-10-09 Thread James T. Metz via Rdkit-discuss
Many thanks to all for this new RDKit release.


I have used commercial software in my past, and I am
more and more impressed by the capabilities of RDKit and
other open source software.


Please keep up the good work!


Regards,
Jim Metz







-Original Message-
From: rdkit-discuss-request 
To: rdkit-discuss 
Sent: Sun, Oct 8, 2017 7:33 am
Subject: Rdkit-discuss Digest, Vol 120, Issue 22

Send Rdkit-discuss mailing list submissions to
rdkit-discuss@lists.sourceforge.net

To subscribe or unsubscribe via the World Wide Web, visit
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
or, via email, send a message with subject or body 'help' to
rdkit-discuss-requ...@lists.sourceforge.net

You can reach the person managing the list at
rdkit-discuss-ow...@lists.sourceforge.net

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Rdkit-discuss digest..."


Today's Topics:

   1. 2017.09.1 RDKit release (Greg Landrum)


--

Message: 1
Date: Sun, 8 Oct 2017 14:32:54 +0200
From: Greg Landrum 
To: RDKit Discuss ,
rdkit-annou...@lists.sourceforge.net,  RDKit Developers List

Subject: [Rdkit-discuss] 2017.09.1 RDKit release
Message-ID:

Content-Type: text/plain; charset="utf-8"

I'm pleased to announce that the next version of the RDKit -- 2017.09 -- is
released. The release notes are below.

The release files are on the github release page:
https://github.com/rdkit/rdkit/releases/tag/Release_2017_09_1

Binaries have been uploaded to anaconda.org (https://anaconda.org/rdkit).
The available conda binaries for this release are:
Linux 64bit: python 2.7, 3.5, 3.6
Mac OS 64bit: python 2.7, 3.5, 3.6
Windows 64bit: python 2.7, 3.5, 3.6

I left out the Win32 python2.7 build this time. If that's important to
someone, let me know and I'll see if I can get it working again.

Some notes on the conda builds:
- These builds are tested with conda v4.3.25 and, as of the release date,
are very unlikely to work with anything newer than that. This thread has
more information on that: https://www.mail-archive.com/rdkit-discuss@lists.
sourceforge.net/msg07315.html
- The conda builds now depend on numpy 1.13 instead of 1.11.
- The Mac and Linux builds now use v1.63 of boost. The rdkit conda channel
has the appropriate binaries.

Some things that will be finished over the next couple of days:
- The conda build scripts will be updated to reflect the new version and
new conda builds will be available in the RDKit channel at anaconda.org (
https://anaconda.org/rdkit).
- The homebrew script
- The online version of the documentation at rdkit.org

Thanks to everyone who submitted bug reports and suggestions for this
release!

Please let me know if you find any problems with the release or have
suggestions for the next one, which is scheduled for March 2018.

Best Regards,
-greg

# Release_2017.09.1
(Changes relative to Release_2017.03.1)

## Important
- The fix for bug #1567 changes the way fragment SMILES are canonicalized.
  MolFragmentToSmiles() and canonicalizeFragment() will now often return
  different results
- The fix for bug #1604 changes the behavior of QueryAtom::setQuery(), which
  now deletes the current query before setting the new value. If you are
using
  QueryAtom::setQuery() from C++ (or possibly Java), be sure that you are
not
  also deleting that memory.

## Acknowledgements:
Brian Cole, Peter Gedeck, Guillaume Godin, Jan Halborg Jensen, Malitha
Kabir,
Tuomo Kalliokoski, Brian Kelley, Noel O'Boyle, Matthew O'Meara, Pavel
Polishchuk, Cameron Pye, Christian Ribeaud, Stephen Roughley, Patrick
Savery,
Roger Sayle, Nadine Schneider, Gregor Simm, Matt Swain, Paolo Tosco, Alain
Vaucher, Sam Webb, 'phenethyl', 'xiaotaw'

## Highlights:
- The new R-Group decomposition code provides a flexible and powerful tool
for
  building R-group tables or datasets look in $RDBASE/Docs/Notebooks for
  example notebooks showing how to use this.
- Drawing of chemical reactions has been greatly improved and is now done
using
  the C++ rendering code.
- The MaxMinPicker is dramatically faster.
- New descriptors: the QED descriptor has been added as have a large
collection
  of new 3D descriptors and implementations of the USR and USRCAT
fingerprints.

## New Features and Enhancements:
  - Bring back USR and USRCAT descriptors
 (github pull #1417 from greglandrum)
  - Generate a warning for conflicting bond directions
 (github issue #1423 from greglandrum)
  - expose and test GetDrawCoords()
 (github pull #1427 from greglandrum)
  - Improvement suggestions for SaltRemover
 (github issue #1431 from ribeaud)
  - Remove obsolete scripts from 

Re: [Rdkit-discuss] need SMARTS query with a specific exclusion

2017-09-24 Thread James T. Metz via Rdkit-discuss
Chris,


Wow! Your recursive SMARTS expression works as needed!


Hmmm... Help me understand this better ... it looks like you "walk around" the
ring of the substructure we want to exclude and employ a slightly different 
recursive SMARTS beginning at that atom.  Is that correct?


Also, since my situation is likely to get more complicated with respect to
exclusions, suppose I still wanted to utilize the general aromatic expression
for a 6-membered ring i.e. [a]1:[a]:[a]:[a][a]:[a]1, and I wanted to exclude
the structures we have been discussing, and I also wanted to exclude
pyridine i.e., [n]1:[c]:[c]:[c]:[c]:[c]1.


Is there a SMARTS expression that would capture 2 exclusions?


Perhaps this is getting too clumsy!  It might be better to have one or more
inclusion SMARTS and one or more exclusion SMARTS, and write code
to remove those groups of atoms that are coming from the exclusion SMARTS.


Any ideas for PYTHON/RDkit code?  Something like


test_smiles = 'c1c1'
inclusion_pattern = '[a]1:[a]:[a]:[a]:[a]:[a]1'
exclusion_pattern = '[n]1:[c]:[c]:[c]:[c]:[c]1'
etc...


Hmmm... any other ideas, suggestions, comments?


Thanks again.


Regards,
Jim Metz








-Original Message-
From: Chris Earnshaw <cgearns...@gmail.com>
To: James T. Metz <jamestm...@aol.com>
Cc: Rdkit-discuss@lists.sourceforge.net <rdkit-discuss@lists.sourceforge.net>
Sent: Sun, Sep 24, 2017 4:01 am
Subject: Re: [Rdkit-discuss] need SMARTS query with a specific exclusion

Hi Jim

It can be done with recursive SMARTS, though the syntax is a bit
painful This may do what you want -
[$(a);!$(n1(C)ccc(=O)nc1=O);!$(c1cc(=O)nc(=O)n1C);!$(c1c(=O)nc(=O)n(C)c1);!$(c(=O)1nc(=O)n(C)cc1);!$(n1c(=O)n(C)ccc1=O);!$(c(=O)1n(C)ccc(=O)n1)]:1:a:a:a:a:a:1

Its basically the general 6-ring aromatic pattern a:1:a:a:a:a:a:1,
with recursive SMARTS applied to the first atom to ensure that this
can't match any of the 6 ring atoms in your undesired system.

Regards,
Chris Earnshaw

On 24 September 2017 at 05:04, James T. Metz via Rdkit-discuss
<rdkit-discuss@lists.sourceforge.net> wrote:
> Hello,
>
> Suppose I have the following molecule
>
> m = 'CN1C=CC(=O)NC1=O'
>
> I would like to be able to use a SMARTS pattern
>
> pattern = '[a]1:[a][a]:[a]:[a]:a]1'
>
> to recognize the 6 atoms in a typical aromatic ring, but
> I do not want to recognize the 6 atoms in the molecule,
> m, as aromatic.  In other words, I am trying to write
> a specific exclusion.
>
> Is it possible to modify the SMARTS pattern to
> exclude the above molecule?  I have tried using
> recursive SMARTS, but I can't get the syntax to
> work.
>
> Any ideas?  Thank you.
>
> Regards,
> Jim Metz
>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SMARTS for heteroaromatic rings?

2017-09-21 Thread James T. Metz via Rdkit-discuss
Greg,


You are suggesting some interesting ideas.  Probably matching of atoms
in 5 and 6-membered aromatic rings will be sufficient for now.


I was initially stumped trying to figure out an elegant way to deal with

aromatic N's, O's, and S's in various combinations.  The usage of "a" in 
SMARTS is powerful in this regard.


Thanks again.



Regards,
Jim Metz





-Original Message-
From: Greg Landrum <greg.land...@gmail.com>
To: James T. Metz <jamestm...@aol.com>
Cc: Jason Biggs <jasondbi...@gmail.com>; RDKit Discuss 
<rdkit-discuss@lists.sourceforge.net>
Sent: Thu, Sep 21, 2017 12:32 am
Subject: Re: [Rdkit-discuss] SMARTS for heteroaromatic rings?



My approach to this would depend on what you're trying to accomplish in the end.


If you just want all the aromatic atoms you can just use "[a]". Unless you do 
some extra work when you read in the molecules, any aromatic atom will be in a 
ring. If you want to be really sure, you can do "[a;r]"
If you want all the aromatic bonds, it's "[a]:[a]"


If you want the rings themselves and you want to just use SMARTS, you have to 
enumerate. Python makes getting the patterns pretty easy:





In [8]: patts = ["[a]:1"+":[a]"*i+":[a]:1" for i in range(3,22)] # 24 is the 
max aromatic ring size




In [9]: patts[:3]

Out[9]: 

['[a]:1:[a]:[a]:[a]:[a]:1',

 '[a]:1:[a]:[a]:[a]:[a]:[a]:1',

 '[a]:1:[a]:[a]:[a]:[a]:[a]:[a]:1']



The rest is just some calls to MolFromSmarts() and then 
mol.GetSubstructMatches() for the molecules you want to test.


-greg







On Thu, Sep 21, 2017 at 3:56 AM, James T. Metz via Rdkit-discuss 
<rdkit-discuss@lists.sourceforge.net> wrote:

Jason,


Thanks!  I just thought of that for a 6-membered ring.  A 5-membered
ring would be [a]1[a][a][a][a]1.


Hmmm... I was thinking of using "r" to specify a ring, but I don't think

that would be necessary.  Correct?


Regards,

Jim Metz








-Original Message-
From: Jason Biggs <jasondbi...@gmail.com>
To: James T. Metz <jamestm...@aol.com>
Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
Sent: Wed, Sep 20, 2017 8:36 pm
Subject: Re: [Rdkit-discuss] SMARTS for heteroaromatic rings?



if you don't care what type of atom it is, just that it's aromatic, you should 
use [a],


so [a]1[a][a][a][a][a]1 would match any 6-membered aromatic ring



Jason Biggs




On Wed, Sep 20, 2017 at 7:57 PM, James T. Metz via Rdkit-discuss 
<rdkit-discuss@lists.sourceforge.net> wrote:

Hello,


I would like to write a SMARTS that will match all of the individual atoms

in all possible heteroaromatic rings.  Does anyone know of an elegant, 
compact way to do this?


If one SMARTS will not work, I can concatenate SMARTS using 

a vertical pipe, "|", as I proposed in an earlier message in this forum.


I am (perhaps) expecting SMARTS something like

[c]1[c][n][c][c]1
etc
[c]1[c][c][c][c][c]1
[c]1[c][n][c][c][c]1
etc.


Perhaps there is a very elegant way to specify the possible

patterns.  I can't think of a way to do it, other than exhaustive
enumeration.  


Any ideas?



Regards,

Jim Metz










--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss








--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss






--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] SMARTS for heteroaromatic rings?

2017-09-20 Thread James T. Metz via Rdkit-discuss
Jason,


Thanks!  I just thought of that for a 6-membered ring.  A 5-membered
ring would be [a]1[a][a][a][a]1.


Hmmm... I was thinking of using "r" to specify a ring, but I don't think

that would be necessary.  Correct?


Regards,

Jim Metz







-Original Message-
From: Jason Biggs <jasondbi...@gmail.com>
To: James T. Metz <jamestm...@aol.com>
Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
Sent: Wed, Sep 20, 2017 8:36 pm
Subject: Re: [Rdkit-discuss] SMARTS for heteroaromatic rings?



if you don't care what type of atom it is, just that it's aromatic, you should 
use [a],


so [a]1[a][a][a][a][a]1 would match any 6-membered aromatic ring



Jason Biggs




On Wed, Sep 20, 2017 at 7:57 PM, James T. Metz via Rdkit-discuss 
<rdkit-discuss@lists.sourceforge.net> wrote:

Hello,


I would like to write a SMARTS that will match all of the individual atoms

in all possible heteroaromatic rings.  Does anyone know of an elegant, 
compact way to do this?


If one SMARTS will not work, I can concatenate SMARTS using 

a vertical pipe, "|", as I proposed in an earlier message in this forum.


I am (perhaps) expecting SMARTS something like

[c]1[c][n][c][c]1
etc
[c]1[c][c][c][c][c]1
[c]1[c][n][c][c][c]1
etc.


Perhaps there is a very elegant way to specify the possible

patterns.  I can't think of a way to do it, other than exhaustive
enumeration.  


Any ideas?



Regards,

Jim Metz










--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss






--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] SMARTS for heteroaromatic rings?

2017-09-20 Thread James T. Metz via Rdkit-discuss
Hello,


I would like to write a SMARTS that will match all of the individual atoms

in all possible heteroaromatic rings.  Does anyone know of an elegant, 
compact way to do this?


If one SMARTS will not work, I can concatenate SMARTS using 

a vertical pipe, "|", as I proposed in an earlier message in this forum.


I am (perhaps) expecting SMARTS something like

[c]1[c][n][c][c]1
etc
[c]1[c][c][c][c][c]1
[c]1[c][n][c][c][c]1
etc.


Perhaps there is a very elegant way to specify the possible

patterns.  I can't think of a way to do it, other than exhaustive
enumeration.  


Any ideas?



Regards,

Jim Metz









--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] single SMARTS for two patterns with Boolean OR

2017-09-19 Thread James T. Metz via Rdkit-discuss
Chris,


Thank you for your interesting suggestion, but it is not quite what I need.


For example, consider the molecule


m = Chem.MolFromSmiles("CCNN")


I am looking for one SMARTS that using the SMARTS pattern matching
capability in RDkit would return 2 groups, each group containing the two
atoms corresponding to  CC  and  NN.


Your suggested recursive SMARTS  and code below



pattern = Chem.MolFromSmarts('[$(C-C),$(N-N)]')
match = m.GetSubstructMatches(pattern)
match


returns



((0,), (1,), (2,), (3,))


The output I am trying to achieve, instead, is



((0,1), (2,3))


Is there a single SMARTS that will do that?



Regards,

Jim Metz










-Original Message-
From: Chris Earnshaw <cgearns...@gmail.com>
To: James T. Metz <jamestm...@aol.com>
Cc: Rdkit-discuss@lists.sourceforge.net <rdkit-discuss@lists.sourceforge.net>
Sent: Tue, Sep 19, 2017 10:13 am
Subject: Re: [Rdkit-discuss] single SMARTS for two patterns with Boolean OR




Hi


Will the recursive SMARTS [$(C-C),$(N-N)] not do the job?

I'd parse this in English as 'an atom which is EITHER an aliphatic carbon 
singly bonded to an aliphatic carbon OR an aliphatic nitrogen singly bonded to 
an aliphatic nitrogen'.


Regards,

Chris Earnshaw



On 19 September 2017 at 15:01, James T. Metz via Rdkit-discuss 
<rdkit-discuss@lists.sourceforge.net> wrote:

Dante,


Yes.  In principle, if one can figure out all of the possible undesired 
cross
matches.


Since my goal is to do this in RDkit and generate groups of atoms

that match, perhaps one approach is to simply use multiple RDkit pattern
matching statements (with multiple SMARTS), generate the groups of atoms, 
then combine the lists, removing identical groups.



Hmmm... Is there a more straightforward (elegant) solution?



Regards,

Jim Metz






-Original Message-
From: Dante <dante.esgrimi...@gmail.com>
To: James T. Metz <jamestm...@aol.com>
Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
Sent: Tue, Sep 19, 2017 8:45 am
Subject: Re: [Rdkit-discuss] single SMARTS for two patterns with Boolean OR



Hi Jim,


Could you use the 'NOT' logical operator (!) in combination with recursive 
SMARTS to eliminate the cross-matches?


Cheers,


Dante


On Tue, Sep 19, 2017 at 9:13 AM, James T. Metz via Rdkit-discuss 
<rdkit-discuss@lists.sourceforge.net> wrote:

Hello,


Is it possible to write a single SMARTS for two separate patterns involving
a Boolean OR?


For example,  I want to write a single SMARTS that can match the
patterns of 


[C]-[C]  


or 


[N]-[N]


I realize that I could write something like


[C,N]-[C,N]


but that would also match "cross" patterns such as


CN and NC which I don't want.


I have tried to write


([C]-[C]), ([N\-[N])   but I have not been able to get that syntax or related
expressions (variations of parentheses, brackets, etc) to work.


Hence, if someone knows how to combine separate SMARTS expressions into
a single expression with a Boolean OR, I would be grateful.  Thank you.


Regards,
Jim Metz





--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss








--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss






--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] single SMARTS for two patterns with Boolean OR

2017-09-19 Thread James T. Metz via Rdkit-discuss
Dante,


Yes.  In principle, if one can figure out all of the possible undesired 
cross
matches.


Since my goal is to do this in RDkit and generate groups of atoms

that match, perhaps one approach is to simply use multiple RDkit pattern
matching statements (with multiple SMARTS), generate the groups of atoms, 
then combine the lists, removing identical groups.



Hmmm... Is there a more straightforward (elegant) solution?



Regards,

Jim Metz





-Original Message-
From: Dante <dante.esgrimi...@gmail.com>
To: James T. Metz <jamestm...@aol.com>
Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
Sent: Tue, Sep 19, 2017 8:45 am
Subject: Re: [Rdkit-discuss] single SMARTS for two patterns with Boolean OR



Hi Jim,


Could you use the 'NOT' logical operator (!) in combination with recursive 
SMARTS to eliminate the cross-matches?


Cheers,


Dante


On Tue, Sep 19, 2017 at 9:13 AM, James T. Metz via Rdkit-discuss 
<rdkit-discuss@lists.sourceforge.net> wrote:

Hello,


Is it possible to write a single SMARTS for two separate patterns involving
a Boolean OR?


For example,  I want to write a single SMARTS that can match the
patterns of 


[C]-[C]  


or 


[N]-[N]


I realize that I could write something like


[C,N]-[C,N]


but that would also match "cross" patterns such as


CN and NC which I don't want.


I have tried to write


([C]-[C]), ([N\-[N])   but I have not been able to get that syntax or related
expressions (variations of parentheses, brackets, etc) to work.


Hence, if someone knows how to combine separate SMARTS expressions into
a single expression with a Boolean OR, I would be grateful.  Thank you.


Regards,
Jim Metz





--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss






--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] single SMARTS for two patterns with Boolean OR

2017-09-19 Thread James T. Metz via Rdkit-discuss
Hello,


Is it possible to write a single SMARTS for two separate patterns involving
a Boolean OR?


For example,  I want to write a single SMARTS that can match the
patterns of 


[C]-[C]  


or 


[N]-[N]


I realize that I could write something like


[C,N]-[C,N]


but that would also match "cross" patterns such as


CN and NC which I don't want.


I have tried to write


([C]-[C]), ([N\-[N])   but I have not been able to get that syntax or related
expressions (variations of parentheses, brackets, etc) to work.


Hence, if someone knows how to combine separate SMARTS expressions into
a single expression with a Boolean OR, I would be grateful.  Thank you.


Regards,
Jim Metz




--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] how to output multiple Kekule structures

2017-09-11 Thread James T. Metz via Rdkit-discuss
Paolo,


Exactly what I was looking for.  Very helpful.  Thank you.

Regards,
Jim Metz





-Original Message-
From: Paolo Tosco <paolo.to...@unito.it>
To: James T. Metz <jamestm...@aol.com>; greg.landrum <greg.land...@gmail.com>; 
rdkit-discuss <rdkit-discuss@lists.sourceforge.net>
Sent: Mon, Sep 11, 2017 2:53 pm
Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures


Hi Jim,

you can indeed enumerate all Kekulè structures for a molecule withinthe 
RDKit using Chem.ResonanceMolSupplier():


  

  
  

  


  

  

  
from rdkit import Chem


  

  


  
  


  

  

  
mol = Chem.MolFromSmiles('c1c1')


  

  


  
  


  

  

  
suppl = Chem.ResonanceMolSupplier(mol, Chem.KEKULE_ALL)


  

  


  
  


  

  

len(suppl)

  

  


  
  


  

  

  
2

  

  


  
  


  

  

for i in range(len(suppl)):
print (Chem.MolToSmiles(suppl[i], kekuleSmiles=True))

  

  


  
  


  

  

  
C1C=CC=CC=1
C1=CC=CC=C1


  

  


  
  


  

  

  

  
 


  

  

  

Best,
Paolo


On 09/11/2017 05:22 PM, James T. Metz  via Rdkit-discuss wrote:


Greg,



Thanks!  Yes, very helpful.  I will need to digest the  detailed 
information

you have provided.  I am somewhat familiar with recursive  SMARTS.  
Thanks

again.




Regards,

Jim Metz


  
  
  
  
-OriginalMessage-
From: Greg Landrum <greg.land...@gmail.com>
To: James T. Metz <jamestm...@aol.com>
Cc: RDKit Discuss<rdkit-discuss@lists.sourceforge.net>
Sent: Mon, Sep 11, 2017 11:15 am
Subject: Re: [Rdkit-discuss] how to output multiple Kekule  
  structures


  

  

  
  

On Mon, Sep 11,  2017 at 5:55 PM, James T. Metz 
<jamestm...@aol.com>  wrote:
  
Greg,  

  
  
I need to be able to use SMARTSpatterns to 
identify substructures inmolecules
  
that can be aromatic, and I need to beable to 
handle cases where there can be
  
differences in the way that the moleculewas entered 
or drawn by a user.

  

  
  

That particular  problem is a big part of the reason 
that we  tend to use the aromatic representation of 
 things.

 

  
  
  
  
  
For example, consider the following
alkenyl-substituted pyridine, there
  
are two possible Kekule structures
  

  
  
m1 = 'C=CC1=NC=CC=C1'
  
m2 = 'C=CC1N=CC=CC1'

  

  
  
Fixing what I assume is a typo for m2, I cando the 
following:
  

  
  
In [11]: m1 =Chem.MolFromSmiles('C=CC1=NC=CC=C1')
  

  
  
In [12]: m2 =Chem.MolFromSmiles('C=CC1N=CC=CC=1')
  

  
  
In [13]: q1 = Chem.MolFromSmarts('')
  

  
 

Re: [Rdkit-discuss] how to output multiple Kekule structures

2017-09-11 Thread James T. Metz via Rdkit-discuss
Greg,


Thanks!  Yes, very helpful.  I will need to digest the detailed information
you have provided.  I am somewhat familiar with recursive SMARTS.  Thanks
again.


Regards,
Jim Metz





-Original Message-
From: Greg Landrum <greg.land...@gmail.com>
To: James T. Metz <jamestm...@aol.com>
Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
Sent: Mon, Sep 11, 2017 11:15 am
Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures






On Mon, Sep 11, 2017 at 5:55 PM, James T. Metz <jamestm...@aol.com> wrote:

Greg,


I need to be able to use SMARTS patterns to identify substructures in 
molecules
that can be aromatic, and I need to be able to handle cases where there can be
differences in the way that the molecule was entered or drawn by a user.




That particular problem is a big part of the reason that we tend to use the 
aromatic representation of things.
 



For example, consider the following alkenyl-substituted pyridine, there
are two possible Kekule structures


m1 = 'C=CC1=NC=CC=C1'
m2 = 'C=CC1N=CC=CC1'



Fixing what I assume is a typo for m2, I can do the following:


In [11]: m1 = Chem.MolFromSmiles('C=CC1=NC=CC=C1')


In [12]: m2 = Chem.MolFromSmiles('C=CC1N=CC=CC=1')


In [13]: q1 = Chem.MolFromSmarts('')


In [14]: q2 = Chem.MolFromSmarts('cccn')


In [15]: list(m1.GetSubstructMatch(q1))
Out[15]: [2, 7, 6, 5]


In [16]: list(m1.GetSubstructMatch(q2))
Out[16]: [6, 5, 4, 3]


In [17]: list(m2.GetSubstructMatch(q1))
Out[17]: [2, 7, 6, 5]


In [18]: list(m2.GetSubstructMatch(q2))
Out[18]: [6, 5, 4, 3]
 


Those particular queries were going for the aromatic species and will only 
match inside the ring, but if you want to be more generic you could tune your 
queries like this:




In [28]: q3 = 
Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]-=,:[*])]')


In [29]: q4 = 
Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#7;$([#7]-=,:[*])]')


In [30]: list(m1.GetSubstructMatch(q3))
Out[30]: [0, 1, 2, 7]


In [31]: list(m1.GetSubstructMatch(q4))
Out[31]: [0, 1, 2, 3]


In [32]: list(m2.GetSubstructMatch(q3))
Out[32]: [0, 1, 2, 7]


In [33]: list(m2.GetSubstructMatch(q4))
Out[33]: [0, 1, 2, 3]



If you aren't familiar with recursive SMARTS, this construct: 
"[#6;$([#6]=,:[*])]" means "a carbon that has either a double bond or an 
aromatic bond to another atom".  So you can interpret q3 as "four carbons that 
each have either a double or aromatic bond and that are connected to each other 
by single, double, or aromatic bonds".


Is this starting to approximate what you're looking for?
-greg










Now consider two SMARTS


pattern1 = '[C]=[C]-[C]={C]

pattern2 = '[C]=[C]-[C]=[N]'



I need to be able to detect the existence of each pattern in the molecule



If m1 is the only available generated Kekule structure, then pattern2 will 
be recognized.

If m2 is the only available generated Kekule  structure, then pattern1 will 
be recognized.



Hence, I am getting different answers for the same input molecule just 
because

it was drawn in different Kekule structures.


Regards,

Jim Metz










-Original Message-
From: Greg Landrum <greg.land...@gmail.com>
To: James T. Metz <jamestm...@aol.com>
Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
Sent: Mon, Sep 11, 2017 10:31 am
Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures



Hi Jim,


The code currently has no way to enumerate Kekule structures. I don't recall 
this coming up in the past and, to be honest, it doesn't seem all that 
generally useful. 


Perhaps there's an alternate way to solve the problem; what are you trying to 
do?


-greg





On Mon, Sep 11, 2017 at 5:04 PM, James T. Metz via Rdkit-discuss 
<rdkit-discuss@lists.sourceforge.net> wrote:

Hello,


Suppose I read in an aromatic SMILES e.g., for benzene



c1c1



I would like to generate the major canonical resonance forms

and save the results as two separate molecules.  Essentially
I am trying to generate


m1 = 'C1=CC=CC-C1'

m2 = 'C1C=CC=CC1'



Can this be done in RDkit?  I have found a KEKULE_ALL 

option in the detailed documentation which seems to be what I
am trying to do, but I don't understand how this option is to be used,
or the proper syntax.


If it is necessary to somehow renumber the atoms and re-generate

Kekule structures, that is OK.  Thank you.


Regards,

Jim Metz














--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@li

Re: [Rdkit-discuss] how to output multiple Kekule structures

2017-09-11 Thread James T. Metz via Rdkit-discuss
Greg,


I need to be able to use SMARTS patterns to identify substructures in 
molecules
that can be aromatic, and I need to be able to handle cases where there can be
differences in the way that the molecule was entered or drawn by a user.


For example, consider the following alkenyl-substituted pyridine, there
are two possible Kekule structures


m1 = 'C=CC1=NC=CC=C1'
m2 = 'C=CC1N=CC=CC1'


Now consider two SMARTS


pattern1 = '[C]=[C]-[C]={C]

pattern2 = '[C]=[C]-[C]=[N]'



I need to be able to detect the existence of each pattern in the molecule



If m1 is the only available generated Kekule structure, then pattern2 will 
be recognized.

If m2 is the only available generated Kekule  structure, then pattern1 will 
be recognized.



Hence, I am getting different answers for the same input molecule just 
because

it was drawn in different Kekule structures.


Regards,

Jim Metz









-Original Message-
From: Greg Landrum <greg.land...@gmail.com>
To: James T. Metz <jamestm...@aol.com>
Cc: RDKit Discuss <rdkit-discuss@lists.sourceforge.net>
Sent: Mon, Sep 11, 2017 10:31 am
Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures



Hi Jim,


The code currently has no way to enumerate Kekule structures. I don't recall 
this coming up in the past and, to be honest, it doesn't seem all that 
generally useful. 


Perhaps there's an alternate way to solve the problem; what are you trying to 
do?


-greg





On Mon, Sep 11, 2017 at 5:04 PM, James T. Metz via Rdkit-discuss 
<rdkit-discuss@lists.sourceforge.net> wrote:

Hello,


Suppose I read in an aromatic SMILES e.g., for benzene



c1c1



I would like to generate the major canonical resonance forms

and save the results as two separate molecules.  Essentially
I am trying to generate


m1 = 'C1=CC=CC-C1'

m2 = 'C1C=CC=CC1'



Can this be done in RDkit?  I have found a KEKULE_ALL 

option in the detailed documentation which seems to be what I
am trying to do, but I don't understand how this option is to be used,
or the proper syntax.


If it is necessary to somehow renumber the atoms and re-generate

Kekule structures, that is OK.  Thank you.


Regards,

Jim Metz














--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss






--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] how to output multiple Kekule structures

2017-09-11 Thread James T. Metz via Rdkit-discuss
Hello,


Suppose I read in an aromatic SMILES e.g., for benzene



c1c1



I would like to generate the major canonical resonance forms

and save the results as two separate molecules.  Essentially
I am trying to generate


m1 = 'C1=CC=CC-C1'

m2 = 'C1C=CC=CC1'



Can this be done in RDkit?  I have found a KEKULE_ALL 

option in the detailed documentation which seems to be what I
am trying to do, but I don't understand how this option is to be used,
or the proper syntax.


If it is necessary to somehow renumber the atoms and re-generate

Kekule structures, that is OK.  Thank you.


Regards,

Jim Metz













--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] SMARTS pattern matching of canonical forms of aromatic molecules

2017-09-08 Thread James T. Metz via Rdkit-discuss
Hello,


Suppose I read in the SMILES of an aromatic molecule e.g., for

benzene


c1c1



I then want to convert the molecule to a Kekule representation and

then perform various SMARTS pattern recognition e.g.


[C]=[C]-[C]



I have tried various Kekule commands in RDkit, but I can not figure

out how to (or if it is possible) to recognize a SMARTS pattern for
a portion of a molecule which is aromatic, but is currently being
stored as a Kekule structure.


Also, is it possible to generate and store more than one Kekule

form in RDkit?


Thank you.


Regards,

Jim Metz





--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Rdkit-discuss Digest, Vol 119, Issue 6

2017-09-07 Thread James T. Metz via Rdkit-discuss
TJ,


Your suggestion solved my problem.  Thanks!  I need to pay closer attention
to the SMARTS documentation!


Regards,
Jim Metz




-Original Message-
From: rdkit-discuss-request <rdkit-discuss-requ...@lists.sourceforge.net>
To: rdkit-discuss <rdkit-discuss@lists.sourceforge.net>
Sent: Wed, Sep 6, 2017 9:17 pm
Subject: Rdkit-discuss Digest, Vol 119, Issue 6

Send Rdkit-discuss mailing list submissions to
rdkit-discuss@lists.sourceforge.net

To subscribe or unsubscribe via the World Wide Web, visit
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
or, via email, send a message with subject or body 'help' to
rdkit-discuss-requ...@lists.sourceforge.net

You can reach the person managing the list at
rdkit-discuss-ow...@lists.sourceforge.net

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Rdkit-discuss digest..."


Today's Topics:

   1. Fwd: Need SMARTS to distinguish 6-ring vs macrocyclic ether
  oxygens (James T. Metz)
   2. Re: Fwd: Need SMARTS to distinguish 6-ring vs macrocyclic
  ether oxygens (TJ O'Donnell)
   3. Re: Fwd: Need SMARTS to distinguish 6-ring vs macrocyclic
  ether oxygens (TJ O'Donnell)


--

Message: 1
Date: Wed, 6 Sep 2017 19:34:02 -0400
From: "James T. Metz" <jamestm...@aol.com>
To: RDkit-discuss@lists.sourceforge.net, jamestm...@aol.com
Subject: [Rdkit-discuss] Fwd: Need SMARTS to distinguish 6-ring vs
macrocyclic ether oxygens
Message-ID: <15e598b1b33-c09-48...@webjas-vab043.srv.aolmail.net>
Content-Type: text/plain; charset="utf-8"



Hello,




Given the following SMILES for a macrocyclic hexaose




OCC1OC2OC3C(CO)OC(OC4C(CO)OC(OC5C(CO)OC(OC6C(CO)OC(OC7C(CO)OC(OC1C(O)C2O)C(O)C7O)C(O)C6O)C(O)C5O)C(O)C4O)C(O)C3O



 can anyone suggest a SMARTS pattern that will distinguish ether oxygens

in the smaller 6-membered rings versus the ethers in the larger macrocyclic
structure?


For example, using RDkit, I have tried (e.g., pattern = 
Chem.MolFromSmarts('[O;H0;D2]') )



[O;H0;D2]  ===>  gives 12 matches (all ether oxygens)



[O;H0;D2;R]  ===>  gives 12 matches (all ether oxygens)



[O;H0;D2;!R]  ===>  gives 0 matches



[O;H0;D2;R6]  ===>  gives 0 matches





I am stumped.  Any ideas?



If it is necessary to write more complicated PYTHON/RDkit/SMARTS code, I am 
certainly willing to try that.



Thanks!



Regards,

Jim Metz

Northwestern University




-- next part --
An HTML attachment was scrubbed...

--

Message: 2
Date: Wed, 6 Sep 2017 18:04:01 -0700
From: "TJ O'Donnell" <t...@acm.org>
To: "James T. Metz" <jamestm...@aol.com>
Cc: RDKit Discuss <RDkit-discuss@lists.sourceforge.net>
Subject: Re: [Rdkit-discuss] Fwd: Need SMARTS to distinguish 6-ring vs
macrocyclic ether oxygens
Message-ID:
<CADqA_h-+KDW=wbtEdGPMvM8vz=1rzskxkkpqzo5iqh1cb33...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Try using [O;H0;D2;r6] lower-case r.  Sorry I'm not at a computer to check
this.
R6 means in 6 rings.
r6 means in ring of size 6.

http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html

TJ O'Donnell

On Wed, Sep 6, 2017 at 4:34 PM, James T. Metz via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> Hello,
>
> Given the following SMILES for a macrocyclic hexaose
>
>OCC1OC2OC3C(CO)OC(OC4C(CO)OC(OC5C(CO)OC(OC6C(CO)OC(OC7C(CO)
> OC(OC1C(O)C2O)C(O)C7O)C(O)C6O)C(O)C5O)C(O)C4O)C(O)C3O
>
> can anyone suggest a SMARTS pattern that will distinguish ether oxygens
> in the smaller 6-membered rings versus the ethers in the larger macrocyclic
> structure?
>
> For example, using RDkit, I have tried (e.g., pattern =
> Chem.MolFromSmarts('[O;H0;D2]') )
>
> [O;H0;D2]  ===>  gives 12 matches (all ether oxygens)
>
> [O;H0;D2;R]  ===>  gives 12 matches (all ether oxygens)
>
> [O;H0;D2;!R]  ===>  gives 0 matches
>
> [O;H0;D2;R6]  ===>  gives 0 matches
>
>
> I am stumped.  Any ideas?
>
> If it is necessary to write more complicated PYTHON/RDkit/SMARTS code,
> I am certainly willing to try that.
>
> Thanks!
>
> Regards,
> Jim Metz
> Northwestern University
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
&g

[Rdkit-discuss] Fwd: Need SMARTS to distinguish 6-ring vs macrocyclic ether oxygens

2017-09-06 Thread James T. Metz via Rdkit-discuss


Hello,




Given the following SMILES for a macrocyclic hexaose




OCC1OC2OC3C(CO)OC(OC4C(CO)OC(OC5C(CO)OC(OC6C(CO)OC(OC7C(CO)OC(OC1C(O)C2O)C(O)C7O)C(O)C6O)C(O)C5O)C(O)C4O)C(O)C3O



can anyone suggest a SMARTS pattern that will distinguish ether oxygens

in the smaller 6-membered rings versus the ethers in the larger macrocyclic
structure?


For example, using RDkit, I have tried (e.g., pattern = 
Chem.MolFromSmarts('[O;H0;D2]') )



[O;H0;D2]  ===>  gives 12 matches (all ether oxygens)



[O;H0;D2;R]  ===>  gives 12 matches (all ether oxygens)



[O;H0;D2;!R]  ===>  gives 0 matches



[O;H0;D2;R6]  ===>  gives 0 matches





I am stumped.  Any ideas?



If it is necessary to write more complicated PYTHON/RDkit/SMARTS code, I am 
certainly willing to try that.



Thanks!



Regards,

Jim Metz

Northwestern University




--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss