Re: [Rdkit-discuss] Using Chem.WrapLogs()

2017-09-11 Thread Greg Landrum
Yeah, please go ahead and fie a bug for the windows problems.
No guarantees that we can fix it, but it's worth at least capturing the
problem.

-greg


On Fri, Sep 8, 2017 at 4:50 PM, Noel O'Boyle  wrote:

> Thanks Maciek,
>
> Both of those solutions works on Linux, which is fine for my purposes.
> Neither works on Windows (let me know if you want me to file a bug).
>
> Regards,
> - Noel
>
> On 8 September 2017 at 15:05, Maciek Wójcikowski 
> wrote:
>
>> Hi Noel,
>>
>> sio.seek(0) before assert or sio.getvalue() instead read().
>>
>> 
>> Pozdrawiam,  |  Best regards,
>> Maciek Wójcikowski
>> mac...@wojcikowski.pl
>>
>> 2017-09-08 15:51 GMT+02:00 Noel O'Boyle :
>>
>>> Hi all,
>>>
>>> I'd like to capture error messages during SMILES parsing, but am having
>>> trouble getting this to work.
>>>
>>> The following code raises an AssertionError, for example. Is there
>>> something here I'm missing? I'm using this from a Windows 7 conda
>>> environment, Python 2.7 64-bit, RDKit 2017.03.3, but a similar conda
>>> environment is also failing for me on Linux.
>>>
>>> import sys
>>> from rdkit import Chem
>>> Chem.WrapLogs()
>>> from StringIO import StringIO
>>>
>>> old_stderr = sys.stderr
>>> sio = sys.stderr = StringIO()
>>>
>>> mol = Chem.MolFromSmiles("c1c")
>>> sys.stderr = old_stderr
>>>
>>> assert sio.read() != ""
>>>
>>> Regards,
>>> - Noel
>>>
>>> 
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] how to output multiple Kekule structures

2017-09-11 Thread Jason Biggs
But keep in mind that the kekulized mols you create with the resonance
supplier will not match the SMARTS patterns given.

Chem.MolToSmiles(mol2, kekuleSmiles = True)

>'C1C=CC=CC=1'


mol2.HasSubstructMatch(Chem.MolFromSmarts('[C]=[C]-[C]'))

> False

mol2.HasSubstructMatch(Chem.MolFromSmarts('[c]=[c]-[c]'))

> True

So at the very least, you need to change the smarts strings to use [#6]
instead of [C]



Jason Biggs


On Mon, Sep 11, 2017 at 2:53 PM, Paolo Tosco  wrote:

> Hi Jim,
>
> you can indeed enumerate all Kekulè structures for a molecule within the
> RDKit using Chem.ResonanceMolSupplier():
>
> from rdkit import Chem
>
> mol = Chem.MolFromSmiles('c1c1')
>
> suppl = Chem.ResonanceMolSupplier(mol, Chem.KEKULE_ALL)
>
> len(suppl)
>
> 2
>
> for i in range(len(suppl)):
> print (Chem.MolToSmiles(suppl[i], kekuleSmiles=True))
>
> C1C=CC=CC=1
> C1=CC=CC=C1
>
>   Best,
> Paolo
>
>
> On 09/11/2017 05:22 PM, James T. Metz via Rdkit-discuss wrote:
>
> Greg,
>
> Thanks!  Yes, very helpful.  I will need to digest the detailed
> information
> you have provided.  I am somewhat familiar with recursive SMARTS.  Thanks
> again.
>
> Regards,
> Jim Metz
>
>
>
>
> -Original Message-
> From: Greg Landrum  
> To: James T. Metz  
> Cc: RDKit Discuss 
> 
> Sent: Mon, Sep 11, 2017 11:15 am
> Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures
>
>
> On Mon, Sep 11, 2017 at 5:55 PM, James T. Metz < 
> jamestm...@aol.com> wrote:
>
> Greg,
>
> I need to be able to use SMARTS patterns to identify substructures in
> molecules
> that can be aromatic, and I need to be able to handle cases where there
> can be
> differences in the way that the molecule was entered or drawn by a user.
>
>
> That particular problem is a big part of the reason that we tend to use
> the aromatic representation of things.
>
>
> For example, consider the following alkenyl-substituted pyridine, there
> are two possible Kekule structures
>
> m1 = 'C=CC1=NC=CC=C1'
> m2 = 'C=CC1N=CC=CC1'
>
>
> Fixing what I assume is a typo for m2, I can do the following:
>
> In [11]: m1 = Chem.MolFromSmiles('C=CC1=NC=CC=C1')
>
> In [12]: m2 = Chem.MolFromSmiles('C=CC1N=CC=CC=1')
>
> In [13]: q1 = Chem.MolFromSmarts('')
>
> In [14]: q2 = Chem.MolFromSmarts('cccn')
>
> In [15]: list(m1.GetSubstructMatch(q1))
> Out[15]: [2, 7, 6, 5]
>
> In [16]: list(m1.GetSubstructMatch(q2))
> Out[16]: [6, 5, 4, 3]
>
> In [17]: list(m2.GetSubstructMatch(q1))
> Out[17]: [2, 7, 6, 5]
>
> In [18]: list(m2.GetSubstructMatch(q2))
> Out[18]: [6, 5, 4, 3]
>
>
> Those particular queries were going for the aromatic species and will only
> match inside the ring, but if you want to be more generic you could tune
> your queries like this:
>
> In [28]: q3 = Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])
> ]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]-=,:[*])]')
>
> In [29]: q4 = Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])
> ]-,=,:[#6;$([#6]=,:[*])]-,=,:[#7;$([#7]-=,:[*])]')
>
> In [30]: list(m1.GetSubstructMatch(q3))
> Out[30]: [0, 1, 2, 7]
>
> In [31]: list(m1.GetSubstructMatch(q4))
> Out[31]: [0, 1, 2, 3]
>
> In [32]: list(m2.GetSubstructMatch(q3))
> Out[32]: [0, 1, 2, 7]
>
> In [33]: list(m2.GetSubstructMatch(q4))
> Out[33]: [0, 1, 2, 3]
>
> If you aren't familiar with recursive SMARTS, this construct:
> "[#6;$([#6]=,:[*])]" means "a carbon that has either a double bond or an
> aromatic bond to another atom".  So you can interpret q3 as "four carbons
> that each have either a double or aromatic bond and that are connected to
> each other by single, double, or aromatic bonds".
>
> Is this starting to approximate what you're looking for?
> -greg
>
>
>
>
> Now consider two SMARTS
>
> pattern1 = '[C]=[C]-[C]={C]
> pattern2 = '[C]=[C]-[C]=[N]'
>
> I need to be able to detect the existence of each pattern in the
> molecule
>
> If m1 is the only available generated Kekule structure, then pattern2
> will be recognized.
> If m2 is the only available generated Kekule  structure, then pattern1
> will be recognized.
>
> Hence, I am getting different answers for the same input molecule just
> because
> it was drawn in different Kekule structures.
>
> Regards,
> Jim Metz
>
>
>
>
>
> -Original Message-
> From: Greg Landrum < greg.land...@gmail.com>
> To: James T. Metz < jamestm...@aol.com>
> Cc: RDKit Discuss 
> Sent: Mon, Sep 11, 2017 10:31 am
> Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures
>
> Hi Jim,
>
> The code currently has no way to enumerate Kekule structures. I don't
> recall this coming up in the past and, to be honest, it doesn't seem all
> that generally useful.
>
> 

Re: [Rdkit-discuss] how to output multiple Kekule structures

2017-09-11 Thread James T. Metz via Rdkit-discuss
Paolo,


Exactly what I was looking for.  Very helpful.  Thank you.

Regards,
Jim Metz





-Original Message-
From: Paolo Tosco 
To: James T. Metz ; greg.landrum ; 
rdkit-discuss 
Sent: Mon, Sep 11, 2017 2:53 pm
Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures


Hi Jim,

you can indeed enumerate all Kekulè structures for a molecule withinthe 
RDKit using Chem.ResonanceMolSupplier():


  

  
  

  


  

  

  
from rdkit import Chem


  

  


  
  


  

  

  
mol = Chem.MolFromSmiles('c1c1')


  

  


  
  


  

  

  
suppl = Chem.ResonanceMolSupplier(mol, Chem.KEKULE_ALL)


  

  


  
  


  

  

len(suppl)

  

  


  
  


  

  

  
2

  

  


  
  


  

  

for i in range(len(suppl)):
print (Chem.MolToSmiles(suppl[i], kekuleSmiles=True))

  

  


  
  


  

  

  
C1C=CC=CC=1
C1=CC=CC=C1


  

  


  
  


  

  

  

  
 


  

  

  

Best,
Paolo


On 09/11/2017 05:22 PM, James T. Metz  via Rdkit-discuss wrote:


Greg,



Thanks!  Yes, very helpful.  I will need to digest the  detailed 
information

you have provided.  I am somewhat familiar with recursive  SMARTS.  
Thanks

again.




Regards,

Jim Metz


  
  
  
  
-OriginalMessage-
From: Greg Landrum 
To: James T. Metz 
Cc: RDKit Discuss
Sent: Mon, Sep 11, 2017 11:15 am
Subject: Re: [Rdkit-discuss] how to output multiple Kekule  
  structures


  

  

  
  

On Mon, Sep 11,  2017 at 5:55 PM, James T. Metz 
  wrote:
  
Greg,  

  
  
I need to be able to use SMARTSpatterns to 
identify substructures inmolecules
  
that can be aromatic, and I need to beable to 
handle cases where there can be
  
differences in the way that the moleculewas entered 
or drawn by a user.

  

  
  

That particular  problem is a big part of the reason 
that we  tend to use the aromatic representation of 
 things.

 

  
  
  
  
  
For example, consider the following
alkenyl-substituted pyridine, there
  
are two possible Kekule structures
  

  
  
m1 = 'C=CC1=NC=CC=C1'
  
m2 = 'C=CC1N=CC=CC1'

  

  
  
Fixing what I assume is a typo for m2, I cando the 
following:
  

  
  
In [11]: m1 =Chem.MolFromSmiles('C=CC1=NC=CC=C1')
  

  
  
In [12]: m2 =Chem.MolFromSmiles('C=CC1N=CC=CC=1')
  

  
  
In [13]: q1 = Chem.MolFromSmarts('')
  

  
  
In [14]: q2 = Chem.MolFromSmarts('cccn')

Re: [Rdkit-discuss] how to output multiple Kekule structures

2017-09-11 Thread Paolo Tosco

Hi Jim,

you can indeed enumerate all Kekulè structures for a molecule within the 
RDKit using Chem.ResonanceMolSupplier():


from  rdkit  import  Chem

mol  =  Chem.MolFromSmiles('c1c1')

suppl  =  Chem.ResonanceMolSupplier(mol,  Chem.KEKULE_ALL)

len(suppl)

2

for  i  in  range(len(suppl)):
print  (Chem.MolToSmiles(suppl[i],  kekuleSmiles=True))

C1C=CC=CC=1
C1=CC=CC=C1

 


Best,
Paolo

On 09/11/2017 05:22 PM, James T. Metz via Rdkit-discuss wrote:

Greg,

Thanks!  Yes, very helpful.  I will need to digest the detailed 
information

you have provided.  I am somewhat familiar with recursive SMARTS.  Thanks
again.

Regards,
Jim Metz




-Original Message-
From: Greg Landrum 
To: James T. Metz 
Cc: RDKit Discuss 
Sent: Mon, Sep 11, 2017 11:15 am
Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures


On Mon, Sep 11, 2017 at 5:55 PM, James T. Metz > wrote:


Greg,

I need to be able to use SMARTS patterns to identify
substructures in molecules
that can be aromatic, and I need to be able to handle cases where
there can be
differences in the way that the molecule was entered or drawn by a
user.


That particular problem is a big part of the reason that we tend to 
use the aromatic representation of things.


For example, consider the following alkenyl-substituted
pyridine, there
are two possible Kekule structures

m1 = 'C=CC1=NC=CC=C1'
m2 = 'C=CC1N=CC=CC1'


Fixing what I assume is a typo for m2, I can do the following:

In [11]: m1 = Chem.MolFromSmiles('C=CC1=NC=CC=C1')

In [12]: m2 = Chem.MolFromSmiles('C=CC1N=CC=CC=1')

In [13]: q1 = Chem.MolFromSmarts('')

In [14]: q2 = Chem.MolFromSmarts('cccn')

In [15]: list(m1.GetSubstructMatch(q1))
Out[15]: [2, 7, 6, 5]

In [16]: list(m1.GetSubstructMatch(q2))
Out[16]: [6, 5, 4, 3]

In [17]: list(m2.GetSubstructMatch(q1))
Out[17]: [2, 7, 6, 5]

In [18]: list(m2.GetSubstructMatch(q2))
Out[18]: [6, 5, 4, 3]

Those particular queries were going for the aromatic species and will 
only match inside the ring, but if you want to be more generic you 
could tune your queries like this:


In [28]: q3 = 
Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]-=,:[*])]')


In [29]: q4 = 
Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#7;$([#7]-=,:[*])]')


In [30]: list(m1.GetSubstructMatch(q3))
Out[30]: [0, 1, 2, 7]

In [31]: list(m1.GetSubstructMatch(q4))
Out[31]: [0, 1, 2, 3]

In [32]: list(m2.GetSubstructMatch(q3))
Out[32]: [0, 1, 2, 7]

In [33]: list(m2.GetSubstructMatch(q4))
Out[33]: [0, 1, 2, 3]

If you aren't familiar with recursive SMARTS, this construct: 
"[#6;$([#6]=,:[*])]" means "a carbon that has either a double bond or 
an aromatic bond to another atom".  So you can interpret q3 as "four 
carbons that each have either a double or aromatic bond and that are 
connected to each other by single, double, or aromatic bonds".


Is this starting to approximate what you're looking for?
-greg




Now consider two SMARTS

pattern1 = '[C]=[C]-[C]={C]
pattern2 = '[C]=[C]-[C]=[N]'

I need to be able to detect the existence of each pattern in
the molecule

If m1 is the only available generated Kekule structure, then
pattern2 will be recognized.
If m2 is the only available generated Kekule  structure, then
pattern1 will be recognized.

Hence, I am getting different answers for the same input
molecule just because
it was drawn in different Kekule structures.

Regards,
Jim Metz





-Original Message-
From: Greg Landrum >
To: James T. Metz >
Cc: RDKit Discuss >
Sent: Mon, Sep 11, 2017 10:31 am
Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures

Hi Jim,

The code currently has no way to enumerate Kekule structures. I
don't recall this coming up in the past and, to be honest, it
doesn't seem all that generally useful.

Perhaps there's an alternate way to solve the problem; what are
you trying to do?

-greg


On Mon, Sep 11, 2017 at 5:04 PM, James T. Metz via Rdkit-discuss
 wrote:

Hello,

Suppose I read in an aromatic SMILES e.g., for benzene

c1c1

I would like to generate the major canonical resonance forms
and save the results as two separate molecules.  Essentially
I am trying to generate

m1 = 'C1=CC=CC-C1'
m2 = 'C1C=CC=CC1'

Can this be done in RDkit?  I 

Re: [Rdkit-discuss] how to output multiple Kekule structures

2017-09-11 Thread James T. Metz via Rdkit-discuss
Greg,


Thanks!  Yes, very helpful.  I will need to digest the detailed information
you have provided.  I am somewhat familiar with recursive SMARTS.  Thanks
again.


Regards,
Jim Metz





-Original Message-
From: Greg Landrum 
To: James T. Metz 
Cc: RDKit Discuss 
Sent: Mon, Sep 11, 2017 11:15 am
Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures






On Mon, Sep 11, 2017 at 5:55 PM, James T. Metz  wrote:

Greg,


I need to be able to use SMARTS patterns to identify substructures in 
molecules
that can be aromatic, and I need to be able to handle cases where there can be
differences in the way that the molecule was entered or drawn by a user.




That particular problem is a big part of the reason that we tend to use the 
aromatic representation of things.
 



For example, consider the following alkenyl-substituted pyridine, there
are two possible Kekule structures


m1 = 'C=CC1=NC=CC=C1'
m2 = 'C=CC1N=CC=CC1'



Fixing what I assume is a typo for m2, I can do the following:


In [11]: m1 = Chem.MolFromSmiles('C=CC1=NC=CC=C1')


In [12]: m2 = Chem.MolFromSmiles('C=CC1N=CC=CC=1')


In [13]: q1 = Chem.MolFromSmarts('')


In [14]: q2 = Chem.MolFromSmarts('cccn')


In [15]: list(m1.GetSubstructMatch(q1))
Out[15]: [2, 7, 6, 5]


In [16]: list(m1.GetSubstructMatch(q2))
Out[16]: [6, 5, 4, 3]


In [17]: list(m2.GetSubstructMatch(q1))
Out[17]: [2, 7, 6, 5]


In [18]: list(m2.GetSubstructMatch(q2))
Out[18]: [6, 5, 4, 3]
 


Those particular queries were going for the aromatic species and will only 
match inside the ring, but if you want to be more generic you could tune your 
queries like this:




In [28]: q3 = 
Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]-=,:[*])]')


In [29]: q4 = 
Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#7;$([#7]-=,:[*])]')


In [30]: list(m1.GetSubstructMatch(q3))
Out[30]: [0, 1, 2, 7]


In [31]: list(m1.GetSubstructMatch(q4))
Out[31]: [0, 1, 2, 3]


In [32]: list(m2.GetSubstructMatch(q3))
Out[32]: [0, 1, 2, 7]


In [33]: list(m2.GetSubstructMatch(q4))
Out[33]: [0, 1, 2, 3]



If you aren't familiar with recursive SMARTS, this construct: 
"[#6;$([#6]=,:[*])]" means "a carbon that has either a double bond or an 
aromatic bond to another atom".  So you can interpret q3 as "four carbons that 
each have either a double or aromatic bond and that are connected to each other 
by single, double, or aromatic bonds".


Is this starting to approximate what you're looking for?
-greg










Now consider two SMARTS


pattern1 = '[C]=[C]-[C]={C]

pattern2 = '[C]=[C]-[C]=[N]'



I need to be able to detect the existence of each pattern in the molecule



If m1 is the only available generated Kekule structure, then pattern2 will 
be recognized.

If m2 is the only available generated Kekule  structure, then pattern1 will 
be recognized.



Hence, I am getting different answers for the same input molecule just 
because

it was drawn in different Kekule structures.


Regards,

Jim Metz










-Original Message-
From: Greg Landrum 
To: James T. Metz 
Cc: RDKit Discuss 
Sent: Mon, Sep 11, 2017 10:31 am
Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures



Hi Jim,


The code currently has no way to enumerate Kekule structures. I don't recall 
this coming up in the past and, to be honest, it doesn't seem all that 
generally useful. 


Perhaps there's an alternate way to solve the problem; what are you trying to 
do?


-greg





On Mon, Sep 11, 2017 at 5:04 PM, James T. Metz via Rdkit-discuss 
 wrote:

Hello,


Suppose I read in an aromatic SMILES e.g., for benzene



c1c1



I would like to generate the major canonical resonance forms

and save the results as two separate molecules.  Essentially
I am trying to generate


m1 = 'C1=CC=CC-C1'

m2 = 'C1C=CC=CC1'



Can this be done in RDkit?  I have found a KEKULE_ALL 

option in the detailed documentation which seems to be what I
am trying to do, but I don't understand how this option is to be used,
or the proper syntax.


If it is necessary to somehow renumber the atoms and re-generate

Kekule structures, that is OK.  Thank you.


Regards,

Jim Metz














--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss













Re: [Rdkit-discuss] how to output multiple Kekule structures

2017-09-11 Thread Greg Landrum
On Mon, Sep 11, 2017 at 5:55 PM, James T. Metz  wrote:

> Greg,
>
> I need to be able to use SMARTS patterns to identify substructures in
> molecules
> that can be aromatic, and I need to be able to handle cases where there
> can be
> differences in the way that the molecule was entered or drawn by a user.
>

That particular problem is a big part of the reason that we tend to use the
aromatic representation of things.


> For example, consider the following alkenyl-substituted pyridine, there
> are two possible Kekule structures
>
> m1 = 'C=CC1=NC=CC=C1'
> m2 = 'C=CC1N=CC=CC1'
>

Fixing what I assume is a typo for m2, I can do the following:

In [11]: m1 = Chem.MolFromSmiles('C=CC1=NC=CC=C1')

In [12]: m2 = Chem.MolFromSmiles('C=CC1N=CC=CC=1')

In [13]: q1 = Chem.MolFromSmarts('')

In [14]: q2 = Chem.MolFromSmarts('cccn')

In [15]: list(m1.GetSubstructMatch(q1))
Out[15]: [2, 7, 6, 5]

In [16]: list(m1.GetSubstructMatch(q2))
Out[16]: [6, 5, 4, 3]

In [17]: list(m2.GetSubstructMatch(q1))
Out[17]: [2, 7, 6, 5]

In [18]: list(m2.GetSubstructMatch(q2))
Out[18]: [6, 5, 4, 3]


Those particular queries were going for the aromatic species and will only
match inside the ring, but if you want to be more generic you could tune
your queries like this:

In [28]: q3 =
Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]-=,:[*])]')

In [29]: q4 =
Chem.MolFromSmarts('[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#6;$([#6]=,:[*])]-,=,:[#7;$([#7]-=,:[*])]')

In [30]: list(m1.GetSubstructMatch(q3))
Out[30]: [0, 1, 2, 7]

In [31]: list(m1.GetSubstructMatch(q4))
Out[31]: [0, 1, 2, 3]

In [32]: list(m2.GetSubstructMatch(q3))
Out[32]: [0, 1, 2, 7]

In [33]: list(m2.GetSubstructMatch(q4))
Out[33]: [0, 1, 2, 3]

If you aren't familiar with recursive SMARTS, this construct:
"[#6;$([#6]=,:[*])]" means "a carbon that has either a double bond or an
aromatic bond to another atom".  So you can interpret q3 as "four carbons
that each have either a double or aromatic bond and that are connected to
each other by single, double, or aromatic bonds".

Is this starting to approximate what you're looking for?
-greg




Now consider two SMARTS
>
> pattern1 = '[C]=[C]-[C]={C]
> pattern2 = '[C]=[C]-[C]=[N]'
>
> I need to be able to detect the existence of each pattern in the
> molecule
>
> If m1 is the only available generated Kekule structure, then pattern2
> will be recognized.
> If m2 is the only available generated Kekule  structure, then pattern1
> will be recognized.
>
> Hence, I am getting different answers for the same input molecule just
> because
> it was drawn in different Kekule structures.
>
> Regards,
> Jim Metz
>
>
>
>
>
> -Original Message-
> From: Greg Landrum 
> To: James T. Metz 
> Cc: RDKit Discuss 
> Sent: Mon, Sep 11, 2017 10:31 am
> Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures
>
> Hi Jim,
>
> The code currently has no way to enumerate Kekule structures. I don't
> recall this coming up in the past and, to be honest, it doesn't seem all
> that generally useful.
>
> Perhaps there's an alternate way to solve the problem; what are you trying
> to do?
>
> -greg
>
>
> On Mon, Sep 11, 2017 at 5:04 PM, James T. Metz via Rdkit-discuss <
> rdkit-discuss@lists.sourceforge.net> wrote:
>
> Hello,
>
> Suppose I read in an aromatic SMILES e.g., for benzene
>
> c1c1
>
> I would like to generate the major canonical resonance forms
> and save the results as two separate molecules.  Essentially
> I am trying to generate
>
> m1 = 'C1=CC=CC-C1'
> m2 = 'C1C=CC=CC1'
>
> Can this be done in RDkit?  I have found a KEKULE_ALL
> option in the detailed documentation which seems to be what I
> am trying to do, but I don't understand how this option is to be used,
> or the proper syntax.
>
> If it is necessary to somehow renumber the atoms and re-generate
> Kekule structures, that is OK.  Thank you.
>
> Regards,
> Jim Metz
>
>
>
>
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] how to output multiple Kekule structures

2017-09-11 Thread James T. Metz via Rdkit-discuss
Greg,


I need to be able to use SMARTS patterns to identify substructures in 
molecules
that can be aromatic, and I need to be able to handle cases where there can be
differences in the way that the molecule was entered or drawn by a user.


For example, consider the following alkenyl-substituted pyridine, there
are two possible Kekule structures


m1 = 'C=CC1=NC=CC=C1'
m2 = 'C=CC1N=CC=CC1'


Now consider two SMARTS


pattern1 = '[C]=[C]-[C]={C]

pattern2 = '[C]=[C]-[C]=[N]'



I need to be able to detect the existence of each pattern in the molecule



If m1 is the only available generated Kekule structure, then pattern2 will 
be recognized.

If m2 is the only available generated Kekule  structure, then pattern1 will 
be recognized.



Hence, I am getting different answers for the same input molecule just 
because

it was drawn in different Kekule structures.


Regards,

Jim Metz









-Original Message-
From: Greg Landrum 
To: James T. Metz 
Cc: RDKit Discuss 
Sent: Mon, Sep 11, 2017 10:31 am
Subject: Re: [Rdkit-discuss] how to output multiple Kekule structures



Hi Jim,


The code currently has no way to enumerate Kekule structures. I don't recall 
this coming up in the past and, to be honest, it doesn't seem all that 
generally useful. 


Perhaps there's an alternate way to solve the problem; what are you trying to 
do?


-greg





On Mon, Sep 11, 2017 at 5:04 PM, James T. Metz via Rdkit-discuss 
 wrote:

Hello,


Suppose I read in an aromatic SMILES e.g., for benzene



c1c1



I would like to generate the major canonical resonance forms

and save the results as two separate molecules.  Essentially
I am trying to generate


m1 = 'C1=CC=CC-C1'

m2 = 'C1C=CC=CC1'



Can this be done in RDkit?  I have found a KEKULE_ALL 

option in the detailed documentation which seems to be what I
am trying to do, but I don't understand how this option is to be used,
or the proper syntax.


If it is necessary to somehow renumber the atoms and re-generate

Kekule structures, that is OK.  Thank you.


Regards,

Jim Metz














--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss






--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] how to output multiple Kekule structures

2017-09-11 Thread Greg Landrum
Hi Jim,

The code currently has no way to enumerate Kekule structures. I don't
recall this coming up in the past and, to be honest, it doesn't seem all
that generally useful.

Perhaps there's an alternate way to solve the problem; what are you trying
to do?

-greg


On Mon, Sep 11, 2017 at 5:04 PM, James T. Metz via Rdkit-discuss <
rdkit-discuss@lists.sourceforge.net> wrote:

> Hello,
>
> Suppose I read in an aromatic SMILES e.g., for benzene
>
> c1c1
>
> I would like to generate the major canonical resonance forms
> and save the results as two separate molecules.  Essentially
> I am trying to generate
>
> m1 = 'C1=CC=CC-C1'
> m2 = 'C1C=CC=CC1'
>
> Can this be done in RDkit?  I have found a KEKULE_ALL
> option in the detailed documentation which seems to be what I
> am trying to do, but I don't understand how this option is to be used,
> or the proper syntax.
>
> If it is necessary to somehow renumber the atoms and re-generate
> Kekule structures, that is OK.  Thank you.
>
> Regards,
> Jim Metz
>
>
>
>
>
>
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] how to output multiple Kekule structures

2017-09-11 Thread James T. Metz via Rdkit-discuss
Hello,


Suppose I read in an aromatic SMILES e.g., for benzene



c1c1



I would like to generate the major canonical resonance forms

and save the results as two separate molecules.  Essentially
I am trying to generate


m1 = 'C1=CC=CC-C1'

m2 = 'C1C=CC=CC1'



Can this be done in RDkit?  I have found a KEKULE_ALL 

option in the detailed documentation which seems to be what I
am trying to do, but I don't understand how this option is to be used,
or the proper syntax.


If it is necessary to somehow renumber the atoms and re-generate

Kekule structures, that is OK.  Thank you.


Regards,

Jim Metz













--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] GetConformerRMS() vs GetBestRMS()

2017-09-11 Thread Udvarhelyi, Aniko
Thanks Greg for this detailed answer.
Yes, indeed this helps.

From: Greg Landrum [mailto:greg.land...@gmail.com]
Sent: Freitag, 8. September 2017 12:10
To: Udvarhelyi, Aniko
Cc: rdkit-discuss@lists.sourceforge.net
Subject: Re: [Rdkit-discuss] GetConformerRMS() vs GetBestRMS()

Hi Anikó,

Both functions do an alignment. The big difference here is coming because 
GetBestRMS() looks at all 2D-identical alignments of the molecules to each 
other while GetConformerRMS() only does the alignment once: using the atom 
numbers.

Practically speaking what does that mean for your molecule?

Here's a 2D sketch without the Hs:
[Inline image 1]

By 2D symmetry atoms 8 and 9 are equivalent as are atoms 4 and 5.

So there are four possible 2D isomorphisms between those molecules :
8->8, 9->9, 4->4, 5->5  (all others the same)
8->9, 9->8, 4->4, 5->5  (all others the same)
8->8, 9->9, 4->5, 5->4  (all others the same)
8->9, 9->8, 4->5, 5->4  (all others the same)

GetBestRMS() does alignments for all of these and takes the one that provides 
the lowest RMS value.
GetConformerRMS() only does the first alignment and uses that RMS.

In general you want to always use GetBestRMS() for symmetric molecules.

Does that help?
-greg
p.s. Adding the Hs leads to additional mappings which just makes the overall 
problem worse.




On Fri, Sep 8, 2017 at 9:26 AM, Udvarhelyi, Aniko 
> wrote:
Dear All,

I would like to compute RMS values between conformers of the same molecule that 
are not aligned. Unfortunately, I can´t get along very well with the 
GetConformerRMS() function, it gives far too high RMS values even for 
conformers that are clearly (near-)identical as judged by visual inspection 
after alignment. I attach one example of 2 conformers of a molecule, that are 
near-identical.
GetConformerRMS() returns an RMS value of 1.32 (with Hydrogens) and 0.70 
(disregarding Hydrogens).
GetBestRMS() returns an RMS value of 0.03 (with Hydrogens) and 0.02 
(disregarding Hydrogens).

Clearly, the GetBestRMS() result is the one I´d expect (I am interested in the 
all-atom RMSDs with Hydrogens). I guess GetConformerRMS() cannot align the two 
conformers properly hence the high RMS value. My question is why not? The atom 
ordering and all bonds are exactly the same in both conformers. Why do I need 
the GetBestRMS() alignment of all possible permutations of matching atom orders 
in both conformers to get the alignment correct? I would like to avoid using 
GetBestRMS()as it is far too slow for my purposes (processing many molecules 
with many conformers).

Many thanks for any hints,
Anikó

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss