Re: [Rdkit-discuss] Nitrogen Valence

2017-05-11 Thread Peter S. Shenkin
Hi,

If the compound is neutral overall and there is a single H where you drew
it, then a valid RDKit SMILES for the nitrogen-containing terminal group is
C[N+](C)(C)[NH-], which is one of the forms I gave earlier.

It is not a zwitterion. Rather, it represents a dative bond. (I am not sure
that all [X+][Y-] bonds are dative bonds, but my guess is that they are.)

Attached are SMILES for some well known nitrogen compounds with adjacent +
and - charges. including nitromethane (lower left). All have single-bonded
"ion pairs", but none are zwitterions. Sorry the drawing (from Slack) is so
small.

Carbon monoxide, [C-]#[0+]. The version of RDKit now hooked up to Slack
can't draw it, but I believe that's due to a known bug that also keeps it
from drawing ethane, CC.

Best,
-P.


On Thu, May 11, 2017 at 1:45 PM, Yuran Wang  wrote:

> Hi Peter,
> Thank you for your reply. I did not quite understand what you mean by 'But
> this makes no sense'.
> Also the SMILES you tested are zwitterionic form. In this link
> http://www.rdkit.org/docs/RDKit_Book.html#molecular-sanitization, the
> zwitterionic form seems suitable for N=O, N#N, not for N=N. But I may just
> have a very limited knowledge of RDkit.
>
> This is how it looks like in ChemDraw:
> [image: Inline image 1]
>
>
> Thanks,
> Yuran
>
> On Thu, May 11, 2017 at 1:33 PM, Peter S. Shenkin 
> wrote:
>
>> The problematic part is just the beginning of your would-be SMILES:
>> N=N(C)(C)C. The rest is correctly parsed. But this makes no sense. Perhaps
>> you mean one of the substructures illustrated in the attached (which at
>> least satisfy normal valence rules). If not, perhaps you could attach a
>> structural diagram of what you do mean.
>>
>> -P.
>>
>>
>> On Thu, May 11, 2017 at 11:02 AM, Yuran Wang 
>> wrote:
>>
>>> Dear Greg,
>>> Thank you very much for the suggestions. It works for me!
>>> Here is the SMILES of one molecule that I am looking
>>> at: N=N(C)(C)CC(CN1N=CN=C1)(O)C2=C(C=C(C=C2)F)F
>>> Any better alternative will be appreciated.
>>>
>>> Thanks,
>>> Yuran
>>>
>>> On Thu, May 11, 2017 at 10:49 AM, Greg Landrum 
>>> wrote:
>>>


 On Thu, May 11, 2017 at 4:24 PM, Yuran Wang 
 wrote:

> I have a question regarding the available valence of Nitrogen. It
> seems only 3 is available in the default setting (atomic_data.cpp). Why is
> it kept to only 3, and not extended to include 4 and 5? If I change it
> locally to include 4 and 5, will it cause any problems?
>

 Aside from generating molecules that don't make any chemical sense?
 Probably not, but the lack of chemical sense may cause some unexpected
 behavior.


> I am aware that I could turn off the sanitization to get a mol object,
> however, it cannot be further processed to get fingerprints, which is what
> I need.
>

 Well, you could turn off the sanitization on molecule construction and
 then manually sanitize with the valence check turned off. Here's a simple
 example of that:

 In [11]: m = Chem.MolFromSmiles('CN(C)(C)(C)C',sanitize=False)

 In [12]: m.UpdatePropertyCache(strict=False)

 In [13]: Chem.SanitizeMol(m,Chem.SANITIZE_SYMMRINGS|Chem.SANITIZE_SET
 CONJUGATION|Chem.SANITIZE_SETHYBRIDIZATION)
 Out[13]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE

 In [14]: rdMolDescriptors.GetMorganFingerprint(m,2)
 Out[14]: >>> 0x10b0ab350>


 But, again, the RDKit's valence rules tend to reflect real chemistry.
 What are you trying to represent that you need 5 coordinate neutral
 nitrogen atoms? There may be a better way.

 -greg


>>>
>>>
>>>
>>> --
>>> Best,
>>> Yuran Wang
>>>
>>> 
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> ___
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>>>
>>
>
>
> --
> Best,
> Yuran Wang
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Nitrogen Valence

2017-05-11 Thread Yuran Wang
Hi Peter,
Thank you for your reply. I did not quite understand what you mean by 'But
this makes no sense'.
Also the SMILES you tested are zwitterionic form. In this link
http://www.rdkit.org/docs/RDKit_Book.html#molecular-sanitization, the
zwitterionic form seems suitable for N=O, N#N, not for N=N. But I may just
have a very limited knowledge of RDkit.

This is how it looks like in ChemDraw:
[image: Inline image 1]


Thanks,
Yuran

On Thu, May 11, 2017 at 1:33 PM, Peter S. Shenkin  wrote:

> The problematic part is just the beginning of your would-be SMILES:
> N=N(C)(C)C. The rest is correctly parsed. But this makes no sense. Perhaps
> you mean one of the substructures illustrated in the attached (which at
> least satisfy normal valence rules). If not, perhaps you could attach a
> structural diagram of what you do mean.
>
> -P.
>
>
> On Thu, May 11, 2017 at 11:02 AM, Yuran Wang 
> wrote:
>
>> Dear Greg,
>> Thank you very much for the suggestions. It works for me!
>> Here is the SMILES of one molecule that I am looking
>> at: N=N(C)(C)CC(CN1N=CN=C1)(O)C2=C(C=C(C=C2)F)F
>> Any better alternative will be appreciated.
>>
>> Thanks,
>> Yuran
>>
>> On Thu, May 11, 2017 at 10:49 AM, Greg Landrum 
>> wrote:
>>
>>>
>>>
>>> On Thu, May 11, 2017 at 4:24 PM, Yuran Wang 
>>> wrote:
>>>
 I have a question regarding the available valence of Nitrogen. It seems
 only 3 is available in the default setting (atomic_data.cpp). Why is it
 kept to only 3, and not extended to include 4 and 5? If I change it locally
 to include 4 and 5, will it cause any problems?

>>>
>>> Aside from generating molecules that don't make any chemical sense?
>>> Probably not, but the lack of chemical sense may cause some unexpected
>>> behavior.
>>>
>>>
 I am aware that I could turn off the sanitization to get a mol object,
 however, it cannot be further processed to get fingerprints, which is what
 I need.

>>>
>>> Well, you could turn off the sanitization on molecule construction and
>>> then manually sanitize with the valence check turned off. Here's a simple
>>> example of that:
>>>
>>> In [11]: m = Chem.MolFromSmiles('CN(C)(C)(C)C',sanitize=False)
>>>
>>> In [12]: m.UpdatePropertyCache(strict=False)
>>>
>>> In [13]: Chem.SanitizeMol(m,Chem.SANITIZE_SYMMRINGS|Chem.SANITIZE_SET
>>> CONJUGATION|Chem.SANITIZE_SETHYBRIDIZATION)
>>> Out[13]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE
>>>
>>> In [14]: rdMolDescriptors.GetMorganFingerprint(m,2)
>>> Out[14]: >> 0x10b0ab350>
>>>
>>>
>>> But, again, the RDKit's valence rules tend to reflect real chemistry.
>>> What are you trying to represent that you need 5 coordinate neutral
>>> nitrogen atoms? There may be a better way.
>>>
>>> -greg
>>>
>>>
>>
>>
>>
>> --
>> Best,
>> Yuran Wang
>>
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>


-- 
Best,
Yuran Wang
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Nitrogen Valence

2017-05-11 Thread Peter S. Shenkin
The problematic part is just the beginning of your would-be SMILES:
N=N(C)(C)C. The rest is correctly parsed. But this makes no sense. Perhaps
you mean one of the substructures illustrated in the attached (which at
least satisfy normal valence rules). If not, perhaps you could attach a
structural diagram of what you do mean.

-P.


On Thu, May 11, 2017 at 11:02 AM, Yuran Wang  wrote:

> Dear Greg,
> Thank you very much for the suggestions. It works for me!
> Here is the SMILES of one molecule that I am looking
> at: N=N(C)(C)CC(CN1N=CN=C1)(O)C2=C(C=C(C=C2)F)F
> Any better alternative will be appreciated.
>
> Thanks,
> Yuran
>
> On Thu, May 11, 2017 at 10:49 AM, Greg Landrum 
> wrote:
>
>>
>>
>> On Thu, May 11, 2017 at 4:24 PM, Yuran Wang 
>> wrote:
>>
>>> I have a question regarding the available valence of Nitrogen. It seems
>>> only 3 is available in the default setting (atomic_data.cpp). Why is it
>>> kept to only 3, and not extended to include 4 and 5? If I change it locally
>>> to include 4 and 5, will it cause any problems?
>>>
>>
>> Aside from generating molecules that don't make any chemical sense?
>> Probably not, but the lack of chemical sense may cause some unexpected
>> behavior.
>>
>>
>>> I am aware that I could turn off the sanitization to get a mol object,
>>> however, it cannot be further processed to get fingerprints, which is what
>>> I need.
>>>
>>
>> Well, you could turn off the sanitization on molecule construction and
>> then manually sanitize with the valence check turned off. Here's a simple
>> example of that:
>>
>> In [11]: m = Chem.MolFromSmiles('CN(C)(C)(C)C',sanitize=False)
>>
>> In [12]: m.UpdatePropertyCache(strict=False)
>>
>> In [13]: Chem.SanitizeMol(m,Chem.SANITIZE_SYMMRINGS|Chem.SANITIZE_
>> SETCONJUGATION|Chem.SANITIZE_SETHYBRIDIZATION)
>> Out[13]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE
>>
>> In [14]: rdMolDescriptors.GetMorganFingerprint(m,2)
>> Out[14]: > 0x10b0ab350>
>>
>>
>> But, again, the RDKit's valence rules tend to reflect real chemistry.
>> What are you trying to represent that you need 5 coordinate neutral
>> nitrogen atoms? There may be a better way.
>>
>> -greg
>>
>>
>
>
>
> --
> Best,
> Yuran Wang
>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Nitrogen Valence

2017-05-11 Thread Yuran Wang
Dear Greg,
Thank you very much for the suggestions. It works for me!
Here is the SMILES of one molecule that I am looking
at: N=N(C)(C)CC(CN1N=CN=C1)(O)C2=C(C=C(C=C2)F)F
Any better alternative will be appreciated.

Thanks,
Yuran

On Thu, May 11, 2017 at 10:49 AM, Greg Landrum 
wrote:

>
>
> On Thu, May 11, 2017 at 4:24 PM, Yuran Wang 
> wrote:
>
>> I have a question regarding the available valence of Nitrogen. It seems
>> only 3 is available in the default setting (atomic_data.cpp). Why is it
>> kept to only 3, and not extended to include 4 and 5? If I change it locally
>> to include 4 and 5, will it cause any problems?
>>
>
> Aside from generating molecules that don't make any chemical sense?
> Probably not, but the lack of chemical sense may cause some unexpected
> behavior.
>
>
>> I am aware that I could turn off the sanitization to get a mol object,
>> however, it cannot be further processed to get fingerprints, which is what
>> I need.
>>
>
> Well, you could turn off the sanitization on molecule construction and
> then manually sanitize with the valence check turned off. Here's a simple
> example of that:
>
> In [11]: m = Chem.MolFromSmiles('CN(C)(C)(C)C',sanitize=False)
>
> In [12]: m.UpdatePropertyCache(strict=False)
>
> In [13]: Chem.SanitizeMol(m,Chem.SANITIZE_SYMMRINGS|Chem.
> SANITIZE_SETCONJUGATION|Chem.SANITIZE_SETHYBRIDIZATION)
> Out[13]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE
>
> In [14]: rdMolDescriptors.GetMorganFingerprint(m,2)
> Out[14]: 
>
>
> But, again, the RDKit's valence rules tend to reflect real chemistry. What
> are you trying to represent that you need 5 coordinate neutral nitrogen
> atoms? There may be a better way.
>
> -greg
>
>



-- 
Best,
Yuran Wang
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Nitrogen Valence

2017-05-11 Thread Greg Landrum
On Thu, May 11, 2017 at 4:24 PM, Yuran Wang  wrote:

> I have a question regarding the available valence of Nitrogen. It seems
> only 3 is available in the default setting (atomic_data.cpp). Why is it
> kept to only 3, and not extended to include 4 and 5? If I change it locally
> to include 4 and 5, will it cause any problems?
>

Aside from generating molecules that don't make any chemical sense?
Probably not, but the lack of chemical sense may cause some unexpected
behavior.


> I am aware that I could turn off the sanitization to get a mol object,
> however, it cannot be further processed to get fingerprints, which is what
> I need.
>

Well, you could turn off the sanitization on molecule construction and then
manually sanitize with the valence check turned off. Here's a simple
example of that:

In [11]: m = Chem.MolFromSmiles('CN(C)(C)(C)C',sanitize=False)

In [12]: m.UpdatePropertyCache(strict=False)

In [13]:
Chem.SanitizeMol(m,Chem.SANITIZE_SYMMRINGS|Chem.SANITIZE_SETCONJUGATION|Chem.SANITIZE_SETHYBRIDIZATION)
Out[13]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE

In [14]: rdMolDescriptors.GetMorganFingerprint(m,2)
Out[14]: 


But, again, the RDKit's valence rules tend to reflect real chemistry. What
are you trying to represent that you need 5 coordinate neutral nitrogen
atoms? There may be a better way.

-greg
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Nitrogen Valence

2017-05-11 Thread Yuran Wang
Hey,
I have a question regarding the available valence of Nitrogen. It seems
only 3 is available in the default setting (atomic_data.cpp). Why is it
kept to only 3, and not extended to include 4 and 5? If I change it locally
to include 4 and 5, will it cause any problems?

I am aware that I could turn off the sanitization to get a mol object,
however, it cannot be further processed to get fingerprints, which is what
I need.

Thanks,
Yuran
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss