Re: [Rdkit-discuss] Nitrogen Valence
Hi, If the compound is neutral overall and there is a single H where you drew it, then a valid RDKit SMILES for the nitrogen-containing terminal group is C[N+](C)(C)[NH-], which is one of the forms I gave earlier. It is not a zwitterion. Rather, it represents a dative bond. (I am not sure that all [X+][Y-] bonds are dative bonds, but my guess is that they are.) Attached are SMILES for some well known nitrogen compounds with adjacent + and - charges. including nitromethane (lower left). All have single-bonded "ion pairs", but none are zwitterions. Sorry the drawing (from Slack) is so small. Carbon monoxide, [C-]#[0+]. The version of RDKit now hooked up to Slack can't draw it, but I believe that's due to a known bug that also keeps it from drawing ethane, CC. Best, -P. On Thu, May 11, 2017 at 1:45 PM, Yuran Wang wrote: > Hi Peter, > Thank you for your reply. I did not quite understand what you mean by 'But > this makes no sense'. > Also the SMILES you tested are zwitterionic form. In this link > http://www.rdkit.org/docs/RDKit_Book.html#molecular-sanitization, the > zwitterionic form seems suitable for N=O, N#N, not for N=N. But I may just > have a very limited knowledge of RDkit. > > This is how it looks like in ChemDraw: > [image: Inline image 1] > > > Thanks, > Yuran > > On Thu, May 11, 2017 at 1:33 PM, Peter S. Shenkin > wrote: > >> The problematic part is just the beginning of your would-be SMILES: >> N=N(C)(C)C. The rest is correctly parsed. But this makes no sense. Perhaps >> you mean one of the substructures illustrated in the attached (which at >> least satisfy normal valence rules). If not, perhaps you could attach a >> structural diagram of what you do mean. >> >> -P. >> >> >> On Thu, May 11, 2017 at 11:02 AM, Yuran Wang >> wrote: >> >>> Dear Greg, >>> Thank you very much for the suggestions. It works for me! >>> Here is the SMILES of one molecule that I am looking >>> at: N=N(C)(C)CC(CN1N=CN=C1)(O)C2=C(C=C(C=C2)F)F >>> Any better alternative will be appreciated. >>> >>> Thanks, >>> Yuran >>> >>> On Thu, May 11, 2017 at 10:49 AM, Greg Landrum >>> wrote: >>> On Thu, May 11, 2017 at 4:24 PM, Yuran Wang wrote: > I have a question regarding the available valence of Nitrogen. It > seems only 3 is available in the default setting (atomic_data.cpp). Why is > it kept to only 3, and not extended to include 4 and 5? If I change it > locally to include 4 and 5, will it cause any problems? > Aside from generating molecules that don't make any chemical sense? Probably not, but the lack of chemical sense may cause some unexpected behavior. > I am aware that I could turn off the sanitization to get a mol object, > however, it cannot be further processed to get fingerprints, which is what > I need. > Well, you could turn off the sanitization on molecule construction and then manually sanitize with the valence check turned off. Here's a simple example of that: In [11]: m = Chem.MolFromSmiles('CN(C)(C)(C)C',sanitize=False) In [12]: m.UpdatePropertyCache(strict=False) In [13]: Chem.SanitizeMol(m,Chem.SANITIZE_SYMMRINGS|Chem.SANITIZE_SET CONJUGATION|Chem.SANITIZE_SETHYBRIDIZATION) Out[13]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE In [14]: rdMolDescriptors.GetMorganFingerprint(m,2) Out[14]: >>> 0x10b0ab350> But, again, the RDKit's valence rules tend to reflect real chemistry. What are you trying to represent that you need 5 coordinate neutral nitrogen atoms? There may be a better way. -greg >>> >>> >>> >>> -- >>> Best, >>> Yuran Wang >>> >>> >>> -- >>> Check out the vibrant tech community on one of the world's most >>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >>> ___ >>> Rdkit-discuss mailing list >>> Rdkit-discuss@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >>> >> > > > -- > Best, > Yuran Wang > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Nitrogen Valence
Hi Peter, Thank you for your reply. I did not quite understand what you mean by 'But this makes no sense'. Also the SMILES you tested are zwitterionic form. In this link http://www.rdkit.org/docs/RDKit_Book.html#molecular-sanitization, the zwitterionic form seems suitable for N=O, N#N, not for N=N. But I may just have a very limited knowledge of RDkit. This is how it looks like in ChemDraw: [image: Inline image 1] Thanks, Yuran On Thu, May 11, 2017 at 1:33 PM, Peter S. Shenkin wrote: > The problematic part is just the beginning of your would-be SMILES: > N=N(C)(C)C. The rest is correctly parsed. But this makes no sense. Perhaps > you mean one of the substructures illustrated in the attached (which at > least satisfy normal valence rules). If not, perhaps you could attach a > structural diagram of what you do mean. > > -P. > > > On Thu, May 11, 2017 at 11:02 AM, Yuran Wang > wrote: > >> Dear Greg, >> Thank you very much for the suggestions. It works for me! >> Here is the SMILES of one molecule that I am looking >> at: N=N(C)(C)CC(CN1N=CN=C1)(O)C2=C(C=C(C=C2)F)F >> Any better alternative will be appreciated. >> >> Thanks, >> Yuran >> >> On Thu, May 11, 2017 at 10:49 AM, Greg Landrum >> wrote: >> >>> >>> >>> On Thu, May 11, 2017 at 4:24 PM, Yuran Wang >>> wrote: >>> I have a question regarding the available valence of Nitrogen. It seems only 3 is available in the default setting (atomic_data.cpp). Why is it kept to only 3, and not extended to include 4 and 5? If I change it locally to include 4 and 5, will it cause any problems? >>> >>> Aside from generating molecules that don't make any chemical sense? >>> Probably not, but the lack of chemical sense may cause some unexpected >>> behavior. >>> >>> I am aware that I could turn off the sanitization to get a mol object, however, it cannot be further processed to get fingerprints, which is what I need. >>> >>> Well, you could turn off the sanitization on molecule construction and >>> then manually sanitize with the valence check turned off. Here's a simple >>> example of that: >>> >>> In [11]: m = Chem.MolFromSmiles('CN(C)(C)(C)C',sanitize=False) >>> >>> In [12]: m.UpdatePropertyCache(strict=False) >>> >>> In [13]: Chem.SanitizeMol(m,Chem.SANITIZE_SYMMRINGS|Chem.SANITIZE_SET >>> CONJUGATION|Chem.SANITIZE_SETHYBRIDIZATION) >>> Out[13]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE >>> >>> In [14]: rdMolDescriptors.GetMorganFingerprint(m,2) >>> Out[14]: >> 0x10b0ab350> >>> >>> >>> But, again, the RDKit's valence rules tend to reflect real chemistry. >>> What are you trying to represent that you need 5 coordinate neutral >>> nitrogen atoms? There may be a better way. >>> >>> -greg >>> >>> >> >> >> >> -- >> Best, >> Yuran Wang >> >> >> -- >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> ___ >> Rdkit-discuss mailing list >> Rdkit-discuss@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> > -- Best, Yuran Wang -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Nitrogen Valence
The problematic part is just the beginning of your would-be SMILES: N=N(C)(C)C. The rest is correctly parsed. But this makes no sense. Perhaps you mean one of the substructures illustrated in the attached (which at least satisfy normal valence rules). If not, perhaps you could attach a structural diagram of what you do mean. -P. On Thu, May 11, 2017 at 11:02 AM, Yuran Wang wrote: > Dear Greg, > Thank you very much for the suggestions. It works for me! > Here is the SMILES of one molecule that I am looking > at: N=N(C)(C)CC(CN1N=CN=C1)(O)C2=C(C=C(C=C2)F)F > Any better alternative will be appreciated. > > Thanks, > Yuran > > On Thu, May 11, 2017 at 10:49 AM, Greg Landrum > wrote: > >> >> >> On Thu, May 11, 2017 at 4:24 PM, Yuran Wang >> wrote: >> >>> I have a question regarding the available valence of Nitrogen. It seems >>> only 3 is available in the default setting (atomic_data.cpp). Why is it >>> kept to only 3, and not extended to include 4 and 5? If I change it locally >>> to include 4 and 5, will it cause any problems? >>> >> >> Aside from generating molecules that don't make any chemical sense? >> Probably not, but the lack of chemical sense may cause some unexpected >> behavior. >> >> >>> I am aware that I could turn off the sanitization to get a mol object, >>> however, it cannot be further processed to get fingerprints, which is what >>> I need. >>> >> >> Well, you could turn off the sanitization on molecule construction and >> then manually sanitize with the valence check turned off. Here's a simple >> example of that: >> >> In [11]: m = Chem.MolFromSmiles('CN(C)(C)(C)C',sanitize=False) >> >> In [12]: m.UpdatePropertyCache(strict=False) >> >> In [13]: Chem.SanitizeMol(m,Chem.SANITIZE_SYMMRINGS|Chem.SANITIZE_ >> SETCONJUGATION|Chem.SANITIZE_SETHYBRIDIZATION) >> Out[13]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE >> >> In [14]: rdMolDescriptors.GetMorganFingerprint(m,2) >> Out[14]: > 0x10b0ab350> >> >> >> But, again, the RDKit's valence rules tend to reflect real chemistry. >> What are you trying to represent that you need 5 coordinate neutral >> nitrogen atoms? There may be a better way. >> >> -greg >> >> > > > > -- > Best, > Yuran Wang > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Nitrogen Valence
Dear Greg, Thank you very much for the suggestions. It works for me! Here is the SMILES of one molecule that I am looking at: N=N(C)(C)CC(CN1N=CN=C1)(O)C2=C(C=C(C=C2)F)F Any better alternative will be appreciated. Thanks, Yuran On Thu, May 11, 2017 at 10:49 AM, Greg Landrum wrote: > > > On Thu, May 11, 2017 at 4:24 PM, Yuran Wang > wrote: > >> I have a question regarding the available valence of Nitrogen. It seems >> only 3 is available in the default setting (atomic_data.cpp). Why is it >> kept to only 3, and not extended to include 4 and 5? If I change it locally >> to include 4 and 5, will it cause any problems? >> > > Aside from generating molecules that don't make any chemical sense? > Probably not, but the lack of chemical sense may cause some unexpected > behavior. > > >> I am aware that I could turn off the sanitization to get a mol object, >> however, it cannot be further processed to get fingerprints, which is what >> I need. >> > > Well, you could turn off the sanitization on molecule construction and > then manually sanitize with the valence check turned off. Here's a simple > example of that: > > In [11]: m = Chem.MolFromSmiles('CN(C)(C)(C)C',sanitize=False) > > In [12]: m.UpdatePropertyCache(strict=False) > > In [13]: Chem.SanitizeMol(m,Chem.SANITIZE_SYMMRINGS|Chem. > SANITIZE_SETCONJUGATION|Chem.SANITIZE_SETHYBRIDIZATION) > Out[13]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE > > In [14]: rdMolDescriptors.GetMorganFingerprint(m,2) > Out[14]: > > > But, again, the RDKit's valence rules tend to reflect real chemistry. What > are you trying to represent that you need 5 coordinate neutral nitrogen > atoms? There may be a better way. > > -greg > > -- Best, Yuran Wang -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Nitrogen Valence
On Thu, May 11, 2017 at 4:24 PM, Yuran Wang wrote: > I have a question regarding the available valence of Nitrogen. It seems > only 3 is available in the default setting (atomic_data.cpp). Why is it > kept to only 3, and not extended to include 4 and 5? If I change it locally > to include 4 and 5, will it cause any problems? > Aside from generating molecules that don't make any chemical sense? Probably not, but the lack of chemical sense may cause some unexpected behavior. > I am aware that I could turn off the sanitization to get a mol object, > however, it cannot be further processed to get fingerprints, which is what > I need. > Well, you could turn off the sanitization on molecule construction and then manually sanitize with the valence check turned off. Here's a simple example of that: In [11]: m = Chem.MolFromSmiles('CN(C)(C)(C)C',sanitize=False) In [12]: m.UpdatePropertyCache(strict=False) In [13]: Chem.SanitizeMol(m,Chem.SANITIZE_SYMMRINGS|Chem.SANITIZE_SETCONJUGATION|Chem.SANITIZE_SETHYBRIDIZATION) Out[13]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE In [14]: rdMolDescriptors.GetMorganFingerprint(m,2) Out[14]: But, again, the RDKit's valence rules tend to reflect real chemistry. What are you trying to represent that you need 5 coordinate neutral nitrogen atoms? There may be a better way. -greg -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
[Rdkit-discuss] Nitrogen Valence
Hey, I have a question regarding the available valence of Nitrogen. It seems only 3 is available in the default setting (atomic_data.cpp). Why is it kept to only 3, and not extended to include 4 and 5? If I change it locally to include 4 and 5, will it cause any problems? I am aware that I could turn off the sanitization to get a mol object, however, it cannot be further processed to get fingerprints, which is what I need. Thanks, Yuran -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss