The problem with the first molecule is that it's not actually a molecule,
it includes R-group information.
Here's what it looks like if you past it into Marvin Sketch:
[image: image.png]
The RDKit does not (yet) know how to parse this.
Here's what the second molecule looks like:
[image: image.png]
Marvin Sketch has done a nice job of highlighting the four problematic
atoms: each of those atoms has an incorrect valence since the molecule
isn't drawn correctly. I think this is probably what is intended (note the
coordinate bonds to the Zn):
[image: image.png]
The RDKit usually provides the errors it does because there's a problem
with the chemistry in the molecule. When you see one of those, it's almost
always a good idea to paste the structure into a chemical sketcher and take
a look at it to see if you can figure out what the problem is and whether
or not you can fix it.

-greg





On Thu, Aug 6, 2020 at 2:16 PM Pitanti Chalowa <ch1...@gmail.com> wrote:

> Thank you Sir for your reply.
>
> RDkit version I am using is 2020.03.4.
>
> I have included each SDF section with associated errors I am receiving.
>
>
> *ERROR: Problems encountered parsing Mol data, M END missing around line
> 16739  *
>
> >  <DSSTox_Compound_id>
> DTXCID701169
>
> >  <DSSTox_Substance_id>
> DTXSID6021169
>
> >  <CASRN>
> 61477-94-9
>
> >  <QC_Level>
> DSSTox_High
>
> >  <Preferred_name>
> Pirmenol hydrochloride
>
> >  <Mol_Weight>
> 374.9500000000
>
> >  <Mol_Formula>
> C22H31ClN2O
>
> >  <Monoisotopic_Mass>
> 374.2124913000
>
> >  <Dashboard_URL>
> https://comptox.epa.gov/dashboard/DTXSID6021169
>
> $$$$
> DTXCID601285170
>   Mrv1805 05101813452D
>
>   0  0  0     0  0            999 V3000
> M  V30 BEGIN CTAB
> M  V30 COUNTS 22 23 0 0 0
> M  V30 BEGIN ATOM
> M  V30 1 C 3.5184 1.3335 0 0
> M  V30 2 C 5.0584 1.3335 0 0
> M  V30 3 C 5.8282 0 0 0
> M  V30 4 C 5.0584 -1.3335 0 0
> M  V30 5 C 3.5184 -1.3335 0 0
> M  V30 6 C 2.7484 0 0 0
> M  V30 7 C 1.2084 0 0 0
> M  V30 8 C 0.4386 -1.3335 0 0
> M  V30 9 C -1.1014 -1.3335 0 0
> M  V30 10 C -1.8714 0 0 0
> M  V30 11 C -1.1014 1.3335 0 0
> M  V30 12 C 0.4386 1.3335 0 0
> M  V30 13 R# -1.8714 2.6671 0 0 RGROUPS=(1 1)
> M  V30 14 R# -3.4114 0 0 0 RGROUPS=(1 1)
> M  V30 15 R# -1.8712 -2.6671 0 0 RGROUPS=(1 1)
> M  V30 16 R# 1.2084 -2.6671 0 0 RGROUPS=(1 1)
> M  V30 17 R# 2.7486 -2.6671 0 0 RGROUPS=(1 1)
> M  V30 18 R# 1.2086 2.6671 0 0 RGROUPS=(1 1)
> M  V30 19 R# 2.7484 2.6671 0 0 RGROUPS=(1 1)
> M  V30 20 R# 5.8284 2.6671 0 0 RGROUPS=(1 1)
> M  V30 21 R# 7.3682 0 0 0 RGROUPS=(1 1)
> M  V30 22 R# 5.8282 -2.6671 0 0 RGROUPS=(1 1)
> M  V30 END ATOM
> M  V30 BEGIN BOND
> M  V30 1 2 1 2
> M  V30 2 1 2 3
> M  V30 3 2 3 4
> M  V30 4 1 4 5
> M  V30 5 2 5 6
> M  V30 6 1 6 1
> M  V30 7 1 6 7
> M  V30 8 1 8 9
> M  V30 9 2 9 10
> M  V30 10 1 10 11
> M  V30 11 2 11 12
> M  V30 12 2 7 8
> M  V30 13 1 12 7
> M  V30 14 1 9 15
> M  V30 15 1 8 16
> M  V30 16 1 5 17
> M  V30 17 1 4 22
> M  V30 18 1 3 21
> M  V30 19 1 2 20
> M  V30 20 1 1 19
> M  V30 21 1 12 18
> M  V30 22 1 11 13
> M  V30 23 1 10 14
> M  V30 END BOND
> M  V30 END CTAB
> M  V30 BEGIN RGROUP 1
> M  V30 RLOGIC 0 1 >0
> M  V30 BEGIN CTAB
> M  V30 COUNTS 1 0 0 0 0
> M  V30 BEGIN ATOM
> M  V30 1 Br -7.3682 -1.2499 0 0 ATTCHPT=1
> M  V30 END ATOM
> M  V30 END CTAB
> M  V30 END RGROUP
> M  END
>
> *ERROR: Could not sanitize molecule ending on line 78558  *
>
> >  <DSSTox_Compound_id>
> DTXCID501446
>
> >  <DSSTox_Substance_id>
> DTXSID6026298
>
> >  <CASRN>
> 108-38-3
>
> >  <QC_Level>
> DSSTox_High
>
> >  <Preferred_name>
> m-Xylene
>
> >  <Mol_Weight>
> 106.1680000000
>
> >  <Mol_Formula>
> C8H10
>
> >  <Monoisotopic_Mass>
> 106.0782503220
>
> >  <Dashboard_URL>
> https://comptox.epa.gov/dashboard/DTXSID6026298
>
> $$$$
> DTXCID90820451
>   Mrv1611104121614362D
>
>   0  0  0     0  0            999 V3000
> M  V30 BEGIN CTAB
> M  V30 COUNTS 17 20 0 0 0
> M  V30 BEGIN ATOM
> M  V30 1 O -0.7801 -1.2459 0 0 CHG=-1
> M  V30 2 C -2.2448 0.77 0 0
> M  V30 3 N -2.2448 -0.77 0 0
> M  V30 4 C -3.5784 1.54 0 0
> M  V30 5 C -3.5784 -1.54 0 0
> M  V30 6 C -4.9121 0.77 0 0
> M  V30 7 C -4.9121 -0.77 0 0
> M  V30 8 S -0.7801 1.2459 0 0
> M  V30 9 Zn 0.1251 0 0 0 CHG=2
> M  V30 10 O 0.7801 1.2459 0 0 CHG=-1
> M  V30 11 C 2.2448 -0.77 0 0
> M  V30 12 N 2.2448 0.77 0 0
> M  V30 13 C 3.5784 -1.54 0 0
> M  V30 14 C 3.5784 1.54 0 0
> M  V30 15 C 4.9121 -0.77 0 0
> M  V30 16 C 4.9121 0.77 0 0
> M  V30 17 S 0.7801 -1.2459 0 0
> M  V30 END ATOM
> M  V30 BEGIN BOND
> M  V30 1 1 3 1
> M  V30 2 1 3 2
> M  V30 3 1 4 2
> M  V30 4 2 8 2
> M  V30 5 1 5 3
> M  V30 6 2 6 4
> M  V30 7 2 7 5
> M  V30 8 1 7 6
> M  V30 9 1 9 8
> M  V30 10 1 17 9
> M  V30 11 1 12 10
> M  V30 12 1 12 11
> M  V30 13 1 13 11
> M  V30 14 2 17 11
> M  V30 15 1 14 12
> M  V30 16 2 15 13
> M  V30 17 2 16 14
> M  V30 18 1 16 15
> M  V30 19 1 9 1
> M  V30 20 1 9 10
> M  V30 END BOND
> M  V30 END CTAB
> M  END
>
>
>
>
>
>
>
>
>
> On Thu, Aug 6, 2020 at 3:51 AM Greg Landrum <greg.land...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Without seeing the SDF itself it's hard to be specific, but here's what
>> the error messages are telling you, in general:
>>
>> the first one normally indicates a badly formed record in the SDF. If you
>> look at around that line in the file you will, hopefully, see a misformed
>> record.
>> The next one, "Explicit valence" indicates that the molecule has an atom
>> (in this case an "O") that has the equivalent of three bonds to it. That's
>> not chemically reasonable, so the software complains
>> The error about "Alkyl" is self explanatory: there's a molecule in the
>> SDF which has an atom with symbol "Alkyl".
>> The rest are warnings.
>>
>> In order to provide more specific help, we'll need to see the SDF you're
>> using (or at least the SDF for the molecules that are failing) as well as
>> information about which version of the RDKit you're using.
>>
>> -greg
>>
>>
>>
>> On Wed, Aug 5, 2020 at 11:43 PM Pitanti Chalowa <ch1...@gmail.com> wrote:
>>
>>> Respected Altruistic Researcher,
>>> While converting one sdf file to fingerprint, I am facing several errors.
>>>
>>> My code
>>>
>>> suppl = Chem.SDMolSupplier('1.sdf')for mol in suppl:
>>>   if mol is None: continue
>>>   # print(mol.GetNumAtoms())
>>>
>>> fps = [Chem.RDKFingerprint(x) for x in supply]
>>>
>>> I am facing many errors
>>>
>>> ERROR: Problems encountered parsing Mol data, M  END missing around line 
>>> 16739...
>>> ERROR: Explicit valence for atom # 0 O, 3, is greater than permitted...
>>> ERROR: Could not sanitize molecule ending on line 78558...
>>> ERROR: Post-condition ViolationRDKit ERROR: Element 'Alkyl' not foundRDKit 
>>> ERROR: Violation occurred on line 91 in file 
>>> /home/conda/feedstock_root/build_artifacts/rdkit_1593788763912/work/Code/GraphMol/PeriodicTable.hRDKit
>>>  ERROR: Failed Expression: anum > -1
>>> ...
>>> WARNING: not removing hydrogen atom without neighbors
>>>
>>> RDKit WARNING: atom 0 has specified valence (4) smaller than the drawn 
>>> valence 6.
>>>
>>> Please direct me to the references. How can I correct them?
>>> _______________________________________________
>>> Rdkit-discuss mailing list
>>> Rdkit-discuss@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>>
>> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Reply via email to