The problem with the first molecule is that it's not actually a molecule, it includes R-group information. Here's what it looks like if you past it into Marvin Sketch: [image: image.png] The RDKit does not (yet) know how to parse this. Here's what the second molecule looks like: [image: image.png] Marvin Sketch has done a nice job of highlighting the four problematic atoms: each of those atoms has an incorrect valence since the molecule isn't drawn correctly. I think this is probably what is intended (note the coordinate bonds to the Zn): [image: image.png] The RDKit usually provides the errors it does because there's a problem with the chemistry in the molecule. When you see one of those, it's almost always a good idea to paste the structure into a chemical sketcher and take a look at it to see if you can figure out what the problem is and whether or not you can fix it.
-greg On Thu, Aug 6, 2020 at 2:16 PM Pitanti Chalowa <ch1...@gmail.com> wrote: > Thank you Sir for your reply. > > RDkit version I am using is 2020.03.4. > > I have included each SDF section with associated errors I am receiving. > > > *ERROR: Problems encountered parsing Mol data, M END missing around line > 16739 * > > > <DSSTox_Compound_id> > DTXCID701169 > > > <DSSTox_Substance_id> > DTXSID6021169 > > > <CASRN> > 61477-94-9 > > > <QC_Level> > DSSTox_High > > > <Preferred_name> > Pirmenol hydrochloride > > > <Mol_Weight> > 374.9500000000 > > > <Mol_Formula> > C22H31ClN2O > > > <Monoisotopic_Mass> > 374.2124913000 > > > <Dashboard_URL> > https://comptox.epa.gov/dashboard/DTXSID6021169 > > $$$$ > DTXCID601285170 > Mrv1805 05101813452D > > 0 0 0 0 0 999 V3000 > M V30 BEGIN CTAB > M V30 COUNTS 22 23 0 0 0 > M V30 BEGIN ATOM > M V30 1 C 3.5184 1.3335 0 0 > M V30 2 C 5.0584 1.3335 0 0 > M V30 3 C 5.8282 0 0 0 > M V30 4 C 5.0584 -1.3335 0 0 > M V30 5 C 3.5184 -1.3335 0 0 > M V30 6 C 2.7484 0 0 0 > M V30 7 C 1.2084 0 0 0 > M V30 8 C 0.4386 -1.3335 0 0 > M V30 9 C -1.1014 -1.3335 0 0 > M V30 10 C -1.8714 0 0 0 > M V30 11 C -1.1014 1.3335 0 0 > M V30 12 C 0.4386 1.3335 0 0 > M V30 13 R# -1.8714 2.6671 0 0 RGROUPS=(1 1) > M V30 14 R# -3.4114 0 0 0 RGROUPS=(1 1) > M V30 15 R# -1.8712 -2.6671 0 0 RGROUPS=(1 1) > M V30 16 R# 1.2084 -2.6671 0 0 RGROUPS=(1 1) > M V30 17 R# 2.7486 -2.6671 0 0 RGROUPS=(1 1) > M V30 18 R# 1.2086 2.6671 0 0 RGROUPS=(1 1) > M V30 19 R# 2.7484 2.6671 0 0 RGROUPS=(1 1) > M V30 20 R# 5.8284 2.6671 0 0 RGROUPS=(1 1) > M V30 21 R# 7.3682 0 0 0 RGROUPS=(1 1) > M V30 22 R# 5.8282 -2.6671 0 0 RGROUPS=(1 1) > M V30 END ATOM > M V30 BEGIN BOND > M V30 1 2 1 2 > M V30 2 1 2 3 > M V30 3 2 3 4 > M V30 4 1 4 5 > M V30 5 2 5 6 > M V30 6 1 6 1 > M V30 7 1 6 7 > M V30 8 1 8 9 > M V30 9 2 9 10 > M V30 10 1 10 11 > M V30 11 2 11 12 > M V30 12 2 7 8 > M V30 13 1 12 7 > M V30 14 1 9 15 > M V30 15 1 8 16 > M V30 16 1 5 17 > M V30 17 1 4 22 > M V30 18 1 3 21 > M V30 19 1 2 20 > M V30 20 1 1 19 > M V30 21 1 12 18 > M V30 22 1 11 13 > M V30 23 1 10 14 > M V30 END BOND > M V30 END CTAB > M V30 BEGIN RGROUP 1 > M V30 RLOGIC 0 1 >0 > M V30 BEGIN CTAB > M V30 COUNTS 1 0 0 0 0 > M V30 BEGIN ATOM > M V30 1 Br -7.3682 -1.2499 0 0 ATTCHPT=1 > M V30 END ATOM > M V30 END CTAB > M V30 END RGROUP > M END > > *ERROR: Could not sanitize molecule ending on line 78558 * > > > <DSSTox_Compound_id> > DTXCID501446 > > > <DSSTox_Substance_id> > DTXSID6026298 > > > <CASRN> > 108-38-3 > > > <QC_Level> > DSSTox_High > > > <Preferred_name> > m-Xylene > > > <Mol_Weight> > 106.1680000000 > > > <Mol_Formula> > C8H10 > > > <Monoisotopic_Mass> > 106.0782503220 > > > <Dashboard_URL> > https://comptox.epa.gov/dashboard/DTXSID6026298 > > $$$$ > DTXCID90820451 > Mrv1611104121614362D > > 0 0 0 0 0 999 V3000 > M V30 BEGIN CTAB > M V30 COUNTS 17 20 0 0 0 > M V30 BEGIN ATOM > M V30 1 O -0.7801 -1.2459 0 0 CHG=-1 > M V30 2 C -2.2448 0.77 0 0 > M V30 3 N -2.2448 -0.77 0 0 > M V30 4 C -3.5784 1.54 0 0 > M V30 5 C -3.5784 -1.54 0 0 > M V30 6 C -4.9121 0.77 0 0 > M V30 7 C -4.9121 -0.77 0 0 > M V30 8 S -0.7801 1.2459 0 0 > M V30 9 Zn 0.1251 0 0 0 CHG=2 > M V30 10 O 0.7801 1.2459 0 0 CHG=-1 > M V30 11 C 2.2448 -0.77 0 0 > M V30 12 N 2.2448 0.77 0 0 > M V30 13 C 3.5784 -1.54 0 0 > M V30 14 C 3.5784 1.54 0 0 > M V30 15 C 4.9121 -0.77 0 0 > M V30 16 C 4.9121 0.77 0 0 > M V30 17 S 0.7801 -1.2459 0 0 > M V30 END ATOM > M V30 BEGIN BOND > M V30 1 1 3 1 > M V30 2 1 3 2 > M V30 3 1 4 2 > M V30 4 2 8 2 > M V30 5 1 5 3 > M V30 6 2 6 4 > M V30 7 2 7 5 > M V30 8 1 7 6 > M V30 9 1 9 8 > M V30 10 1 17 9 > M V30 11 1 12 10 > M V30 12 1 12 11 > M V30 13 1 13 11 > M V30 14 2 17 11 > M V30 15 1 14 12 > M V30 16 2 15 13 > M V30 17 2 16 14 > M V30 18 1 16 15 > M V30 19 1 9 1 > M V30 20 1 9 10 > M V30 END BOND > M V30 END CTAB > M END > > > > > > > > > > On Thu, Aug 6, 2020 at 3:51 AM Greg Landrum <greg.land...@gmail.com> > wrote: > >> Hi, >> >> Without seeing the SDF itself it's hard to be specific, but here's what >> the error messages are telling you, in general: >> >> the first one normally indicates a badly formed record in the SDF. If you >> look at around that line in the file you will, hopefully, see a misformed >> record. >> The next one, "Explicit valence" indicates that the molecule has an atom >> (in this case an "O") that has the equivalent of three bonds to it. That's >> not chemically reasonable, so the software complains >> The error about "Alkyl" is self explanatory: there's a molecule in the >> SDF which has an atom with symbol "Alkyl". >> The rest are warnings. >> >> In order to provide more specific help, we'll need to see the SDF you're >> using (or at least the SDF for the molecules that are failing) as well as >> information about which version of the RDKit you're using. >> >> -greg >> >> >> >> On Wed, Aug 5, 2020 at 11:43 PM Pitanti Chalowa <ch1...@gmail.com> wrote: >> >>> Respected Altruistic Researcher, >>> While converting one sdf file to fingerprint, I am facing several errors. >>> >>> My code >>> >>> suppl = Chem.SDMolSupplier('1.sdf')for mol in suppl: >>> if mol is None: continue >>> # print(mol.GetNumAtoms()) >>> >>> fps = [Chem.RDKFingerprint(x) for x in supply] >>> >>> I am facing many errors >>> >>> ERROR: Problems encountered parsing Mol data, M END missing around line >>> 16739... >>> ERROR: Explicit valence for atom # 0 O, 3, is greater than permitted... >>> ERROR: Could not sanitize molecule ending on line 78558... >>> ERROR: Post-condition ViolationRDKit ERROR: Element 'Alkyl' not foundRDKit >>> ERROR: Violation occurred on line 91 in file >>> /home/conda/feedstock_root/build_artifacts/rdkit_1593788763912/work/Code/GraphMol/PeriodicTable.hRDKit >>> ERROR: Failed Expression: anum > -1 >>> ... >>> WARNING: not removing hydrogen atom without neighbors >>> >>> RDKit WARNING: atom 0 has specified valence (4) smaller than the drawn >>> valence 6. >>> >>> Please direct me to the references. How can I correct them? >>> _______________________________________________ >>> Rdkit-discuss mailing list >>> Rdkit-discuss@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >> _______________________________________________ > Rdkit-discuss mailing list > Rdkit-discuss@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >
_______________________________________________ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss