Re: [Rdkit-discuss] Saving mol file

GALLY Jose Manuel Wed, 26 Sep 2018 12:10:57 -0700

Dear Colin,
this is a specific problem I stumbled upon some time ago.[1]

I also mentioned it to the rDock mailing list.[2]


Maybe there is a better work-around, but in the meantime I wrote the
attached function.

It takes as input the Mol Block, which in my case are in a dataframe.

Hope that helps!

Cheers,
Jose Manuel

Refs:
[1] https://sourceforge.net/p/rdkit/mailman/message/34740124/
[2] https://sourceforge.net/p/rdock/mailman/message/34741112/

2018-09-25 17:27 GMT+02:00 Colin Bournez <colin.bour...@univ-orleans.fr>:

> Well yes I have this line indeed, I did not put the whole file for clarity
> purpose. The thing is tools as MOE, Pymol read it without problem but RDock
> for example can't read it properly and returns a neutral N which is not the
> case. And if I open it with pymol and save it back in mol format, the 3
> appears on the N line and Rdock has no trouble anymore...
> I was just wondering if there was a trick in RDKit to also save it this
> way.
>
>
> On 25/09/18 17:18, Greg Landrum wrote:
>
> Hi Colin,
> The RDkit outputs charge information to mol blocks using the CHG line:
>
> In [3]: m = Chem.MolFromSmiles('C[NH3+]')
>
> In [4]: print(Chem.MolToMolBlock(m))
>
>      RDKit          2D
>
>   2  1  0  0  0  0  0  0  0  0999 V2000
>     0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
>     1.2990    0.7500    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
>   1  2  1  0
> M  CHG  1   2   1
> M  END
>
>
> I expect that you will find one of those in your mol file and that it
> should be properly read in by other tools.
> Is this not the case for you?
>
> Best,
> -greg
>
>
>
> On Tue, Sep 25, 2018 at 4:39 PM Colin Bournez <
> colin.bour...@univ-orleans.fr> wrote:
>
>> Hey everyone,
>>
>> I have a question concerning the Chem.MolToMolFile() function.
>> When I open this file containing a N+ (here is the line corresponding in
>> the mol file) :
>>
>>    11.3700    3.4360  -11.8300 N   0  3  0  0  0  0  0  0  0  0  0  0
>>
>> And I just save it back withotu any modification, the line is then :
>>
>>      11.3700    3.4360  -11.8300 N   0  0  0  0  0  0  0  0  0  0  0  0
>>
>> The problem is that for some software this mol file causes trouble and
>> the N+ is transformed to N with 4 bonds.
>> I tried several tricks but I was not able to save it as the original
>> line, does anyone has suggestion?
>>
>> Thanks,
>>
>> --
>> *Colin Bournez*
>> PhD Student, Structural Bioinformatics & Chemoinformatics
>> Institut de Chimie Organique et Analytique (ICOA), UMR CNRS-Université
>> d'Orléans 7311
>> Rue de Chartres, 45067 Orléans, France
>> T. +33 238 494 577
>> _______________________________________________
>> Rdkit-discuss mailing list
>> Rdkit-discuss@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>
> --
> *Colin Bournez*
> PhD Student, Structural Bioinformatics & Chemoinformatics
> Institut de Chimie Organique et Analytique (ICOA), UMR CNRS-Université
> d'Orléans 7311
> Rue de Chartres, 45067 Orléans, France
> T. +33 238 494 577
>
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>


-- 
José-Manuel Gally
PhD Student
Structural Bioinformatics & Chemoinformatics
Institut de Chimie Organique et Analytique (ICOA)
UMR CNRS-Université d'Orléans 7311
Université d'Orléans
Rue de Chartres
F-45067 Orléans
phone: +33 238 494 577

def UpdateChargeFlagInAtomBlock(mb):
    """
    This function opens twice a file.
    During the first time it reads it in order to extract all Mol Blocks
    and update them with expected charge flags in atomblocks in memory.
    During second time it rewrites it using the updated Mol Blocks in memory.
    """
    f="{:>10s}"*3+"{:>2}{:>4s}"+"{:>3s}"*11
    chgs = []    # list of charges
    lines = mb.split("\n")
    if mb[0] == '' or mb[0] == "\n":
        del lines[0]
    CTAB = lines[2]
    atomCount = int(CTAB.split()[0])
    # parse mb line per line
    for l in lines:
        # look for M CHG property
        if l[0:6] == "M  CHG":
            records = l.split()[3:]    # M  CHG X is not needed for parsing, the info we want comes afterwards
            # record each charge into a list
            for i in range(0,len(records),2):
                idx = records[i]
                chg = records[i+1]
                chgs.append((int(idx), int(chg)))    # sort tuples by first element?
            break    # stop iterating

    # sort by idx in order to parse the molblock only once more
    chgs = sorted(chgs, key=lambda x: x[0])

    # that we have a list for the current molblock, attribute each charges
    for chg in chgs:
        i=3
        while i < 3+atomCount:    # do not read from beginning each time, rather continue parsing mb!
            # when finding the idx of the atom we want to update, extract all fields and rewrite whole sequence
            if i-2 == chg[0]:    # -4 to take into account the CTAB headers, +1 because idx begin at 1 and not 0
                fields = lines[i].split()
                x=fields[0]
                y=fields[1]
                z=fields[2]
                symb=fields[3]
                massDiff=fields[4]
                charge=fields[5]
                sp=fields[6]
                hc=fields[7]
                scb=fields[8]
                v=fields[9]
                hd=fields[10]
                nu1=fields[11]
                nu2=fields[12]
                aamn=fields[13]
                irf=fields[14]
                ecf=fields[15]
                # update charge flag
                if chg[1] == -1:
                    charge = '5'
                elif chg[1] == -2:
                    charge = '6'
                elif chg[1] == -3:
                    charge = '7'
                elif chg[1] == 1:
                    charge = '3'
                elif chg[1] == 2:
                    charge = '2'
                elif chg[1] == 3:
                    charge = '1'
                else:
                    print("ERROR! " + str(lines[0]) + "unknown charge flag: " + str(chg[1]))    # print name then go to next chg
                    break
                # update modatom block line
                lines[i] = f.format(x,y,z,symb,massDiff,charge,sp,hc,scb,v,hd,nu1,nu2,aamn,irf,ecf)
            i+=1
    #print("\n".join(lines))
    del lines[-1]    # remove empty element left because last character before $$$$ is \n
    upmb = "\n" + "\n".join(lines)
    return(upmb)

_______________________________________________
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss

Re: [Rdkit-discuss] Saving mol file

Reply via email to