Re: [Rdkit-discuss] Catching errors in SMILES files

2019-06-06 Thread Greg Landrum
For what it's worth, I think I fixed this (and cleared up some other
problems) in this PR:
https://github.com/rdkit/rdkit/pull/2482

On Tue, Jun 4, 2019 at 1:44 PM Paolo Tosco 
wrote:

> Hi David,
>
> I think I already have a fix for this bug, I'll submit a PR later. If you
> can create a ?GitHub issue it would be great so I can link my PR to the bug.
>
> Thanks, cheers
> p.
>
> On 06/04/19 12:10, David Cosgrove wrote:
>
> Hi Paolo,
> Many thanks for the speedy reply.  I'll do as you suggest for now.  Do you
> want me to file an issue on github, or even, maybe, see if I can fix it
> myself?
> Cheers,
> Dave
>
>
> On Mon, Jun 3, 2019 at 5:32 PM Paolo Tosco 
> wrote:
>
>> Hi David,
>>
>> a workaround could be adding a final check after the for loop:
>>
>> #!/usr/bin/env python
>>
>> from rdkit import Chem
>>
>> suppl1 = Chem.SmilesMolSupplier('test1.smi', titleLine=False,
>> nameColumn=1)
>> rec_num = 0
>> print("len(suppl1) = {0:d}".format(len(suppl1)))
>> for mol in suppl1:
>> rec_num += 1
>> if not mol:
>> print('Record {} not read.'.format(rec_num))
>> else:
>> print('Record {} read ok.'.format(rec_num))
>> if (rec_num == len(suppl1) - 1):
>> rec_num += 1
>> print('Record {} not read.'.format(rec_num))
>>
>>
>> suppl2 = Chem.SmilesMolSupplier('test2.smi', titleLine=False,
>> nameColumn=1)
>> rec_num = 0
>> print("len(suppl2) = {0:d}".format(len(suppl2)))
>> for mol in suppl2:
>> rec_num += 1
>> if not mol:
>> print('Record {} not read.'.format(rec_num))
>> else:
>> print('Record {} read ok.'.format(rec_num))
>> if (rec_num == len(suppl2) - 1):
>> rec_num += 1
>> print('Record {} not read.'.format(rec_num))
>>
>> This should work until what seems to be an issue in the SmilesSupplier is
>> fixed.
>>
>> Cheers,
>> p.
>>
>> On 06/03/19 16:49, David Cosgrove wrote:
>>
>> Hi,
>>
>> I'm trying to catch the line numbers of lines in a SMILES file that
>> aren't parsed by the SmilesMolSupplier.  Example code is attached, along
>> with 2 SMILES files.  When there is a bad SMILES string on the last line,
>> the error is not reported, as in test2.smi.  I've tried iterating through
>> the file in a loop using next(suppl1) and catching the StopIteration
>> exception, but I have the same issue.  Is there a way to spot a last bad
>> record in a file?
>>
>> Thanks,
>> Dave
>>
>> --
>> David Cosgrove
>> Freelance computational chemistry and chemoinformatics developer
>> http://cozchemix.co.uk
>>
>>
>>
>>
>>
>> ___
>> Rdkit-discuss mailing 
>> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>>
>>
>>
>
> --
> David Cosgrove
> Freelance computational chemistry and chemoinformatics developer
> http://cozchemix.co.uk
>
>
> ___
> Rdkit-discuss mailing list
> Rdkit-discuss@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Catching errors in SMILES files

2019-06-04 Thread Paolo Tosco

Hi David,

I think I already have a fix for this bug, I'll submit a PR later. If 
you can create a ?GitHub issue it would be great so I can link my PR to 
the bug.


Thanks, cheers
p.


On 06/04/19 12:10, David Cosgrove wrote:

Hi Paolo,
Many thanks for the speedy reply.  I'll do as you suggest for now.  Do 
you want me to file an issue on github, or even, maybe, see if I can 
fix it myself?

Cheers,
Dave


On Mon, Jun 3, 2019 at 5:32 PM Paolo Tosco > wrote:


Hi David,

a workaround could be adding a final check after the for loop:

#!/usr/bin/env python

from rdkit import Chem

suppl1 = Chem.SmilesMolSupplier('test1.smi', titleLine=False,
nameColumn=1)
rec_num = 0
print("len(suppl1) = {0:d}".format(len(suppl1)))
for mol in suppl1:
    rec_num += 1
    if not mol:
    print('Record {} not read.'.format(rec_num))
    else:
    print('Record {} read ok.'.format(rec_num))
if (rec_num == len(suppl1) - 1):
    rec_num += 1
    print('Record {} not read.'.format(rec_num))


suppl2 = Chem.SmilesMolSupplier('test2.smi', titleLine=False,
nameColumn=1)
rec_num = 0
print("len(suppl2) = {0:d}".format(len(suppl2)))
for mol in suppl2:
    rec_num += 1
    if not mol:
    print('Record {} not read.'.format(rec_num))
    else:
    print('Record {} read ok.'.format(rec_num))
if (rec_num == len(suppl2) - 1):
    rec_num += 1
    print('Record {} not read.'.format(rec_num))

This should work until what seems to be an issue in the
SmilesSupplier is fixed.

Cheers,
p.

On 06/03/19 16:49, David Cosgrove wrote:

Hi,

I'm trying to catch the line numbers of lines in a SMILES file
that aren't parsed by the SmilesMolSupplier.  Example code is
attached, along with 2 SMILES files.  When there is a bad SMILES
string on the last line, the error is not reported, as in
test2.smi.  I've tried iterating through the file in a loop using
next(suppl1) and catching the StopIteration exception, but I have
the same issue. Is there a way to spot a last bad record in a file?

Thanks,
Dave

-- 
David Cosgrove

Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk





___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/rdkit-discuss




--
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk



___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Catching errors in SMILES files

2019-06-04 Thread David Cosgrove
Hi Paolo,
Many thanks for the speedy reply.  I'll do as you suggest for now.  Do you
want me to file an issue on github, or even, maybe, see if I can fix it
myself?
Cheers,
Dave


On Mon, Jun 3, 2019 at 5:32 PM Paolo Tosco 
wrote:

> Hi David,
>
> a workaround could be adding a final check after the for loop:
>
> #!/usr/bin/env python
>
> from rdkit import Chem
>
> suppl1 = Chem.SmilesMolSupplier('test1.smi', titleLine=False, nameColumn=1)
> rec_num = 0
> print("len(suppl1) = {0:d}".format(len(suppl1)))
> for mol in suppl1:
> rec_num += 1
> if not mol:
> print('Record {} not read.'.format(rec_num))
> else:
> print('Record {} read ok.'.format(rec_num))
> if (rec_num == len(suppl1) - 1):
> rec_num += 1
> print('Record {} not read.'.format(rec_num))
>
>
> suppl2 = Chem.SmilesMolSupplier('test2.smi', titleLine=False, nameColumn=1)
> rec_num = 0
> print("len(suppl2) = {0:d}".format(len(suppl2)))
> for mol in suppl2:
> rec_num += 1
> if not mol:
> print('Record {} not read.'.format(rec_num))
> else:
> print('Record {} read ok.'.format(rec_num))
> if (rec_num == len(suppl2) - 1):
> rec_num += 1
> print('Record {} not read.'.format(rec_num))
>
> This should work until what seems to be an issue in the SmilesSupplier is
> fixed.
>
> Cheers,
> p.
>
> On 06/03/19 16:49, David Cosgrove wrote:
>
> Hi,
>
> I'm trying to catch the line numbers of lines in a SMILES file that aren't
> parsed by the SmilesMolSupplier.  Example code is attached, along with 2
> SMILES files.  When there is a bad SMILES string on the last line, the
> error is not reported, as in test2.smi.  I've tried iterating through the
> file in a loop using next(suppl1) and catching the StopIteration exception,
> but I have the same issue.  Is there a way to spot a last bad record in a
> file?
>
> Thanks,
> Dave
>
> --
> David Cosgrove
> Freelance computational chemistry and chemoinformatics developer
> http://cozchemix.co.uk
>
>
>
>
>
> ___
> Rdkit-discuss mailing 
> listRdkit-discuss@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>
>

-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Catching errors in SMILES files

2019-06-03 Thread Paolo Tosco

Hi David,

a workaround could be adding a final check after the for loop:

#!/usr/bin/env python

from rdkit import Chem

suppl1 = Chem.SmilesMolSupplier('test1.smi', titleLine=False, nameColumn=1)
rec_num = 0
print("len(suppl1) = {0:d}".format(len(suppl1)))
for mol in suppl1:
    rec_num += 1
    if not mol:
    print('Record {} not read.'.format(rec_num))
    else:
    print('Record {} read ok.'.format(rec_num))
if (rec_num == len(suppl1) - 1):
    rec_num += 1
    print('Record {} not read.'.format(rec_num))


suppl2 = Chem.SmilesMolSupplier('test2.smi', titleLine=False, nameColumn=1)
rec_num = 0
print("len(suppl2) = {0:d}".format(len(suppl2)))
for mol in suppl2:
    rec_num += 1
    if not mol:
    print('Record {} not read.'.format(rec_num))
    else:
    print('Record {} read ok.'.format(rec_num))
if (rec_num == len(suppl2) - 1):
    rec_num += 1
    print('Record {} not read.'.format(rec_num))

This should work until what seems to be an issue in the SmilesSupplier 
is fixed.


Cheers,
p.

On 06/03/19 16:49, David Cosgrove wrote:

Hi,

I'm trying to catch the line numbers of lines in a SMILES file that 
aren't parsed by the SmilesMolSupplier.  Example code is attached, 
along with 2 SMILES files.  When there is a bad SMILES string on the 
last line, the error is not reported, as in test2.smi.  I've tried 
iterating through the file in a loop using next(suppl1) and catching 
the StopIteration exception, but I have the same issue.  Is there a 
way to spot a last bad record in a file?


Thanks,
Dave

--
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk





___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


[Rdkit-discuss] Catching errors in SMILES files

2019-06-03 Thread David Cosgrove
Hi,

I'm trying to catch the line numbers of lines in a SMILES file that aren't
parsed by the SmilesMolSupplier.  Example code is attached, along with 2
SMILES files.  When there is a bad SMILES string on the last line, the
error is not reported, as in test2.smi.  I've tried iterating through the
file in a loop using next(suppl1) and catching the StopIteration exception,
but I have the same issue.  Is there a way to spot a last bad record in a
file?

Thanks,
Dave

-- 
David Cosgrove
Freelance computational chemistry and chemoinformatics developer
http://cozchemix.co.uk


test1.smi
Description: application/diskcopy


test2.smi
Description: application/diskcopy
#!/usr/bin/env python

from rdkit import Chem

suppl1 = Chem.SmilesMolSupplier('test1.smi', titleLine=False, nameColumn=1)
rec_num = 0
for mol in suppl1:
rec_num += 1
if not mol:
print('Record {} not read.'.format(rec_num))
else:
print('Record {} read ok.'.format(rec_num))


suppl2 = Chem.SmilesMolSupplier('test2.smi', titleLine=False, nameColumn=1)
rec_num = 0
for mol in suppl2:
rec_num += 1
if not mol:
print('Record {} not read.'.format(rec_num))
else:
print('Record {} read ok.'.format(rec_num))


___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss