Greg,
After running through the process with exception handling in place I was
able to isolate 10 structures that were being problematic. All of them had
at least one bond designated as 0 order in the SD file - much as you found
for some of the other structures previously. I assume that these passed the
initial import step but are failing upon descriptor generation for obvious
reasons.
I suppose the only request that I have is for more graceful error handling.
I've attached my (admittedly sloppy) version of CreateDB.py showing what I
did to isolate the errors.
-Kirk
On Thu, Nov 20, 2008 at 1:33 PM, rkdeli...@gmail.com wrote:
Indeed I can. Luckily I had a console window open with the error in place
just as I saw your message:
[13:21:16] INFO: Done: 54500
Traceback (most recent call last):
File C:\RDKit_Q32008_1\Projects\dbcli\CreateDB.py, line 222, in module
mol = Chem.Mol(str(pkl))
RuntimeError: Unknown exception
I've just wrapped this one in a try-catch block as well.
On Nov 20, 2008 1:17pm, Greg Landrum greg.land...@gmail.com wrote:
Can you send me the console output without disclosing things you
oughtn't to disclose?
FYI: the deprecation warnings ought not to be causing the problem.
There ought to be a bug report filed against this already, but it
looks like I forgot to submit it. grn.
-greg
On Thu, Nov 20, 2008 at 9:06 PM, wrote:
Greg,
Thanks for the quick response.
In reading my original question I realize I didn't explain myself well.
Sorry about that. 8^)
I'm trying to set up a database of ~100,000 structures which will be
queried
by very few structures at a time. While running CreateDB.py I get to
the
step that gives an output of:
'Generating fingerprints and descriptors:'
In reading the output more closely I see that there are some
deprecation
warnings that mention a distance matrix - that's where my original
question
regarding a pairwise computation step came from. Regardless, after
around
50,000 structures, I get a 'Runtime: unexpected exception' message and
Python stops. Having done a bit more research I see that each molecule
is
passed through Atom Pair, Fingerprint, and Descriptor generation. I
assume
it is failing somewhere within those steps, but I haven't yet
identified
where or why. I have just wrapped all of those procedures in try-catch
blocks in hopes of finding the offending structure. Once I have it,
I'll do
some tests on it and send it your way.
-Kirk
On Nov 20, 2008 12:41pm, Greg Landrum wrote:
[moving a general-interest question to the mailing list]
Hi Kirk,
On Thu, Nov 20, 2008 at 6:03 PM, wrote:
I have another question on DbCLI. After getting rid of problematic
structures, I was able to get DbCLI to the pairwise comparison step,
but
my
I'm not sure what the pairwise comparison step is with the DbCLI
stuff.
Step one is loading the database with CreateDb.py, step 2 is doing
searches with SearchDb.py. What are you asking about?
dataset has on the order of 100,000 structures. After about 50,000
structures Python issued an Unexpected error response and stopped.
Is
this
likely due to the enormous size of a pairwise distance table for
this
dataset? Have to had problems with very large datasets in the past
or
has
this typically worked smoothly?
I must admit that I've never queried with that number of structures.
My typical use case is to have a large database (10^5-10^6 compounds)
and query that with a few (~10) structures. The code hasn't really
been written to deal with giant query sets. That is doable, but it
would require some reworking. Probably the best bet would be to
support loading the queries from a database as well; that way you
wouldn't have to reprocess the queries every time and could pretty
easily handle the only loading a few at a time problem.
It's an interesting thing to think about.
-greg
# $Id: CreateDb.py 665 2008-05-15 04:33:40Z glandrum $
#
# Copyright (c) 2007, Novartis Institutes for BioMedical Research Inc.
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above
# copyright notice, this list of conditions and the following
# disclaimer in the documentation and/or