Re: [Rdkit-discuss] Substructure search

2009-11-07 Thread Greg Landrum
Combining two answers into one:

On Fri, Nov 6, 2009 at 7:59 AM, Evgueni Kolossov ekolos...@gmail.com wrote:
 Hi Greg,

 Yes, this is solution I been thinking about as well but there is 2 problems:
 1. It will slow dawn mapping process which is slow already
 2. What atom to use for replacement?

I'm not sure I understand what you mean about slowing down the mapping
process. If you replace the dummies in your fragments with query
atoms, as I proposed in the sample code in my earlier message, the
substructure search should not be substantially slower. The
replacement itself also won't take that long, unless you really have a
*lot* of fragments.


On Fri, Nov 6, 2009 at 9:03 AM, Evgueni Kolossov ekolos...@gmail.com wrote:

 I think you should distinguish between dummy atoms and connection points -
 for fragments it is connection points we are talking about.

The code doesn't understand anything about connection points... it
just has atoms. Dummy atoms are atoms with atomic number zero. The
substructure matching code applied to normal Atoms (i.e. not
QueryAtoms) compares two atoms by checking to see if their atomic
numbers match, so dummies match dummies. Additionally, when isotopes
are specified, it checks that the specified isotopes match.
QueryAtoms, on the other had, allow client code to specify the
function that's used for matching. The example I provided showed how
to use a function that matches any atom; which I think is what you are
looking for.

 So, it suppose
 to ignore this atom (but not bond!) during matching process. May be just add
 another bool flag to allow user select different behavior?

The substructure matching uses atoms and bonds, and returns the
results as lists of atom indices; how (and why) would you propose to
ignore an atom but not a bond?

-greg

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Substructure search

2009-11-07 Thread Evgueni Kolossov
Thanks Greg,

I have calculated it will slow down on about 30% using this replacement
which is significant for big datasets.

The substructure matching uses atoms and bonds, and returns the
results as lists of atom indices; how (and why) would you propose to
ignore an atom but not a bond?
I mean take bond in account as it is but use match any for dummy atom

Regards,
Evgueni


2009/11/7 Greg Landrum greg.land...@gmail.com

 Combining two answers into one:

 On Fri, Nov 6, 2009 at 7:59 AM, Evgueni Kolossov ekolos...@gmail.com
 wrote:
  Hi Greg,
 
  Yes, this is solution I been thinking about as well but there is 2
 problems:
  1. It will slow dawn mapping process which is slow already
  2. What atom to use for replacement?

 I'm not sure I understand what you mean about slowing down the mapping
 process. If you replace the dummies in your fragments with query
 atoms, as I proposed in the sample code in my earlier message, the
 substructure search should not be substantially slower. The
 replacement itself also won't take that long, unless you really have a
 *lot* of fragments.


 On Fri, Nov 6, 2009 at 9:03 AM, Evgueni Kolossov ekolos...@gmail.com
 wrote:
 
  I think you should distinguish between dummy atoms and connection points
 -
  for fragments it is connection points we are talking about.

 The code doesn't understand anything about connection points... it
 just has atoms. Dummy atoms are atoms with atomic number zero. The
 substructure matching code applied to normal Atoms (i.e. not
 QueryAtoms) compares two atoms by checking to see if their atomic
 numbers match, so dummies match dummies. Additionally, when isotopes
 are specified, it checks that the specified isotopes match.
 QueryAtoms, on the other had, allow client code to specify the
 function that's used for matching. The example I provided showed how
 to use a function that matches any atom; which I think is what you are
 looking for.

  So, it suppose
  to ignore this atom (but not bond!) during matching process. May be just
 add
  another bool flag to allow user select different behavior?

 The substructure matching uses atoms and bonds, and returns the
 results as lists of atom indices; how (and why) would you propose to
 ignore an atom but not a bond?

 -greg

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Substructure search

2009-11-07 Thread Greg Landrum
On Sat, Nov 7, 2009 at 12:35 PM, Evgueni Kolossov ekolos...@gmail.com wrote:

 I have calculated it will slow down on about 30% using this replacement
 which is significant for big datasets.

Agreed, that's a huge difference. How does it come about? Where is the
time being spent?

-greg

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Substructure search

2009-11-07 Thread Greg Landrum
On Sat, Nov 7, 2009 at 3:44 PM, Evgueni Kolossov ekolos...@gmail.com wrote:

 I have not done full profiling - this came just from the difference between
 time with and without Replace Dummmy

are you doing the replace dummy for each fragment every time before
you do a search or do you do it just once?

I would guess that replacing the dummy atoms shouldn't take very long
at all, and then doing the searches should also be reasonably quick.
One complication might be that having the query atoms will return a
lot more matches than the non-query dummies; this will naturally take
longer.

-greg

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss


Re: [Rdkit-discuss] Substructure search

2009-11-07 Thread Greg Landrum
On Sat, Nov 7, 2009 at 5:43 PM, Evgueni Kolossov ekolos...@gmail.com wrote:
are you doing the replace dummy for each fragment every time before
you do a search or do you do it just once?
 I am iterating through all the structures and all the fragments:
 so for each structure do
    for each fragment ( and need to replace dummy
 here)

 probably can do it another way:
 for each fragment do
     for each structure

 In this case will need to do it only once for each fragment

yes, I imagine that will help a lot.

or:
for each fragment do: replace dummy atom
for each structure do
   for each fragment do something

-greg

-greg

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Rdkit-discuss mailing list
Rdkit-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rdkit-discuss