Re: [Rdkit-discuss] Substructure search
Combining two answers into one: On Fri, Nov 6, 2009 at 7:59 AM, Evgueni Kolossov ekolos...@gmail.com wrote: Hi Greg, Yes, this is solution I been thinking about as well but there is 2 problems: 1. It will slow dawn mapping process which is slow already 2. What atom to use for replacement? I'm not sure I understand what you mean about slowing down the mapping process. If you replace the dummies in your fragments with query atoms, as I proposed in the sample code in my earlier message, the substructure search should not be substantially slower. The replacement itself also won't take that long, unless you really have a *lot* of fragments. On Fri, Nov 6, 2009 at 9:03 AM, Evgueni Kolossov ekolos...@gmail.com wrote: I think you should distinguish between dummy atoms and connection points - for fragments it is connection points we are talking about. The code doesn't understand anything about connection points... it just has atoms. Dummy atoms are atoms with atomic number zero. The substructure matching code applied to normal Atoms (i.e. not QueryAtoms) compares two atoms by checking to see if their atomic numbers match, so dummies match dummies. Additionally, when isotopes are specified, it checks that the specified isotopes match. QueryAtoms, on the other had, allow client code to specify the function that's used for matching. The example I provided showed how to use a function that matches any atom; which I think is what you are looking for. So, it suppose to ignore this atom (but not bond!) during matching process. May be just add another bool flag to allow user select different behavior? The substructure matching uses atoms and bonds, and returns the results as lists of atom indices; how (and why) would you propose to ignore an atom but not a bond? -greg -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Substructure search
Thanks Greg, I have calculated it will slow down on about 30% using this replacement which is significant for big datasets. The substructure matching uses atoms and bonds, and returns the results as lists of atom indices; how (and why) would you propose to ignore an atom but not a bond? I mean take bond in account as it is but use match any for dummy atom Regards, Evgueni 2009/11/7 Greg Landrum greg.land...@gmail.com Combining two answers into one: On Fri, Nov 6, 2009 at 7:59 AM, Evgueni Kolossov ekolos...@gmail.com wrote: Hi Greg, Yes, this is solution I been thinking about as well but there is 2 problems: 1. It will slow dawn mapping process which is slow already 2. What atom to use for replacement? I'm not sure I understand what you mean about slowing down the mapping process. If you replace the dummies in your fragments with query atoms, as I proposed in the sample code in my earlier message, the substructure search should not be substantially slower. The replacement itself also won't take that long, unless you really have a *lot* of fragments. On Fri, Nov 6, 2009 at 9:03 AM, Evgueni Kolossov ekolos...@gmail.com wrote: I think you should distinguish between dummy atoms and connection points - for fragments it is connection points we are talking about. The code doesn't understand anything about connection points... it just has atoms. Dummy atoms are atoms with atomic number zero. The substructure matching code applied to normal Atoms (i.e. not QueryAtoms) compares two atoms by checking to see if their atomic numbers match, so dummies match dummies. Additionally, when isotopes are specified, it checks that the specified isotopes match. QueryAtoms, on the other had, allow client code to specify the function that's used for matching. The example I provided showed how to use a function that matches any atom; which I think is what you are looking for. So, it suppose to ignore this atom (but not bond!) during matching process. May be just add another bool flag to allow user select different behavior? The substructure matching uses atoms and bonds, and returns the results as lists of atom indices; how (and why) would you propose to ignore an atom but not a bond? -greg -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Substructure search
On Sat, Nov 7, 2009 at 12:35 PM, Evgueni Kolossov ekolos...@gmail.com wrote: I have calculated it will slow down on about 30% using this replacement which is significant for big datasets. Agreed, that's a huge difference. How does it come about? Where is the time being spent? -greg -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Substructure search
On Sat, Nov 7, 2009 at 3:44 PM, Evgueni Kolossov ekolos...@gmail.com wrote: I have not done full profiling - this came just from the difference between time with and without Replace Dummmy are you doing the replace dummy for each fragment every time before you do a search or do you do it just once? I would guess that replacing the dummy atoms shouldn't take very long at all, and then doing the searches should also be reasonably quick. One complication might be that having the query atoms will return a lot more matches than the non-query dummies; this will naturally take longer. -greg -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
Re: [Rdkit-discuss] Substructure search
On Sat, Nov 7, 2009 at 5:43 PM, Evgueni Kolossov ekolos...@gmail.com wrote: are you doing the replace dummy for each fragment every time before you do a search or do you do it just once? I am iterating through all the structures and all the fragments: so for each structure do for each fragment ( and need to replace dummy here) probably can do it another way: for each fragment do for each structure In this case will need to do it only once for each fragment yes, I imagine that will help a lot. or: for each fragment do: replace dummy atom for each structure do for each fragment do something -greg -greg -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Rdkit-discuss mailing list Rdkit-discuss@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/rdkit-discuss