Re: [OpenBabel-Devel] Convenience functions

Noel O'Boyle Mon, 06 Mar 2017 07:09:41 -0800

The previous discussion was about having such a function in the public API.
But anyway, let's say you do use smarts, you can still combine it with a
switch statement on element and maybe on something else so that you don't
end up testing the oxygen patterns against a nitrogen (this will already
make it three times faster or so). Regarding your specific examples I think
you just need to rewrite so that the first atom is the query atom. Also
avoid using an explicit hydrogen - this won't work like you expect.


I'm hoping to do away with dependencies on text files (as this complicates
having an all in one binary) at some point so putting your patterns in a
const char array would be welcome. Doing the perception once and then
storing it may indeed be a good idea but the user can do this without us
getting involved. Maintaining perceived data is a source of errors and
inefficiencies in the current codebase.

Noel

On Thursday, 2 March 2017, Stefano Forli <[email protected]> wrote:

> Noel,
>
> I'm not sure how to describe all the cases with a switch statement.
> The original idea of using SMARTS came from your suggestion:
>
> The thing is, if I was matching a functional group, I would just use a
>> SMARTS pattern. This is a general solution for 99% of cases, rather
>> than having a function for each one. We have a set of SMARTS patterns
>> for functional groups in the .txt file for FP3. The added benefit of a
>> SMARTS pattern is that you have the match of each atom to each virtual
>> atom of the pattern. For a function of OBAtom, it's just boolean true
>> or false.
>>
>
> and you convinced me that's an excellent idea.
>
> Hydrogen bond types can be: 0 (none), 1 (acceptor), 2 (donor),
> acceptor/donor (3), and there are several conditions that need to be
> checked.
> For that, I'm creating SMARTS patterns which would reproduce the behavior
> of the current IsHBondAcceptor() which are stored in text datafile. As an
> example, this is a subset of the conditions in which oxygen is not
> considered an hbonding oxygen:
>
>  # nitro-oxygen
>  [#8]~N~[#8]           0 0
>
>  # aromatic-bound oxygen
>  [a]-[#8]-[a]                1 0
>
>  # ester sp3 oxygen
>  [*]-[#8]-[#6]=[#8]     1 0
>
>  # sulfone
>  [#6][#16;X4](=[#8])(=[#8])[#6] 2 0
>
>  # hydroxyl
>  [#8]-[#1]      0 3
>
> The line format is (pattern, patternIdx, hbType). Basically, the typer
> checks the condition OBAtom.GetIdx() == pattern[patternIdx], then return
> type "hbType".
> Between N, O, S and F, there are about 20 or so patterns to be tested, so
> I though doing the perception once and storing it would have been a good
> idea. On the other hand, I don't think performance is an issue for this
> kind of functions.
>
> I have no idea how to implement this with a switch statement without
> having to do the bond/atom walking manually, as I did in the functions I
> contributed already.
>
> Hope this helps,
>
> S
>
>
>
>
> On 03/02/2017 01:28 PM, Noel O'Boyle wrote:
>
>> The presence of a class or not is an "implementation detail", as they
>> say. That is, the user doesn't need to know anything about that.
>> However, as I am currently replacing the use of SMARTS patterns for
>> atom typers with switch statements, I'd recommend you to avoid using
>> SMARTS for this in the first place and thus avoid any need for a class
>> or a file to read things from. Maybe you should describe the exact
>> problem you are solving with some examples and I can explain better.
>>
>> - Noel
>>
>> On 2 March 2017 at 19:51, Stefano Forli <[email protected]> wrote:
>>
>>> Noel,
>>> thanks for the clarification, the only reason why I was looking at the
>>> lazy
>>> mechanism was because of previous code.
>>>
>>> I'm OK with the simple function, although I think there's still a need
>>> for a
>>> dedicated class behind which gets called to parse the different SMARTS
>>> patterns from a data file and match them with the requested atom.
>>>
>>> S
>>>
>>>
>>>
>>>
>>> On 03/02/2017 02:22 AM, Noel O'Boyle wrote:
>>>
>>>>
>>>> Hi Stefano,
>>>>
>>>> Sounds good. But the guidelines are not unfortunately the existing
>>>> guide. I'm currently in the process of rewriting/removing as much of
>>>> the croft as possible and the Lazy Evaluation mechanism is in my
>>>> sights. It's a legacy from the original codebase. It'll be difficult
>>>> to change this now, but at least we can avoid adding anything. I won't
>>>> go too much into why this is, but suffice to say that OB developers
>>>> spend some time working around the lazy evaluation or if they don't
>>>> they triggeri it multiple times unneccessarily.
>>>>
>>>> In short, see if you can write a function that just takes an atom (or
>>>> pair of atoms, or whatever) and returns an answer., e.g.
>>>> OBAtomGetHBondType(OBAtom*). The simpler solution really is the best
>>>> one here.
>>>>
>>>> - Noel
>>>>
>>>> On 1 March 2017 at 23:17, Stefano Forli <[email protected]> wrote:
>>>>
>>>>>
>>>>> Noel,
>>>>> quite the contrary, I'm far from being pissed at you, by all means.
>>>>> I like your suggestion, but I don't know if I can do it right away,
>>>>> there
>>>>> are still a few things about the facade programming paradigm that
>>>>> escape
>>>>> my
>>>>> hobbist programming training.
>>>>>
>>>>> Following up on the discussion about the hydrogen bond, I had a quick
>>>>> chat
>>>>> with one of my students which is starting to write code based on
>>>>> OpenBabel.
>>>>> We took a shot at designing an OBHBondTyper class which should behave
>>>>> similarly to OBAromaticTyper, and my idea was to store the information
>>>>> in
>>>>> a
>>>>> vector.
>>>>>
>>>>> If I'm not mistaken, though, the aromatic typer works in a lazy way
>>>>> that
>>>>> looks similar to what you're describing for the OBResidueFacade,
>>>>> storing
>>>>> the
>>>>> information in a flag instead of a vector, is that correct? Or is the
>>>>> flag
>>>>> is just in vector? I tried looking for the definition of HasFlag(),
>>>>> but I
>>>>> couldn't find it.
>>>>>
>>>>> Either way, I was thinking to start by writing this HB class (which I
>>>>> probably understand better), try implementing the ob-standard lazy
>>>>> evaluation mechanism, and integrate it the [Begin|End]Modify() process.
>>>>>
>>>>> We can do a git pull and then fix and adapt it according to the
>>>>> feedback
>>>>> other developers suggestions.
>>>>>
>>>>> This would be a great chance for us to understand how to contribute
>>>>> code
>>>>> that would integrate better and match the guidelines you guys follow.
>>>>>
>>>>> Thanks (for the patience),
>>>>>
>>>>> S
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 02/26/2017 02:05 AM, Noel O'Boyle wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks Stefano for not getting (too!) pissed off with me. :-) We still
>>>>>> don't have the clear API guidelines you asked for last time, but I
>>>>>> think that these discussions are clarifying things for me at least,
>>>>>> and we could probably start writing something up.
>>>>>>
>>>>>> I was thinking that your idea is similar to the rationale behind
>>>>>> OBStereoFacade. Well, simply put, we could have an OBResidueFacade
>>>>>> class which you initialise with a molecule and behind the scenes it
>>>>>> then stores the Atom->PDBAtomId correspondance in a map or vector by
>>>>>> iterating over the residues. Then you would have the same method you
>>>>>> described, but it wouldn't do any iteration, but just look up the Id
>>>>>> in the std::map or vector. So, it's a convenience class, separate from
>>>>>> the core API, and it's efficient. If no-one disagrees and this makes
>>>>>> sense to you, do you want to have a go at writing it?
>>>>>>
>>>>>> - Noel
>>>>>>
>>>>>> On 25 February 2017 at 23:03, Geoffrey Hutchison
>>>>>> <[email protected]> wrote:
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> About the PDB atom name, unfortunately I don't fully understand the
>>>>>>>> performance issue implied in my suggestion, but from an interface
>>>>>>>> point of
>>>>>>>> view, it seems more intuitive to access an atom property from OBAtom
>>>>>>>> instead
>>>>>>>> of going back to the OBResidue (and pass the OBAtom).
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The OBAtom should already have a pointer to the OBResidue. If it can
>>>>>>> be
>>>>>>> done with generic data in the OBAtom, that's fine, but I don't think
>>>>>>> I
>>>>>>> want
>>>>>>> to add data to each OBAtom. If I translate a nanotube or nanoparticle
>>>>>>> (or
>>>>>>> any other non-biomolecule) I'd have to store in memory a bunch of
>>>>>>> residue
>>>>>>> pointers that would never get used.
>>>>>>>
>>>>>>> In this case, the best approach I can think of is to write a
>>>>>>>> OBHBondTyper class to perceive the hbond character (similarly to
>>>>>>>> what
>>>>>>>> OBAromaticTyper does), then each atom should have a simple IsHBond()
>>>>>>>> method
>>>>>>>> that would return 0, 1 (donor), 2 (acceptor), 3(donor/acceptor),
>>>>>>>> 4(...?).
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Yes, this is the way to go to add "convenience functions." Noel's
>>>>>>> suggestion is to keep the core API restrained, but that doesn't mean
>>>>>>> there
>>>>>>> can't be convenience classes or static methods. I think HBondTyper
>>>>>>> would be
>>>>>>> a good example, since there can (and should) be multiple models of
>>>>>>> "what is
>>>>>>> a hydrogen bond."
>>>>>>>
>>>>>>> At the same time, I support Maciek's idea: useful functions that can
>>>>>>>> streamline the use of scripting languages such as Python should be
>>>>>>>> grouped
>>>>>>>> and conserved.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I'm not sure why these can't be put into wrappers like Pybel. As is,
>>>>>>> there is considerable "helper code" in Pybel, etc.
>>>>>>>
>>>>>>> -Geoff
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>  Stefano Forli, PhD
>>>>>
>>>>>  Assistant Professor of ISCB
>>>>>  Molecular Graphics Laboratory
>>>>>
>>>>>  Dept. of Integrative Structural
>>>>>  and Computational Biology, MB-112A
>>>>>  The Scripps Research Institute
>>>>>  10550  North Torrey Pines Road
>>>>>  La Jolla,  CA 92037-1000,  USA.
>>>>>
>>>>>     tel: +1 (858)784-2055
>>>>>     fax: +1 (858)784-2860
>>>>>     email: [email protected]
>>>>>     http://www.scripps.edu/~forli/
>>>>>
>>>>
>>>
>>> --
>>>
>>>  Stefano Forli, PhD
>>>
>>>  Assistant Professor of ISCB
>>>  Molecular Graphics Laboratory
>>>
>>>  Dept. of Integrative Structural
>>>  and Computational Biology, MB-112A
>>>  The Scripps Research Institute
>>>  10550  North Torrey Pines Road
>>>  La Jolla,  CA 92037-1000,  USA.
>>>
>>>     tel: +1 (858)784-2055
>>>     fax: +1 (858)784-2860
>>>     email: [email protected]
>>>     http://www.scripps.edu/~forli/
>>>
>>
> --
>
>  Stefano Forli, PhD
>
>  Assistant Professor of ISCB
>  Molecular Graphics Laboratory
>
>  Dept. of Integrative Structural
>  and Computational Biology, MB-112A
>  The Scripps Research Institute
>  10550  North Torrey Pines Road
>  La Jolla,  CA 92037-1000,  USA.
>
>     tel: +1 (858)784-2055
>     fax: +1 (858)784-2860
>     email: [email protected]
>     http://www.scripps.edu/~forli/
>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot

_______________________________________________
OpenBabel-Devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Re: [OpenBabel-Devel] Convenience functions

Reply via email to