Re: [OpenBabel-Devel] Convenience functions

Stefano Forli Mon, 06 Mar 2017 11:32:42 -0800

Hi Noel,
I think I understand your idea, and it's totally compatible with the current 
design we're 
following. Indeed, the API is going to be for the atom.


In fact, in the actual syntax of the datafile I sent before has an extra column 
(which I 
omitted) that specifies the element of the pattern, which is used as a key for 
the hash 
storing the patterns.
The advice on using the first atom is clever, although, I think the element 
still needs to 
be specified (unless you think there's an unambiguous way to get that from the 
pattern).

I agree that it's a good idea to move away from text files, although there's a 
certain 
degree of convenience in storing and managing large data sets (e.g., force 
field, atom 
type parameters) outside the actual source code

One possible compromise may be to keep the text files as part of the source 
'package' and 
generate .h files on the fly at compilation time. I've seen this pattern in 
several 
programs, and I think it may be convenient.
On the other hand, I don't know how much work it is to write and maintain such 
on-the-fly 
scripts/programs, so feel free to ignore the suggestion if that's the case.

I hope to come back with a working prototype soon.
Thanks for the guidance,

S


On 03/06/2017 07:08 AM, Noel O'Boyle wrote:
> The previous discussion was about having such a function in the public API. 
> But anyway,
> let's say you do use smarts, you can still combine it with a switch statement 
> on element
> and maybe on something else so that you don't end up testing the oxygen 
> patterns against a
> nitrogen (this will already make it three times faster or so). Regarding your 
> specific
> examples I think you just need to rewrite so that the first atom is the query 
> atom. Also
> avoid using an explicit hydrogen - this won't work like you expect.
>
> I'm hoping to do away with dependencies on text files (as this complicates 
> having an all
> in one binary) at some point so putting your patterns in a const char array 
> would be
> welcome. Doing the perception once and then storing it may indeed be a good 
> idea but the
> user can do this without us getting involved. Maintaining perceived data is a 
> source of
> errors and inefficiencies in the current codebase.
>
> Noel
>
> On Thursday, 2 March 2017, Stefano Forli <fo...@scripps.edu 
> <mailto:fo...@scripps.edu>> wrote:
>
>     Noel,
>
>     I'm not sure how to describe all the cases with a switch statement.
>     The original idea of using SMARTS came from your suggestion:
>
>         The thing is, if I was matching a functional group, I would just use a
>         SMARTS pattern. This is a general solution for 99% of cases, rather
>         than having a function for each one. We have a set of SMARTS patterns
>         for functional groups in the .txt file for FP3. The added benefit of a
>         SMARTS pattern is that you have the match of each atom to each virtual
>         atom of the pattern. For a function of OBAtom, it's just boolean true
>         or false.
>
>
>     and you convinced me that's an excellent idea.
>
>     Hydrogen bond types can be: 0 (none), 1 (acceptor), 2 (donor), 
> acceptor/donor (3), and
>     there are several conditions that need to be checked.
>     For that, I'm creating SMARTS patterns which would reproduce the behavior 
> of the
>     current IsHBondAcceptor() which are stored in text datafile. As an 
> example, this is a
>     subset of the conditions in which oxygen is not considered an hbonding 
> oxygen:
>
>      # nitro-oxygen
>      [#8]~N~[#8]           0 0
>
>      # aromatic-bound oxygen
>      [a]-[#8]-[a]                1 0
>
>      # ester sp3 oxygen
>      [*]-[#8]-[#6]=[#8]     1 0
>
>      # sulfone
>      [#6][#16;X4](=[#8])(=[#8])[#6] 2 0
>
>      # hydroxyl
>      [#8]-[#1]      0 3
>
>     The line format is (pattern, patternIdx, hbType). Basically, the typer 
> checks the
>     condition OBAtom.GetIdx() == pattern[patternIdx], then return type 
> "hbType".
>     Between N, O, S and F, there are about 20 or so patterns to be tested, so 
> I though
>     doing the perception once and storing it would have been a good idea. On 
> the other
>     hand, I don't think performance is an issue for this kind of functions.
>
>     I have no idea how to implement this with a switch statement without 
> having to do the
>     bond/atom walking manually, as I did in the functions I contributed 
> already.
>
>     Hope this helps,
>
>     S
>
>
>
>
>     On 03/02/2017 01:28 PM, Noel O'Boyle wrote:
>
>         The presence of a class or not is an "implementation detail", as they
>         say. That is, the user doesn't need to know anything about that.
>         However, as I am currently replacing the use of SMARTS patterns for
>         atom typers with switch statements, I'd recommend you to avoid using
>         SMARTS for this in the first place and thus avoid any need for a class
>         or a file to read things from. Maybe you should describe the exact
>         problem you are solving with some examples and I can explain better.
>
>         - Noel
>
>         On 2 March 2017 at 19:51, Stefano Forli <fo...@scripps.edu> wrote:
>
>             Noel,
>             thanks for the clarification, the only reason why I was looking 
> at the lazy
>             mechanism was because of previous code.
>
>             I'm OK with the simple function, although I think there's still a 
> need for a
>             dedicated class behind which gets called to parse the different 
> SMARTS
>             patterns from a data file and match them with the requested atom.
>
>             S
>
>
>
>
>             On 03/02/2017 02:22 AM, Noel O'Boyle wrote:
>
>
>                 Hi Stefano,
>
>                 Sounds good. But the guidelines are not unfortunately the 
> existing
>                 guide. I'm currently in the process of rewriting/removing as 
> much of
>                 the croft as possible and the Lazy Evaluation mechanism is in 
> my
>                 sights. It's a legacy from the original codebase. It'll be 
> difficult
>                 to change this now, but at least we can avoid adding 
> anything. I won't
>                 go too much into why this is, but suffice to say that OB 
> developers
>                 spend some time working around the lazy evaluation or if they 
> don't
>                 they triggeri it multiple times unneccessarily.
>
>                 In short, see if you can write a function that just takes an 
> atom (or
>                 pair of atoms, or whatever) and returns an answer., e.g.
>                 OBAtomGetHBondType(OBAtom*). The simpler solution really is 
> the best
>                 one here.
>
>                 - Noel
>
>                 On 1 March 2017 at 23:17, Stefano Forli <fo...@scripps.edu> 
> wrote:
>
>
>                     Noel,
>                     quite the contrary, I'm far from being pissed at you, by 
> all means.
>                     I like your suggestion, but I don't know if I can do it 
> right away, there
>                     are still a few things about the facade programming 
> paradigm that escape
>                     my
>                     hobbist programming training.
>
>                     Following up on the discussion about the hydrogen bond, I 
> had a quick
>                     chat
>                     with one of my students which is starting to write code 
> based on
>                     OpenBabel.
>                     We took a shot at designing an OBHBondTyper class which 
> should behave
>                     similarly to OBAromaticTyper, and my idea was to store 
> the information in
>                     a
>                     vector.
>
>                     If I'm not mistaken, though, the aromatic typer works in 
> a lazy way that
>                     looks similar to what you're describing for the 
> OBResidueFacade, storing
>                     the
>                     information in a flag instead of a vector, is that 
> correct? Or is the
>                     flag
>                     is just in vector? I tried looking for the definition of 
> HasFlag(), but I
>                     couldn't find it.
>
>                     Either way, I was thinking to start by writing this HB 
> class (which I
>                     probably understand better), try implementing the 
> ob-standard lazy
>                     evaluation mechanism, and integrate it the 
> [Begin|End]Modify() process.
>
>                     We can do a git pull and then fix and adapt it according 
> to the feedback
>                     other developers suggestions.
>
>                     This would be a great chance for us to understand how to 
> contribute code
>                     that would integrate better and match the guidelines you 
> guys follow.
>
>                     Thanks (for the patience),
>
>                     S
>
>
>
>
>
>                     On 02/26/2017 02:05 AM, Noel O'Boyle wrote:
>
>
>
>                         Thanks Stefano for not getting (too!) pissed off with 
> me. :-) We still
>                         don't have the clear API guidelines you asked for 
> last time, but I
>                         think that these discussions are clarifying things 
> for me at least,
>                         and we could probably start writing something up.
>
>                         I was thinking that your idea is similar to the 
> rationale behind
>                         OBStereoFacade. Well, simply put, we could have an 
> OBResidueFacade
>                         class which you initialise with a molecule and behind 
> the scenes it
>                         then stores the Atom->PDBAtomId correspondance in a 
> map or vector by
>                         iterating over the residues. Then you would have the 
> same method you
>                         described, but it wouldn't do any iteration, but just 
> look up the Id
>                         in the std::map or vector. So, it's a convenience 
> class, separate from
>                         the core API, and it's efficient. If no-one disagrees 
> and this makes
>                         sense to you, do you want to have a go at writing it?
>
>                         - Noel
>
>                         On 25 February 2017 at 23:03, Geoffrey Hutchison
>                         <geoff.hutchi...@gmail.com> wrote:
>
>
>
>                                 About the PDB atom name, unfortunately I 
> don't fully
>                                 understand the
>                                 performance issue implied in my suggestion, 
> but from an
>                                 interface
>                                 point of
>                                 view, it seems more intuitive to access an 
> atom property
>                                 from OBAtom
>                                 instead
>                                 of going back to the OBResidue (and pass the 
> OBAtom).
>
>
>
>
>                             The OBAtom should already have a pointer to the 
> OBResidue. If
>                             it can be
>                             done with generic data in the OBAtom, that's 
> fine, but I don't
>                             think I
>                             want
>                             to add data to each OBAtom. If I translate a 
> nanotube or
>                             nanoparticle
>                             (or
>                             any other non-biomolecule) I'd have to store in 
> memory a bunch of
>                             residue
>                             pointers that would never get used.
>
>                                 In this case, the best approach I can think 
> of is to write a
>                                 OBHBondTyper class to perceive the hbond 
> character
>                                 (similarly to what
>                                 OBAromaticTyper does), then each atom should 
> have a simple
>                                 IsHBond()
>                                 method
>                                 that would return 0, 1 (donor), 2 (acceptor),
>                                 3(donor/acceptor),
>                                 4(...?).
>
>
>
>
>                             Yes, this is the way to go to add "convenience 
> functions." Noel's
>                             suggestion is to keep the core API restrained, 
> but that
>                             doesn't mean
>                             there
>                             can't be convenience classes or static methods. I 
> think HBondTyper
>                             would be
>                             a good example, since there can (and should) be 
> multiple models of
>                             "what is
>                             a hydrogen bond."
>
>                                 At the same time, I support Maciek's idea: 
> useful
>                                 functions that can
>                                 streamline the use of scripting languages 
> such as Python
>                                 should be
>                                 grouped
>                                 and conserved.
>
>
>
>
>                             I'm not sure why these can't be put into wrappers 
> like Pybel.
>                             As is,
>                             there is considerable "helper code" in Pybel, etc.
>
>                             -Geoff
>
>
>
>
>                     --
>
>                      Stefano Forli, PhD
>
>                      Assistant Professor of ISCB
>                      Molecular Graphics Laboratory
>
>                      Dept. of Integrative Structural
>                      and Computational Biology, MB-112A
>                      The Scripps Research Institute
>                      10550  North Torrey Pines Road
>                      La Jolla,  CA 92037-1000,  USA.
>
>                         tel: +1 (858)784-2055
>                         fax: +1 (858)784-2860
>                         email: fo...@scripps.edu
>                         http://www.scripps.edu/~forli/
>
>
>
>             --
>
>              Stefano Forli, PhD
>
>              Assistant Professor of ISCB
>              Molecular Graphics Laboratory
>
>              Dept. of Integrative Structural
>              and Computational Biology, MB-112A
>              The Scripps Research Institute
>              10550  North Torrey Pines Road
>              La Jolla,  CA 92037-1000,  USA.
>
>                 tel: +1 (858)784-2055
>                 fax: +1 (858)784-2860
>                 email: fo...@scripps.edu
>                 http://www.scripps.edu/~forli/
>
>
>     --
>
>      Stefano Forli, PhD
>
>      Assistant Professor of ISCB
>      Molecular Graphics Laboratory
>
>      Dept. of Integrative Structural
>      and Computational Biology, MB-112A
>      The Scripps Research Institute
>      10550  North Torrey Pines Road
>      La Jolla,  CA 92037-1000,  USA.
>
>         tel: +1 (858)784-2055
>         fax: +1 (858)784-2860
>         email: fo...@scripps.edu
>         http://www.scripps.edu/~forli/
>

-- 

  Stefano Forli, PhD

  Assistant Professor of ISCB
  Molecular Graphics Laboratory

  Dept. of Integrative Structural
  and Computational Biology, MB-112A
  The Scripps Research Institute
  10550  North Torrey Pines Road
  La Jolla,  CA 92037-1000,  USA.

     tel: +1 (858)784-2055
     fax: +1 (858)784-2860
     email: fo...@scripps.edu
     http://www.scripps.edu/~forli/

------------------------------------------------------------------------------
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel

Re: [OpenBabel-Devel] Convenience functions

Reply via email to