On 20 October 2010 05:58, Geoffrey Hutchison <ge...@geoffhutchison.net>wrote:
> > and there's a comment from ghutchis last year saying he should have some
> code that does this fairly soon, so I guess this question is directed at
> Geoff: Is this something that's stayed out because it's low priority or
> because it turns out to be more difficult than anticipated?
> >
> > If someone can point me in the right direction (and it doesn't involve
> major re-writing of the smarts parser) I'd be happy to try and get this
> working...
>
> It's a lot more work than anticipated. (I also opened my big mouth recently
> about fixing cis/trans matching for SMARTS.) For one, the code for *parsing*
> the SMARTS is tricky to modify. I have also seen these projects have also
> been lower priority than fixing stereo errors, crashes, etc.
>
> For parsing, there's currently a block on '.' but once you remove that,
> you'd want to handle it something like an AND operator. (Sorry, it's late,
> so I can't remember operator precedence on '.' versus the ANDs in SMARTS.)
>
> Once you add '.' as another type of operator, my approach was something
> like this:
> 1) Ensure that each component matched the OBMol as a SMARTS itself (i.e.,
> throw away easy rejects)
> 2) Ensure that there are at least as many contiguous fragments in the OBMol
> as there are components in the SMARTS (more rejects).
> 3) For each component SMARTS, go through the matches and determine which
> contiguous fragment it's in.
> 4) Now the hard work -- iterating through the components and their matches:
> * OK, now you have a list of component SMARTS, each with a list of
> matching contiguous fragments:
> * for component one, take a matching fragment.
> * check component two, see if there's a match that's *not* the same as
> component one
> * If no, reject early
> * If yes, continue down the line to make sure we can find unique fragments
> for each SMARTS component
>
> I'm sure that's the naive, brute-force method, but it should work. I'd be
> more than happy to give further pointers and/or hand-holding.
>
Hi Geoff,
I've looked through the code in parsmart.cpp - and yes, doing this by
editing the parser/matcher looks like it's quite tricky... I guess the
parser needs to do 2 things when it hits a ".":
1) Start a new "part" (AtomExpr->part seems to be in place for this)
2) Keep track of whether this part must be in the same/different/doesn't
matter component - (C).(C) vs (C.C) vs C.C. I'm not sure where the best
place to represent this is - (mostly because I haven't got my head round how
the matcher interacts with the parsed Pattern).
Then the matcher needs to take this into account, and yup, it looks like a
lot of work..! :)
A quick solution would be to create a OBDisconnectedSmartsPattern class that
parses the top-level dots/parentheses in the expression and creates several
OBSmartsPattern instances, performing the steps you outlined above to yield
overall matches. There are two problems with this that I can see:
1) We assume that there's nothing higher precedence in the pattern than the
dot and associated parentheses - but these are referred to as "zero-level
parentheses" in the Daylight docs, so it might be fair to make them highest
level. (Certainly, the >> reaction symbol is higher precedence, but I don't
think this is handled by Openbabel at all at the moment?)
2) Smarts parsing would occur in two different places, which makes the code
harder to follow.
For what I'm doing I can live with both of those, so I think I'll try doing
it this way first - if it works and you think having disconnected matching's
worth the extra layer of parsing and possible matching inefficiency it
should be possible to alter OBSmartsPattern so that internally it
instantiates several sub OBSmartsPatterns, not that that' a very beautiful
way of doing it... but anyway, I'll try and get it working without touching
the core parsmart code first.
Cheers,
Fred
> Best regards,
> -Geoff
------------------------------------------------------------------------------
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store
http://p.sf.net/sfu/nokia-dev2dev
_______________________________________________
OpenBabel-Devel mailing list
OpenBabel-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openbabel-devel