[opencog-dev] Re: Pattern Matching performance

Linas Vepstas Mon, 12 Mar 2018 22:08:42 -0700

I hope you can read the below. It took me a while to type it in.
I think maybe I'm the victim of some kind of google A/B testing
today, because for me, my own emails are totally unreadable.
Let me know if you suspect you didn't receive everything I sent ...


--linas

On Tue, Mar 13, 2018 at 12:05 AM, Linas Vepstas <linasveps...@gmail.com> wrote:
> OK, trying again. Maybe the third time it will work right.
>
> -- Linas
>
> On Tue, Mar 13, 2018 at 12:04 AM, Linas Vepstas <linasveps...@gmail.com> 
> wrote:
>> Resending. My email agent is formatting and indenting bizarrely.
>> I don't understand why its 2018 and something as simple as email
>> has ugly and unreliable formating.  Is this what the heat death of
>> the universe looks like?
>>
>> --linas
>>
>> On Mon, Mar 12, 2018 at 11:55 PM, Linas Vepstas <linasveps...@gmail.com> 
>> wrote:
>>> Hi,
>>>
>>> To repeat my earlier remarks, and add a few more:
>>>
>>> On Mon, Mar 12, 2018 at 3:42 PM, Alexey Potapov <pas.a...@gmail.com> wrote:
>>>>
>>>> Yes. But Pattern Matcher compares all pairs of bounding boxes making it
>>>> quadratic.
>>
>>> Yes, there is supposed to be a space-server that is supposed to be optimized
>>> for these kinds of searches. There is one, but its super-minimal, and not 
>>> at all
>>> integrated with the pattern matcher, or anything else for that matter.  
>>> Smart
>>> searches on bounding boxes is completely green-field development, for the
>>> atomspace. We've kind of got very nearly nothing for this.
>>
>>
>>>>> Yes, but keep in mind that the PM is also Turing complete because you can
>>>>> call any function within a query (including the PM itself).
>>>>
>>>>
>>>> This is quite problematic. Basic processes should not execute anything
>>>> dangerous that can take too much time or loop forever and cannot be
>>>> interrupted. Thus, we should either not treat PM as a basic process, or
>>>> should restrict its capabilities (and shift responsibility for evaluating
>>>> arbitrary code to other processes).
>>
>>
>>> This is misleading or a mis-understanding. Its not the correct way to think
>>> about it.  The current pattern matcher is 2 or three or 4 things matched 
>>> into
>>> one:
>>>
>>> A) A generic subgraph isomorphism solver. Since this is an NP complete
>>> problem, it's certainly possible to create pathologically slow queries.
>>>
>>> B) A way of combining subgraphs using a crisp-logic boolean algebra
>>> (actually a Heyting algebra) which we have very vague intentions of
>>> promoting into something probabilistic.  Which, of course, if this was done,
>>> would layer on an additional combinatoric explosion.  It would be fruitful 
>>> to
>>> discuss the wisdom or stupidity of this particular task. Or alternative 
>>> designs
>>> for it.
>>>
>>> C) The ability to perform subgraph isomorphism with so-called "axiom 
>>> schemata".
>>> An "axiom schemata" is roughly an infinite collection of relations, for 
>>> example
>>> "less than" over the integers or rationals or reals.   This means that the 
>>> pattern
>>> matcher is kind-of-like-ish a "satisfiability modulo theories" (SMT) 
>>> solver.  The
>>> API for specifying a theory at this point is rather very simplistic. The 
>>> "virtual link"
>>> is  that API. It says, basically "implement your model theory here, as C++ 
>>> code,
>>> and we will automatically do the satisfiability modulo your theory for you"
>>>
>>> Currently supported theories are the equational theory (EqualLink) and 
>>> numeric
>>> inequaltiy (GreaterThanLink)  An example of a nice-to-have theory would be
>>> linear algebra - done right, this could  solve your space-time bounding box 
>>> problem
>>> for starters, and linear programming type problems if anyone cared about 
>>> that.
>>> Maybe matroids. Whatever. Dunno.  Another nice-to-have would be naive set 
>>> theory
>>> which could help lay a cornerstone of probability done right (TM) but this 
>>> would be
>>> a long and difficult but veryinteresting discussion.
>>>
>>> D) Once a matching subgraph is found, you can launch arbitrary 
>>> C++/scheme/python
>>> code to do something with that subgraph. So that's unbounded.
>>
>>
>>>>> So, the first thing you should do is build a good benchmark tool, then we,
>>>>> you and the rest of opencog community, can supply it with a collection of
>>>>> critical tests.
>>>>
>>>> How do you see such tool?
>>
>>> Unclear. I am interested in a tool that tells me if performance got worse 
>>> after
>>> a particular code change or bug fix.  Some of our fixes accidentally slow 
>>> things
>>> down (by a lot) and no one noticies for months or half a year.
>>
>>>> We have some datasets of varying sizes and
>>>> queries, and simply run PM on these queries and measure the time. What can
>>>> be unified/automated?
>>
>>> Yes!?
>>
>>>> Running all tests and writing log files with
>>>> computation times? Anything else?
>>
>>> I don't care about log files.
>>
>>>> I agree that we need several (many?) tests to be sure that some changes
>>>> didn't affect any types of queries, but isn't it just a script? Or do you
>>>> have something much more complicated in mind?
>>
>>> Take a look at 3D graphics performance: there are two types of benchmark:
>>> triangles per second, lines per second, texture maps per second.
>>
>>> The other type is "for game XYZ, frames per second".
>>
>>> We need both types of benchmarks.  Probably the first more than the second,
>>> because it helps developers more.  The second kind just tells you how 
>>> screwed
>>> up the system is today, without telling you why, or where to look.
>>
>>
>>>> In our case, the query processing time becomes unrealistically large for a
>>>> one-minute video.  If we consider the problem of search in the entire
>>>> episodic memory, it should be even not linear, but logarithmic. Low degree
>>>> polynomial complexity is ok for the task dimensions ~1000, not millions or
>>>> billions...
>>
>>> No clue what you are searching here.  Logarithmic searches mean binary tree,
>>> quad-tree, octree.  The space-time sever is an octree, but its not 
>>> integrated with
>>> the pattern matcher, and has a super-naive API.
>>
>>
>>> We have binary-trees and hash tables for atoms-by-name, by-type, but zero
>>> sophistication for numeric values. See above comments about "satisfiability
>>> modulo theories".
>>
>>
>>> Linas
>>>
>>>
>>> --
>>> cassette tapes - analog TV - film cameras - you
>>
>>
>>
>> --
>> cassette tapes - analog TV - film cameras - you
>
>
>
> --
> cassette tapes - analog TV - film cameras - you



-- 
cassette tapes - analog TV - film cameras - you

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to opencog+unsubscr...@googlegroups.com.
To post to this group, send email to opencog@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/CAHrUA379JFcMy5OwYOcya-rBxYagFuk6OzdMjFYGbt0ch%3D7mzQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

[opencog-dev] Re: Pattern Matching performance

Reply via email to