[opencog-dev] Re: Pattern Matching performance

2018-03-12 Thread Linas Vepstas
Hi,

To repeat my earlier remarks, and add a few more:


On Mon, Mar 12, 2018 at 3:42 PM, Alexey Potapov  wrote:
>
> Yes. But Pattern Matcher compares all pairs of bounding boxes making it
> quadratic.


Yes, there is supposed to be a space-server that is supposed to be optimized
for these kinds of searches. There is one, but its super-minimal, and not
at all
integrated with the pattern matcher, or anything else for that matter.
Smart
searches on bounding boxes is completely green-field development, for the
atomspace. We've kind of got very nearly nothing for this.


>> Yes, but keep in mind that the PM is also Turing complete because you can
>> call any function within a query (including the PM itself).
>
> This is quite problematic. Basic processes should not execute anything
> dangerous that can take too much time or loop forever and cannot be
> interrupted. Thus, we should either not treat PM as a basic process, or
> should restrict its capabilities (and shift responsibility for evaluating
> arbitrary code to other processes).


This is misleading or a mis-understanding. Its not the correct way to think
about it.  The current pattern matcher is 2 or three or 4 things matched
into
one:

A) A generic subgraph isomorphism solver. Since this is an NP complete
problem, it's certainly possible to create pathologically slow queries.

B) A way of combining subgraphs using a crisp-logic boolean algebra
(actually a Heyting algebra) which we have very vague intentions of
promoting into something probabilistic.  Which, of course, if this was done,
would layer on an additional combinatoric explosion.  It would be fruitful
to
discuss the wisdom or stupidity of this particular task. Or alternative
designs
for it.

C) The ability to perform subgraph isomorphism with so-called "axiom
schemata".
An "axiom schemata" is roughly an infinite collection of relations, for
example
"less than" over the integers or rationals or reals.   This means that the
pattern
matcher is kind-of-like-ish a "satisfiability modulo theories" (SMT)
solver.  The
API for specifying a theory at this point is rather very simplistic.  The
"virtual link"
is  that API. It says, basically "implement your model theory here, as C++
code,
and we will automatically do the satisfiability modulo your theory for you"

Currently supported theories are the equational theory (EqualLink) and
numeric
inequaltiy (GreaterThanLink)  An example of a nice-to-have theory would be
linear algebra - done right, this could  solve your space-time bounding box
problem
for starters, and linear programming type problems if anyone cared about
that.
Maybe matroids. Whatever. Dunno.  Another nice-to-have would be naive set
theory
which could help lay a cornerstone of probability done right (TM) but this
would be
a long and difficult but veryinteresting discussion.

D) Once a matching subgraph is found, you can launch arbitrary
C++/scheme/python
code to do something with that subgraph. So that's unbounded.


>> So, the first thing you should do is build a good benchmark tool, then
we,
>> you and the rest of opencog community, can supply it with a collection of
>> critical tests.
>
> How do you see such tool?


Unclear. I am interested in a tool that tells me if performance got worse
after
a particular code change or bug fix.  Some of our fixes accidentally slow
things
down (by a lot) and no one noticies for months or half a year.


> We have some datasets of varying sizes and
> queries, and simply run PM on these queries and measure the time. What can
> be unified/automated?


Yes!?


> Running all tests and writing log files with
> computation times? Anything else?


I don't care about log files.


> I agree that we need several (many?) tests to be sure that some changes
> didn't affect any types of queries, but isn't it just a script? Or do you
> have something much more complicated in mind?


Take a look at 3D graphics performance: there are two types of benchmark:
triangles per second, lines per second, texture maps per second.


The other type is "for game XYZ, frames per second".


We need both types of benchmarks.  Probably the first more than the second,
because it helps developers more.  The second kind just tells you how
screwed
up the system is today, without telling you why, or where to look.


> In our case, the query processing time becomes unrealistically large for a
> one-minute video.  If we consider the problem of search in the entire
> episodic memory, it should be even not linear, but logarithmic. Low degree
> polynomial complexity is ok for the task dimensions ~1000, not millions or
> billions...


No clue what you are searching here.  Logarithmic searches mean binary tree,
quad-tree, octree. The space-time sever is an octree, but its not
integrated with
the pattern matcher, and has a super-naive API.


We have binary-trees and hash tables for atoms-by-name, by-type, but zero
sophistication for numeric values. See above comments about 

[opencog-dev] Re: Pattern Matching performance

2018-03-12 Thread Linas Vepstas
I hope you can read the below. It took me a while to type it in.
I think maybe I'm the victim of some kind of google A/B testing
today, because for me, my own emails are totally unreadable.
Let me know if you suspect you didn't receive everything I sent ...

--linas

On Tue, Mar 13, 2018 at 12:05 AM, Linas Vepstas  wrote:
> OK, trying again. Maybe the third time it will work right.
>
> -- Linas
>
> On Tue, Mar 13, 2018 at 12:04 AM, Linas Vepstas  
> wrote:
>> Resending. My email agent is formatting and indenting bizarrely.
>> I don't understand why its 2018 and something as simple as email
>> has ugly and unreliable formating.  Is this what the heat death of
>> the universe looks like?
>>
>> --linas
>>
>> On Mon, Mar 12, 2018 at 11:55 PM, Linas Vepstas  
>> wrote:
>>> Hi,
>>>
>>> To repeat my earlier remarks, and add a few more:
>>>
>>> On Mon, Mar 12, 2018 at 3:42 PM, Alexey Potapov  wrote:

 Yes. But Pattern Matcher compares all pairs of bounding boxes making it
 quadratic.
>>
>>> Yes, there is supposed to be a space-server that is supposed to be optimized
>>> for these kinds of searches. There is one, but its super-minimal, and not 
>>> at all
>>> integrated with the pattern matcher, or anything else for that matter.  
>>> Smart
>>> searches on bounding boxes is completely green-field development, for the
>>> atomspace. We've kind of got very nearly nothing for this.
>>
>>
> Yes, but keep in mind that the PM is also Turing complete because you can
> call any function within a query (including the PM itself).


 This is quite problematic. Basic processes should not execute anything
 dangerous that can take too much time or loop forever and cannot be
 interrupted. Thus, we should either not treat PM as a basic process, or
 should restrict its capabilities (and shift responsibility for evaluating
 arbitrary code to other processes).
>>
>>
>>> This is misleading or a mis-understanding. Its not the correct way to think
>>> about it.  The current pattern matcher is 2 or three or 4 things matched 
>>> into
>>> one:
>>>
>>> A) A generic subgraph isomorphism solver. Since this is an NP complete
>>> problem, it's certainly possible to create pathologically slow queries.
>>>
>>> B) A way of combining subgraphs using a crisp-logic boolean algebra
>>> (actually a Heyting algebra) which we have very vague intentions of
>>> promoting into something probabilistic.  Which, of course, if this was done,
>>> would layer on an additional combinatoric explosion.  It would be fruitful 
>>> to
>>> discuss the wisdom or stupidity of this particular task. Or alternative 
>>> designs
>>> for it.
>>>
>>> C) The ability to perform subgraph isomorphism with so-called "axiom 
>>> schemata".
>>> An "axiom schemata" is roughly an infinite collection of relations, for 
>>> example
>>> "less than" over the integers or rationals or reals.   This means that the 
>>> pattern
>>> matcher is kind-of-like-ish a "satisfiability modulo theories" (SMT) 
>>> solver.  The
>>> API for specifying a theory at this point is rather very simplistic. The 
>>> "virtual link"
>>> is  that API. It says, basically "implement your model theory here, as C++ 
>>> code,
>>> and we will automatically do the satisfiability modulo your theory for you"
>>>
>>> Currently supported theories are the equational theory (EqualLink) and 
>>> numeric
>>> inequaltiy (GreaterThanLink)  An example of a nice-to-have theory would be
>>> linear algebra - done right, this could  solve your space-time bounding box 
>>> problem
>>> for starters, and linear programming type problems if anyone cared about 
>>> that.
>>> Maybe matroids. Whatever. Dunno.  Another nice-to-have would be naive set 
>>> theory
>>> which could help lay a cornerstone of probability done right (TM) but this 
>>> would be
>>> a long and difficult but veryinteresting discussion.
>>>
>>> D) Once a matching subgraph is found, you can launch arbitrary 
>>> C++/scheme/python
>>> code to do something with that subgraph. So that's unbounded.
>>
>>
> So, the first thing you should do is build a good benchmark tool, then we,
> you and the rest of opencog community, can supply it with a collection of
> critical tests.

 How do you see such tool?
>>
>>> Unclear. I am interested in a tool that tells me if performance got worse 
>>> after
>>> a particular code change or bug fix.  Some of our fixes accidentally slow 
>>> things
>>> down (by a lot) and no one noticies for months or half a year.
>>
 We have some datasets of varying sizes and
 queries, and simply run PM on these queries and measure the time. What can
 be unified/automated?
>>
>>> Yes!?
>>
 Running all tests and writing log files with
 computation times? Anything else?
>>
>>> I don't care about log files.
>>
 I agree that we need several (many?) tests to be sure that some changes

[opencog-dev] Re: Pattern Matching performance

2018-03-12 Thread Linas Vepstas
OK, trying again. Maybe the third time it will work right.

-- Linas

On Tue, Mar 13, 2018 at 12:04 AM, Linas Vepstas  wrote:
> Resending. My email agent is formatting and indenting bizarrely.
> I don't understand why its 2018 and something as simple as email
> has ugly and unreliable formating.  Is this what the heat death of
> the universe looks like?
>
> --linas
>
> On Mon, Mar 12, 2018 at 11:55 PM, Linas Vepstas  
> wrote:
>> Hi,
>>
>> To repeat my earlier remarks, and add a few more:
>>
>> On Mon, Mar 12, 2018 at 3:42 PM, Alexey Potapov  wrote:
>>>
>>> Yes. But Pattern Matcher compares all pairs of bounding boxes making it
>>> quadratic.
>
>> Yes, there is supposed to be a space-server that is supposed to be optimized
>> for these kinds of searches. There is one, but its super-minimal, and not at 
>> all
>> integrated with the pattern matcher, or anything else for that matter.  Smart
>> searches on bounding boxes is completely green-field development, for the
>> atomspace. We've kind of got very nearly nothing for this.
>
>
 Yes, but keep in mind that the PM is also Turing complete because you can
 call any function within a query (including the PM itself).
>>>
>>>
>>> This is quite problematic. Basic processes should not execute anything
>>> dangerous that can take too much time or loop forever and cannot be
>>> interrupted. Thus, we should either not treat PM as a basic process, or
>>> should restrict its capabilities (and shift responsibility for evaluating
>>> arbitrary code to other processes).
>
>
>> This is misleading or a mis-understanding. Its not the correct way to think
>> about it.  The current pattern matcher is 2 or three or 4 things matched into
>> one:
>>
>> A) A generic subgraph isomorphism solver. Since this is an NP complete
>> problem, it's certainly possible to create pathologically slow queries.
>>
>> B) A way of combining subgraphs using a crisp-logic boolean algebra
>> (actually a Heyting algebra) which we have very vague intentions of
>> promoting into something probabilistic.  Which, of course, if this was done,
>> would layer on an additional combinatoric explosion.  It would be fruitful to
>> discuss the wisdom or stupidity of this particular task. Or alternative 
>> designs
>> for it.
>>
>> C) The ability to perform subgraph isomorphism with so-called "axiom 
>> schemata".
>> An "axiom schemata" is roughly an infinite collection of relations, for 
>> example
>> "less than" over the integers or rationals or reals.   This means that the 
>> pattern
>> matcher is kind-of-like-ish a "satisfiability modulo theories" (SMT) solver. 
>>  The
>> API for specifying a theory at this point is rather very simplistic. The 
>> "virtual link"
>> is  that API. It says, basically "implement your model theory here, as C++ 
>> code,
>> and we will automatically do the satisfiability modulo your theory for you"
>>
>> Currently supported theories are the equational theory (EqualLink) and 
>> numeric
>> inequaltiy (GreaterThanLink)  An example of a nice-to-have theory would be
>> linear algebra - done right, this could  solve your space-time bounding box 
>> problem
>> for starters, and linear programming type problems if anyone cared about 
>> that.
>> Maybe matroids. Whatever. Dunno.  Another nice-to-have would be naive set 
>> theory
>> which could help lay a cornerstone of probability done right (TM) but this 
>> would be
>> a long and difficult but veryinteresting discussion.
>>
>> D) Once a matching subgraph is found, you can launch arbitrary 
>> C++/scheme/python
>> code to do something with that subgraph. So that's unbounded.
>
>
 So, the first thing you should do is build a good benchmark tool, then we,
 you and the rest of opencog community, can supply it with a collection of
 critical tests.
>>>
>>> How do you see such tool?
>
>> Unclear. I am interested in a tool that tells me if performance got worse 
>> after
>> a particular code change or bug fix.  Some of our fixes accidentally slow 
>> things
>> down (by a lot) and no one noticies for months or half a year.
>
>>> We have some datasets of varying sizes and
>>> queries, and simply run PM on these queries and measure the time. What can
>>> be unified/automated?
>
>> Yes!?
>
>>> Running all tests and writing log files with
>>> computation times? Anything else?
>
>> I don't care about log files.
>
>>> I agree that we need several (many?) tests to be sure that some changes
>>> didn't affect any types of queries, but isn't it just a script? Or do you
>>> have something much more complicated in mind?
>
>> Take a look at 3D graphics performance: there are two types of benchmark:
>> triangles per second, lines per second, texture maps per second.
>
>> The other type is "for game XYZ, frames per second".
>
>> We need both types of benchmarks.  Probably the first more than the second,
>> because it helps developers more.  The second kind just tells you 

[opencog-dev] Re: Pattern Matching performance

2018-03-12 Thread Linas Vepstas
Resending. My email agent is formatting and indenting bizarrely.
I don't understand why its 2018 and something as simple as email
has ugly and unreliable formating.  Is this what the heat death of
the universe looks like?

--linas

On Mon, Mar 12, 2018 at 11:55 PM, Linas Vepstas  wrote:
> Hi,
>
> To repeat my earlier remarks, and add a few more:
>
> On Mon, Mar 12, 2018 at 3:42 PM, Alexey Potapov  wrote:
>>
>> Yes. But Pattern Matcher compares all pairs of bounding boxes making it
>> quadratic.

> Yes, there is supposed to be a space-server that is supposed to be optimized
> for these kinds of searches. There is one, but its super-minimal, and not at 
> all
> integrated with the pattern matcher, or anything else for that matter.  Smart
> searches on bounding boxes is completely green-field development, for the
> atomspace. We've kind of got very nearly nothing for this.


>>> Yes, but keep in mind that the PM is also Turing complete because you can
>>> call any function within a query (including the PM itself).
>>
>>
>> This is quite problematic. Basic processes should not execute anything
>> dangerous that can take too much time or loop forever and cannot be
>> interrupted. Thus, we should either not treat PM as a basic process, or
>> should restrict its capabilities (and shift responsibility for evaluating
>> arbitrary code to other processes).


> This is misleading or a mis-understanding. Its not the correct way to think
> about it.  The current pattern matcher is 2 or three or 4 things matched into
> one:
>
> A) A generic subgraph isomorphism solver. Since this is an NP complete
> problem, it's certainly possible to create pathologically slow queries.
>
> B) A way of combining subgraphs using a crisp-logic boolean algebra
> (actually a Heyting algebra) which we have very vague intentions of
> promoting into something probabilistic.  Which, of course, if this was done,
> would layer on an additional combinatoric explosion.  It would be fruitful to
> discuss the wisdom or stupidity of this particular task. Or alternative 
> designs
> for it.
>
> C) The ability to perform subgraph isomorphism with so-called "axiom 
> schemata".
> An "axiom schemata" is roughly an infinite collection of relations, for 
> example
> "less than" over the integers or rationals or reals.   This means that the 
> pattern
> matcher is kind-of-like-ish a "satisfiability modulo theories" (SMT) solver.  
> The
> API for specifying a theory at this point is rather very simplistic. The 
> "virtual link"
> is  that API. It says, basically "implement your model theory here, as C++ 
> code,
> and we will automatically do the satisfiability modulo your theory for you"
>
> Currently supported theories are the equational theory (EqualLink) and numeric
> inequaltiy (GreaterThanLink)  An example of a nice-to-have theory would be
> linear algebra - done right, this could  solve your space-time bounding box 
> problem
> for starters, and linear programming type problems if anyone cared about that.
> Maybe matroids. Whatever. Dunno.  Another nice-to-have would be naive set 
> theory
> which could help lay a cornerstone of probability done right (TM) but this 
> would be
> a long and difficult but veryinteresting discussion.
>
> D) Once a matching subgraph is found, you can launch arbitrary 
> C++/scheme/python
> code to do something with that subgraph. So that's unbounded.


>>> So, the first thing you should do is build a good benchmark tool, then we,
>>> you and the rest of opencog community, can supply it with a collection of
>>> critical tests.
>>
>> How do you see such tool?

> Unclear. I am interested in a tool that tells me if performance got worse 
> after
> a particular code change or bug fix.  Some of our fixes accidentally slow 
> things
> down (by a lot) and no one noticies for months or half a year.

>> We have some datasets of varying sizes and
>> queries, and simply run PM on these queries and measure the time. What can
>> be unified/automated?

> Yes!?

>> Running all tests and writing log files with
>> computation times? Anything else?

> I don't care about log files.

>> I agree that we need several (many?) tests to be sure that some changes
>> didn't affect any types of queries, but isn't it just a script? Or do you
>> have something much more complicated in mind?

> Take a look at 3D graphics performance: there are two types of benchmark:
> triangles per second, lines per second, texture maps per second.

> The other type is "for game XYZ, frames per second".

> We need both types of benchmarks.  Probably the first more than the second,
> because it helps developers more.  The second kind just tells you how screwed
> up the system is today, without telling you why, or where to look.


>> In our case, the query processing time becomes unrealistically large for a
>> one-minute video.  If we consider the problem of search in the entire
>> episodic memory, it should be even not 

[opencog-dev] Re: Pattern Matching performance

2018-03-12 Thread Linas Vepstas
Hi,

On Mon, Mar 12, 2018 at 3:24 AM, Nil Geisweiller
 wrote:
> Hi,
>
> I'm adding Linas, the author of the pattern matcher, who knows it better
> than I do. Also, generally speaking, feel free to use the opencog list, that
> way all opencog community can participate.
>
> On 03/11/2018 09:24 PM, Alexey Potapov wrote:

>>
>> It just searches for pairs of bounding boxes one of which is higher than
>> another /within the same frame/.
>> We expect coordinates of BBs to be compared only within same frame, so the
>> time should depend on the number of frames linearly.
>> Unfortunately, it grows quadratically, and becomes too large for rather
>> short videos.
>
>
> OK, so you're saying that in principle you only need to iterate through the
> frames, and fetching the corresponding coordinates should be constant, thus
> it should be linear overall.

Several remarks, some obvious, some maybe not:
1) The atomspace is optimized for long-term storage of knowledge
representations,
and for certain tasks applied to KR, e.g. subgraph isomorphism
(pattern matching)

2) The atomspace is not optimized for numerical computation or general
computation,
so that adding two numbers with NumberNodes might be thousands if not millions
of times slower than doing it "natively".  That said, there are some
interesting stunts
that could improve generic computation by factors of hundreds or
thousands, or more,
if that was desired.

3) Fleeting short-term, rapidly changing data should use Values, not Atoms (!!!)

4) There is some rudimentary support for a "space-time server" to store location
position data, but its not well integrated or refined: its just a
super-naive octree
store.  Much work is needed to make it generically useful.


>> Maybe, we could represent these data and queries in a better way, but the
>> problem (with virtual links) seems general. The problem here is that the
>> search is quadratic while it can in principle be linear.

5) The pattern matcher tries to be quite very smart about where to
start searches
and how to move forward in them.  However, no one has ever tried to digest
numeric data with it before, so failures of this kind - quadratic vs.
linear, are not
surprising.  They're certainly fixable but (probably) not in an
afternoon or two.
At this point, its a rather very complex and elaborate block of code
trying to do
lots of things, and it very much needs a full-time devoted expert to
maintain it.
It will take more than weeks - maybe months to really understand what's going
on inside of it. Viz. It needs an experienced pro, not a code-monkey
to maintain it.

>> The PM
>> documentation says that its computational time can be even exponential for
>> unordered links...

That's not the PM, that's a fundamental mathematical statement about the
isomorphism problem for unordered sets.  For N elements to be matched to
N variables, there are N-factorial alternatives.  That's not "fixable".


> One would need to look at what the pattern matcher is doing in that
> instance. I suspect it chooses a sub-optimal starting point.

Possibly. See above comments about numbers.

>> I thought that PM is a basic function in sense that it doesn't need to be
>> 'intelligent', but is one of building blocks for 'intelligent' functions
>> like the backward chaining.

Its not "intelligent" but it is complicated.  If it looks simple, then
in a certain
sense, we've succeeded. :-)  I want software that works well, is easy and
pleasant to use, and looks "simple" to the user.

> what the URE is more suitable for. Although of course the boundary is
> completely blurry, as you know, but you see what I mean.

We're trying to figure out the boundary by touch and feel.

> I think it is solvable. In particular take a look at
> https://github.com/opencog/atomspace/blob/master/opencog/query/InitiateSearchCB.h

Yes.  I did not catch the inital part of this, so cannot offer more. But that
is where the inital starting points for a search get picked.   I'm
totally swamped
so please don't send me long complicated blobs of code cause I won;t be able
to respond quickly.


>> I believe that either PM should have a guaranteed linear computational
>> time,

The subgraph isomorphism problem is an NP-complete problem, last time
I looked. When its fast, its only because the query is hanging out in the
simple side of the lambda cube.

I think that for "most" queries, the pattern matcher runs in O(SAT) time
that is O(Davis-Putnam) time, because it kind-of-ish performs some of
the similar kind of graph pruning exploration. Sort-of-ish. But only partly.
It could be made more completely Davis-Putnam by someone who has
the brain cells and the time on their hands.

> Nil
>
>>
>> Best regards,
>> Alexey.

Linas.

-- 
cassette tapes - analog TV - film cameras - you

-- 
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to 

[opencog-dev] AS-MOSES

2018-03-12 Thread 'Nil Geisweiller' via opencog

Hi,

for those involved with MOSES port to the AtomSpace, here the first 
version of the document describing such port, according to my own views 
anyway, which surely are opened to revisions


https://github.com/opencog/as-moses/blob/master/README.md

Nil

--
You received this message because you are subscribed to the Google Groups 
"opencog" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to opencog+unsubscr...@googlegroups.com.
To post to this group, send email to opencog@googlegroups.com.
Visit this group at https://groups.google.com/group/opencog.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/opencog/6347fff8-8b00-67cf-9b25-41e27d75f0d6%40gmail.com.
For more options, visit https://groups.google.com/d/optout.