Re: AIML pattern matcher design

2009-05-09 Thread dhs827

I'm completely engulfed in all this material, but I wanted to come
back and say that I'm stunned by the enthusiasm with which you share
your knowledge here. Many thanks, again.

Dirk


Parth Malwankar schrieb:
> On Fri, 08 May 2009 22:20:13 +0530, dhs827  wrote:
>
> >
>
> >
> > ; First thing to learn is XML parsing with Clojure.
> >
> 
> >
> > Other comments, tips, disses?
> >
> > Dirk
>
> In case you don't expect end users or other languages
> to access the configuration, one option you have is
> to save the configuration directly as Clojure data.
>
> As Clojure is a lisp, you have access to the reader and
> you could read the data (maps, vectors, etc.)
> directly from the file.
>
> E.g.:
>
> user=> (def x (read-string "{:a 1 :b 2}"))
> #'user/x
> user=> x
> {:a 1, :b 2}
> user=>
>
> See also: (doc read)
>
> If you decide to go ahead with xml, you can use
> the xml support in clojure core:
>
> http://clojure.org/api#toc673
>
> Regards,
> Parth
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~--~~~~--~~--~--~---



Re: AIML pattern matcher design

2009-05-08 Thread Parth Malwankar

On Fri, 08 May 2009 22:20:13 +0530, dhs827  wrote:

>

>
> ; First thing to learn is XML parsing with Clojure.
>

>
> Other comments, tips, disses?
>
> Dirk

In case you don't expect end users or other languages
to access the configuration, one option you have is
to save the configuration directly as Clojure data.

As Clojure is a lisp, you have access to the reader and
you could read the data (maps, vectors, etc.)
directly from the file.

E.g.:

user=> (def x (read-string "{:a 1 :b 2}"))
#'user/x
user=> x
{:a 1, :b 2}
user=>

See also: (doc read)

If you decide to go ahead with xml, you can use
the xml support in clojure core:

http://clojure.org/api#toc673

Regards,
Parth


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~--~~~~--~~--~--~---



Re: AIML pattern matcher design

2009-05-08 Thread Adrian Cuthbertson

I found that after a couple of months of working with Clojure, my
whole perspective on thinking about the problem domain and its
possible abstractions changed really significantly. An approach that
might benefit you is to spend a while dabbling with some repl
explorations of some of the key Clojure features. I would start with
really getting to understand maps, vectors, map and reduce, sequences
and lazy-seq. A good grasp of multi-methods would be helpful. Also
check out cond and condp. I also found an understanding of using first
order functions and closures invaluable (e.g storing a function closed
over some arguments in a map entry for dynamic program composition).
There are others, but those are good to start with. In the beginning
avoid recur (leads to too much imperative thinking) and macros (I
found for most things I thought a macro would be needed that they
could be done easily in basic clojure).

After the basics check out the clojure.core source and the source for
the contrib libraries (xml, zip, etc) that you may be using. The
source is a great learning resource. Also Mark Volkmann's guide
(http://java.ociweb.com/mark/clojure/article.html), Stuart Halloway's
book and clojure.org are great resources for getting into Clojure.

Rgds, Adrian.


On Fri, May 8, 2009 at 7:23 PM, Luke VanderHart
 wrote:
>
>> ; First thing to learn is XML parsing with Clojure.
>
> This is basically done. Use the xml library in core if you just need
> to load XML into a map data structure. Use the zip library if you need
> to navigate it. Use the xml library in contrib if you need to do xpath-
> style navigation.
>
> For the rest of it... it looks very well thought out. I'm not familiar
> enough with the problem domain to comment specifically, but obviously
> you're putting a lot of thought into it, so I'm pretty sure you won't
> have any problems.
>
> -Luke
>
> On May 8, 12:50 pm, dhs827  wrote:
>> I'm stealing knowledge left and right (just ask me :-) to design me an
>> AIML pattern matcher. I've compiled a draft list of objects and
>> behaviors, which I would like to see reviewed for plausibility:
>>
>> startup
>>         - opens configuration file (e.g. startup.xml)
>>         - passes configuration file to bot-loader object
>> bot-loader
>>         - loads general input substitutions (spelling, person)
>>         - loads sentence splitters (.;!?)
>>         - looks for enabled bot, takes first one it finds
>>         - reads bot-id; uses it as key (for saving/loading variables and
>> chatlogs)
>>         - loads bot properties (global constants, e.g. name)
>>         - passes control to aiml-loader object
>> aiml-loader
>>         - loads list of AIML files to load, and for each file
>>                 - opens file
>>                 - reads AIML categories (XML) one by one as they appear in 
>> the file
>>                         - parses and stores the content of the match path 
>> (e.g."BOTID *
>> INPUTPATTERN * CONTEXT1 * CONTEXT2 *")
>>                         - when it reaches the end of the category - the 
>> template, or leaf
>> of this branch of the tree
>>                                 - calls a method to store the elements of 
>> the match path, together
>> with the template, in the
>> pattern-matcher-tree
>>
>> ; First thing to learn is XML parsing with Clojure.
>>
>> ; Though it is probably the easiest thing to do, it is not necessary
>> for the templates to be stored along with the paths in the tree. They
>> might as well be left on disc or in a database.
>>
>> ; A function like parser/scan must advance the parse to the next part
>> of the document (element - element content - processing
>> instruction...) and tokenize it. I can then use case/switch/if (must
>> look at what Clojure offers) to make decisions/set variables/call
>> methods.
>>
>> ; The whole path, with all components, gets created at load time. The
>> loader combines all elements of the path (e.g. INPUTPATTERN * CONTEXT1
>> * CONTEXT2 *) into one string, seperating the components using special
>> context-id strings (e.g. , , )
>>
>> ; The idea of the AIML graphmaster is: take this string, seperate it
>> into words, then store these words as nodes in a tree.
>>
>> ; A variation of this idea: instead of keying the nodes by their
>> values, key them first by context, then by value.
>>
>> ; Now that the bot is up and running, the user types something into
>> the input box and hits Enter. The
>>
>> pre-processor
>>         - protects sentences
>>         - blocks common attack vectors, e.g. code injection, flooding
>>         - eliminates common spelling mistakes
>>                 - for each loaded substitution
>>                         - finds and replaces it in the input string
>>                 - alternatively, uses a tree to search for them
>>         - removes redundant whitespace
>>         - splits input into sentences (everything that follows is for each
>> sentence)
>> pattern-matcher
>>         - combines INPUTPATTERN * CONTEXT1

Re: AIML pattern matcher design

2009-05-08 Thread Luke VanderHart

> ; First thing to learn is XML parsing with Clojure.

This is basically done. Use the xml library in core if you just need
to load XML into a map data structure. Use the zip library if you need
to navigate it. Use the xml library in contrib if you need to do xpath-
style navigation.

For the rest of it... it looks very well thought out. I'm not familiar
enough with the problem domain to comment specifically, but obviously
you're putting a lot of thought into it, so I'm pretty sure you won't
have any problems.

-Luke

On May 8, 12:50 pm, dhs827  wrote:
> I'm stealing knowledge left and right (just ask me :-) to design me an
> AIML pattern matcher. I've compiled a draft list of objects and
> behaviors, which I would like to see reviewed for plausibility:
>
> startup
>         - opens configuration file (e.g. startup.xml)
>         - passes configuration file to bot-loader object
> bot-loader
>         - loads general input substitutions (spelling, person)
>         - loads sentence splitters (.;!?)
>         - looks for enabled bot, takes first one it finds
>         - reads bot-id; uses it as key (for saving/loading variables and
> chatlogs)
>         - loads bot properties (global constants, e.g. name)
>         - passes control to aiml-loader object
> aiml-loader
>         - loads list of AIML files to load, and for each file
>                 - opens file
>                 - reads AIML categories (XML) one by one as they appear in 
> the file
>                         - parses and stores the content of the match path 
> (e.g."BOTID *
> INPUTPATTERN * CONTEXT1 * CONTEXT2 *")
>                         - when it reaches the end of the category - the 
> template, or leaf
> of this branch of the tree
>                                 - calls a method to store the elements of the 
> match path, together
> with the template, in the
> pattern-matcher-tree
>
> ; First thing to learn is XML parsing with Clojure.
>
> ; Though it is probably the easiest thing to do, it is not necessary
> for the templates to be stored along with the paths in the tree. They
> might as well be left on disc or in a database.
>
> ; A function like parser/scan must advance the parse to the next part
> of the document (element - element content - processing
> instruction...) and tokenize it. I can then use case/switch/if (must
> look at what Clojure offers) to make decisions/set variables/call
> methods.
>
> ; The whole path, with all components, gets created at load time. The
> loader combines all elements of the path (e.g. INPUTPATTERN * CONTEXT1
> * CONTEXT2 *) into one string, seperating the components using special
> context-id strings (e.g. , , )
>
> ; The idea of the AIML graphmaster is: take this string, seperate it
> into words, then store these words as nodes in a tree.
>
> ; A variation of this idea: instead of keying the nodes by their
> values, key them first by context, then by value.
>
> ; Now that the bot is up and running, the user types something into
> the input box and hits Enter. The
>
> pre-processor
>         - protects sentences
>         - blocks common attack vectors, e.g. code injection, flooding
>         - eliminates common spelling mistakes
>                 - for each loaded substitution
>                         - finds and replaces it in the input string
>                 - alternatively, uses a tree to search for them
>         - removes redundant whitespace
>         - splits input into sentences (everything that follows is for each
> sentence)
> pattern-matcher
>         - combines INPUTPATTERN * CONTEXT1 * CONTEXT2 * into one string
>         - tokenizes the "path to be matched" into the individual words
> (nodes)
>         - traverses the tree from the root; first
>                 - tries matching underscore (_)wildcards
>                         - matching of wildcards is recursive
>                                 - match one word of the current path component
>                                 - try remainder against child node
>                                 - if the whole remaining input matches
>                                 - and if the last node is a leaf
>                                         - return the template
>                                 - else try 2 words, then 3
>                                 - if all words in the string are used up and 
> the current node is a
> leaf
>                                         - return the template
>                                 - else stop matching underscores, and
>                 - tries matching exact words in alphabetical order
>                         - if there is a childnode that equals to the input 
> word, recurse a
> level deeper
>                                 - if at the next level there is a leaf, 
> return the template
>                                 - else
>                 - tries matching the star (*) wildcard
>         - when a complete path was matched, creates a
> match-object
>         - holds information

AIML pattern matcher design

2009-05-08 Thread dhs827

I'm stealing knowledge left and right (just ask me :-) to design me an
AIML pattern matcher. I've compiled a draft list of objects and
behaviors, which I would like to see reviewed for plausibility:

startup
- opens configuration file (e.g. startup.xml)
- passes configuration file to bot-loader object
bot-loader
- loads general input substitutions (spelling, person)
- loads sentence splitters (.;!?)
- looks for enabled bot, takes first one it finds
- reads bot-id; uses it as key (for saving/loading variables and
chatlogs)
- loads bot properties (global constants, e.g. name)
- passes control to aiml-loader object
aiml-loader
- loads list of AIML files to load, and for each file
- opens file
- reads AIML categories (XML) one by one as they appear in the 
file
- parses and stores the content of the match path 
(e.g."BOTID *
INPUTPATTERN * CONTEXT1 * CONTEXT2 *")
- when it reaches the end of the category - the 
template, or leaf
of this branch of the tree
- calls a method to store the elements of the 
match path, together
with the template, in the
pattern-matcher-tree

; First thing to learn is XML parsing with Clojure.

; Though it is probably the easiest thing to do, it is not necessary
for the templates to be stored along with the paths in the tree. They
might as well be left on disc or in a database.

; A function like parser/scan must advance the parse to the next part
of the document (element - element content - processing
instruction...) and tokenize it. I can then use case/switch/if (must
look at what Clojure offers) to make decisions/set variables/call
methods.

; The whole path, with all components, gets created at load time. The
loader combines all elements of the path (e.g. INPUTPATTERN * CONTEXT1
* CONTEXT2 *) into one string, seperating the components using special
context-id strings (e.g. , , )

; The idea of the AIML graphmaster is: take this string, seperate it
into words, then store these words as nodes in a tree.

; A variation of this idea: instead of keying the nodes by their
values, key them first by context, then by value.

; Now that the bot is up and running, the user types something into
the input box and hits Enter. The

pre-processor
- protects sentences
- blocks common attack vectors, e.g. code injection, flooding
- eliminates common spelling mistakes
- for each loaded substitution
- finds and replaces it in the input string
- alternatively, uses a tree to search for them
- removes redundant whitespace
- splits input into sentences (everything that follows is for each
sentence)
pattern-matcher
- combines INPUTPATTERN * CONTEXT1 * CONTEXT2 * into one string
- tokenizes the "path to be matched" into the individual words
(nodes)
- traverses the tree from the root; first
- tries matching underscore (_)wildcards
- matching of wildcards is recursive
- match one word of the current path component
- try remainder against child node
- if the whole remaining input matches
- and if the last node is a leaf
- return the template
- else try 2 words, then 3
- if all words in the string are used up and 
the current node is a
leaf
- return the template
- else stop matching underscores, and
- tries matching exact words in alphabetical order
- if there is a childnode that equals to the input 
word, recurse a
level deeper
- if at the next level there is a leaf, return 
the template
- else
- tries matching the star (*) wildcard
- when a complete path was matched, creates a
match-object
- holds information about the match
- the input (sentence)
- the template
- the strings matched to the wildcards

This first project should end there, with the template just returning
the values in the match-object. From there, the non-AIML aspects - the
new stuff - of the concept would be foregrounded.

Does this make sense to the casual observer?

Which known Clojure libraries should I be learning first?

Other comments, tips, disses?

Dirk
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+uns