Re: MiniKanren + Neural Nets

Apocalypse Mystic Mon, 04 Sep 2017 11:16:27 -0700

Yeah, so some of the Barliman systems always assemble programs in the same
order. Eg, to synthesize (if (null? a) ... it might always synthesize if
then null? then a, which means we can use a standard RNN type approach that
just summarizes if+null? when generating a. There are a couple subtleties,
including differentiating between (list list) and (list (list)), and the
fact that, depending on how you write the interpreter, either "if" or
"(null? a)" might be generated first, so you can't necessarily just read
training data left to right, but neither of these is a show stopper and you
can definitely get a system to work with this approach.


The less-than-satisfactory part of this approach is that it never looks at
the actual values we have in the interpreter. This approach can learn the
prior that "null?" is common inside an "if," but we would really like to
extract from the "if" context the fact that we need to generate an
expression that evaluates to true or false, and has an empty list available
at the top of the environment, which should make null? very likely.
Moreover, since the standard lisp "if" is actually choosing between false
and not false, you could think about the probability 1-p(false) etc.
Essentially, we end up wanting as general a way as possible to end up with
some kind of distribution over the syntax conditioned on the current
environment, whether it contains ground values or constrained logic
variables.

I don't have a clear sense of how to do that as a learning problem right
now, but I am working on modifications to the constraint system to try to
get the right information to the right part of the interpreter at the right
time so that I can start to think about how to learn it.

Evan

On Sun, Sep 3, 2017 at 1:14 PM, Timothy Washington <[email protected]>
wrote:

> Hey Evan, no problem.
>
> My tool of choice is Clojure. So my first thought is how to adapt Barliman
> to that platform. At the moment, I'm just trying to wrap my head around the
> different approaches.
>
> So finding general priors of symbol combinations, is how some (incl.
> approaches in that blog post - combining scalar as well as "table" value
> types) are approaching it. And that's How you are applying RNNs to your
> approach? Which implies a time series, or some kind of sequence to sequence
> translation, using embeddings?
>
> But you want to leverage the live environment at every step of synthesis.
> Organizing your inputs and outputs in your search tree, so that a learning
> task can be formulated. Just thinking out loud here. But with a live
> environment + relational interpreter, of course we can represent formal
> logic. Which means we can use strong types as propositions
> <http://thebox.ghost.io/propositions-as-types> in a formal logic. Of
> course, this introduces Curry / Howard types and potentially Category
> Theory to the solution.
>
> Evaluation of those programs (or terms in that formal logic), is like
> going in one direction. Going in the other direction, can we use Barliman
> for induction, generating the program (or terms and types) that yield a
> successful result? If we don't want to introduce strong types, we'd need to
> think of a way for symbols and combinations of symbols to score some kind
> of fitness test for a query on the program we're trying to generate.
>
> In either approach, you'd have to make the symbol assembly differentiable,
> so that we can do backprop for training. The benefit I see of a live
> environment is the ability to inspect live systems and reconstruct them on
> the fly. Maybe you can leverage that in the training process (probably what
> you're trying to figure out now, lol). Hmm, I'm not sure.
>
> Does this make any sense?
>
>
> Tim
>
>
> On Sat, Sep 2, 2017 at 5:55 PM, Apocalypse Mystic <
> [email protected]> wrote:
>
>> Hi,
>>
>> Thanks for that. I'm actually in the middle of trying to revise one of
>> the core assumptions I was making in the approach I described here
>> initially.
>>
>> What I was doing, which is what a lot of the neural and other learning
>> systems were/are doing (and what that one looks like it may be doing on
>> first glance, but frr), was learning general priors on which program
>> operations tend to go well together (if + null?, map + lambda, etc etc).
>> What bothered me, although I couldn't figure out how to use it, was that we
>> have the live environment to work with at every step of synthesis, rather
>> than just the general language grammar & syntax.
>>
>> This comes from the fact that the relational interpreter is still an
>> interpreter, and not just a program synthesizer that generates programs to
>> be run later. You could think of it in terms of not needing to know that
>> you were nested inside an (if _ because you know that the return value has
>> to be true or false, which summarizes the context needed to figure out how
>> to turn your input into a true or a false. I'm still in the
>> not-sure-my-mental-model-is-accurate phase of testing this, but I have a
>> preliminary implementation of something that will hopefully pull all
>> available information about inputs and outputs as far down the search tree
>> as possible so that a learning task can be formulated on the basis of that
>> data, but nothing conclusive yet.
>>
>> Anyway, that might be something to think about when trying to come up
>> with something to do with learning in this space. We may actually have more
>> pieces on the table than a lot of other synthesis projects, so we may need
>> to do things they don't to make use of them.
>>
>> Cheers,
>> Evan
>>
>> On Sat, Sep 2, 2017 at 3:49 PM, Timothy Washington <[email protected]>
>> wrote:
>>
>>> Hey Evan,
>>>
>>> Wrt algorithms for searching the symbol space, I've been investigating
>>> RNNs, I've come across the notion of a Neural Programmer.
>>>
>>>    - https://distill.pub/2016/augmented-rnns/#neural-programmer
>>>
>>>
>>> Not sure if this helps your approach. But thought I'd throw it out there
>>> anyways. I'm going to try to dive into itRNN in the coming weeks. And here
>>> are some of the references in the above link, for you guys to take a look
>>> at.
>>>
>>>    - https://arxiv.org/pdf/1511.04834.pdf
>>>    - https://openreview.net/pdf?id=ry2YOrcge
>>>    - https://github.com/tensorflow/models/tree/master/neural_programmer
>>>    - https://github.com/mokemokechicken/keras_npi
>>>
>>>
>>> Hth
>>> Tim
>>>
>>> On Wednesday, 14 June 2017 15:19:38 UTC-4, Evan Donahue wrote:
>>>>
>>>> Sure, so here is the trace for the program that generates a list of two
>>>> quoted values:
>>>>
>>>> '(eval 3 list proper 2 nonempty 1 eval ((1 quote) (2 proper 2 nonempty
>>>> ((1 eval 1 quote) (2 proper 1)))))
>>>>
>>>> I am using human readable tags for the functions and numbers for the
>>>> conde branches, I think in the same way you are. You will notice the tree
>>>> has 3 leaves, one for each eval'd element of the list and one for the final
>>>> nil. Because I'm feeding it to a simple rnn right now, I flatten this to a
>>>> simple sequence:
>>>>
>>>> "((eval) (eval 3) (list) (proper) (proper 2) (nonempty) (nonempty 1)
>>>> (eval) (eval 1) (quote) (nonempty 1) (eval) (eval 1) (quote) (nonempty 1)
>>>> (eval) (eval 1) (quote))"
>>>>
>>>> Where each inner list should be taken together as an opaque token. I'm
>>>> also working in part from that grammar autoencoder paper, although I am
>>>> currently building these traces on the fly and without a full reference
>>>> grammar, which makes following the precise method kind of tricky. I may
>>>> bite the bullet and extract the grammar as a pre-step before starting the
>>>> process, but we'll see.
>>>>
>>>> I'll be interested to hear if you think up good ways to encode the
>>>> data, since I'd like to be able to feed that information to the rnn and use
>>>> it to further bias the search. I haven't thought too much about it yet, but
>>>> one thing that has occurred to me is that maybe it would be possible to
>>>> repurpose the traces generated by the interpreter to encode the
>>>> input/output data. Eg, given input '(1 2) and output '(2) generate the
>>>> canonical program that produces this data, eg (cons (list (quote 1) (quote
>>>> 2)) (list (quote 2))) and run the interpreter on it to generate a trace
>>>> that can be encoded by whatever you are using to encode normal program
>>>> traces. This would still leave the actual atomic elements (numbers,
>>>> symbols...) unspecified, but maybe you could work them in with a vector
>>>> representation that reserved say 3 bits for a 1 hot encoding of the type
>>>> (number, symbol, boolean) and the rest for a type specific encoding (binary
>>>> encoding for numbers, one hot encodings for distinct symbols within a given
>>>> execution of the program, etc). Depending on what one was doing, I'm also
>>>> interested in some of the stuff that has gone out to the mailing list about
>>>> vectorized 'unification' so if symbols are meaningful knowledge within a
>>>> domain rather than just opaque gensyms for specifying a scheme function,
>>>> maybe actually learning embeddings for the symbols could be part of the
>>>> execution. Lots of partial thoughts on this, but nothing too concrete yet.
>>>>
>>>> I'm intrigued by where you're going with lookup, because that is also
>>>> something I'd love to have a good answer for. Ideally it would be nice to
>>>> be able to have a large library of functions in the environment without
>>>> totally sacrificing our ability to use them. I've been wondering if some of
>>>> the finite domain constraint stuff could be repurposed to do a more direct
>>>> lookup computation, possibly along the lines you are suggesting, but I
>>>> haven't made too much progress in fleshing that out so far.
>>>>
>>>> Cheers,
>>>> Evan
>>>>
>>>> On Wed, Jun 14, 2017 at 9:54 AM, Smock Jonathan <[email protected]>
>>>> wrote:
>>>>
>>>>> Do you have an example trace that you can paste? That's really
>>>>> interesting. My trace/path is just a sequence, not a tree (shown at the
>>>>> very bottom).
>>>>>
>>>>> Btw, for those interested, I've been trying to figure out how to take
>>>>> trees of data and turn them into a feature vector/embedding. Does anyone
>>>>> have good resources for this? The best I can find so far is this, which
>>>>> uses both a context-free grammar and an autoencoder to generate molecules:
>>>>> https://www.youtube.com/watch?v=ar4Fm1V65Fw .  My goal was to take
>>>>> data structures and turn them into feature vectors and then use those to
>>>>> make feature vectors for functions. Hopefully having feature vectors for
>>>>> functions gets us out of the scalability problem for `lookup` - i.e.
>>>>> instead of trying all functions, we have an intuition about what kind of
>>>>> function we need/want and then do a lookup in all existing functions by
>>>>> feature vector for nearest neighbors (or use feature vector to create a
>>>>> heuristic through the interpreter).  I think my first step is to try and
>>>>> replicate what he's doing in that video though, which will take me some
>>>>> time to figure out.
>>>>>
>>>>> Anyway, here's my substitution store (Clojure):
>>>>> {:st/path []}
>>>>>
>>>>> I only add to the path for disjunctions (btw, I realized yesterday
>>>>> that Greg said he only reorders conjunctions in Barliman, which I have to
>>>>> think about more).
>>>>>
>>>>> My implementation very much follows the microKanren approach (b/c
>>>>> that's what I understand ...). Here's the simplified version of my
>>>>> disjunction:
>>>>>
>>>>> (defn path-conj [st direction]
>>>>>   (update st :st/path conj direction))
>>>>>
>>>>> (defn disjunction [& goals]
>>>>>   (fn [st]
>>>>>     (->> goals
>>>>>          (map-indexed (fn [idx g]
>>>>>                         (g (path-conj st idx))))
>>>>>          vec
>>>>>          (apply interleave-all))))
>>>>>
>>>>> `map-indexed` gives each goal in the disjunction a number/index, and
>>>>> we record that in the path as we descend into that branch. So, when we're
>>>>> done, our path might look like [1 1 0], meaning we took the 1th branch,
>>>>> then the 1th branch of the next disjunction, and finally the 0th branch of
>>>>> the last disjunction. I considered labeling conde's or branches, though, 
>>>>> so
>>>>> it's neat that you're doing something like that.
>>>>>
>>>>> My initial goal in adding the path was to be able to stop the
>>>>> computation at some arbitrary depth, serialize the substitution store,
>>>>> deserialize it somewhere else, and then resume where it left off.  I
>>>>> realized after I did that, that I could simply serialize the path alone, 
>>>>> as
>>>>> long as I'm willing to do the work of rebuilding the substitution store as
>>>>> I navigate though to that path to resume.
>>>>>
>>>>>
>>>>> On Tue, Jun 13, 2017 at 6:37 PM, William Byrd <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Thanks Evan!
>>>>>>
>>>>>> Yes, Sunday's Advanced Hangout would be a great time to talk about
>>>>>> this implementation, and for anyone interested in learning for
>>>>>> miniKanren to share notes.
>>>>>>
>>>>>> --Will
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 13, 2017 at 4:02 PM, Apocalypse Mystic
>>>>>> <[email protected]> wrote:
>>>>>> > Hello,
>>>>>> >
>>>>>> > Since it seems that there is growing activity around applying
>>>>>> machine
>>>>>> > learning to minikanren at various levels, Will asked if I would
>>>>>> share the
>>>>>> > approach I have been taking. To the extent that there is broader
>>>>>> interest in
>>>>>> > mk machine learning, it probably makes sense for interested parties
>>>>>> to be in
>>>>>> > touch.
>>>>>> >
>>>>>> > Background:
>>>>>> >
>>>>>> > Currently, I have what seems like a complete implementation for at
>>>>>> least one
>>>>>> > approach to integrating neural nets with mk search. However, there
>>>>>> remain
>>>>>> > some open questions, such as what to train it on and how to
>>>>>> evaluate it once
>>>>>> > trained. Moreover, it seems likely that there are entirely other
>>>>>> approaches
>>>>>> > that may or may not be related to this one. In any case, this
>>>>>> approach
>>>>>> > should be fairly easy to implement with minor changes to any
>>>>>> standard
>>>>>> > minikanren implementation, so if anyone is interested in using any
>>>>>> part of
>>>>>> > it to try to integrate it with an existing implementation of a
>>>>>> given mk
>>>>>> > program, such as the reference implementation of Barliman, I can
>>>>>> describe it
>>>>>> > in more detail. This email will just be a basic overview.
>>>>>> >
>>>>>> > The goal of this approach is to derive a general way to turn a
>>>>>> minikanren
>>>>>> > program into a machine learning problem, so that any application
>>>>>> developed
>>>>>> > in minikanren can also implicitly be trained. Although I had
>>>>>> > Barliman/program synthesis in mind as a prime example, I did not
>>>>>> use any
>>>>>> > Barliman-specific knowledge, so that the system would be applicable
>>>>>> to
>>>>>> > Barliman as well as to other programs. For the sake of explanation,
>>>>>> however,
>>>>>> > I will describe the particular set up in terms of how it works in
>>>>>> Barliman.
>>>>>> >
>>>>>> > Overview:
>>>>>> >
>>>>>> > As a simplification, I decided to set up the problem in three
>>>>>> stages:
>>>>>> > 1) Barliman would be run like normal, and each returned package
>>>>>> would
>>>>>> > contain, in addition to the substitution and constraint store, an
>>>>>> "execution
>>>>>> > trace" representing the choices made in constructing this particular
>>>>>> > package. Similar programs should have similar execution traces,
>>>>>> which forms
>>>>>> > the basis of learning. Given a corpus of programs, this would take
>>>>>> the form
>>>>>> > of running Barliman once on each fully ground program, so that we
>>>>>> can
>>>>>> > recover the trace representing that program. Given no corpus, we
>>>>>> can use
>>>>>> > Barliman to just generate a bunch of programs and use those for
>>>>>> training,
>>>>>> > although in this case we need to make sure we have a way to
>>>>>> distinguish good
>>>>>> > and bad programs so that we can set the rewards appropriately.
>>>>>> > 2) Outside of minikanren, a machine learning model (in my case, a
>>>>>> simple
>>>>>> > RNN) is trained on the program traces, which are tree structures,
>>>>>> although I
>>>>>> > flatten them to sequences for training.
>>>>>> > 3) The model is fed back into subsequent runs of the minikanren
>>>>>> program, and
>>>>>> > its predictions are used to guide the search.
>>>>>> >
>>>>>> > The overall effect is that the model represents a bias over the
>>>>>> search
>>>>>> > space, but does not update live during the search. There are a
>>>>>> number of
>>>>>> > reasons for this, all of which have to do with my inability to
>>>>>> arrive at a
>>>>>> > satisfactory interpretation of what success and failure mean in the
>>>>>> middle
>>>>>> > of a given search, but I suspect that, depending on the problem, a
>>>>>> model
>>>>>> > that updates live during the search would probably be a valid
>>>>>> approach and
>>>>>> > possibly easily reachable from this implementation, although for
>>>>>> the moment
>>>>>> > I am not pursuing it. I will note however that this "bias" can be a
>>>>>> fairly
>>>>>> > flexible concept per-search. Eg, one could imagine a bias over all
>>>>>> scheme
>>>>>> > programs in Barliman that down-weighted lambdas full of unused
>>>>>> arguments and
>>>>>> > other frequently vestigial components of synthesized programs, but
>>>>>> one could
>>>>>> > also imagine a bias tuned to the specific input/output constraints,
>>>>>> so that
>>>>>> > it weighted cons more highly if the output was a list vs a number,
>>>>>> etc.
>>>>>> >
>>>>>> > Implementation:
>>>>>> >
>>>>>> > Since I implemented this in Smalltalk, there are probably a few
>>>>>> > implementation details that will differ in the lisp/scheme
>>>>>> variants. I will
>>>>>> > try to give a general sketch of what I think such an implementation
>>>>>> would
>>>>>> > look like. Roughly, it will require two main changes to the basic mk
>>>>>> > implementation:
>>>>>> >
>>>>>> > 1) Forms that delay the search, like fresh and conde, should have
>>>>>> unique
>>>>>> > global id's, and add them to the trace of the packages that they
>>>>>> process.
>>>>>> >
>>>>>> > This may be easier in the scheme/lisp implementations, where you
>>>>>> have a
>>>>>> > clear macro expansion time and a module boundary that can contain a
>>>>>> global
>>>>>> > counter, although maybe not. The goal is that each package, as it
>>>>>> passes
>>>>>> > through the goals, will accumulate a history of all the goals it
>>>>>> has seen.
>>>>>> > Eg, an answer that represents a variable lookup on the third
>>>>>> variable in the
>>>>>> > environment might look something like a less human-readable
>>>>>> version of:
>>>>>> > '(eval-conde-2 lookup-fresh lookup-fail-conde lookup-fail-conde
>>>>>> > lookup-success-conde). It is important that the same symbol be
>>>>>> given by the
>>>>>> > "same" goal within a recursive step, so taking the second cond in
>>>>>> the main
>>>>>> > eval conde should always add eval-conde-2 to the trace.
>>>>>> >
>>>>>> > One extra potential complication here is the handling of
>>>>>> conjunctions of
>>>>>> > more than one conde. This may not be necessary for a basic
>>>>>> implementation.
>>>>>> > If we think of the interpreter as defining an implicit grammar over
>>>>>> the
>>>>>> > execution traces of the program, eg:
>>>>>> >
>>>>>> > Eval => Quote
>>>>>> > Quote => quoted-value
>>>>>> > Eval => Lookup
>>>>>> > Lookup => lookup-failure Lookup
>>>>>> > Lookup => lookup-success
>>>>>> >
>>>>>> > Then we can recognize that the trace may not, in general, be a
>>>>>> list. For an
>>>>>> > interpreter with only lookup and quote, it will be, because these
>>>>>> rules are
>>>>>> > both regular, and so a sequence of trace id's uniquely defines the
>>>>>> "parse"
>>>>>> > of grammar rules used by the interpreter to arrive at its final
>>>>>> package.
>>>>>> >
>>>>>> > Once we add in rules like list:
>>>>>> >
>>>>>> > ProperList => empty-list
>>>>>> > ProperList => destructure-list Eval ProperList
>>>>>> >
>>>>>> > We can see that the "grammar" may potentially be context free. Of
>>>>>> course, in
>>>>>> > reality, even calling it context free is a modeling approximation.
>>>>>> In some
>>>>>> > minikanren programs, this will never come up (either because all
>>>>>> the "rules"
>>>>>> > are regular, or the "cfg" is unambiguous given a sequential trace
>>>>>> anyway),
>>>>>> > and in some machine learning models, this may not really matter, if
>>>>>> the
>>>>>> > model is only using information from the sequence and not from the
>>>>>> full
>>>>>> > hierarchy. Still, All that is needed to capture the extra
>>>>>> information is
>>>>>> > that fresh/cond forms that contain multiple condes should create
>>>>>> bind forms
>>>>>> > that essentially take the sequential traces from each conde and
>>>>>> merge them
>>>>>> > together up to the end of their common prefix to create a tree
>>>>>> structured
>>>>>> > trace. Eg: '(eval-conde-3 proper-list-fresh
>>>>>> proper-list-nonempty-conde
>>>>>> > ((eval-conde2 ...) (proper-list-fresh ...)))
>>>>>> >
>>>>>> > 2) The interleaving streams of closures should be reified as
>>>>>> structs so that
>>>>>> > the order of the search can be controlled by the trained model.
>>>>>> >
>>>>>> > This is a straightforward translation of the stream logic into a
>>>>>> reified
>>>>>> > form of the same stream logic. Every time the package is extended
>>>>>> with a new
>>>>>> > trace id, the model is run on the current trace and the package is
>>>>>> tagged
>>>>>> > with a score. When packages are mplused together, mplus checks
>>>>>> which branch
>>>>>> > has the max score, and interleaves that branch first, caching the
>>>>>> score in
>>>>>> > the new mplus struct for further checking higher up the tree. If
>>>>>> the scores
>>>>>> > are the same, it interleaves like normal, so the initial run should
>>>>>> use a
>>>>>> > dummy model that just scores everything with a 0. It is also
>>>>>> possible to
>>>>>> > choose probabilistically, etc instead of using the max.
>>>>>> >
>>>>>> > I've definitely left out some implementation details that would be
>>>>>> useful
>>>>>> > for actually making this work, but I think I've said enough
>>>>>> already, so I
>>>>>> > can go into more detail if anyone wants to hear it. Just let me
>>>>>> know. Also,
>>>>>> > I think there is a plan to talk about this more in the advanced
>>>>>> hangout on
>>>>>> > sunday? Will correct me if I'm wrong. So that would be a good time
>>>>>> to talk
>>>>>> > about it if this is interesting to anyone.
>>>>>> >
>>>>>> > Currently, I'm refactoring the above code a bit, and then I'm going
>>>>>> to work
>>>>>> > on figuring out what kind of training makes sense (corpus-based,
>>>>>> > self-training via unbound synthesis, etc), and how to evaluate a
>>>>>> trained
>>>>>> > search.
>>>>>> >
>>>>>> > Cheers,
>>>>>> > Evan
>>>>>> >
>>>>>> > --
>>>>>> > You received this message because you are subscribed to the Google
>>>>>> Groups
>>>>>> > "Barliman Editor" group.
>>>>>> > To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an
>>>>>> > email to [email protected].
>>>>>> > To post to this group, send email to [email protected].
>>>>>> > To view this discussion on the web visit
>>>>>> > https://groups.google.com/d/msgid/barliman-editor/CAEE2%2B5A
>>>>>> YrW-m2%2BvO6QZ6Uu2EJkyPOtePEXto9g2NSaxL-8g1Zg%40mail.gmail.com.
>>>>>> > For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Barliman Editor" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> To post to this group, send email to [email protected].
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/barliman-editor/CACJoNKF%2
>>>>>> B8EOVAXVduDcutzCkGqO_HBAgfMdOxCdWX1%2BnBD65hg%40mail.gmail.com.
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Barliman Editor" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/barliman-editor/1678e047-3b4e-4ab6-84b5-a24df47892ea%40g
>>> ooglegroups.com
>>> <https://groups.google.com/d/msgid/barliman-editor/1678e047-3b4e-4ab6-84b5-a24df47892ea%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"minikanren" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/minikanren.
For more options, visit https://groups.google.com/d/optout.

Re: MiniKanren + Neural Nets

Reply via email to