Re: [nupic-discuss] Fun challenge: constant memory RDSE

Chetan Surpur Tue, 18 Feb 2014 12:14:54 -0800

As long as the encoder maintains the mappings for the currently-existing
buckets, why must the new encodings be independent of the order of
presentation of the data?



On Tue, Feb 18, 2014 at 12:10 PM, Fergal Byrne
<[email protected]>wrote:

> Hi Chetan,
>
> No, but the encodings should always be independent of the order of
> presentation of the data, so it's a bug if they're not.  My code includes a
> workaround which builds buckets out in both directions from a predefined
> centre value until it encompasses each new value. This guarantees the same
> encoding regardless of which values come in when. You could easily add a
> version of this to your encoder, it's a small overhead for ensuring
> identical encodings.
>
> This is the C4 idea in action - argue with patches... Just shows you how
> useful an executable document can be when you're experimenting!
>
> Regards,
>
> Fergal
>
>
> On Tue, Feb 18, 2014 at 6:55 PM, Chetan Surpur <[email protected]> wrote:
>
>> Oh, my mistake, I misunderstood the question. I thought Fergal was asking
>> if the order has to be presented in a certain order to get correct results
>> (results that have the desired overlap properties).
>>
>> So yes, order dependence exists in the currently implemented encoder, but
>> it shouldn't affect correctness.
>> On Feb 18, 2014 10:51 AM, "Scott Purdy" <[email protected]> wrote:
>>
>>> Fergal, I believe the implementation in NuPIC is dependent on the order
>>> of data. Why do you ask? The constant-memory design I have brought up here
>>> do not exist in NuPIC.
>>>
>>> Chetan, you are right that it extends separately in each directly but I
>>> believe that the randomness is shared so the order of the data would affect
>>> it. It wouldn't be difficult to change that though. But it also doesn't
>>> really solve any problems.
>>>
>>>
>>> On Tue, Feb 18, 2014 at 10:39 AM, Chetan Surpur <[email protected]>wrote:
>>>
>>>> From reading the code, it looks to me that the generation of buckets
>>>> happens on the left and right boundaries of the currently-existing buckets,
>>>> and extends the boundaries to create buckets as necessary. Thus, it
>>>> shouldn't matter what order the data is presented. Subutai can correct me
>>>> if I'm mistaken.
>>>>
>>>>
>>>> On Tue, Feb 18, 2014 at 10:35 AM, Fergal Byrne <
>>>> [email protected]> wrote:
>>>>
>>>>> Cheers Scott.
>>>>>
>>>>> Have you checked the NuPIC implementation for dependence on the order
>>>>> of data presented? My python isn't up to that ;{
>>>>>
>>>>> Regards
>>>>>
>>>>> Fergal Byrne
>>>>>
>>>>>
>>>>> On Tue, Feb 18, 2014 at 5:37 PM, Scott Purdy <[email protected]>wrote:
>>>>>
>>>>>> Oh and Chetan's proposal is good but it still has memory and time
>>>>>> constraints that are linear with the number of buckets (but it doesn't 
>>>>>> have
>>>>>> to keep the memory in use in between invocations).
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 18, 2014 at 9:36 AM, Scott Purdy <[email protected]>wrote:
>>>>>>
>>>>>>> Thanks for the details on your implementation Fergal!
>>>>>>>
>>>>>>> Just to be clear, it is possible to create a constant memory and
>>>>>>> constant time solution (assuming fixed w). The one that I came up with 
>>>>>>> does
>>>>>>> not use all nCw combinations of active bits though. Instead it uses 
>>>>>>> (n/2)Cw
>>>>>>> * (n/2)Cw.
>>>>>>>
>>>>>>> I am hoping someone can find a solution with the same time/memory
>>>>>>> bounds but with a higher entropy solution. IE one that will have random
>>>>>>> collisions less frequently.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Feb 18, 2014 at 1:01 AM, Fergal Byrne <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hi Scott, Chetan,
>>>>>>>>
>>>>>>>> Great thought experiment, Scott.
>>>>>>>>
>>>>>>>> Someone looking at my Clojure code [1] (a Clojure expert, not one
>>>>>>>> on NuPIC) had some issues with my using a mutable data structure (ie
>>>>>>>> memory) so much. I looked at ways of eliminating it, but there's no 
>>>>>>>> simple
>>>>>>>> way to do it unless you know that you'll never go backwards on the 
>>>>>>>> number
>>>>>>>> line. This also means that the encodings are dependent on the order in
>>>>>>>> which you present the data to the encoder.
>>>>>>>>
>>>>>>>> For example, let's say the first value encoded is 0 (without loss
>>>>>>>> of generality). If you have to encode 1 next, it will get the second 
>>>>>>>> code,
>>>>>>>> 2 will get the next one, and so on. But if -1 is provided after 1, 
>>>>>>>> it'll
>>>>>>>> get the next code (or at least some distortion of it) and thus 2 will 
>>>>>>>> be
>>>>>>>> encoded differently.
>>>>>>>>
>>>>>>>> This means that every encoder must remember its buckets in order to
>>>>>>>> give back the same encoding for previously computed values, or else
>>>>>>>> remember the entire sequence of values and rerun their computations 
>>>>>>>> each
>>>>>>>> time (which may cost more memory if many values per bucket must be 
>>>>>>>> stored).
>>>>>>>>
>>>>>>>> I've added a test/demo for this to my document.
>>>>>>>>
>>>>>>>> Update: If you decide on 0 as a centre, you can precalculate bands
>>>>>>>> of buckets out to your first data value (and repeat this for each new 
>>>>>>>> one),
>>>>>>>> which ensures the encoding is always the same:
>>>>>>>>
>>>>>>>> e.g given 22 as the first datum, generate buckets for -10...10,
>>>>>>>> -20...20, -30...30 and return encoding(22).
>>>>>>>>
>>>>>>>> You could choose a different centre if you know more about your
>>>>>>>> data. I've detailed this idea in the doc.
>>>>>>>>
>>>>>>>> [1] http://fergalbyrne.github.io/rdse.html
>>>>>>>>
>>>>>>>> Regards
>>>>>>>>
>>>>>>>> Fergal Byrne
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Feb 18, 2014 at 6:06 AM, Chetan Surpur 
>>>>>>>> <[email protected]>wrote:
>>>>>>>>
>>>>>>>>> For this problem, this looks useful:
>>>>>>>>> http://en.wikipedia.org/wiki/Linear_feedback_shift_register
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Feb 17, 2014 at 6:01 PM, Chetan Surpur <[email protected]
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> A very simple approach would be to trade speed for memory.
>>>>>>>>>> Instead of storing a map between buckets and SDRs, we can go through 
>>>>>>>>>> the
>>>>>>>>>> bucket generation process every time we want to find the SDR for a 
>>>>>>>>>> bucket.
>>>>>>>>>> From what I understand, this bucket generation process is linear in 
>>>>>>>>>> speed
>>>>>>>>>> with the number of buckets you want to generate. So the linear memory
>>>>>>>>>> requirement would be translated into a linear speed requirement.
>>>>>>>>>>
>>>>>>>>>> In a nutshell, walk through the number line, generating buckets
>>>>>>>>>> until you hit the target bucket you want a representation for, *every
>>>>>>>>>> time* you want to get a representation, and don't store
>>>>>>>>>> anything. You'll need to use the same seed for the random number 
>>>>>>>>>> generator
>>>>>>>>>> though, to get consistent results.
>>>>>>>>>>
>>>>>>>>>> The advantage of this is that it's a simple modification to what
>>>>>>>>>> is already implemented. On the other hand, it's slightly slower when
>>>>>>>>>> outputting SDRs for previously-seen buckets.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Feb 17, 2014 at 5:15 PM, Scott Purdy 
>>>>>>>>>> <[email protected]>wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all, I thought some of you might enjoy trying to come up with
>>>>>>>>>>> a solution for this problem. If you watch Chetan's presentation 
>>>>>>>>>>> about the
>>>>>>>>>>> random distributed scalar encoder (RDSE), you will see that we are 
>>>>>>>>>>> keeping
>>>>>>>>>>> a mapping between all buckets computed so far and the bits that 
>>>>>>>>>>> represent
>>>>>>>>>>> them. This was Subutai's implementation of Jeff's general idea for 
>>>>>>>>>>> the
>>>>>>>>>>> encoder. This design has a memory usage for the encoder that 
>>>>>>>>>>> increases
>>>>>>>>>>> linearly with the number of buckets that it has to represent.
>>>>>>>>>>>
>>>>>>>>>>> When originally discussing the design, I was trying to find a
>>>>>>>>>>> way to statically compute the mapping so that you don't have to 
>>>>>>>>>>> store
>>>>>>>>>>> anything. But it has to have the property that buckets i and j have 
>>>>>>>>>>> w-(j-i)
>>>>>>>>>>> overlapping bits if j-i<w and also that a given index is never 
>>>>>>>>>>> assigned
>>>>>>>>>>> multiple times to the same bucket. I came up with a solution but it 
>>>>>>>>>>> would
>>>>>>>>>>> likely have more random collisions than Subutai's linear-memory 
>>>>>>>>>>> solution
>>>>>>>>>>> because it was limited in the number of possibly combinations of 
>>>>>>>>>>> bits the
>>>>>>>>>>> buckets could have. Curious if someone can come up with something 
>>>>>>>>>>> better!
>>>>>>>>>>>
>>>>>>>>>>> And be sure to watch Chetan's presentation on the RDSE that
>>>>>>>>>>> Subutai designed and implemented for background.
>>>>>>>>>>>
>>>>>>>>>>> *Note: the current implementation is fine for all practical
>>>>>>>>>>> scenarios so this is just a fun exercise for those interested*
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> nupic mailing list
>>>>>>>>>>> [email protected]
>>>>>>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> nupic mailing list
>>>>>>>>> [email protected]
>>>>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Fergal Byrne, Brenter IT
>>>>>>>>
>>>>>>>> <http://www.examsupport.ie>http://inbits.com - Better Living
>>>>>>>> through Thoughtful Technology
>>>>>>>>
>>>>>>>> e:[email protected] t:+353 83 4214179
>>>>>>>> Formerly of Adnet [email protected] http://www.adnet.ie
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> nupic mailing list
>>>>>>>> [email protected]
>>>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> nupic mailing list
>>>>>> [email protected]
>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Fergal Byrne, Brenter IT
>>>>>
>>>>> <http://www.examsupport.ie>http://inbits.com - Better Living through
>>>>> Thoughtful Technology
>>>>>
>>>>> e:[email protected] t:+353 83 4214179
>>>>> Formerly of Adnet [email protected] http://www.adnet.ie
>>>>>
>>>>> _______________________________________________
>>>>> nupic mailing list
>>>>> [email protected]
>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> nupic mailing list
>>>> [email protected]
>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>>
>>>>
>>>
>>> _______________________________________________
>>> nupic mailing list
>>> [email protected]
>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>>
>>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
>
>
> --
>
> Fergal Byrne, Brenter IT
>
> <http://www.examsupport.ie>http://inbits.com - Better Living through
> Thoughtful Technology
>
> e:[email protected] t:+353 83 4214179
> Formerly of Adnet [email protected] http://www.adnet.ie
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-discuss] Fun challenge: constant memory RDSE

Reply via email to