As long as the encoder maintains the mappings for the currently-existing buckets, why must the new encodings be independent of the order of presentation of the data?
On Tue, Feb 18, 2014 at 12:10 PM, Fergal Byrne <[email protected]>wrote: > Hi Chetan, > > No, but the encodings should always be independent of the order of > presentation of the data, so it's a bug if they're not. My code includes a > workaround which builds buckets out in both directions from a predefined > centre value until it encompasses each new value. This guarantees the same > encoding regardless of which values come in when. You could easily add a > version of this to your encoder, it's a small overhead for ensuring > identical encodings. > > This is the C4 idea in action - argue with patches... Just shows you how > useful an executable document can be when you're experimenting! > > Regards, > > Fergal > > > On Tue, Feb 18, 2014 at 6:55 PM, Chetan Surpur <[email protected]> wrote: > >> Oh, my mistake, I misunderstood the question. I thought Fergal was asking >> if the order has to be presented in a certain order to get correct results >> (results that have the desired overlap properties). >> >> So yes, order dependence exists in the currently implemented encoder, but >> it shouldn't affect correctness. >> On Feb 18, 2014 10:51 AM, "Scott Purdy" <[email protected]> wrote: >> >>> Fergal, I believe the implementation in NuPIC is dependent on the order >>> of data. Why do you ask? The constant-memory design I have brought up here >>> do not exist in NuPIC. >>> >>> Chetan, you are right that it extends separately in each directly but I >>> believe that the randomness is shared so the order of the data would affect >>> it. It wouldn't be difficult to change that though. But it also doesn't >>> really solve any problems. >>> >>> >>> On Tue, Feb 18, 2014 at 10:39 AM, Chetan Surpur <[email protected]>wrote: >>> >>>> From reading the code, it looks to me that the generation of buckets >>>> happens on the left and right boundaries of the currently-existing buckets, >>>> and extends the boundaries to create buckets as necessary. Thus, it >>>> shouldn't matter what order the data is presented. Subutai can correct me >>>> if I'm mistaken. >>>> >>>> >>>> On Tue, Feb 18, 2014 at 10:35 AM, Fergal Byrne < >>>> [email protected]> wrote: >>>> >>>>> Cheers Scott. >>>>> >>>>> Have you checked the NuPIC implementation for dependence on the order >>>>> of data presented? My python isn't up to that ;{ >>>>> >>>>> Regards >>>>> >>>>> Fergal Byrne >>>>> >>>>> >>>>> On Tue, Feb 18, 2014 at 5:37 PM, Scott Purdy <[email protected]>wrote: >>>>> >>>>>> Oh and Chetan's proposal is good but it still has memory and time >>>>>> constraints that are linear with the number of buckets (but it doesn't >>>>>> have >>>>>> to keep the memory in use in between invocations). >>>>>> >>>>>> >>>>>> On Tue, Feb 18, 2014 at 9:36 AM, Scott Purdy <[email protected]>wrote: >>>>>> >>>>>>> Thanks for the details on your implementation Fergal! >>>>>>> >>>>>>> Just to be clear, it is possible to create a constant memory and >>>>>>> constant time solution (assuming fixed w). The one that I came up with >>>>>>> does >>>>>>> not use all nCw combinations of active bits though. Instead it uses >>>>>>> (n/2)Cw >>>>>>> * (n/2)Cw. >>>>>>> >>>>>>> I am hoping someone can find a solution with the same time/memory >>>>>>> bounds but with a higher entropy solution. IE one that will have random >>>>>>> collisions less frequently. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Feb 18, 2014 at 1:01 AM, Fergal Byrne < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hi Scott, Chetan, >>>>>>>> >>>>>>>> Great thought experiment, Scott. >>>>>>>> >>>>>>>> Someone looking at my Clojure code [1] (a Clojure expert, not one >>>>>>>> on NuPIC) had some issues with my using a mutable data structure (ie >>>>>>>> memory) so much. I looked at ways of eliminating it, but there's no >>>>>>>> simple >>>>>>>> way to do it unless you know that you'll never go backwards on the >>>>>>>> number >>>>>>>> line. This also means that the encodings are dependent on the order in >>>>>>>> which you present the data to the encoder. >>>>>>>> >>>>>>>> For example, let's say the first value encoded is 0 (without loss >>>>>>>> of generality). If you have to encode 1 next, it will get the second >>>>>>>> code, >>>>>>>> 2 will get the next one, and so on. But if -1 is provided after 1, >>>>>>>> it'll >>>>>>>> get the next code (or at least some distortion of it) and thus 2 will >>>>>>>> be >>>>>>>> encoded differently. >>>>>>>> >>>>>>>> This means that every encoder must remember its buckets in order to >>>>>>>> give back the same encoding for previously computed values, or else >>>>>>>> remember the entire sequence of values and rerun their computations >>>>>>>> each >>>>>>>> time (which may cost more memory if many values per bucket must be >>>>>>>> stored). >>>>>>>> >>>>>>>> I've added a test/demo for this to my document. >>>>>>>> >>>>>>>> Update: If you decide on 0 as a centre, you can precalculate bands >>>>>>>> of buckets out to your first data value (and repeat this for each new >>>>>>>> one), >>>>>>>> which ensures the encoding is always the same: >>>>>>>> >>>>>>>> e.g given 22 as the first datum, generate buckets for -10...10, >>>>>>>> -20...20, -30...30 and return encoding(22). >>>>>>>> >>>>>>>> You could choose a different centre if you know more about your >>>>>>>> data. I've detailed this idea in the doc. >>>>>>>> >>>>>>>> [1] http://fergalbyrne.github.io/rdse.html >>>>>>>> >>>>>>>> Regards >>>>>>>> >>>>>>>> Fergal Byrne >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Feb 18, 2014 at 6:06 AM, Chetan Surpur >>>>>>>> <[email protected]>wrote: >>>>>>>> >>>>>>>>> For this problem, this looks useful: >>>>>>>>> http://en.wikipedia.org/wiki/Linear_feedback_shift_register >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Feb 17, 2014 at 6:01 PM, Chetan Surpur <[email protected] >>>>>>>>> > wrote: >>>>>>>>> >>>>>>>>>> A very simple approach would be to trade speed for memory. >>>>>>>>>> Instead of storing a map between buckets and SDRs, we can go through >>>>>>>>>> the >>>>>>>>>> bucket generation process every time we want to find the SDR for a >>>>>>>>>> bucket. >>>>>>>>>> From what I understand, this bucket generation process is linear in >>>>>>>>>> speed >>>>>>>>>> with the number of buckets you want to generate. So the linear memory >>>>>>>>>> requirement would be translated into a linear speed requirement. >>>>>>>>>> >>>>>>>>>> In a nutshell, walk through the number line, generating buckets >>>>>>>>>> until you hit the target bucket you want a representation for, *every >>>>>>>>>> time* you want to get a representation, and don't store >>>>>>>>>> anything. You'll need to use the same seed for the random number >>>>>>>>>> generator >>>>>>>>>> though, to get consistent results. >>>>>>>>>> >>>>>>>>>> The advantage of this is that it's a simple modification to what >>>>>>>>>> is already implemented. On the other hand, it's slightly slower when >>>>>>>>>> outputting SDRs for previously-seen buckets. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Feb 17, 2014 at 5:15 PM, Scott Purdy >>>>>>>>>> <[email protected]>wrote: >>>>>>>>>> >>>>>>>>>>> Hi all, I thought some of you might enjoy trying to come up with >>>>>>>>>>> a solution for this problem. If you watch Chetan's presentation >>>>>>>>>>> about the >>>>>>>>>>> random distributed scalar encoder (RDSE), you will see that we are >>>>>>>>>>> keeping >>>>>>>>>>> a mapping between all buckets computed so far and the bits that >>>>>>>>>>> represent >>>>>>>>>>> them. This was Subutai's implementation of Jeff's general idea for >>>>>>>>>>> the >>>>>>>>>>> encoder. This design has a memory usage for the encoder that >>>>>>>>>>> increases >>>>>>>>>>> linearly with the number of buckets that it has to represent. >>>>>>>>>>> >>>>>>>>>>> When originally discussing the design, I was trying to find a >>>>>>>>>>> way to statically compute the mapping so that you don't have to >>>>>>>>>>> store >>>>>>>>>>> anything. But it has to have the property that buckets i and j have >>>>>>>>>>> w-(j-i) >>>>>>>>>>> overlapping bits if j-i<w and also that a given index is never >>>>>>>>>>> assigned >>>>>>>>>>> multiple times to the same bucket. I came up with a solution but it >>>>>>>>>>> would >>>>>>>>>>> likely have more random collisions than Subutai's linear-memory >>>>>>>>>>> solution >>>>>>>>>>> because it was limited in the number of possibly combinations of >>>>>>>>>>> bits the >>>>>>>>>>> buckets could have. Curious if someone can come up with something >>>>>>>>>>> better! >>>>>>>>>>> >>>>>>>>>>> And be sure to watch Chetan's presentation on the RDSE that >>>>>>>>>>> Subutai designed and implemented for background. >>>>>>>>>>> >>>>>>>>>>> *Note: the current implementation is fine for all practical >>>>>>>>>>> scenarios so this is just a fun exercise for those interested* >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> nupic mailing list >>>>>>>>>>> [email protected] >>>>>>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> nupic mailing list >>>>>>>>> [email protected] >>>>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> Fergal Byrne, Brenter IT >>>>>>>> >>>>>>>> <http://www.examsupport.ie>http://inbits.com - Better Living >>>>>>>> through Thoughtful Technology >>>>>>>> >>>>>>>> e:[email protected] t:+353 83 4214179 >>>>>>>> Formerly of Adnet [email protected] http://www.adnet.ie >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> nupic mailing list >>>>>>>> [email protected] >>>>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> nupic mailing list >>>>>> [email protected] >>>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Fergal Byrne, Brenter IT >>>>> >>>>> <http://www.examsupport.ie>http://inbits.com - Better Living through >>>>> Thoughtful Technology >>>>> >>>>> e:[email protected] t:+353 83 4214179 >>>>> Formerly of Adnet [email protected] http://www.adnet.ie >>>>> >>>>> _______________________________________________ >>>>> nupic mailing list >>>>> [email protected] >>>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> nupic mailing list >>>> [email protected] >>>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>>> >>>> >>> >>> _______________________________________________ >>> nupic mailing list >>> [email protected] >>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >>> >>> >> _______________________________________________ >> nupic mailing list >> [email protected] >> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org >> >> > > > -- > > Fergal Byrne, Brenter IT > > <http://www.examsupport.ie>http://inbits.com - Better Living through > Thoughtful Technology > > e:[email protected] t:+353 83 4214179 > Formerly of Adnet [email protected] http://www.adnet.ie > > _______________________________________________ > nupic mailing list > [email protected] > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org > >
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
