Re: Serialization in opf - file size issue

Matthew Taylor Fri, 18 Dec 2015 07:24:03 -0800

BTW, we have been working on a new serialization format. The old one is
using python's pickle functionality and there are several problems with it.
The new method in NuPIC will be using Capn Proto serialization, which is a
very fast and efficient technique that happens on the C++ side (and through
the pycapnp adapter in python).


Once we have this finished, the time it takes to save and retrieve models
should decrease by about tenfold (based on Scott's initial experiments). I
assume this will come along with a considerable decrease in serialization
size on disk, but I have not checked. If Scott is reading this, maybe he
can answer.


---------
Matt Taylor
OS Community Flag-Bearer
Numenta

On Fri, Dec 18, 2015 at 2:46 AM, David Ray <cognitionmiss...@gmail.com>
wrote:

>
> Hi Karin,
>
> the network can't really grow new connections, which are not yet stored in
> the memory, right? (other than adjusting weights of the connections)
>
>
> The network does in fact grow new connections,  Distal Dendrites are
> formed with Synapses housing new connections to other Cells. This is one of
> the most distinguishing features of HTM Neurons as opposed to "point
> neurons" (i.e A-to-Z NNs a.k.a "Deep" Neural Networks).
>
> See:
> https://github.com/numenta/nupic/blob/master/src/nupic/research/temporal_memory.py#L361
>
> ...starting above from the "pickCellsToLearnOn()" method...
>
> Cheers,
> David
>
> Sent from my iPhone
>
> On Dec 18, 2015, at 4:10 AM, Karin Valisova <ka...@datapine.com> wrote:
>
> Thank you for your answers!
>
> Mathew, what do you mean by, 'how much data the model has seen'? I have
> noticed that the size of network increases with the size of data sample,
> but I can't really see a reason for that - the network can't really grow
> new connections, which are not yet stored in the memory, right? (other
> than adjusting weights of the connections) And if it's a matter of
> accumulation of the data somewhere by the model, for calculation of sliding
> window metrics or thing like these then it can be theoretically cut off -
> if we're talking only about network's ability to process data.
>
> Mark, what kind of compression do you have on your mind? any ideas what to
> try?
>
> Thank you,
> Karin
>
> On Thu, Dec 17, 2015 at 7:29 PM, Marek Otahal <markota...@gmail.com>
> wrote:
>
>> Hi Karin,
>>
>> yes, that is an issue! I've suggested to use compression, it helps
>> suprisingly well in this matter (from hundreds of MB to 10s,...)
>> Afaik it's not implemented yet.
>>
>> Cheers,
>> Mark
>>
>> On Thu, Dec 17, 2015 at 6:15 PM, Matthew Taylor <m...@numenta.org> wrote:
>>
>>> That's not too surprising ;). The size of a saved model depends on
>>> several things, including # of input fields, model parameters that
>>> affect how cells connect, and how much data the model has seen. There
>>> are thousands of connections between cells that need to be persisted
>>> when a model is saved. I have seen serialized models be much larger
>>> than 50MB.
>>> ---------
>>> Matt Taylor
>>> OS Community Flag-Bearer
>>> Numenta
>>>
>>>
>>> On Thu, Dec 17, 2015 at 8:06 AM, Karin Valisova <ka...@datapine.com>
>>> wrote:
>>> > Hello!
>>> >
>>> > I've been playing around with serialization under opf framework and I
>>> > noticed that when using the typical model for temporal anomaly
>>> detection
>>> >
>>> >
>>> https://github.com/numenta/nupic/blob/master/examples/opf/clients/hotgym/anomaly/one_gym/model_params/rec_center_hourly_model_params.py
>>> >
>>> > The size of saved file gets surprisingly large ~ 50 Mb. What is the
>>> reason
>>> > for this? If I understand correctly, only the states of temporal and
>>> spatial
>>> > pooler should be enough to reload a network, right? Or am I forgetting
>>> about
>>> > some extra data stored?
>>> >
>>> > Thank you!
>>> > Karin
>>>
>>>
>>
>>
>> --
>> Marek Otahal :o)
>>
>
>
>
> --
>
> datapine GmbH
> Skalitzer Straße 33
> 10999 Berlin
>
> email: ka...@datapine.com
>
>

Re: Serialization in opf - file size issue

Reply via email to