Re: htm.java config questions

cogmission (David Ray) Wed, 04 Nov 2015 22:26:54 -0800

Hi Takenori,

Running a swarm is always an option. Can you give me push rights to your
repo and check in some (small) example of data so I can have something to
run, and I'll take a look? I'll see if I can get it up and running and push
it back to your repo...


Cheers,
David

On Wed, Nov 4, 2015 at 10:30 PM, Takenori Sato <[email protected]> wrote:

> Hi David, thanks for your answers!
>
> I tried some, like adding SpatialPooler, changing n/w, but no luck.
>
> Perhaps I should run swarming in python against my data,
> and study the configuration produced.
>
> - Takenori
>
> On Thu, Nov 5, 2015 at 3:44 AM, cogmission (David Ray) <
> [email protected]> wrote:
>
>> Hi Takenori,
>>
>> You might think this is weird (I know I do), but as I am basically just
>> one person writing and supporting HTM.java (with some appreciated help from
>> community members from time to time), I haven't really had the time to
>> **use** NuPIC. Therefore the scope of the questions I can faithfully answer
>> are specific to setting up and using the code, together with any Java
>> related questions. NuPIC configurations that have to do with performance of
>> the HTM (like DateEncoder parameters, the size of W and N; and actual
>> parameter settings - any familiar person who has used NuPIC and struggled
>> with that learning curve can answer you.
>>
>> The default parameters used are those that were in the Python network
>> examples and settings that I have been told are "decent" when asking for
>> help myself. NuPIC parameters are not easy, and require knowledge of the
>> "rules of thumb" (typical rules for usage). For instance, W should be an
>> odd number for reasons having to do with finding the "center" of a series
>> of bits. Also, if you read the class documentation for Encoder.java or
>> base.py (The abstract base encoder for the Python version) files, you will
>> see some discussion for N and W and how they relate to each other.
>>
>> In general, the difference between the ScalarEncoder and the
>> RandomDistributedScalarEncoder is that the ScalarEncoder is a bit more
>> efficient but requires prior knowledge of the min and max values in your
>> expected dataset. The RDSE can be used without prior knowledge of the
>> bounds and so is a nice alternative for unknown data. Most people just use
>> the RDSE.
>>
>> Here's a video that discusses the RDSE:
>> https://www.youtube.com/watch?v=_q5W2Ov6C9E
>>
>> The DateEncoder class Javadoc, and the class file itself (together with
>> DateEncoderTest.java), have lots of documentation in them which illustrate
>> their usage. Basically, a DateEncoder is a compound encoder that has
>> ScalarEncoders inside it which handle different aspects of the date
>> mechanism being used.
>>
>> The SpatialPooler is an integral part of the HTM - you usually want that.
>> The only time when that has been "skipped" is when inserting an encoding
>> scheme of your own and you want to preserve the input format. But that is
>> an extreme corner case, I would advise to use one in your code.
>>
>> Don't worry about multiple regions and layers. The capacity to have
>> multiple regions and layers exists for those who need extra flexibility.
>> The ability to assemble Network hierarchies is mostly a "space saver" for
>> when HTM Hierarchy code is released by Numenta in the future. The "modes"
>> shown in the HotGym Demo are just there for demonstration purposes and
>> really there is no internal concept of "Mode" inside the Network hierarchy.
>> Again, the Mode in the demo is just a switch to instruct the demo to setup
>> different hierarchy styles to show that the output is the same regardless
>> of the number of hierarchical components used to funnel data through.
>>
>> I hope this helps. You can ask Numenta engineers for rules of thumb
>> regarding the individual Parameter settings.
>>
>> Cheers,
>> David
>>
>> On Wed, Nov 4, 2015 at 9:45 AM, Takenori Sato <[email protected]> wrote:
>>
>>> Hi NuPIC community and David,
>>>
>>> I have some questions about how to configure my network with htm.java.
>>>
>>> My use case is to let HTM detect an unexpected high load on a server
>>> through PING response times. But so far, it produces 0.0 for almost any
>>> inputs. Sometimes it returns some value, but which are not reasonable at
>>> all.
>>>
>>> The biggest problem is that I am not sure at all about my
>>> configurations. So I highly suspect my configurations are far from correct
>>> ones.
>>>
>>> For your reference, you can see my codes here:
>>>
>>> CloudSonar project <https://github.com/ggsato/CloudSonar>
>>> HTMAnomalyDetector
>>> <https://github.com/ggsato/CloudSonar/blob/master/src/com/cloudian/analytics/HTMAnomalyDetector.java>
>>>
>>> My network configurations are based on(or I say copy and paste)
>>> NetworkDemoHarness. They are modified slightly where I believe I understand.
>>>
>>> Here're my questions.
>>>
>>> *1. Parameters#getAllDefaultParameters*
>>>
>>> private static Network createNetwork(Sensor<ObservableSensor<String>>
>>> sensor) {
>>> *Parameters p = buildParams();*
>>> p = p.union(buildEncoderParams());
>>> return Network.create("CloudSonar", p)
>>>            .add(Network.createRegion("Region")
>>>                .add(Network.createLayer("Layer", p)
>>>                    .alterParameter(KEY.AUTO_CLASSIFY, Boolean.TRUE)
>>>                    .add(Anomaly.create())
>>>                    .add(new TemporalMemory())
>>>                    .add(sensor)
>>>                    )
>>>                );
>>> }
>>> private static Parameters buildParams() {
>>> return* Parameters.getAllDefaultParameters(); <== THIS ONE*
>>> }
>>>
>>> NetworkDemoHarness#getParameters confused me with many parameters. So I
>>> picked up only the default ones without overriding anything. Can I start
>>> like this?
>>>
>>> Also, are there any resources to learn about those parameters?
>>>
>>> *2. Encoders*
>>>
>>> My inputs are [timestamps, duration_in_micro_sec].
>>>
>>> private static String generateCSVInput(PollingJob job) {
>>> StringBuffer sb = new StringBuffer();
>>> sb.append(FULL_DATE_FORMAT.format(new Date())); *<== TIMESTAMP*
>>> sb.append(CSVUpdateHandler.DELIM);
>>> sb.append(TimeUnit.MICROSECONDS.convert(job.pollingStatus.duration(),
>>> TimeUnit.NANOSECONDS)); *<== DURATION*
>>> return sb.toString();
>>> }
>>>
>>> I borrowed the config from NetworkDemoHarness#getHotGymFieldEncodingMap
>>> and getNetworkDemoFieldEncodingMap(noticed mixed up). Then, modified the
>>> red parts:
>>>
>>>     public static Map<String, Map<String, Object>>
>>> getNetworkFieldEncodingMap() {
>>>         Map<String, Map<String, Object>> fieldEncodings = setupMap(
>>>                 null,
>>>                 0, // n
>>>                 0, // w
>>>                 0, 0, 0, 0, null, null, null,
>>>                 "timestamp", "datetime", "DateEncoder");
>>>         fieldEncodings = setupMap(
>>>                 fieldEncodings,
>>>                 50,
>>>                 21,
>>>                 0, *10000000*, 0, 0.1, null, Boolean.TRUE, null,  *<==
>>> 0 ~ 10 sec*
>>>                 CLASSFIER_FIELD, "int", "ScalarEncoder");
>>>
>>>
>>> fieldEncodings.get("timestamp").put(KEY.DATEFIELD_DOFW.getFieldName(), new
>>> Tuple(1, 1.0)); // Day of week
>>>
>>> fieldEncodings.get("timestamp").put(KEY.DATEFIELD_TOFD.getFieldName(), new
>>> Tuple(5, 4.0)); // Time of day
>>>
>>> fieldEncodings.get("timestamp").put(KEY.DATEFIELD_PATTERN.getFieldName(),
>>> *FULL_DATE*);
>>>
>>>         return fieldEncodings;
>>>     }
>>>
>>> Why are all the params of DateEncoder 0 or null?
>>>
>>> What is the difference between ScalarEncoder
>>> and RandomDistributedScalarEncoder?
>>>
>>> I happened to use the larger n and w used
>>> by getNetworkDemoFieldEncodingMap. Compared to HotGym demo, durations is
>>> much larger than consumption. So a larger n makes sense, but I should have
>>> set lower w like 6?
>>>
>>> I wasn't able to find information how to set those DATEFIELD parameters.
>>> PATTERN was obvious, but the other two remained unclear. Especially, what
>>> is the Tuple, and those numbers?
>>>
>>> *3. SpatialPooler*
>>>
>>> NetworkAPIDemo uses SpatialPooler in every network. But it should be
>>> related to spatial inputs, correct? So I dropped it from my network
>>> configuration. I have read the JavaDoc, but got no clue. What is it for?
>>>
>>> *4. Multiple Regions and Layers*
>>>
>>> I wasn't able to understand the difference between those 3 modes in
>>> NetworkAPIDemo. I understand MULTILAYER uses multiple layers, and
>>> MULTIREGION uses multiple regions. But when to use which mode in practice?
>>>
>>>
>>> I gave all of these stupid questions, but in overall, I was impressed
>>> that the design is easy to understand to integrate htm.java in my own
>>> application!!
>>>
>>> Thanks,
>>> Takenori
>>>
>>
>>
>>
>> --
>> *With kind regards,*
>>
>> David Ray
>> Java Solutions Architect
>>
>> *Cortical.io <http://cortical.io/>*
>> Sponsor of:  HTM.java <https://github.com/numenta/htm.java>
>>
>> [email protected]
>> http://cortical.io
>>
>
>


-- 
*With kind regards,*

David Ray
Java Solutions Architect

*Cortical.io <http://cortical.io/>*
Sponsor of:  HTM.java <https://github.com/numenta/htm.java>

[email protected]
http://cortical.io

Re: htm.java config questions

Reply via email to