Hi Takenori, Running a swarm is always an option. Can you give me push rights to your repo and check in some (small) example of data so I can have something to run, and I'll take a look? I'll see if I can get it up and running and push it back to your repo...
Cheers, David On Wed, Nov 4, 2015 at 10:30 PM, Takenori Sato <[email protected]> wrote: > Hi David, thanks for your answers! > > I tried some, like adding SpatialPooler, changing n/w, but no luck. > > Perhaps I should run swarming in python against my data, > and study the configuration produced. > > - Takenori > > On Thu, Nov 5, 2015 at 3:44 AM, cogmission (David Ray) < > [email protected]> wrote: > >> Hi Takenori, >> >> You might think this is weird (I know I do), but as I am basically just >> one person writing and supporting HTM.java (with some appreciated help from >> community members from time to time), I haven't really had the time to >> **use** NuPIC. Therefore the scope of the questions I can faithfully answer >> are specific to setting up and using the code, together with any Java >> related questions. NuPIC configurations that have to do with performance of >> the HTM (like DateEncoder parameters, the size of W and N; and actual >> parameter settings - any familiar person who has used NuPIC and struggled >> with that learning curve can answer you. >> >> The default parameters used are those that were in the Python network >> examples and settings that I have been told are "decent" when asking for >> help myself. NuPIC parameters are not easy, and require knowledge of the >> "rules of thumb" (typical rules for usage). For instance, W should be an >> odd number for reasons having to do with finding the "center" of a series >> of bits. Also, if you read the class documentation for Encoder.java or >> base.py (The abstract base encoder for the Python version) files, you will >> see some discussion for N and W and how they relate to each other. >> >> In general, the difference between the ScalarEncoder and the >> RandomDistributedScalarEncoder is that the ScalarEncoder is a bit more >> efficient but requires prior knowledge of the min and max values in your >> expected dataset. The RDSE can be used without prior knowledge of the >> bounds and so is a nice alternative for unknown data. Most people just use >> the RDSE. >> >> Here's a video that discusses the RDSE: >> https://www.youtube.com/watch?v=_q5W2Ov6C9E >> >> The DateEncoder class Javadoc, and the class file itself (together with >> DateEncoderTest.java), have lots of documentation in them which illustrate >> their usage. Basically, a DateEncoder is a compound encoder that has >> ScalarEncoders inside it which handle different aspects of the date >> mechanism being used. >> >> The SpatialPooler is an integral part of the HTM - you usually want that. >> The only time when that has been "skipped" is when inserting an encoding >> scheme of your own and you want to preserve the input format. But that is >> an extreme corner case, I would advise to use one in your code. >> >> Don't worry about multiple regions and layers. The capacity to have >> multiple regions and layers exists for those who need extra flexibility. >> The ability to assemble Network hierarchies is mostly a "space saver" for >> when HTM Hierarchy code is released by Numenta in the future. The "modes" >> shown in the HotGym Demo are just there for demonstration purposes and >> really there is no internal concept of "Mode" inside the Network hierarchy. >> Again, the Mode in the demo is just a switch to instruct the demo to setup >> different hierarchy styles to show that the output is the same regardless >> of the number of hierarchical components used to funnel data through. >> >> I hope this helps. You can ask Numenta engineers for rules of thumb >> regarding the individual Parameter settings. >> >> Cheers, >> David >> >> On Wed, Nov 4, 2015 at 9:45 AM, Takenori Sato <[email protected]> wrote: >> >>> Hi NuPIC community and David, >>> >>> I have some questions about how to configure my network with htm.java. >>> >>> My use case is to let HTM detect an unexpected high load on a server >>> through PING response times. But so far, it produces 0.0 for almost any >>> inputs. Sometimes it returns some value, but which are not reasonable at >>> all. >>> >>> The biggest problem is that I am not sure at all about my >>> configurations. So I highly suspect my configurations are far from correct >>> ones. >>> >>> For your reference, you can see my codes here: >>> >>> CloudSonar project <https://github.com/ggsato/CloudSonar> >>> HTMAnomalyDetector >>> <https://github.com/ggsato/CloudSonar/blob/master/src/com/cloudian/analytics/HTMAnomalyDetector.java> >>> >>> My network configurations are based on(or I say copy and paste) >>> NetworkDemoHarness. They are modified slightly where I believe I understand. >>> >>> Here're my questions. >>> >>> *1. Parameters#getAllDefaultParameters* >>> >>> private static Network createNetwork(Sensor<ObservableSensor<String>> >>> sensor) { >>> *Parameters p = buildParams();* >>> p = p.union(buildEncoderParams()); >>> return Network.create("CloudSonar", p) >>> .add(Network.createRegion("Region") >>> .add(Network.createLayer("Layer", p) >>> .alterParameter(KEY.AUTO_CLASSIFY, Boolean.TRUE) >>> .add(Anomaly.create()) >>> .add(new TemporalMemory()) >>> .add(sensor) >>> ) >>> ); >>> } >>> private static Parameters buildParams() { >>> return* Parameters.getAllDefaultParameters(); <== THIS ONE* >>> } >>> >>> NetworkDemoHarness#getParameters confused me with many parameters. So I >>> picked up only the default ones without overriding anything. Can I start >>> like this? >>> >>> Also, are there any resources to learn about those parameters? >>> >>> *2. Encoders* >>> >>> My inputs are [timestamps, duration_in_micro_sec]. >>> >>> private static String generateCSVInput(PollingJob job) { >>> StringBuffer sb = new StringBuffer(); >>> sb.append(FULL_DATE_FORMAT.format(new Date())); *<== TIMESTAMP* >>> sb.append(CSVUpdateHandler.DELIM); >>> sb.append(TimeUnit.MICROSECONDS.convert(job.pollingStatus.duration(), >>> TimeUnit.NANOSECONDS)); *<== DURATION* >>> return sb.toString(); >>> } >>> >>> I borrowed the config from NetworkDemoHarness#getHotGymFieldEncodingMap >>> and getNetworkDemoFieldEncodingMap(noticed mixed up). Then, modified the >>> red parts: >>> >>> public static Map<String, Map<String, Object>> >>> getNetworkFieldEncodingMap() { >>> Map<String, Map<String, Object>> fieldEncodings = setupMap( >>> null, >>> 0, // n >>> 0, // w >>> 0, 0, 0, 0, null, null, null, >>> "timestamp", "datetime", "DateEncoder"); >>> fieldEncodings = setupMap( >>> fieldEncodings, >>> 50, >>> 21, >>> 0, *10000000*, 0, 0.1, null, Boolean.TRUE, null, *<== >>> 0 ~ 10 sec* >>> CLASSFIER_FIELD, "int", "ScalarEncoder"); >>> >>> >>> fieldEncodings.get("timestamp").put(KEY.DATEFIELD_DOFW.getFieldName(), new >>> Tuple(1, 1.0)); // Day of week >>> >>> fieldEncodings.get("timestamp").put(KEY.DATEFIELD_TOFD.getFieldName(), new >>> Tuple(5, 4.0)); // Time of day >>> >>> fieldEncodings.get("timestamp").put(KEY.DATEFIELD_PATTERN.getFieldName(), >>> *FULL_DATE*); >>> >>> return fieldEncodings; >>> } >>> >>> Why are all the params of DateEncoder 0 or null? >>> >>> What is the difference between ScalarEncoder >>> and RandomDistributedScalarEncoder? >>> >>> I happened to use the larger n and w used >>> by getNetworkDemoFieldEncodingMap. Compared to HotGym demo, durations is >>> much larger than consumption. So a larger n makes sense, but I should have >>> set lower w like 6? >>> >>> I wasn't able to find information how to set those DATEFIELD parameters. >>> PATTERN was obvious, but the other two remained unclear. Especially, what >>> is the Tuple, and those numbers? >>> >>> *3. SpatialPooler* >>> >>> NetworkAPIDemo uses SpatialPooler in every network. But it should be >>> related to spatial inputs, correct? So I dropped it from my network >>> configuration. I have read the JavaDoc, but got no clue. What is it for? >>> >>> *4. Multiple Regions and Layers* >>> >>> I wasn't able to understand the difference between those 3 modes in >>> NetworkAPIDemo. I understand MULTILAYER uses multiple layers, and >>> MULTIREGION uses multiple regions. But when to use which mode in practice? >>> >>> >>> I gave all of these stupid questions, but in overall, I was impressed >>> that the design is easy to understand to integrate htm.java in my own >>> application!! >>> >>> Thanks, >>> Takenori >>> >> >> >> >> -- >> *With kind regards,* >> >> David Ray >> Java Solutions Architect >> >> *Cortical.io <http://cortical.io/>* >> Sponsor of: HTM.java <https://github.com/numenta/htm.java> >> >> [email protected] >> http://cortical.io >> > > -- *With kind regards,* David Ray Java Solutions Architect *Cortical.io <http://cortical.io/>* Sponsor of: HTM.java <https://github.com/numenta/htm.java> [email protected] http://cortical.io
