Dear Subutai,
I ran the large swarm and got a value of 0.8157 with --maxWorkers=5. I also
returned to my previous medium swarm and got an error of 1.2114 with
maxWorkers=5 compared to my previous result of 1.5709 for --maxWorkers=2 so
it seems to depend on maxWorkers which is strange...!
So my best result of 0.8157 (for the large swarm) is slightly better than
what you reported on Github for the medium swarm (0.886). For this large
swarm run, I got the following field contributions:
Field Contributions:
{
u'metric1': 0.0
u'metric2': 41.345
u'metric3': -49.007
u'metric4':-160.548
u'metric5':-147.54 }
which is rather different from the field contributions you reported, though
with the same general trend.
Could you please run a large swarm using the latest NUPIC and see if you
get the same as above?
John.
On Tue, Oct 21, 2014 at 5:30 PM, Subutai Ahmad <[email protected]> wrote:
>
> When I ran the medium swarm experiments on Saturday I did see variation,
> but I didn't get down to the error levels I reported earlier. I think
> your point is a good one - there could be differences due to randomness
> particularly for medium swarms. I believe we made some changes to the way
> the SP is randomly initialized, so that could account for the difference.
> It would be interesting if someone tried running a number of experiments
> and just varied the random seed.
>
> Note that we are talking about pretty small differences here 1.5% error vs
> 0.8% error. If you are really concerned about achieving the very best
> error rate, I would definitely recommend using large swarms.
>
> My experiments were with NuPIC. (Grok is just a client of NuPIC - there
> are no algorithm differences between the two.)
>
> --Subutai
>
> On Mon, Oct 20, 2014 at 6:05 AM, John Blackburn <
> [email protected]> wrote:
>
>> Thanks, Subutai, I am running the large swarm now. However, I did run the
>> medium swarm twice in succession and got exactly the same answer, an error
>> of 1.58 so I'm not sure if randomness can account for this... There is a
>> factor of 2 difference in accuracy between my run and yours, both of which
>> were "medium" and in every way identical. So IMO it would be worrying if
>> NuPIC is so dependent on the random seed number chosen. That would prove
>> that the swarm has not really discovered all possibilities. Well, maybe
>> it's just an example when a "large" swarm is essential and lower swarms are
>> kind of meaningless...?
>>
>> Did you consider the possibility that the difference is due to Nupic vs
>> Grok? Did you try running your swarm file with Nupic?
>>
>> John.
>>
>> On Sun, Oct 19, 2014 at 3:58 AM, Subutai Ahmad <[email protected]>
>> wrote:
>>
>>> Hi John,
>>>
>>> I think I figured this out. I ran the swarm a number of times. There is
>>> some randomness in the swarming process so I do see different results from
>>> run to run. Because of the way the swarm parameters are set, the variation
>>> for a medium swarm is higher than for a large swarm. I ran a couple of
>>> large swarms and got more consistent results. I ended up with a 0.5761
>>> error rate which is even better than what I had before. The main downside
>>> is that a large swarm takes longer to run (24 mins vs 6 mins in this
>>> example).
>>>
>>> Could you try running that same example with swarmSize set to "large"?
>>>
>>> Thanks,
>>>
>>> --Subutai
>>>
>>> On Wed, Oct 8, 2014 at 10:28 AM, John Blackburn <
>>> [email protected]> wrote:
>>>
>>>> Dear Subutai
>>>>
>>>> I tried to run your "multiple fields example 1" from
>>>>
>>>> https://github.com/subutai/nupic.subutai/tree/master/swarm_examples
>>>>
>>>> I ran the command
>>>>
>>>> run_swarm.py multi1_search_def.json --overwrite --maxWorkers 5
>>>>
>>>> using the supplied JSON file and "run_swarm.py" from the "scripts"
>>>> directory. I got the result:
>>>>
>>>> Field Contributions:
>>>> { u'metric1': 0.0,
>>>> u'metric2': 20.0598347434741,
>>>> u'metric3': -63.85677190034707,
>>>> u'metric4': -157.77883953004587,
>>>> u'metric5': -153.23706619032606}
>>>>
>>>> Best results on the optimization metric
>>>> multiStepBestPredictions:multiStep:errorMetric='altMAPE':steps=[1]:window=1000:field=metric1
>>>> (maximize=False):
>>>> [41] Experiment _NupicModelInfo(jobID=1062, modelID=4815,
>>>> status=completed, completionReason=eof, updateCounter=22, numRecords=1500)
>>>> (modelParams|clParams|alpha_0.055045.modelParams|tpParams|minThreshold_11.modelParams|tpParams|activationThreshold_14.modelParams|tpParams|pamLength_3.modelParams|sensorParams|encoders|metric2:n_296.modelParams|sensorParams|encoders|metric1:n_307.modelParams|spParams|synPermInactiveDec_0.055135):
>>>>
>>>> multiStepBestPredictions:multiStep:errorMetric='altMAPE':steps=[1]:window=1000:field=metric1:
>>>> 1.57090277774
>>>>
>>>> So the error was only slightly improved to 1.57 (altMAPE) compared to
>>>> the "basic swarm with one field"
>>>>
>>>> Now in the readme file, you stated you got the result:
>>>>
>>>> Best results on the optimization metric
>>>> multiStepBestPredictions:multiStep:errorMetric='altMAPE':steps=[1]:window=1000:field=metric1
>>>> (maximize=False): [52] Experiment _GrokModelInfo(jobID=1161, modelID=23650,
>>>> status=completed, completionReason=eof, updateCounter=22, numRecords=1500)
>>>> (modelParams|clParams|alpha_0.0248715879513.modelParams|tpParams|minThreshold_10.modelParams|tpParams|activationThreshold_13.modelParams|tpParams|pamLength_2.modelParams|sensorParams|encoders|metric2:n_271.modelParams|sensorParams|encoders|metric1:n_392.modelParams|spParams|synPermInactiveDec_0.0727958344423):
>>>> multiStepBestPredictions:multiStep:errorMetric='altMAPE':steps=[1]:window=1000:field=metric1:
>>>> 0.886040768868
>>>>
>>>> Field Contributions:
>>>> { u'metric1': 0.0,
>>>> u'metric2': 54.62889798318686,
>>>> u'metric3': -23.71223053273957,
>>>> u'metric4': -91.68162623355796,
>>>> u'metric5': -25.51553640787998}
>>>>
>>>> Which gives a considerable improvement to to 0.886 (altMAPE). Note that in
>>>> "Field
>>>> Contributions" you get a 54.6% improvement from metric2 while in my run I
>>>> only got 20.05% improvement.
>>>>
>>>> Can we explain this discrepancy? I think I ran your code exactly. It's
>>>> important because it shows my NUPIC
>>>>
>>>> is not working as well with multiple fields as yours is which is
>>>> especially important for the bridge
>>>> project I keep going on about! I notice your output refers to
>>>> GrokModelInfo, while mine refers to
>>>>
>>>> NupicModelInfo.
>>>>
>>>> John.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>