When I ran the medium swarm experiments on Saturday I did see variation,
but I didn't get down to the error levels I reported earlier.    I think
your point is a good one - there could be differences due to randomness
particularly for medium swarms. I believe we made some changes to the way
the SP is randomly initialized, so that could account for the difference.
It would be interesting if someone tried running a number of experiments
and just varied the random seed.

Note that we are talking about pretty small differences here 1.5% error vs
0.8% error.  If you are really concerned about achieving the very best
error rate, I would definitely recommend using large swarms.

My experiments were with NuPIC.  (Grok is just a client of NuPIC - there
are no algorithm differences between the two.)

--Subutai

On Mon, Oct 20, 2014 at 6:05 AM, John Blackburn <[email protected]>
wrote:

> Thanks, Subutai, I am running the large swarm now. However, I did run the
> medium swarm twice in succession and got exactly the same answer, an error
> of 1.58 so I'm not sure if randomness can account for this... There is a
> factor of 2 difference in accuracy between my run and yours, both of which
> were "medium" and in every way identical. So IMO it would be worrying if
> NuPIC is so dependent on the random seed number chosen. That would prove
> that the swarm has not really discovered all possibilities. Well, maybe
> it's just an example when a "large" swarm is essential and lower swarms are
> kind of meaningless...?
>
> Did you consider the possibility that the difference is due to Nupic vs
> Grok? Did you try running your swarm file with Nupic?
>
> John.
>
> On Sun, Oct 19, 2014 at 3:58 AM, Subutai Ahmad <[email protected]>
> wrote:
>
>> Hi John,
>>
>> I think I figured this out. I ran the swarm a number of times. There is
>> some randomness in the swarming process so I do see different results from
>> run to run. Because of the way the swarm parameters are set, the variation
>> for a medium swarm is higher than for a large swarm.  I ran a couple of
>> large swarms and got more consistent results.  I ended up with a 0.5761
>> error rate which is even better than what I had before.   The main downside
>> is that a large swarm takes longer to run (24 mins vs 6 mins in this
>> example).
>>
>> Could you try running that same example with swarmSize set to "large"?
>>
>> Thanks,
>>
>> --Subutai
>>
>> On Wed, Oct 8, 2014 at 10:28 AM, John Blackburn <
>> [email protected]> wrote:
>>
>>> Dear Subutai
>>>
>>> I tried to run your "multiple fields example 1" from
>>>
>>> https://github.com/subutai/nupic.subutai/tree/master/swarm_examples
>>>
>>> I ran the command
>>>
>>> run_swarm.py multi1_search_def.json --overwrite --maxWorkers 5
>>>
>>> using the supplied JSON file and "run_swarm.py" from the "scripts"
>>> directory. I got the result:
>>>
>>> Field Contributions:
>>> {   u'metric1': 0.0,
>>>     u'metric2': 20.0598347434741,
>>>     u'metric3': -63.85677190034707,
>>>     u'metric4': -157.77883953004587,
>>>     u'metric5': -153.23706619032606}
>>>
>>> Best results on the optimization metric
>>> multiStepBestPredictions:multiStep:errorMetric='altMAPE':steps=[1]:window=1000:field=metric1
>>> (maximize=False):
>>> [41] Experiment _NupicModelInfo(jobID=1062, modelID=4815,
>>> status=completed, completionReason=eof, updateCounter=22, numRecords=1500)
>>> (modelParams|clParams|alpha_0.055045.modelParams|tpParams|minThreshold_11.modelParams|tpParams|activationThreshold_14.modelParams|tpParams|pamLength_3.modelParams|sensorParams|encoders|metric2:n_296.modelParams|sensorParams|encoders|metric1:n_307.modelParams|spParams|synPermInactiveDec_0.055135):
>>>
>>> multiStepBestPredictions:multiStep:errorMetric='altMAPE':steps=[1]:window=1000:field=metric1:
>>> 1.57090277774
>>>
>>> So the error was only slightly improved to 1.57 (altMAPE) compared to
>>> the "basic swarm with one field"
>>>
>>> Now in the readme file, you stated you got the result:
>>>
>>> Best results on the optimization metric
>>> multiStepBestPredictions:multiStep:errorMetric='altMAPE':steps=[1]:window=1000:field=metric1
>>> (maximize=False): [52] Experiment _GrokModelInfo(jobID=1161, modelID=23650,
>>> status=completed, completionReason=eof, updateCounter=22, numRecords=1500)
>>> (modelParams|clParams|alpha_0.0248715879513.modelParams|tpParams|minThreshold_10.modelParams|tpParams|activationThreshold_13.modelParams|tpParams|pamLength_2.modelParams|sensorParams|encoders|metric2:n_271.modelParams|sensorParams|encoders|metric1:n_392.modelParams|spParams|synPermInactiveDec_0.0727958344423):
>>> multiStepBestPredictions:multiStep:errorMetric='altMAPE':steps=[1]:window=1000:field=metric1:
>>> 0.886040768868
>>>
>>> Field Contributions:
>>> {   u'metric1': 0.0,
>>>     u'metric2': 54.62889798318686,
>>>     u'metric3': -23.71223053273957,
>>>     u'metric4': -91.68162623355796,
>>>     u'metric5': -25.51553640787998}
>>>
>>> Which gives a considerable improvement to to 0.886 (altMAPE). Note that in 
>>> "Field
>>> Contributions" you get a 54.6% improvement from metric2 while in my run I 
>>> only got 20.05% improvement.
>>>
>>> Can we explain this discrepancy? I think I ran your code exactly. It's 
>>> important because it shows my NUPIC
>>>
>>> is not working as well with multiple fields as yours is which is especially 
>>> important for the bridge
>>> project I keep going on about! I notice your output refers to 
>>> GrokModelInfo, while mine refers to
>>>
>>> NupicModelInfo.
>>>
>>> John.
>>>
>>>
>>>
>>>
>>>
>>
>

Reply via email to