Re: Troubleshooting poor HTM performance in gesture recognition

Francisco Webber Mon, 29 Dec 2014 11:13:25 -0800

Do the different sensor value sequences that lead to the trace of ’5’ look 
similar after the encoding process?
My experience is that:
If the sequences do not “resemble” each other, than the HTM will be (of course) 
able to learn the exact sequences, but it will have difficulties to 
“extrapolate” unseen sequences.
If the encoded sequences look similar then the HTM will still be “perfect" with 
seen sequences and also be “good” with unseen ones.


What I am trying to prove with our semantic encoding approach is that the 
actual encoding process is part of the processing algorithm. If I apply my 
hypothesis to your case, the “way” the sequences look similar and the “degree” 
in which they are similar (in that “way”) is proportional to their semantic 
value in the (semantic) universe of all possible sequence representations.

Sorry if this sounds a bit over-abstracted

Francisco
 
On 29.12.2014, at 18:42, Nicholas Mitri <[email protected]> wrote:

> Generalization in the machine learning sense i.e. to leverage prior knowledge 
> to correctly recognize/classify a novel input. The encoder and SP have 
> quantization functionalities that should in theory make HTM robust to noise 
> and within-class variances. 
> Unfortunately, in the experiments I’ve run so far, whenever HTM is fed a 
> novel input, it can’t reliably classify it; as indicated by the accuracy 
> rates in my previous post. 
> 
>> On Dec 29, 2014, at 7:09 PM, Francisco Webber <[email protected]> wrote:
>> 
>> Ok, I now better understand how you encode the gestures. But still I think 
>> that my argument is valid, that if you want to do generalization you need 
>> semantics to generalize on.
>> Maybe I didn’t understand well what kind of generalization you want to 
>> achieve.
>> 
>> Francisco
>> 
>> On 29.12.2014, at 17:38, Nicholas Mitri <[email protected]> wrote:
>> 
>>> Thanks for your comments Francisco, 
>>> 
>>> I should’ve explained better. Gestures here refer to the shapes drawn by 
>>> the user’s hand as he/she moves a smartphone. The result is a flattened 3D 
>>> trace whose trajectory is estimated using motion sensors. The feature 
>>> vector of every gesture is subsequently a sequence of directions from one 
>>> control vertex of the trace to the next. Think of it as a piece wise linear 
>>> trace that’s represented by discretized direction e.g. the trace of ‘5’ is 
>>> 2->3->0->3->2  if we’re using 4 directions and start at the top. 
>>> 
>>> That’s the data I’m working with so there’s very little semantic depth to 
>>> consider. Encoders here are needed more for their quantization/pooling 
>>> functionality than anything else. 
>>> 
>>> best,
>>> Nick
>>> 
>>>> On Dec 29, 2014, at 6:23 PM, Francisco Webber <[email protected]> wrote:
>>>> 
>>>> Hello Nick,
>>>> What you are trying to do sounds very interesting. My guess is that the 
>>>> poor generalization is due to the fact that there is not sufficient 
>>>> semantics captured during the encoding step. As you might know we are 
>>>> working in the domain of language processing where semantic depth of the 
>>>> SDRs is key. 
>>>> In your case the semantics of the system is defined by the way a (human) 
>>>> body looks like and its degrees of freedom to move.
>>>> What you should try to achieve is to capture some this semantic context in 
>>>> your encoding process. The SDRs representing the body positions (or 
>>>> movements) should be formed in a way that similar positions (gestures) 
>>>> have similar SDRs (many overlapping points). The better you are able to 
>>>> realize this encoding, the better the HTM will be able to generalize.
>>>> In language processing, we were able to create classifiers that needed 
>>>> only 4 example sentences like:
>>>> 
>>>> "Erwin Schrödinger is a physicist.”
>>>> “Marie Curie is a physicist"
>>>> “Niels Bohr is a physicist”
>>>> “James Maxwell is a physicist”
>>>> 
>>>> to give the following response: “Albert Einstein is a” PHYSICIST
>>>> 
>>>> In my experience, measurable similarity among SDRs, encoded to represent 
>>>> similar data, seems to be key for an HTM network to unfold its full power.
>>>> 
>>>> Francisco
>>>> 
>>>> On 29.12.2014, at 16:25, Nicholas Mitri <[email protected]> wrote:
>>>> 
>>>>> Hey Matt, everyone, 
>>>>> 
>>>>> I debugged the code and managed to get some sensible results. HTM is 
>>>>> doing a great job of learning sequences but performing very poorly at 
>>>>> generalization. So while it can recognize a sequence it had learned with 
>>>>> high accuracy, when it’s fed a test sequence that it’s never seen, its 
>>>>> classification accuracy plummets. To be clear, classification here is 
>>>>> performed by assigning an HTM region to each class and observing which 
>>>>> region outputs the least anomaly score averaged along a test sequence. 
>>>>> 
>>>>> I’ve tried tweaking the encoder parameters to quantize the input with a 
>>>>> lower resolution in the hope that similar inputs will be better pooled. 
>>>>> That didn’t pan out. Also, changing encoder output length or number of 
>>>>> columns is causing the HTM to output no predictions at times even with a 
>>>>> non-empty active column list. I have little idea why that keeps 
>>>>> happening. 
>>>>> 
>>>>> Any hints as to how to get HTM to better perform here? I’ve included HMM 
>>>>> results for comparison. SVM results are all 95+%.
>>>>> 
>>>>> Thank you,
>>>>> Nick
>>>>> 
>>>>> 
>>>>> HTM Results:
>>>>> 
>>>>> Data = sequence of directions (8 discrete direction)
>>>>> Note on accuracy: M1/M2 is shown here to represent 2 performance metrics. 
>>>>> M1 is average anomaly, M2 is the sum of average anomaly normalized and 
>>>>> prediction error normalized.
>>>>> 
>>>>> Base training accuracy: 100 % at 2 training passes
>>>>> User Dependent: 56.25%/56.25%
>>>>> User Independent: N/A
>>>>> Mixed: 65.00 %/ 71.25%
>>>>> 
>>>>> HMM (22-states) Results:
>>>>> 
>>>>> Data = sequence of directions (16 discrete direction)
>>>>> 
>>>>> Base training accuracy: 97.5%
>>>>> User Dependent: 76.25 %
>>>>> User Independent:  88.75 %
>>>>> Mixed: 88.75 %
>>>>> 
>>>>> 
>>>>>> On Dec 11, 2014, at 7:16 PM, Matthew Taylor <[email protected]> wrote:
>>>>>> 
>>>>>> Nicholas, can you paste a sample of the input data file?
>>>>>> 
>>>>>> ---------
>>>>>> Matt Taylor
>>>>>> OS Community Flag-Bearer
>>>>>> Numenta
>>>>>> 
>>>>>> On Thu, Dec 11, 2014 at 7:50 AM, Nicholas Mitri <[email protected]> 
>>>>>> wrote:
>>>>>> Hey all, 
>>>>>> 
>>>>>> I’m running into some trouble with using HTM for a gesture recognition 
>>>>>> application and would appreciate some help. 
>>>>>> First, the data is collected from 17 users performing 5 gestures of each 
>>>>>> of 16 different gesture classes using motion sensors. The feature vector 
>>>>>> for each sample is a sequence of discretized directions calculated using 
>>>>>> bezier control points after curve fitting the gesture trace. 
>>>>>> 
>>>>>> For a baseline, I fed the data to 16 10-state HMMs for training and 
>>>>>> again for testing. The classification accuracy achieved is 95.7%. 
>>>>>> 
>>>>>> For HTM, I created 16 CLA models using parameters from a medium swarm. I 
>>>>>> ran the data through the models for training where each model is trained 
>>>>>> on only 1 gesture class. For testing, I fed the same data again with 
>>>>>> learning turned off and recorded the anomaly score (averaged across each 
>>>>>> sequence) for each model. Classification was done by seeking the model 
>>>>>> with the minimum anomaly score. Accuracy turned out to be a puzzling 
>>>>>> 0.0%!!
>>>>>> 
>>>>>> Below is the relevant section of the code. I would appreciate any hints. 
>>>>>> Thanks,
>>>>>> Nick
>>>>>> 
>>>>>> def run_experiment():
>>>>>>     print "Running experiment..."
>>>>>> 
>>>>>>     model = [0]*16
>>>>>>     for i in range(0, 16):
>>>>>>         model[i] = ModelFactory.create(model_params, logLevel=0)
>>>>>>         model[i].enableInference({"predictedField": FIELD_NAME})
>>>>>> 
>>>>>>     with open(FILE_PATH, "rb") as f:
>>>>>>         csv_reader = csv.reader(f)
>>>>>>         data = []
>>>>>>         labels = []
>>>>>>         for row in csv_reader:
>>>>>>             r = [int(item) for item in row[:-1]]
>>>>>>             data.append(r)
>>>>>>             labels.append(int(row[-1]))
>>>>>> 
>>>>>>         # data_train, data_test, labels_train, labels_test = 
>>>>>> cross_validation.train_test_split(data, labels, test_size=0.4, 
>>>>>> random_state=0)
>>>>>>         data_train = data
>>>>>>         data_test = data
>>>>>>         labels_train = labels
>>>>>>         labels_test = labels
>>>>>> 
>>>>>>     for passes in range(0, TRAINING_PASSES):
>>>>>>         sample = 0
>>>>>>         for (ind, row) in enumerate(data_train):
>>>>>>             for r in row:
>>>>>>                 value = int(r)
>>>>>>                 result = model[labels_train[ind]].run({FIELD_NAME: 
>>>>>> value, '_learning': True})
>>>>>>                 prediction = 
>>>>>> result.inferences["multiStepBestPredictions"][1]
>>>>>>                 anomalyScore = result.inferences["anomalyScore"]
>>>>>>             model[labels[ind]].resetSequenceStates()
>>>>>>             sample += 1
>>>>>>             print "Processing training sample %i" % sample
>>>>>>             if sample == 100:
>>>>>>                 break
>>>>>> 
>>>>>>     sample = 0
>>>>>>     labels_predicted = []
>>>>>>     for row in data_test:
>>>>>>         anomaly = [0]*16
>>>>>>         for i in range(0, 16):
>>>>>>             model[i].resetSequenceStates()
>>>>>>             for r in row:
>>>>>>                 value = int(r)
>>>>>>                 result = model[i].run({FIELD_NAME: value, '_learning': 
>>>>>> False})
>>>>>>                 prediction = 
>>>>>> result.inferences["multiStepBestPredictions"][1]
>>>>>>                 anomalyScore = result.inferences["anomalyScore"]
>>>>>>                 # print value, prediction, anomalyScore
>>>>>>                 if value == int(prediction) and anomalyScore == 0:
>>>>>>                     # print "No prediction made"
>>>>>>                     anomalyScore = 1
>>>>>>                 anomaly[i] += anomalyScore
>>>>>>             anomaly[i] /= len(row)
>>>>>>         sample += 1
>>>>>>         print "Processing testing sample %i" % sample
>>>>>>         labels_predicted.append(np.min(np.array(anomaly)))
>>>>>>         print anomaly, row[-1]
>>>>>>         if sample == 100:
>>>>>>             break
>>>>>> 
>>>>>>     accuracy = np.sum(np.array(labels_predicted) == 
>>>>>> np.array(labels_test))*100.0/len(labels_test)
>>>>>>     print "Testing accuracy is %0.2f" % accuracy
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>

Re: Troubleshooting poor HTM performance in gesture recognition

Reply via email to