Hi! I want to be able to run each fold of a k-fold cross validation fold in 
parallel, using all of my 6 CPUs at once. My model is a hidden markov model and 
I want to train it using the training portion of the data and then extract the 
anomaly score (negative log-likelihood) of each test sequence of the test 
portion with every fold and use ROC as an evaluation technique with every fold.

I have found the function cross_validate() which seems to provide the option of 
running things in parralel with n_jobs = -1.
I assume the estimator is then my HMM model.
As of now I'm using pomegranate to train the model and extract the anomaly 
score of the test sequences.
I don't understand how to call the cross_validate function with the right 
arguments for my HMM model. All examples I've seen havn't used HMM. I'm 
confused on where to specify the hidden states number if Im not callign my 
usual pomegranate function from_samples(), which I've used before.

Also how can I extract the anomay scores within each fold using this function?
I'm unsure what exactly is happening with in the cross_validate function and 
how to control it the way I need.

If anyone has an example or explanation or another idea on how to run the folds 
in parallel, I would really appreciate it!

This is my attempt of using cross_validate, which gets stuck or seems to not be 
running through (although I'm quite sure I'm not using it properly):

import pomegranate
import sklearn
model = pomegranate.HiddenMarkovModel()

results = cross_validate(model, listToUse, y=None, groups=None, scoring=None, 
cv=3, n_jobs=-1, verbose=10)

print(results)


Below is how I've manually set my cross-validation up as of now:

listExample = []
kfold = KFold(10, True)
for train, test in kfold.split(listToUse):
    listExample.append([listToUse[train], listToUse[test]])

scoreList = []

for ex in listExample:

    hmmModel = hmm.hmm(ex[0])
    scoreListFold = []

    mid = time.time()

    for li in ex[1]:
        prob = hmmModel.log_probability(li)
        scoreListFold.append(prob)

    scoreList.append(numpy.mean(scoreListFold))

avg = numpy.mean(scoreList)

Thanks again!

Anni
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to