Hi Sean,

On Mon, 6 Jun 2011, Sean Liu wrote:

> Date: Mon, 06 Jun 2011 19:50:32 +0800
>
> Hi,
>
> I am not sure if you are actively working on the project any more. I
> hope you are.

Yes, I am. I just lag horribly behind in updating the status and pages as
I am terribly busy with my PhD among other things. I do maintain, use, and
publish using MARF, however. At lease sporradically. Most recently in
MARFCAT.

> When I used your sample wave files, every thing works as exactly as you
> wrote in the manual document.
>
> But when I changed to use my own set of wave files, the accuracy
> dramatically dropped, I do have a lot more data than yours though.

How much more data and what's the drop?

> Have you ever tried to use different set of data, especially more data?

Yes. I tried all kinds of stuff -- audio, text, images, binary files, and
all kinds of experiments.

However, you'd need to open up a bit more on the details of your setup.
There are many factors that can contribute to good or poor recognition
accuracy.

- sample frequency? (8 kHz, 16 kHz?)
- PCM data (2 bytes, more, less, which endian?)
- sample quality is it realitively uniform?
- do you do noise and silence removal (-noise, -silence options)
- how many samples are used to train per class and test per class?
  (class ~ e.g. speaker, gender, etc.)
- are you using mean clusters (the default), median clustering, or plain
  feature vectors?
- ...

A lot depends on correct data loading, interpretation, preprocessing,
before feature extraction and classification.

Also, if you mislabel your training data, you are in for trouble too (I am
not sure how much data you actually have and if your labeling of them is
correct).

Also, please try the most recent version of MARF from the CVS itself;
there's been bugs (or if you are using a .jar then pick the one from the
recently released app).

> And how was the performance if you did?

Variable. For images it was relatively poor. For audio (voice), text and
binaries it was much better. Here are some works describing experiments
done at various times and diversity. You can just skip over to the result
tables and then come back for a brief read about the experiments
themselves.

http://dx.doi.org/10.1145/1370256.1370262
http://arxiv.org/abs/1010.2511
http://www.intechopen.com/download/pdf/pdfs_id/12131
http://subs.emis.de/LNI/Proceedings/Proceedings140/gi-proc-140-007.pdf
http://arxiv.org/abs/1006.3787
http://arxiv.org/abs/0912.5502

> Thanks,
>
> Sean Liu

-- 
Serguei A. Mokhov, PhD Candidate                         |  /~\     The ASCII
Computer Science and Software Engineering &              |  \ /  Ribbon Campaign
Concordia Institute for Information Systems Engineering  |   X     Against HTML
Concordia University, Montreal, Quebec, Canada           |  / \       Email!

------------------------------------------------------------------------------
EditLive Enterprise is the world's most technically advanced content
authoring tool. Experience the power of Track Changes, Inline Image
Editing and ensure content is compliant with Accessibility Checking.
http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________
marf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/marf-devel

Reply via email to