Hi Yuan, Bayes classifier takes only binary features. So inorder to make
your User class into a dataset,You need to create a tab separated file with
label as the key and space separated features as the value. Presence of a
feature makes it true absence makes it false.

e.g.  if you are classifying heart-attack prone v/s healthy
individual(assuming from your data)
take two labels heart-attack and healthy

You will need to convert integer and double values and map them to boolean
features
say you have boolean features like

Weight:40-50
Weight:50-60

Age:20-30
Age:30-40

For user A with age = 23 weight = 53 diabetes=false
write the line

healthy<TAB>Age:20-30 Weight:50-60

For user B with age = 37 weight = 52 diabetes=true

heart-attack<TAB>Age:30-40 Weight:50-60 diabetes

You will have many such lines for each feature in your dataset file. Give
the file path to the classifier and it learns the model for you.

For now, the algorithm takes the data from a file and not from a memory
datastructure and do not use vectors. Try the classification
example(20newsgroups) to get an idea of how the classifier can be run

Robin

On Wed, Jan 27, 2010 at 8:56 AM, Yuan Wang <[email protected]> wrote:

> Hi all,
>
> I am learning Mahout. It seems to me most the examples load dataset from
> files using command line. I know Baynes classifier can work with HBase.
>
> Is there any way to build the dataset from scratch in Java Code?
>
> for example, there is a User class having four attributes: ID(data type is
> long or String), age {int}, weight (double), and diabetes {boolean} .
> There are 100 user objects in my memory,  is there way I can convert them
> into any type of dataset that classifier algorithm can handle.
>
> I noticed there are vector class and InMemoryDataStore, but I don't how to
> use them. If someone can give any hint or write down some pseudo code, that
> would very helpful.
>
> Thanks,
> Yuan
>

Reply via email to