Thanks Albert.

I have a couple of questions.

1/ how do we distinguish between input and output attributes?
In particular, let's take as an example the default single-label
classification.
I guess that is the role of Range.
However, do we have to serialize it with every instance we send?

2/ to distinguish between numeric and categorical we need some metadata,
which I guess goes into InstancesHeader.
I am fine with keeping it also for compatibility with MOA, and we might use
it if we have access to it.
However, I would prefer algorithms not to rely on it, and consider the
presence of metadata optional.

Some other points:
- what's the difference between InstanceInformation and InstancesHeaders
- can the AttributesInformation be modified at runtime? Or is it statically
set for the whole duration of the algorithm?

Cheers,

--
Gianmarco

On 10 January 2015 at 04:26, Albert Bifet <[email protected]> wrote:

> Hi all,
>
> This is a short explanation of the new instances of SAMOA.
>
>
> https://github.com/abifet/moa/tree/master/moa/src/main/java/com/yahoo/labs/samoa/instances
>
> Instances will be much simpler than the current implementation. They
> can be dense or sparse, and they contain only one array (or two for
> sparse) with all the attribute values. In the current implementation
> we have two arrays, one for input values and another for output values
>
> The main changes are two:
>
> 1/ All instances are going to be multi-label, that means they have
> input and output attributes, and we can call their values with
> getInputValue(i) and getOutputValue(i).
>
> 2/ Attributes are numeric by default, so we only keep information of
> discrete attributes (values). For example if we have one million
> numeric attributes, we will not need to store attribute information of
> these one million numeric attributes.
>
> Basically, we have:
>
> - Instance: interface
> - MultiLabelInstance: interface (empty interface that extends Instance)
> - InstanceImpl extends MultiLabelInstance: implementation of Instance.
> Contains
>     - InstanceData
>     - InstancesHeader
> - DenseInstance extends InstanceImpl
> - SparseInstance extends InstanceImpl
>
> -Instances: a list of instances and an InstanceInformation object
> -InstancesHeader extends Instances
>
> -InstanceData: interface
> -DenseInstanceData implements InstanceData
> -SparseInstanceData implements InstanceData
>
> - InstanceInformation contains name, attribute information and
> attributes to predict.
> - AttributesInformation contains two list of Attributes (indices and
> values) for non-numerical attributes. Numerical attributes are by
> default
> - Range: attributes to predict
>
> Cheers,
>
> Albert
>

Reply via email to