Thanks Albert. I have a couple of questions.
1/ how do we distinguish between input and output attributes? In particular, let's take as an example the default single-label classification. I guess that is the role of Range. However, do we have to serialize it with every instance we send? 2/ to distinguish between numeric and categorical we need some metadata, which I guess goes into InstancesHeader. I am fine with keeping it also for compatibility with MOA, and we might use it if we have access to it. However, I would prefer algorithms not to rely on it, and consider the presence of metadata optional. Some other points: - what's the difference between InstanceInformation and InstancesHeaders - can the AttributesInformation be modified at runtime? Or is it statically set for the whole duration of the algorithm? Cheers, -- Gianmarco On 10 January 2015 at 04:26, Albert Bifet <[email protected]> wrote: > Hi all, > > This is a short explanation of the new instances of SAMOA. > > > https://github.com/abifet/moa/tree/master/moa/src/main/java/com/yahoo/labs/samoa/instances > > Instances will be much simpler than the current implementation. They > can be dense or sparse, and they contain only one array (or two for > sparse) with all the attribute values. In the current implementation > we have two arrays, one for input values and another for output values > > The main changes are two: > > 1/ All instances are going to be multi-label, that means they have > input and output attributes, and we can call their values with > getInputValue(i) and getOutputValue(i). > > 2/ Attributes are numeric by default, so we only keep information of > discrete attributes (values). For example if we have one million > numeric attributes, we will not need to store attribute information of > these one million numeric attributes. > > Basically, we have: > > - Instance: interface > - MultiLabelInstance: interface (empty interface that extends Instance) > - InstanceImpl extends MultiLabelInstance: implementation of Instance. > Contains > - InstanceData > - InstancesHeader > - DenseInstance extends InstanceImpl > - SparseInstance extends InstanceImpl > > -Instances: a list of instances and an InstanceInformation object > -InstancesHeader extends Instances > > -InstanceData: interface > -DenseInstanceData implements InstanceData > -SparseInstanceData implements InstanceData > > - InstanceInformation contains name, attribute information and > attributes to predict. > - AttributesInformation contains two list of Attributes (indices and > values) for non-numerical attributes. Numerical attributes are by > default > - Range: attributes to predict > > Cheers, > > Albert >
