Hi Gianmarco,

I hope my reply doesn't create create a separate thread. Sorry in advance
if it does, I forgot to subscribe before sending the original message.

Here's an excerpt from my dataset in sparse array ARFF format:
https://drive.google.com/file/d/0B1WaPw_KXbfkaVJ6T0lnMDFBdmc/view?usp=sharing

I am coming from an SVM classification paradigm where you first train your
model with a labelled data-set and then test it with a separate unlabelled
data-set.
How would that translate in the streaming online processing paradigm of
SAMOA ?

I noticed there are a lot of classifications tasks available that are not
listed in the documentation is there a reason for that ?

Kind Regards,

Ilias Bertsimas.


On 13 May 2015 at 14:03, Bertsimas Ilias <[email protected]> wrote:

> Hi all!
>
> I am in the process of running some tests for online machine learning in
> data streams from social media. I came across apache-SAMOA and seemed like
> a very interesting framework.
> However it was not possible to figure out how to get it to test and train
> using a sparse array of tf-idf feature vectors. I provide the data in the
> standard WEKA arff format and although it run, the output is something
> along the lines of:
>
> 2015-05-12 22:58:58,993 [main] INFO
>>  com.yahoo.labs.samoa.evaluation.EvaluatorProcessor
>> (EvaluatorProcessor.java:189) -
>> com.yahoo.labs.samoa.evaluation.EvaluatorProcessorid = 0
>> evaluation instances,classified instances,classifications correct
>> (percent),Kappa Statistic (percent),Kappa Temporal Statistic (percent)
>> 100.0,100.0,100.0,100.0,?
>> 200.0,200.0,100.0,100.0,?
>> 300.0,300.0,100.0,100.0,?
>> 400.0,400.0,100.0,100.0,?
>> 500.0,500.0,100.0,100.0,?
>> 600.0,600.0,100.0,100.0,?
>> 700.0,700.0,100.0,100.0,?
>> 800.0,800.0,100.0,100.0,?
>> 900.0,900.0,100.0,100.0,?
>> 1000.0,1000.0,100.0,100.0,?
>> 1100.0,1100.0,100.0,100.0,?
>> 1200.0,1200.0,100.0,100.0,?
>> 1300.0,1300.0,100.0,100.0,?
>> 1400.0,1400.0,100.0,100.0,?
>> 1500.0,1500.0,100.0,100.0,?
>> 1600.0,1600.0,100.0,100.0,?
>> 1700.0,1700.0,100.0,100.0,?
>> 1800.0,1800.0,100.0,100.0,?
>> 1900.0,1900.0,100.0,100.0,?
>
>
>
> I have read the documentation on the SAMOA project page but I wasn't able
> to figure out how to get classification results per instance.
> Could you please point me to the right direction in terms of acceptable
> formats SAMOA can use as stream input ? Is there a need for a labeled
> training set to be included in the data ?
>
> Any examples you could provide me with that are not already in the
> documentation would be most welcome!
>
>
> Kind Regards,
>
> Ilias Bertsimas.
>

Reply via email to