Hi Gianmarco, I hope my reply doesn't create create a separate thread. Sorry in advance if it does, I forgot to subscribe before sending the original message.
Here's an excerpt from my dataset in sparse array ARFF format: https://drive.google.com/file/d/0B1WaPw_KXbfkaVJ6T0lnMDFBdmc/view?usp=sharing I am coming from an SVM classification paradigm where you first train your model with a labelled data-set and then test it with a separate unlabelled data-set. How would that translate in the streaming online processing paradigm of SAMOA ? I noticed there are a lot of classifications tasks available that are not listed in the documentation is there a reason for that ? Kind Regards, Ilias Bertsimas. On 13 May 2015 at 14:03, Bertsimas Ilias <[email protected]> wrote: > Hi all! > > I am in the process of running some tests for online machine learning in > data streams from social media. I came across apache-SAMOA and seemed like > a very interesting framework. > However it was not possible to figure out how to get it to test and train > using a sparse array of tf-idf feature vectors. I provide the data in the > standard WEKA arff format and although it run, the output is something > along the lines of: > > 2015-05-12 22:58:58,993 [main] INFO >> com.yahoo.labs.samoa.evaluation.EvaluatorProcessor >> (EvaluatorProcessor.java:189) - >> com.yahoo.labs.samoa.evaluation.EvaluatorProcessorid = 0 >> evaluation instances,classified instances,classifications correct >> (percent),Kappa Statistic (percent),Kappa Temporal Statistic (percent) >> 100.0,100.0,100.0,100.0,? >> 200.0,200.0,100.0,100.0,? >> 300.0,300.0,100.0,100.0,? >> 400.0,400.0,100.0,100.0,? >> 500.0,500.0,100.0,100.0,? >> 600.0,600.0,100.0,100.0,? >> 700.0,700.0,100.0,100.0,? >> 800.0,800.0,100.0,100.0,? >> 900.0,900.0,100.0,100.0,? >> 1000.0,1000.0,100.0,100.0,? >> 1100.0,1100.0,100.0,100.0,? >> 1200.0,1200.0,100.0,100.0,? >> 1300.0,1300.0,100.0,100.0,? >> 1400.0,1400.0,100.0,100.0,? >> 1500.0,1500.0,100.0,100.0,? >> 1600.0,1600.0,100.0,100.0,? >> 1700.0,1700.0,100.0,100.0,? >> 1800.0,1800.0,100.0,100.0,? >> 1900.0,1900.0,100.0,100.0,? > > > > I have read the documentation on the SAMOA project page but I wasn't able > to figure out how to get classification results per instance. > Could you please point me to the right direction in terms of acceptable > formats SAMOA can use as stream input ? Is there a need for a labeled > training set to be included in the data ? > > Any examples you could provide me with that are not already in the > documentation would be most welcome! > > > Kind Regards, > > Ilias Bertsimas. >
