Gianmarco, I really want to take up Samoa supporting json. Can you please point me to somewhere I can start?
- Shekar On Sun, Jul 12, 2015 at 12:20 AM, Gianmarco De Francisci Morales < [email protected]> wrote: > Hi, > > The only reason is that we inherited the format from MOA. > In practice, anything from which we can create an Instance from would be > good enough. > For example I'd like to support VW and svmLib formats. > > One caveat is that some algorithms require knowledge of the metadata for > the datasets to preallocate some data structure. > I would like to remove this dependency in the future, by having the > algorithms completely adaptable. > Though it's not as easy as it sounds :) > > Cheers, > > -- > Gianmarco > > On 11 July 2015 at 16:46, Shekar Tippur <[email protected]> wrote: > > > Gianmarco > > > > Thanks for the response. Can you please specify the format? Can you > please > > explain the reason for keeping it in a specific format? > > I would like contribute to kafka enhancement. I will look into the code > > base you pointed out. > > > > Shekar > > On Jul 11, 2015 1:36 AM, "Gianmarco De Francisci Morales" < > [email protected] > > > > > wrote: > > > > > Hi Shekar, > > > > > > At the moment we do not support JSON data. > > > The current readers support ARFF format, which is a CSV with some > header. > > > http://www.cs.waikato.ac.nz/ml/weka/arff.html > > > Adding support for JSON is doable, but it should conform to a very > > specific > > > format. > > > > > > About Kafka, we support it as a transport via Samza, but we don't have > a > > > reader for it right now. > > > Adding it would be very valuable. If you wanted to work on it I'd be > > happy > > > to help. > > > Have a look at org.apache.samoa.streams.fs.HDFSFileStreamSource, > > > and org.apache.samoa.streams.ArffFileStream for some examples. > > > > > > Cheers, > > > > > > > > > -- > > > Gianmarco > > > > > > On 10 July 2015 at 01:18, Shekar Tippur <[email protected]> wrote: > > > > > > > Hello, > > > > > > > > I am trying to use Samoa/Samza combination to apply ML for a dataset > I > > > have > > > > in JSON format. > > > > > > > > This is the document I am following: > > > > > > > > > > > > > > https://samoa.incubator.apache.org/documentation/Executing-SAMOA-with-Apache-Samza.html > > > > > > > > Couple of questions: > > > > 1. How do I point the input event to a Stream/Topic in Kafka? The > data > > is > > > > in JSON. > > > > 2. If I want to use historical data that is stored in a file, how do > I > > > > point the job to read from a file and serialise as json? > > > > > > > > bin/samoa samza target/SAMOA-Samza-0.3.0-SNAPSHOT.jar > > > > "PrequentialEvaluation -l classifiers.ensemble.Bagging -s (??)" > > > > > > > > - Shekar > > > > > > > > > >
