[ 
https://issues.apache.org/jira/browse/MADLIB-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan resolved MADLIB-766.
------------------------------------
    Resolution: Fixed

docs are much better now, marking as resolved

> Doc needs scenario
> ------------------
>
>                 Key: MADLIB-766
>                 URL: https://issues.apache.org/jira/browse/MADLIB-766
>             Project: Apache MADlib
>          Issue Type: Request
>         Environment: ubuntu
>            Reporter: audrey lee
>            Assignee: Rahul Iyer
>            Priority: Minor
>
> Hello World.
> I wanted to post this in the forum but the forum is acting broken.
> My first suggestion is for the project leaders to dump google groups.
> I'm new to madlib.
> My experience with ML on databases is limited to running SVM on Oracle.
> Oracle does an excellent job of explaining how to setup an SVM classification 
> system.
> Also they provide a decent comprehensive example related to some sample 
> marketing data.
> The Madlib docs do a good job of explaining how to install the Madlib.
> They do a poor job of explaining how I might use Madlib after I install it.
> I studied this page:
>   http://doc.madlib.net/v0.5/group__grp__kernmach.html
> The page displays this syntax:
> t(x) = if x[1] > 0 and  x[2] < 0 then 1 else -1;
> What kind of syntax is this?
> Where/How do I run it?
> Why do I want to run it?
> I tried this:
> oracle@vbox3:/pt/s/r/madlib$ psql rsinp
> psql (9.1.7, server 9.1.8)
> Type "help" for help.
> rsinp=# t(x) = if x[1] > 0 and  x[2] < 0 then 1 else -1;
> ERROR:  syntax error at or near "t"
> LINE 1: t(x) = if x[1] > 0 and  x[2] < 0 then 1 else -1;
>         ^
> At this point I'd suggest the doc-authors toss this discussion about t(x) in 
> the trash.
> I find it useless.
> I returned my attention to 
>   http://doc.madlib.net/v0.5/group__grp__kernmach.html
> I read this:
> Examples:
> As a general first step, we need to prepare and populate an input table/view 
> with the following structure:
> TABLE/VIEW my_schema.my_input_table 
> (       
>         id    INT,       -- point ID
>         ind   FLOAT8[],  -- data point
>         label FLOAT8     -- label of data point
> );
> I wrote this:
> rsinp=# CREATE TABLE my_train_data (id INT, ind FLOAT8[], label FLOAT8);
> CREATE TABLE
> rsinp=# 
> rsinp=# INSERT INTO my_train_data VALUES (100, '{1.1, 2.2}', 10.1);
> INSERT 0 1
> rsinp=# INSERT INTO my_train_data VALUES (101, '{3.3, 4.4}', 20.1);
> INSERT 0 1
> rsinp=# select MADlib.svm_generate_reg_data('my_train_data', 2, 2);
>  svm_generate_reg_data 
> -----------------------
>  
> (1 row)
> rsinp=#
> At this point I'm convinced that I've installed Madlib and I'm happy with the 
> installation documentation.
> But now I encounter a conceptual roadblock which I cant resolve by reading 
> the content listed below:
>   http://doc.madlib.net/v0.5/index.html
> When I encounter new technology I prefer scenarios and usage examples over 
> general descriptions.
> My understanding of SVM classification can be described with this limited 
> scenario:
> I want to predict if IBM will go up or down tomorrow.
> I assign +1 to the class of 'up' and -1 to the class of 'down'.
> The data I want to learn from is a set of 2 dimensional vectors.
> The first element is the 5 day moving avg slope of IBM price.
> The 2nd element is the temperature of NYC at 9am.
> The id of each vector is the daynumber after 2000-01-01.
> So a set of vectors might look something like this:
> [3, 0.001, 28.5, +1]
> [33, 0.002, 38.5, -1]
> [333, -0.001, 30.5, +1]
> Then finally the vector I want to 'predict' might look like this:
> [444, 0.0015, 40.2]
> I need help understanding how to match the scenario I describe above to the 
> API described at:
> http://doc.madlib.net/v0.5/group__grp__kernmach.html
> (section: Example usage for classification)
> For example, this syntax confuses me:
>   CREATE TABLE my_train_data (id INT, ind FLOAT8[], label FLOAT8);
> I expect to see something like this:
>   CREATE TABLE my_train_data (id INT, ind FLOAT8[], label BOOLEAN);
> Question:
>   Is the label-column supposed to 'label' which class the vector resides in?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to