[ https://issues.apache.org/jira/browse/MADLIB-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Frank McQuillan resolved MADLIB-766. ------------------------------------ Resolution: Fixed docs are much better now, marking as resolved > Doc needs scenario > ------------------ > > Key: MADLIB-766 > URL: https://issues.apache.org/jira/browse/MADLIB-766 > Project: Apache MADlib > Issue Type: Request > Environment: ubuntu > Reporter: audrey lee > Assignee: Rahul Iyer > Priority: Minor > > Hello World. > I wanted to post this in the forum but the forum is acting broken. > My first suggestion is for the project leaders to dump google groups. > I'm new to madlib. > My experience with ML on databases is limited to running SVM on Oracle. > Oracle does an excellent job of explaining how to setup an SVM classification > system. > Also they provide a decent comprehensive example related to some sample > marketing data. > The Madlib docs do a good job of explaining how to install the Madlib. > They do a poor job of explaining how I might use Madlib after I install it. > I studied this page: > http://doc.madlib.net/v0.5/group__grp__kernmach.html > The page displays this syntax: > t(x) = if x[1] > 0 and x[2] < 0 then 1 else -1; > What kind of syntax is this? > Where/How do I run it? > Why do I want to run it? > I tried this: > oracle@vbox3:/pt/s/r/madlib$ psql rsinp > psql (9.1.7, server 9.1.8) > Type "help" for help. > rsinp=# t(x) = if x[1] > 0 and x[2] < 0 then 1 else -1; > ERROR: syntax error at or near "t" > LINE 1: t(x) = if x[1] > 0 and x[2] < 0 then 1 else -1; > ^ > At this point I'd suggest the doc-authors toss this discussion about t(x) in > the trash. > I find it useless. > I returned my attention to > http://doc.madlib.net/v0.5/group__grp__kernmach.html > I read this: > Examples: > As a general first step, we need to prepare and populate an input table/view > with the following structure: > TABLE/VIEW my_schema.my_input_table > ( > id INT, -- point ID > ind FLOAT8[], -- data point > label FLOAT8 -- label of data point > ); > I wrote this: > rsinp=# CREATE TABLE my_train_data (id INT, ind FLOAT8[], label FLOAT8); > CREATE TABLE > rsinp=# > rsinp=# INSERT INTO my_train_data VALUES (100, '{1.1, 2.2}', 10.1); > INSERT 0 1 > rsinp=# INSERT INTO my_train_data VALUES (101, '{3.3, 4.4}', 20.1); > INSERT 0 1 > rsinp=# select MADlib.svm_generate_reg_data('my_train_data', 2, 2); > svm_generate_reg_data > ----------------------- > > (1 row) > rsinp=# > At this point I'm convinced that I've installed Madlib and I'm happy with the > installation documentation. > But now I encounter a conceptual roadblock which I cant resolve by reading > the content listed below: > http://doc.madlib.net/v0.5/index.html > When I encounter new technology I prefer scenarios and usage examples over > general descriptions. > My understanding of SVM classification can be described with this limited > scenario: > I want to predict if IBM will go up or down tomorrow. > I assign +1 to the class of 'up' and -1 to the class of 'down'. > The data I want to learn from is a set of 2 dimensional vectors. > The first element is the 5 day moving avg slope of IBM price. > The 2nd element is the temperature of NYC at 9am. > The id of each vector is the daynumber after 2000-01-01. > So a set of vectors might look something like this: > [3, 0.001, 28.5, +1] > [33, 0.002, 38.5, -1] > [333, -0.001, 30.5, +1] > Then finally the vector I want to 'predict' might look like this: > [444, 0.0015, 40.2] > I need help understanding how to match the scenario I describe above to the > API described at: > http://doc.madlib.net/v0.5/group__grp__kernmach.html > (section: Example usage for classification) > For example, this syntax confuses me: > CREATE TABLE my_train_data (id INT, ind FLOAT8[], label FLOAT8); > I expect to see something like this: > CREATE TABLE my_train_data (id INT, ind FLOAT8[], label BOOLEAN); > Question: > Is the label-column supposed to 'label' which class the vector resides in? -- This message was sent by Atlassian JIRA (v7.6.3#76005)