Frank McQuillan created MADLIB-1265:
---------------------------------------

             Summary: Load data from database into PL/Python
                 Key: MADLIB-1265
                 URL: https://issues.apache.org/jira/browse/MADLIB-1265
             Project: Apache MADlib
          Issue Type: New Feature
          Components: Module: Utilities
            Reporter: Frank McQuillan


Story

`As a data scientist`
I want to easily and efficiently load data from the database into PL/Python 
memory
`so that`
I can use the loaded data in my PL/Python code.

Interface

```
load_to_plpythonu (
                        source_table,                           -- source table
                        list_of_columns,                        -- columns you 
want in GD, could be '*'
                        list_of_columns_to_exclude      -- columns explicitly 
not to load
        );
```

Arguments
```
source_table
TEXT. Name of the table containing the data to load.

list_of_columns
TEXT. Comma-separated string of column names or expressions to load. 
Can also be '*' implying all columns are to be loaded (except for the ones 
included
 in the next argument that lists exclusions). The types of the columns can be 
mixed.  
Array columns can also be included in the list and will be loaded as is (i.e., 
not be flattened). (???)

list_of_columns_to_exclude
TEXT. Comma-separated string of column names to exclude from load. Typically 
used when 'list_of_columns' is set to '*'.
```

Details

1) This function will user facing and also will be called internally by other 
MADlib functions in the area of data parallel models.
2) The interface above is modeled on DT/RF.  I think it should be the same 
general idea.


Open questions

1) Is the interface above the correct one?  Are there any parameters missing?
2) Can we support array columns, and is it necessary to flatten them? i.e., can 
we leave them unflattened, since that is preferable?


Acceptance

1) Load MNIST data set from PG or GP into PL/Python and print out the a few 
rows of the data.
2) Load array columns and mixed type data  into PL/Python and confirm that 
types and formats are preserved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to