Frank McQuillan created MADLIB-1265: ---------------------------------------
Summary: Load data from database into PL/Python Key: MADLIB-1265 URL: https://issues.apache.org/jira/browse/MADLIB-1265 Project: Apache MADlib Issue Type: New Feature Components: Module: Utilities Reporter: Frank McQuillan Story `As a data scientist` I want to easily and efficiently load data from the database into PL/Python memory `so that` I can use the loaded data in my PL/Python code. Interface ``` load_to_plpythonu ( source_table, -- source table list_of_columns, -- columns you want in GD, could be '*' list_of_columns_to_exclude -- columns explicitly not to load ); ``` Arguments ``` source_table TEXT. Name of the table containing the data to load. list_of_columns TEXT. Comma-separated string of column names or expressions to load. Can also be '*' implying all columns are to be loaded (except for the ones included in the next argument that lists exclusions). The types of the columns can be mixed. Array columns can also be included in the list and will be loaded as is (i.e., not be flattened). (???) list_of_columns_to_exclude TEXT. Comma-separated string of column names to exclude from load. Typically used when 'list_of_columns' is set to '*'. ``` Details 1) This function will user facing and also will be called internally by other MADlib functions in the area of data parallel models. 2) The interface above is modeled on DT/RF. I think it should be the same general idea. Open questions 1) Is the interface above the correct one? Are there any parameters missing? 2) Can we support array columns, and is it necessary to flatten them? i.e., can we leave them unflattened, since that is preferable? Acceptance 1) Load MNIST data set from PG or GP into PL/Python and print out the a few rows of the data. 2) Load array columns and mixed type data into PL/Python and confirm that types and formats are preserved. -- This message was sent by Atlassian JIRA (v7.6.3#76005)