[ https://issues.apache.org/jira/browse/MADLIB-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Frank McQuillan updated MADLIB-1265: ------------------------------------ Description: Story `As a data scientist` I want to easily and efficiently load data from the database into PL/Python memory `so that` I can use the loaded data in my PL/Python code. Interface {code} load_to_plpythonu ( source_table, -- source table list_of_columns, -- columns you want in GD, could be '*' list_of_columns_to_exclude -- columns explicitly not to load ); {code} Arguments {code} source_table TEXT. Name of the table containing the data to load. list_of_columns TEXT. Comma-separated string of column names or expressions to load. Can also be '*' implying all columns are to be loaded (except for the ones included in the next argument that lists exclusions). The types of the columns can be mixed. Array columns can also be included in the list and will be loaded as is (i.e., not be flattened). (???) list_of_columns_to_exclude TEXT. Comma-separated string of column names to exclude from load. Typically used when 'list_of_columns' is set to '*'. {code} Details 1) This function will user facing and also will be called internally by other MADlib functions in the area of data parallel models. 2) The interface above is modeled on DT/RF. I think it should be the same general idea. Open questions 1) Is the interface above the correct one? Are there any parameters missing? 2) Can we support array columns, and is it necessary to flatten them? i.e., can we leave them unflattened, since that is preferable? Acceptance 1) Load MNIST data set from PG or GP into PL/Python and print out the a few rows of the data. 2) Load array columns and mixed type data into PL/Python and confirm that types and formats are preserved. was: Story `As a data scientist` I want to easily and efficiently load data from the database into PL/Python memory `so that` I can use the loaded data in my PL/Python code. Interface ``` load_to_plpythonu ( source_table, -- source table list_of_columns, -- columns you want in GD, could be '*' list_of_columns_to_exclude -- columns explicitly not to load ); ``` Arguments ``` source_table TEXT. Name of the table containing the data to load. list_of_columns TEXT. Comma-separated string of column names or expressions to load. Can also be '*' implying all columns are to be loaded (except for the ones included in the next argument that lists exclusions). The types of the columns can be mixed. Array columns can also be included in the list and will be loaded as is (i.e., not be flattened). (???) list_of_columns_to_exclude TEXT. Comma-separated string of column names to exclude from load. Typically used when 'list_of_columns' is set to '*'. ``` Details 1) This function will user facing and also will be called internally by other MADlib functions in the area of data parallel models. 2) The interface above is modeled on DT/RF. I think it should be the same general idea. Open questions 1) Is the interface above the correct one? Are there any parameters missing? 2) Can we support array columns, and is it necessary to flatten them? i.e., can we leave them unflattened, since that is preferable? Acceptance 1) Load MNIST data set from PG or GP into PL/Python and print out the a few rows of the data. 2) Load array columns and mixed type data into PL/Python and confirm that types and formats are preserved. > Load data from database into PL/Python > -------------------------------------- > > Key: MADLIB-1265 > URL: https://issues.apache.org/jira/browse/MADLIB-1265 > Project: Apache MADlib > Issue Type: New Feature > Components: Module: Utilities > Reporter: Frank McQuillan > Priority: Major > Fix For: v2.0 > > > Story > `As a data scientist` > I want to easily and efficiently load data from the database into PL/Python > memory > `so that` > I can use the loaded data in my PL/Python code. > Interface > {code} > load_to_plpythonu ( > source_table, -- source table > list_of_columns, -- columns you > want in GD, could be '*' > list_of_columns_to_exclude -- columns explicitly > not to load > ); > {code} > Arguments > {code} > source_table > TEXT. Name of the table containing the data to load. > list_of_columns > TEXT. Comma-separated string of column names or expressions to load. > Can also be '*' implying all columns are to be loaded (except for the ones > included > in the next argument that lists exclusions). The types of the columns can be > mixed. > Array columns can also be included in the list and will be loaded as is > (i.e., not be flattened). (???) > list_of_columns_to_exclude > TEXT. Comma-separated string of column names to exclude from load. Typically > used when 'list_of_columns' is set to '*'. > {code} > Details > 1) This function will user facing and also will be called internally by other > MADlib functions in the area of data parallel models. > 2) The interface above is modeled on DT/RF. I think it should be the same > general idea. > Open questions > 1) Is the interface above the correct one? Are there any parameters missing? > 2) Can we support array columns, and is it necessary to flatten them? i.e., > can we leave them unflattened, since that is preferable? > Acceptance > 1) Load MNIST data set from PG or GP into PL/Python and print out the a few > rows of the data. > 2) Load array columns and mixed type data into PL/Python and confirm that > types and formats are preserved. -- This message was sent by Atlassian JIRA (v7.6.3#76005)