[jira] [Updated] (MADLIB-1265) Load data from database into PL/Python

Frank McQuillan (JIRA) Tue, 07 Aug 2018 12:51:09 -0700


     [ 
https://issues.apache.org/jira/browse/MADLIB-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Frank McQuillan updated MADLIB-1265:
------------------------------------
    Description: 
Story

`As a data scientist`
I want to easily and efficiently load data from the database into PL/Python 
memory
`so that`
I can use the loaded data in my PL/Python code.

Interface

{code}
load_to_plpythonu (
                        source_table,                           -- source table
                        list_of_columns,                        -- columns you 
want in GD, could be '*'
                        list_of_columns_to_exclude      -- columns explicitly 
not to load
        );
{code}

Arguments
{code}
source_table
TEXT. Name of the table containing the data to load.

list_of_columns
TEXT. Comma-separated string of column names or expressions to load. 
Can also be '*' implying all columns are to be loaded (except for the ones 
included
 in the next argument that lists exclusions). The types of the columns can be 
mixed.  
Array columns can also be included in the list and will be loaded as is (i.e., 
not be flattened). (???)

list_of_columns_to_exclude
TEXT. Comma-separated string of column names to exclude from load. Typically 
used when 'list_of_columns' is set to '*'.
{code}

Details

1) This function will user facing and also will be called internally by other 
MADlib functions in the area of data parallel models.
2) The interface above is modeled on DT/RF.  I think it should be the same 
general idea.


Open questions

1) Is the interface above the correct one?  Are there any parameters missing?
2) Can we support array columns, and is it necessary to flatten them? i.e., can 
we leave them unflattened, since that is preferable?


Acceptance

1) Load MNIST data set from PG or GP into PL/Python and print out the a few 
rows of the data.
2) Load array columns and mixed type data  into PL/Python and confirm that 
types and formats are preserved.

  was:
Story

`As a data scientist`
I want to easily and efficiently load data from the database into PL/Python 
memory
`so that`
I can use the loaded data in my PL/Python code.

Interface

```
load_to_plpythonu (
                        source_table,                           -- source table
                        list_of_columns,                        -- columns you 
want in GD, could be '*'
                        list_of_columns_to_exclude      -- columns explicitly 
not to load
        );
```

Arguments
```
source_table
TEXT. Name of the table containing the data to load.

list_of_columns
TEXT. Comma-separated string of column names or expressions to load. 
Can also be '*' implying all columns are to be loaded (except for the ones 
included
 in the next argument that lists exclusions). The types of the columns can be 
mixed.  
Array columns can also be included in the list and will be loaded as is (i.e., 
not be flattened). (???)

list_of_columns_to_exclude
TEXT. Comma-separated string of column names to exclude from load. Typically 
used when 'list_of_columns' is set to '*'.
```

Details

1) This function will user facing and also will be called internally by other 
MADlib functions in the area of data parallel models.
2) The interface above is modeled on DT/RF.  I think it should be the same 
general idea.


Open questions

1) Is the interface above the correct one?  Are there any parameters missing?
2) Can we support array columns, and is it necessary to flatten them? i.e., can 
we leave them unflattened, since that is preferable?


Acceptance

1) Load MNIST data set from PG or GP into PL/Python and print out the a few 
rows of the data.
2) Load array columns and mixed type data  into PL/Python and confirm that 
types and formats are preserved.


> Load data from database into PL/Python
> --------------------------------------
>
>                 Key: MADLIB-1265
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1265
>             Project: Apache MADlib
>          Issue Type: New Feature
>          Components: Module: Utilities
>            Reporter: Frank McQuillan
>            Priority: Major
>             Fix For: v2.0
>
>
> Story
> `As a data scientist`
> I want to easily and efficiently load data from the database into PL/Python 
> memory
> `so that`
> I can use the loaded data in my PL/Python code.
> Interface
> {code}
> load_to_plpythonu (
>                       source_table,                           -- source table
>                       list_of_columns,                        -- columns you 
> want in GD, could be '*'
>                       list_of_columns_to_exclude      -- columns explicitly 
> not to load
>       );
> {code}
> Arguments
> {code}
> source_table
> TEXT. Name of the table containing the data to load.
> list_of_columns
> TEXT. Comma-separated string of column names or expressions to load. 
> Can also be '*' implying all columns are to be loaded (except for the ones 
> included
>  in the next argument that lists exclusions). The types of the columns can be 
> mixed.  
> Array columns can also be included in the list and will be loaded as is 
> (i.e., not be flattened). (???)
> list_of_columns_to_exclude
> TEXT. Comma-separated string of column names to exclude from load. Typically 
> used when 'list_of_columns' is set to '*'.
> {code}
> Details
> 1) This function will user facing and also will be called internally by other 
> MADlib functions in the area of data parallel models.
> 2) The interface above is modeled on DT/RF.  I think it should be the same 
> general idea.
> Open questions
> 1) Is the interface above the correct one?  Are there any parameters missing?
> 2) Can we support array columns, and is it necessary to flatten them? i.e., 
> can we leave them unflattened, since that is preferable?
> Acceptance
> 1) Load MNIST data set from PG or GP into PL/Python and print out the a few 
> rows of the data.
> 2) Load array columns and mixed type data  into PL/Python and confirm that 
> types and formats are preserved.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (MADLIB-1265) Load data from database into PL/Python

Reply via email to