[ 
https://issues.apache.org/jira/browse/MADLIB-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan updated MADLIB-1274:
------------------------------------
    Summary: Association rules error on output schema  (was: Association rules 
hangs/errors out for toy example)

> Association rules error on output schema
> ----------------------------------------
>
>                 Key: MADLIB-1274
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1274
>             Project: Apache MADlib
>          Issue Type: Bug
>          Components: Module: Association Rules
>            Reporter: Frank McQuillan
>            Priority: Major
>             Fix For: v1.15.1
>
>
> Error observed on:
> * Postgres 9.6
> * Greenplum Database 5.9.0
> This is a small AWS single node GP, 4 segments on a machine with 8  VCPUs, 
> and plenty of available memory
> [gpadmin@ip-172-21-0-246 RetailDemo]$ cat /proc/meminfo
> MemTotal:       62711428 kB
> MemFree:        59786076 kB
> MemAvailable:   60281836 kB
> Load data
> {code}
> DROP TABLE IF EXISTS order_items;
> CREATE TABLE order_items(  itemid INTEGER,
>                            orderid INTEGER,
>                            productid INTEGER,
>                            quantity INTEGER,
>                            productname TEXT);                        
> INSERT INTO order_items VALUES
> (      5 ,    1044 ,         9 ,        3 , 'Kirby cukes'),
> (     11 ,      37 ,         2 ,        3 , 'Ooopsi Cola'),
> (     12 ,      37 ,        21 ,        3 , 'black radish'),
> (     15 ,      37 ,        49 ,        3 , 'Leg of lamb'),
> (     18 ,      37 ,        37 ,        3 , 'Uggo Waffles'),
> (     20 ,      37 ,        76 ,        3 , 'Happy Valley White Peaches'),
> (     21 ,      37 ,        29 ,        3 , 'Breakstone Whole Milk Cottage 
> Cheese'),
> (     22 ,      37 ,        25 ,        3 , 'ugli fruit'),
> (      4 ,    1044 ,        44 ,        3 , 'ground beef'),
> (      6 ,    1044 ,        17 ,        3 , 'napa'),
> (      9 ,    1044 ,        10 ,        3 , 'dill'),
> (     13 ,      37 ,        21 ,        3 , 'black radish'),
> (     24 ,      37 ,        47 ,        3 , 'Ball Park Franks'),
> (     25 ,      37 ,        69 ,        3 , 'Ball Park Mustard'),
> (     26 ,      37 ,        64 ,        3 , 'Ballpark Hot Dog Rolls'),
> (     27 ,    1044 ,        47 ,        3 , 'Ball Park Franks'),
> (     28 ,    1044 ,        69 ,        3 , 'Ball Park Mustard'),
> (     29 ,    1044 ,        64 ,        3 , 'Ballpark Hot Dog Rolls'),
> (     30 ,    1044 ,        70 ,        3 , 'Homer''s Strawberry Jam'),
> (     31 ,    1044 ,        71 ,        3 , 'Mr Peanut Peanut Butter'),
> (     32 ,      37 ,        71 ,        3 , 'Mr Peanut Peanut Butter'),
> (     33 ,      37 ,        70 ,        3 , 'Homer''s Strawberry Jam'),
> (      1 ,    1044 ,         1 ,        3 , 'Pivotal Apple Juice'),
> (      3 ,    1044 ,        77 ,        3 , 'Pivotal Baked Beans'),
> (     14 ,      37 ,        53 ,        3 , 'Old Zurich Swiss Cheese'),
> (     17 ,      37 ,        49 ,        3 , 'Leg of lamb'),
> (     19 ,      37 ,        18 ,        3 , 'california navels'),
> (      2 ,    1044 ,        41 ,        3 , '12" Dinner Plates'),
> (      7 ,    1044 ,        32 ,        3 , 'Vermot Extra Sharp Cheddar'),
> (      8 ,    1044 ,        71 ,        3 , 'Mr Peanut Peanut Butter'),
> (     10 ,    1044 ,        39 ,        3 , 'Pivotal Soft and Smooth 24 
> pack'),
> (     16 ,      37 ,        22 ,        3 , 'triple wahsed spinach'),
> (     23 ,      37 ,        61 ,        3 , 'Brooklyn Bagel 6 pack');
> {code}
> (1)
> Run assoc rules:
> {code}
> SELECT * FROM madlib.assoc_rules( .25,
>                                   .5,
>                                   'orderid',
>                                   'productid',
>                                   'order_items',
>                                   NULL,
>                                   TRUE
>                                 );
> {code}
> does not return.
> Other info on failure on GP:
> {code}
> The original table was distributed randomly.  If distributed by trans_id, the 
> code completes.  I get no assoc_rules, but it doesn’t run forever.
> If test_data is distributed randomly, the function returns, but there are no 
> assoc_rules.  So the behavior is different depending upon the table 
> distribution.
> There may be a tiny data set issue where there are no rules that meet the 
> support and confidence thresholds. 
> {code}
> (2)
> Run assoc rules with output table specified results in:
> {code}
> SELECT * FROM madlib.assoc_rules(.10,                  -- Support
>                                  .10,                  -- Confidence
>                                  'orderid',            -- Transaction id col
>                                  'productname',        -- Product col
>                                  'order_items',        -- Input data
>                                  'pivotalmarkets',     -- Output data
>                                  TRUE);                -- Verbose
> {code}
> results in error:
> {code}
> InternalError: (psycopg2.InternalError) plpy.Error: the output schema does 
> not exist
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "assoc_rules", line 31, in <module>
>     'NULL'
>   PL/Python function "assoc_rules", line 107, in assoc_rules
>   PL/Python function "assoc_rules", line 21, in __assert
> PL/Python function "assoc_rules"
>  [SQL: "SELECT * FROM madlib.assoc_rules(.10,                  -- Support\n   
>                               .10,                  -- Confidence\n           
>                       'orderid',            -- Transaction id col\n           
>                       'productname',        -- Product col\n                  
>                'order_items',        -- Input data\n                          
>        'pivotalmarkets',     -- Output data\n                                 
> TRUE);                -- Verbose"]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to