Frank McQuillan created MADLIB-1274:
---------------------------------------

             Summary: Association rules hangs/errors out for toy example
                 Key: MADLIB-1274
                 URL: https://issues.apache.org/jira/browse/MADLIB-1274
             Project: Apache MADlib
          Issue Type: Bug
          Components: Module: Association Rules
            Reporter: Frank McQuillan


Error observed on:
* Postgres 9.6

* Greenplum Database 5.9.0
This is a small AWS single node GP, 4 segments on a machine with 8  VCPUs, and 
plenty of available memory
[gpadmin@ip-172-21-0-246 RetailDemo]$ cat /proc/meminfo
MemTotal:       62711428 kB
MemFree:        59786076 kB
MemAvailable:   60281836 kB


Load data
```
DROP TABLE IF EXISTS order_items;
CREATE TABLE order_items(  itemid INTEGER,
                           orderid INTEGER,
                           productid INTEGER,
                           quantity INTEGER,
                           productname TEXT);                        
INSERT INTO order_items VALUES
(      5 ,    1044 ,         9 ,        3 , 'Kirby cukes'),
(     11 ,      37 ,         2 ,        3 , 'Ooopsi Cola'),
(     12 ,      37 ,        21 ,        3 , 'black radish'),
(     15 ,      37 ,        49 ,        3 , 'Leg of lamb'),
(     18 ,      37 ,        37 ,        3 , 'Uggo Waffles'),
(     20 ,      37 ,        76 ,        3 , 'Happy Valley White Peaches'),
(     21 ,      37 ,        29 ,        3 , 'Breakstone Whole Milk Cottage 
Cheese'),
(     22 ,      37 ,        25 ,        3 , 'ugli fruit'),
(      4 ,    1044 ,        44 ,        3 , 'ground beef'),
(      6 ,    1044 ,        17 ,        3 , 'napa'),
(      9 ,    1044 ,        10 ,        3 , 'dill'),
(     13 ,      37 ,        21 ,        3 , 'black radish'),
(     24 ,      37 ,        47 ,        3 , 'Ball Park Franks'),
(     25 ,      37 ,        69 ,        3 , 'Ball Park Mustard'),
(     26 ,      37 ,        64 ,        3 , 'Ballpark Hot Dog Rolls'),
(     27 ,    1044 ,        47 ,        3 , 'Ball Park Franks'),
(     28 ,    1044 ,        69 ,        3 , 'Ball Park Mustard'),
(     29 ,    1044 ,        64 ,        3 , 'Ballpark Hot Dog Rolls'),
(     30 ,    1044 ,        70 ,        3 , 'Homer''s Strawberry Jam'),
(     31 ,    1044 ,        71 ,        3 , 'Mr Peanut Peanut Butter'),
(     32 ,      37 ,        71 ,        3 , 'Mr Peanut Peanut Butter'),
(     33 ,      37 ,        70 ,        3 , 'Homer''s Strawberry Jam'),
(      1 ,    1044 ,         1 ,        3 , 'Pivotal Apple Juice'),
(      3 ,    1044 ,        77 ,        3 , 'Pivotal Baked Beans'),
(     14 ,      37 ,        53 ,        3 , 'Old Zurich Swiss Cheese'),
(     17 ,      37 ,        49 ,        3 , 'Leg of lamb'),
(     19 ,      37 ,        18 ,        3 , 'california navels'),
(      2 ,    1044 ,        41 ,        3 , '12" Dinner Plates'),
(      7 ,    1044 ,        32 ,        3 , 'Vermot Extra Sharp Cheddar'),
(      8 ,    1044 ,        71 ,        3 , 'Mr Peanut Peanut Butter'),
(     10 ,    1044 ,        39 ,        3 , 'Pivotal Soft and Smooth 24 pack'),
(     16 ,      37 ,        22 ,        3 , 'triple wahsed spinach'),
(     23 ,      37 ,        61 ,        3 , 'Brooklyn Bagel 6 pack');
```

(1)
Run assoc rules:
```
SELECT * FROM madlib.assoc_rules( .25,
                                  .5,
                                  'orderid',
                                  'productid',
                                  'order_items',
                                  NULL,
                                  TRUE
                                );
```
does not return.

(2)
Run assoc rules with output table specified results in:
```
SELECT * FROM madlib.assoc_rules(.10,                  -- Support
                                 .10,                  -- Confidence
                                 'orderid',            -- Transaction id col
                                 'productname',        -- Product col
                                 'order_items',        -- Input data
                                 'pivotalmarkets',     -- Output data
                                 TRUE);                -- Verbose

```
results in error:
```
InternalError: (psycopg2.InternalError) plpy.Error: the output schema does not 
exist
CONTEXT:  Traceback (most recent call last):
  PL/Python function "assoc_rules", line 31, in <module>
    'NULL'
  PL/Python function "assoc_rules", line 107, in assoc_rules
  PL/Python function "assoc_rules", line 21, in __assert
PL/Python function "assoc_rules"
 [SQL: "SELECT * FROM madlib.assoc_rules(.10,                  -- Support\n     
                            .10,                  -- Confidence\n               
                  'orderid',            -- Transaction id col\n                 
                'productname',        -- Product col\n                          
       'order_items',        -- Input data\n                                 
'pivotalmarkets',     -- Output data\n                                 TRUE);   
             -- Verbose"]
```

Other info on failure on GP:
```
The original table was distributed randomly.  If distributed by trans_id, the 
code completes.  I get no assoc_rules, but it doesn’t run forever.

If test_data is distributed randomly, the function returns, but there are no 
assoc_rules.  So the behavior is different depending upon the table 
distribution.

There may be a tiny data set issue where there are no rules that meet the 
support and confidence thresholds. 
```






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to