Frank McQuillan created MADLIB-1274:
---------------------------------------
Summary: Association rules hangs/errors out for toy example
Key: MADLIB-1274
URL: https://issues.apache.org/jira/browse/MADLIB-1274
Project: Apache MADlib
Issue Type: Bug
Components: Module: Association Rules
Reporter: Frank McQuillan
Error observed on:
* Postgres 9.6
* Greenplum Database 5.9.0
This is a small AWS single node GP, 4 segments on a machine with 8 VCPUs, and
plenty of available memory
[gpadmin@ip-172-21-0-246 RetailDemo]$ cat /proc/meminfo
MemTotal: 62711428 kB
MemFree: 59786076 kB
MemAvailable: 60281836 kB
Load data
```
DROP TABLE IF EXISTS order_items;
CREATE TABLE order_items( itemid INTEGER,
orderid INTEGER,
productid INTEGER,
quantity INTEGER,
productname TEXT);
INSERT INTO order_items VALUES
( 5 , 1044 , 9 , 3 , 'Kirby cukes'),
( 11 , 37 , 2 , 3 , 'Ooopsi Cola'),
( 12 , 37 , 21 , 3 , 'black radish'),
( 15 , 37 , 49 , 3 , 'Leg of lamb'),
( 18 , 37 , 37 , 3 , 'Uggo Waffles'),
( 20 , 37 , 76 , 3 , 'Happy Valley White Peaches'),
( 21 , 37 , 29 , 3 , 'Breakstone Whole Milk Cottage
Cheese'),
( 22 , 37 , 25 , 3 , 'ugli fruit'),
( 4 , 1044 , 44 , 3 , 'ground beef'),
( 6 , 1044 , 17 , 3 , 'napa'),
( 9 , 1044 , 10 , 3 , 'dill'),
( 13 , 37 , 21 , 3 , 'black radish'),
( 24 , 37 , 47 , 3 , 'Ball Park Franks'),
( 25 , 37 , 69 , 3 , 'Ball Park Mustard'),
( 26 , 37 , 64 , 3 , 'Ballpark Hot Dog Rolls'),
( 27 , 1044 , 47 , 3 , 'Ball Park Franks'),
( 28 , 1044 , 69 , 3 , 'Ball Park Mustard'),
( 29 , 1044 , 64 , 3 , 'Ballpark Hot Dog Rolls'),
( 30 , 1044 , 70 , 3 , 'Homer''s Strawberry Jam'),
( 31 , 1044 , 71 , 3 , 'Mr Peanut Peanut Butter'),
( 32 , 37 , 71 , 3 , 'Mr Peanut Peanut Butter'),
( 33 , 37 , 70 , 3 , 'Homer''s Strawberry Jam'),
( 1 , 1044 , 1 , 3 , 'Pivotal Apple Juice'),
( 3 , 1044 , 77 , 3 , 'Pivotal Baked Beans'),
( 14 , 37 , 53 , 3 , 'Old Zurich Swiss Cheese'),
( 17 , 37 , 49 , 3 , 'Leg of lamb'),
( 19 , 37 , 18 , 3 , 'california navels'),
( 2 , 1044 , 41 , 3 , '12" Dinner Plates'),
( 7 , 1044 , 32 , 3 , 'Vermot Extra Sharp Cheddar'),
( 8 , 1044 , 71 , 3 , 'Mr Peanut Peanut Butter'),
( 10 , 1044 , 39 , 3 , 'Pivotal Soft and Smooth 24 pack'),
( 16 , 37 , 22 , 3 , 'triple wahsed spinach'),
( 23 , 37 , 61 , 3 , 'Brooklyn Bagel 6 pack');
```
(1)
Run assoc rules:
```
SELECT * FROM madlib.assoc_rules( .25,
.5,
'orderid',
'productid',
'order_items',
NULL,
TRUE
);
```
does not return.
(2)
Run assoc rules with output table specified results in:
```
SELECT * FROM madlib.assoc_rules(.10, -- Support
.10, -- Confidence
'orderid', -- Transaction id col
'productname', -- Product col
'order_items', -- Input data
'pivotalmarkets', -- Output data
TRUE); -- Verbose
```
results in error:
```
InternalError: (psycopg2.InternalError) plpy.Error: the output schema does not
exist
CONTEXT: Traceback (most recent call last):
PL/Python function "assoc_rules", line 31, in <module>
'NULL'
PL/Python function "assoc_rules", line 107, in assoc_rules
PL/Python function "assoc_rules", line 21, in __assert
PL/Python function "assoc_rules"
[SQL: "SELECT * FROM madlib.assoc_rules(.10, -- Support\n
.10, -- Confidence\n
'orderid', -- Transaction id col\n
'productname', -- Product col\n
'order_items', -- Input data\n
'pivotalmarkets', -- Output data\n TRUE);
-- Verbose"]
```
Other info on failure on GP:
```
The original table was distributed randomly. If distributed by trans_id, the
code completes. I get no assoc_rules, but it doesn’t run forever.
If test_data is distributed randomly, the function returns, but there are no
assoc_rules. So the behavior is different depending upon the table
distribution.
There may be a tiny data set issue where there are no rules that meet the
support and confidence thresholds.
```
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)