[ https://issues.apache.org/jira/browse/CASSANDRA-11194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251727#comment-15251727 ]
Keith Wansbrough commented on CASSANDRA-11194: ---------------------------------------------- We would also find it very useful to be able to create a materialized view on a set. The {{explode}} syntax looks good for this: {code} CREATE TABLE customers ( id text PRIMARY KEY, data text, phones frozen<set<text>> ); CREATE MATERIALIZED VIEW customers_by_phone AS SELECT explode(phones), id FROM customers WHERE phones IS NOT NULL; {code} We have a database of customers with an ID as primary key. Each customer has zero or more phone numbers. We would like to be able to create a materialized view so we can look up by phone number. Our current schema uses a frozen set for this, but either frozen or unfrozen would be fine. > materialized views - support explode() on collections > ----------------------------------------------------- > > Key: CASSANDRA-11194 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11194 > Project: Cassandra > Issue Type: New Feature > Reporter: Jon Haddad > > I'm working on a database design to model a product catalog. Products can > belong to categories. Categories can belong to multiple sub categories > (think about Amazon's complex taxonomies). > My category table would look like this, giving me individual categories & > their parents: > {code} > CREATE TABLE category ( > category_id uuid primary key, > name text, > parents set<uuid> > ); > {code} > To get a list of all the children of a particular category, I need a table > that looks like the following: > {code} > CREATE TABLE categories_by_parent ( > parent_id uuid, > category_id uuid, > name text, > primary key (parent_id, category_id) > ); > {code} > The important thing to note here is that a single category can have multiple > parents. > I'd like to propose support for collections in materialized views via an > explode() function that would create 1 row per item in the collection. For > instance, I'll insert the following 3 rows (2 parents, 1 child) into the > category table: > {code} > insert into category (category_id, name, parents) values > (009fe0e1-5b09-4efc-a92d-c03720324a4f, 'Parent', null); > insert into category (category_id, name, parents) values > (1f2914de-0adf-4afc-b7ad-ddd8dc876ab1, 'Parent2', null); > insert into category (category_id, name, parents) values > (1f93bc07-9874-42a5-a7d1-b741dc9c509c, 'Child', > {009fe0e1-5b09-4efc-a92d-c03720324a4f, 1f2914de-0adf-4afc-b7ad-ddd8dc876ab1 > }); > cqlsh:test> select * from category; > category_id | name | parents > --------------------------------------+---------+------------------------------------------------------------------------------ > 009fe0e1-5b09-4efc-a92d-c03720324a4f | Parent | > null > 1f2914de-0adf-4afc-b7ad-ddd8dc876ab1 | Parent2 | > null > 1f93bc07-9874-42a5-a7d1-b741dc9c509c | Child | > {009fe0e1-5b09-4efc-a92d-c03720324a4f, 1f2914de-0adf-4afc-b7ad-ddd8dc876ab1} > (3 rows) > {code} > Given the following CQL to select the child category, utilizing an explode > function, I would expect to get back 2 rows, 1 for each parent: > {code} > select category_id, name, explode(parents) as parent_id from category where > category_id = 1f93bc07-9874-42a5-a7d1-b741dc9c509c; > category_id | name | parent_id > --------------------------------------+-------+-------------------------------------- > 1f93bc07-9874-42a5-a7d1-b741dc9c509c | Child | > 009fe0e1-5b09-4efc-a92d-c03720324a4f > 1f93bc07-9874-42a5-a7d1-b741dc9c509c | Child | > 1f2914de-0adf-4afc-b7ad-ddd8dc876ab1 > (2 rows) > {code} > This functionality would ideally apply to materialized views, since the > ability to control partitioning here would allow us to efficiently query our > MV for all categories belonging to a parent in a complex taxonomy. > {code} > CREATE MATERIALIZED VIEW categories_by_parent as > SELECT explode(parents) as parent_id, > category_id, name FROM category WHERE parents IS NOT NULL > {code} > The explode() function is available in Spark Dataframes and my proposed > function has the same behavior: > http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.functions.explode -- This message was sent by Atlassian JIRA (v6.3.4#6332)