[ https://issues.apache.org/jira/browse/ASTERIXDB-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Glenn Justo Galvizo updated ASTERIXDB-3059: ------------------------------------------- Description: It would be nice to have an option to exclude certain fields when using SELECT *. [BigQuery|https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except] gives users the option to exclude columns with 'SELECT * EXCEPT f1, f2, ...'. 'EXCEPT' seems like a nice addition to our grammar, here are a few examples on how we can extend our SELECT clause to include 'EXCEPT': Let's start off with some DDLs and some data: {code:sql} CREATE TYPE GenericType AS { _id: uuid }; CREATE DATASET Users (GenericType) PRIMARY KEY _id AUTOGENERATED; CREATE DATASET Friends (GenericType) PRIMARY KEY _id AUTOGENERATED; INSERT INTO Users [ { "user_id": 1, "name": "Sally", "address": { "zip_code": "92512", "state": "CA" } }, { "user_id": 2, "name": "Mark" }, { "user_id": 3, "name": "John" }, { "user_id": 4, "name": "Tracy", "title": "Dr" } ]; INSERT INTO Friends [ { "user_id": 1, "friend": 2 }, { "user_id": 1, "friend": 3 }, { "user_id": 2, "friend": 1 }, { "user_id": 3, "friend": 1 } ];{code} ---- Now we can get into the examples: 1. Get me all users U1 that have friends. Do not include the F dataset. In the example below, we have a 'SELECT *' followed by an 'EXCEPT ( F )'. In contrast to BigQuery, we do not specify columns / fields in our exclusion list here, instead we specify an exclusion of _variables_ that would have otherwise been returned with our 'SELECT' clause. {code:sql} FROM Users U1, Friends F WHERE U1.user_id = F.user_id SELECT * EXCEPT F;{code} Returns the following: {code:java} > { "U1": { "user_id": 1, "name": "Sally", "address": { "zip_code": "92512", "state": "CA" } } } > { "U1": { "user_id": 2, "name": "Mark" } } > { "U1": { "user_id": 3, "name": "John" } }{code} 2. Get me all users U1 that have friends. Do not include that user's address and title. The example below more closely aligns with BigQuery, as we specify columns / fields that we want to exclude from U1 in the exclusion list 'address' and 'title'. {code:sql} FROM Users U1, Friends F WHERE U1.user_id = F.user_id SELECT DISTINCT U1.* EXCEPT address, title;{code} Returns the following: {code:java} > { "user_id": 1, "name": "Sally" } > { "user_id": 2, "name": "Mark" } > { "user_id": 3, "name": "John" }{code} 3. If we want to exclude some field in a nested object, we can specify the fields within our object with the '.' syntax. Say we don't want to include the zipcode of an address object within a user that has friends. We would use the query below: {code:sql} FROM Users U1, Friends F WHERE U1.user_id = F.user_id SELECT DISTINCT U1.* EXCEPT address.zip_code;{code} Returns the following: {code:java} > { "user_id": 1, "name": "Sally", "address": { "state": "CA" } } > { "user_id": 2, "name": "Mark" } > { "user_id": 3, "name": "John" }{code} 4. Now suppose we want users and the user documents of their immediate friends, excluding addresses from both user documents. We would use the following query: {code:sql} FROM Users U1, Friends F, Users U2 WHERE U1.user_id = F.user_id AND U2.user_id = F.friend SELECT U1, U2 EXCEPT U1.address, U2.address;{code} Returns the following: {code:java} > { "U1": { "user_id": 1, "name": "Sally" }, "U2": { "user_id": 2, "name": "Mark" } } > { "U1": { "user_id": 1, "name": "Sally" }, "U2": { "user_id": 3, "name": "John" } } > { "U1": { "user_id": 2, "name": "Mark" }, "U2": { "user_id": 1, "name": "Sally" } } > { "U1": { "user_id": 3, "name": "John" }, "U2": { "user_id": 1, "name": "Sally" } }{code} ---- The exact grammar I have in mind is given below, where we simply add the 'EXCEPT' at the tail of our 'SELECT' clause. Logically, the 'EXCEPT' clause surrounds our SELECT- EXPRESSION with a call to OBJECT_REMOVE_FIELDS. Using #2 as an example, a logical equivalent would be: {code:sql} FROM ( FROM Users U1, Friends F WHERE U1.user_id = F.user_id SELECT DISTINCT U1.* ) TMP SELECT VALUE OBJECT_REMOVE_FIELDS(TMP, ["address", "title"]);{code} was: It would be nice to have an option to exclude certain fields when using SELECT *. [BigQuery|https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except] gives users the option to exclude columns with 'SELECT * EXCEPT f1, f2, ...'. 'EXCEPT' seems like a nice addition to our grammar, here are a few examples on how we can extend our SELECT clause to include 'EXCEPT': Let's start off with some DDLs and some data: {code:sql} CREATE TYPE GenericType AS { _id: uuid }; CREATE DATASET Users (GenericType) PRIMARY KEY _id AUTOGENERATED; CREATE DATASET Friends (GenericType) PRIMARY KEY _id AUTOGENERATED; INSERT INTO Users [ { "user_id": 1, "name": "Sally", "address": { "zip_code": "92512", "state": "CA" } }, { "user_id": 2, "name": "Mark" }, { "user_id": 3, "name": "John" }, { "user_id": 4, "name": "Tracy", "title": "Dr" } ]; INSERT INTO Friends [ { "user_id": 1, "friend": 2 }, { "user_id": 1, "friend": 3 }, { "user_id": 2, "friend": 1 }, { "user_id": 3, "friend": 1 } ];{code} ---- Now we can get into the examples: 1. Get me all users U1 that have friends. Do not include the F dataset. In the example below, we have a 'SELECT *' followed by an 'EXCEPT ( F )'. In contrast to BigQuery, we do not specify columns / fields in our exclusion list here, instead we specify an exclusion of _variables_ that would have otherwise been returned with our 'SELECT' clause. {code:sql} FROM Users U1, Friends F WHERE U1.user_id = F.user_id SELECT * EXCEPT F;{code} Returns the following: {code:java} > { "U1": { "user_id": 1, "name": "Sally", "address": { "zip_code": "92512", "state": "CA" } } } > { "U1": { "user_id": 2, "name": "Mark" } } > { "U1": { "user_id": 3, "name": "John" } }{code} 2. Get me all users U1 that have friends. Do not include that user's address and title. The example below more closely aligns with BigQuery, as we specify columns / fields that we want to exclude from U1 in the exclusion list 'address' and 'title'. {code:sql} FROM Users U1, Friends F WHERE U1.user_id = F.user_id SELECT DISTINCT U1.* EXCEPT address, title;{code} Returns the following: {code:java} > { "user_id": 1, "name": "Sally" } > { "user_id": 2, "name": "Mark" } > { "user_id": 3, "name": "John" }{code} 3. If we want to exclude some field in a nested object, we can specify the fields within our object with the '.' syntax. Say we don't want to include the zipcode of an address object within a user that has friends. We would use the query below: {code:sql} FROM Users U1, Friends F WHERE U1.user_id = F.user_id SELECT DISTINCT U1.* EXCEPT address.zip_code;{code} Returns the following: {code:java} > { "user_id": 1, "name": "Sally", "address": { "state": "CA" } } > { "user_id": 2, "name": "Mark" } > { "user_id": 3, "name": "John" }{code} ---- The exact grammar I have in mind is given below, where we simply add the 'EXCEPT' at the tail of our 'SELECT' clause. Logically, the 'EXCEPT' clause surrounds our SELECT- EXPRESSION with a call to OBJECT_REMOVE_FIELDS. Using #2 as an example, a logical equivalent would be: {code:sql} FROM ( FROM Users U1, Friends F WHERE U1.user_id = F.user_id SELECT DISTINCT U1.* ) TMP SELECT VALUE OBJECT_REMOVE_FIELDS(TMP, ["address", "title"]);{code} > EXCEPT in SelectClause > ---------------------- > > Key: ASTERIXDB-3059 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-3059 > Project: Apache AsterixDB > Issue Type: New Feature > Components: SQL - Translator SQL++ > Reporter: Glenn Justo Galvizo > Assignee: Glenn Justo Galvizo > Priority: Major > Attachments: SelectClause-1.png > > > It would be nice to have an option to exclude certain fields when using > SELECT *. > [BigQuery|https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_except] > gives users the option to exclude columns with 'SELECT * EXCEPT f1, f2, > ...'. 'EXCEPT' seems like a nice addition to our grammar, here are a few > examples on how we can extend our SELECT clause to include 'EXCEPT': > > Let's start off with some DDLs and some data: > {code:sql} > CREATE TYPE GenericType AS { _id: uuid }; > CREATE DATASET Users (GenericType) PRIMARY KEY _id AUTOGENERATED; > CREATE DATASET Friends (GenericType) PRIMARY KEY _id AUTOGENERATED; > INSERT INTO Users [ > { "user_id": 1, "name": "Sally", > "address": { "zip_code": "92512", "state": "CA" } }, > { "user_id": 2, "name": "Mark" }, > { "user_id": 3, "name": "John" }, > { "user_id": 4, "name": "Tracy", "title": "Dr" } > ]; > INSERT INTO Friends [ > { "user_id": 1, "friend": 2 }, > { "user_id": 1, "friend": 3 }, > { "user_id": 2, "friend": 1 }, > { "user_id": 3, "friend": 1 } > ];{code} > ---- > Now we can get into the examples: > 1. Get me all users U1 that have friends. Do not include the F dataset. In > the example below, we have a 'SELECT *' followed by an 'EXCEPT ( F )'. In > contrast to BigQuery, we do not specify columns / fields in our exclusion > list here, instead we specify an exclusion of _variables_ that would have > otherwise been returned with our 'SELECT' clause. > {code:sql} > FROM Users U1, Friends F > WHERE U1.user_id = F.user_id > SELECT * EXCEPT F;{code} > Returns the following: > {code:java} > > { "U1": { "user_id": 1, "name": "Sally", > "address": { "zip_code": "92512", "state": "CA" } } } > > { "U1": { "user_id": 2, "name": "Mark" } } > > { "U1": { "user_id": 3, "name": "John" } }{code} > 2. Get me all users U1 that have friends. Do not include that user's address > and title. The example below more closely aligns with BigQuery, as we specify > columns / fields that we want to exclude from U1 in the exclusion list > 'address' and 'title'. > {code:sql} > FROM Users U1, Friends F > WHERE U1.user_id = F.user_id > SELECT DISTINCT U1.* EXCEPT address, title;{code} > Returns the following: > {code:java} > > { "user_id": 1, "name": "Sally" } > > { "user_id": 2, "name": "Mark" } > > { "user_id": 3, "name": "John" }{code} > 3. If we want to exclude some field in a nested object, we can specify the > fields within our object with the '.' syntax. Say we don't want to include > the zipcode of an address object within a user that has friends. We would use > the query below: > {code:sql} > FROM Users U1, Friends F > WHERE U1.user_id = F.user_id > SELECT DISTINCT U1.* EXCEPT address.zip_code;{code} > Returns the following: > {code:java} > > { "user_id": 1, "name": "Sally", "address": { "state": "CA" } } > > { "user_id": 2, "name": "Mark" } > > { "user_id": 3, "name": "John" }{code} > 4. Now suppose we want users and the user documents of their immediate > friends, excluding addresses from both user documents. We would use the > following query: > {code:sql} > FROM Users U1, Friends F, Users U2 > WHERE U1.user_id = F.user_id AND > U2.user_id = F.friend > SELECT U1, U2 EXCEPT U1.address, U2.address;{code} > Returns the following: > {code:java} > > { "U1": { "user_id": 1, "name": "Sally" }, > "U2": { "user_id": 2, "name": "Mark" } } > > { "U1": { "user_id": 1, "name": "Sally" }, > "U2": { "user_id": 3, "name": "John" } } > > { "U1": { "user_id": 2, "name": "Mark" }, > "U2": { "user_id": 1, "name": "Sally" } } > > { "U1": { "user_id": 3, "name": "John" }, > "U2": { "user_id": 1, "name": "Sally" } }{code} > ---- > The exact grammar I have in mind is given below, where we simply add the > 'EXCEPT' at the tail of our 'SELECT' clause. Logically, the 'EXCEPT' clause > surrounds our SELECT- EXPRESSION with a call to OBJECT_REMOVE_FIELDS. Using > #2 as an example, a logical equivalent would be: > {code:sql} > FROM ( > FROM Users U1, Friends F > WHERE U1.user_id = F.user_id > SELECT DISTINCT U1.* > ) TMP > SELECT VALUE OBJECT_REMOVE_FIELDS(TMP, ["address", "title"]);{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)