Re: [GENERAL] Help Optimizing a Summary Query

Arthur Silva Tue, 09 Dec 2014 11:46:19 -0800

On Tue, Dec 9, 2014 at 4:18 PM, Robert DiFalco <[email protected]>
wrote:


> I'm sorry, I missed a JOIN on the second variation. It is:
>
> SELECT u.id, u.name, u.imageURL, u.bio,
>    CASE
>       WHEN f.friend_id IS NOT NULL THEN 'isFriend'
>       WHEN s.to_id IS NOT NULL THEN 'hasSentRequest'
>       WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest'
>       ELSE 'none'
>    END AS 'friendStatus',
>    (SELECT COUNT(1) AS d
>       FROM friends f1
>          JOIN friends f2 ON f1.fiend_id = f2.friend_id
>       WHERE f1.user_id = 33 AND f2.user_id = u.id)
> FROM users u
> *LEFT OUTER JOIN friends f ON f.user_id = 33 AND f.friend_id = u.id
> <http://u.id>*
> LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id
> LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33
> WHERE u.id != 33 AND u.name LIKE '%John%' ORDER BY u.name;
>
>
> On Tue, Dec 9, 2014 at 10:15 AM, Robert DiFalco <[email protected]>
> wrote:
>
>> I have users, friends, and friend_requests. I need a query that
>> essentially returns a summary containing:
>>
>>     * user (name, imageURL, bio, ...)
>>     * Friend status (relative to an active user)
>>        * Is the user a friend of the active user?
>>        * Has the user sent a friend request to the active user?
>>        * Has the user received a friend request from the active user?
>>     * # of mutualFriends
>>     * Exclude the active user from the result set.
>>
>> So I have mocked this up two ways but both have complicated query plans
>> that will be problematic with large data sets. I'm thinking that my lack of
>> deep SQL knowledge is making me miss the obvious choice.
>>
>> Here's my two query examples:
>>
>> SELECT u.id, u.name, u.imageURL, u.bio,
>>    CASE
>>       WHEN EXISTS(SELECT 1 FROM friends f WHERE f.user_id = 33 AND
>> f.friend_id = u.id)       THEN 'isFriend'
>>       WHEN EXISTS(SELECT 1 FROM friend_requests s WHERE s.to_id = 33
>> AND s.from_id = u.id) THEN 'hasSentRequest'
>>       WHEN EXISTS(SELECT 1 FROM friend_requests r WHERE r.to_id = u.id
>> AND r.from_id = 33)   THEN 'hasReceivedRequest'
>>       ELSE 'none'
>>    END AS "friendStatus",
>>    (SELECT COUNT(1)
>>       FROM friends f1
>>          JOIN friends f2 ON f1.friend_id = f2.friend_id
>>       WHERE f1.user_id = 33 AND f2.user_id = u.id) AS mutualFriends
>> FROM users u
>> WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name;
>>
>> SELECT u.id, u.name, u.imageURL, u.bio,
>>    CASE
>>       WHEN f.friend_id IS NOT NULL THEN 'isFriend'
>>       WHEN s.to_id IS NOT NULL THEN 'hasSentRequest'
>>       WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest'
>>       ELSE 'none'
>>    END AS 'friendStatus',
>>    (SELECT COUNT(1) AS d
>>       FROM friends f1
>>          JOIN friends f2 ON f1.fiend_id = f2.friend_id
>>       WHERE f1.user_id = 33 AND f2.user_id = u.id)
>> FROM users u
>> LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id
>> LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33
>> WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name;
>>
>> 33 is just the id of the active user I am using for testing. The WHERE
>> clause could be anything. I'm just using "u.name" here but I'm more
>> concerned about the construction of the result set than the WHERE clause.
>> These have more or less similar query plans, nothing that would change
>> things factorially. Is this the best I can do or am I missing the obvious?
>>
>> Here are the tables:
>>
>>
>> CREATE TABLE users (
>>   id            BIGINT,
>>   name          VARCHAR,
>>   imageURL      VARCHAR
>>   created       TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
>>   phone_natl    BIGINT,       /* National Phone Number */
>>   country_e164  SMALLINT,     /* E164 country code */
>>   email         VARCHAR(255),
>>   PRIMARY KEY (id),
>>   UNIQUE (email),
>>   UNIQUE (phone_natl, country_e164)
>> );
>>
>>
>> CREATE TABLE friends (
>>   user_id  BIGINT,
>>   friend_id   BIGINT,
>>   PRIMARY KEY (user_id, user_id),
>>   FOREIGN KEY (user_id)    REFERENCES users(id) ON DELETE CASCADE,
>>   FOREIGN KEY (friend_id)  REFERENCES users(id) ON DELETE CASCADE
>> );
>> CREATE INDEX idx_friends_friend ON friends(friend_id);
>>
>> CREATE TABLE friend_requests (
>>   from_id  BIGINT,
>>   to_id    BIGINT,
>>   created  TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
>>   PRIMARY KEY (from_id, user_id),
>>   FOREIGN KEY (from_id)  REFERENCES users(id) ON DELETE CASCADE,
>>   FOREIGN KEY (to_id)    REFERENCES users(id) ON DELETE CASCADE
>> );
>> CREATE INDEX idx_friend_requests_to ON friend_requests(to_id);
>>
>> Let me know if you guys need anything else.
>>
>>
>

Hello Robert, none of your schemas worked for me, here's a clean version

CREATE TABLE users (
  id            BIGINT,
  name          VARCHAR,
  imageURL      VARCHAR,
  created       TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  phone_natl    BIGINT,
  country_e164  SMALLINT,
  email         VARCHAR(255),
  PRIMARY KEY (id),
  UNIQUE (email),
  UNIQUE (phone_natl, country_e164)
);


CREATE TABLE friends (
  user_id  BIGINT,
  friend_id   BIGINT,
  PRIMARY KEY (user_id, friend_id),
  FOREIGN KEY (user_id)    REFERENCES users(id) ON DELETE CASCADE,
  FOREIGN KEY (friend_id)  REFERENCES users(id) ON DELETE CASCADE
);
CREATE INDEX idx_friends_friend ON friends(friend_id);

CREATE TABLE friend_requests (
  from_id  BIGINT,
  to_id    BIGINT,
  created  TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (from_id, to_id),
  FOREIGN KEY (from_id)  REFERENCES users(id) ON DELETE CASCADE,
  FOREIGN KEY (to_id)    REFERENCES users(id) ON DELETE CASCADE
);
CREATE INDEX idx_friend_requests_to ON friend_requests(to_id);

I may look into the query later but here's some thoughts

* you want an index to speed up name searches on user.name, I suggest
pg_trgm + ILIKE as a starting point.
* you really expect more than a billion users? If not (or not in the near
future) use INT instead. This will save you a significant amount of memory
and eventually buy you time later.
* as long as you don't hit the disk for the queries you'll be fine, so make
sure you have enough memory or use read-slaves with smaller working sets.

Re: [GENERAL] Help Optimizing a Summary Query

Reply via email to