On Thu, Dec 11, 2014 at 6:52 PM, Robert DiFalco <robert.difa...@gmail.com>
wrote:

> Thanks Arthur. I don't think there is as big a different between BIGINT
> and INTEGER as you think there is. In fact with an extended filesystem you
> might not see any difference at all.
>
> As I put in the first emal I am using a GIST index on user.name.
>
> I was really more interested in the LEFT OUTER JOINs vs EXISTS queries and
> if there was a better alternative I had not considered.
>
> On Tue, Dec 9, 2014 at 11:44 AM, Arthur Silva <arthur...@gmail.com> wrote:
>
>> On Tue, Dec 9, 2014 at 4:18 PM, Robert DiFalco <robert.difa...@gmail.com>
>> wrote:
>>
>>> I'm sorry, I missed a JOIN on the second variation. It is:
>>>
>>> SELECT u.id, u.name, u.imageURL, u.bio,
>>>    CASE
>>>       WHEN f.friend_id IS NOT NULL THEN 'isFriend'
>>>       WHEN s.to_id IS NOT NULL THEN 'hasSentRequest'
>>>       WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest'
>>>       ELSE 'none'
>>>    END AS 'friendStatus',
>>>    (SELECT COUNT(1) AS d
>>>       FROM friends f1
>>>          JOIN friends f2 ON f1.fiend_id = f2.friend_id
>>>       WHERE f1.user_id = 33 AND f2.user_id = u.id)
>>> FROM users u
>>> *LEFT OUTER JOIN friends f ON f.user_id = 33 AND f.friend_id = u.id
>>> <http://u.id>*
>>> LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id
>>> LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33
>>> WHERE u.id != 33 AND u.name LIKE '%John%' ORDER BY u.name;
>>>
>>>
>>> On Tue, Dec 9, 2014 at 10:15 AM, Robert DiFalco <
>>> robert.difa...@gmail.com> wrote:
>>>
>>>> I have users, friends, and friend_requests. I need a query that
>>>> essentially returns a summary containing:
>>>>
>>>>     * user (name, imageURL, bio, ...)
>>>>     * Friend status (relative to an active user)
>>>>        * Is the user a friend of the active user?
>>>>        * Has the user sent a friend request to the active user?
>>>>        * Has the user received a friend request from the active user?
>>>>     * # of mutualFriends
>>>>     * Exclude the active user from the result set.
>>>>
>>>> So I have mocked this up two ways but both have complicated query plans
>>>> that will be problematic with large data sets. I'm thinking that my lack of
>>>> deep SQL knowledge is making me miss the obvious choice.
>>>>
>>>> Here's my two query examples:
>>>>
>>>> SELECT u.id, u.name, u.imageURL, u.bio,
>>>>    CASE
>>>>       WHEN EXISTS(SELECT 1 FROM friends f WHERE f.user_id = 33 AND
>>>> f.friend_id = u.id)       THEN 'isFriend'
>>>>       WHEN EXISTS(SELECT 1 FROM friend_requests s WHERE s.to_id = 33
>>>> AND s.from_id = u.id) THEN 'hasSentRequest'
>>>>       WHEN EXISTS(SELECT 1 FROM friend_requests r WHERE r.to_id = u.id
>>>> AND r.from_id = 33)   THEN 'hasReceivedRequest'
>>>>       ELSE 'none'
>>>>    END AS "friendStatus",
>>>>    (SELECT COUNT(1)
>>>>       FROM friends f1
>>>>          JOIN friends f2 ON f1.friend_id = f2.friend_id
>>>>       WHERE f1.user_id = 33 AND f2.user_id = u.id) AS mutualFriends
>>>> FROM users u
>>>> WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name;
>>>>
>>>> SELECT u.id, u.name, u.imageURL, u.bio,
>>>>    CASE
>>>>       WHEN f.friend_id IS NOT NULL THEN 'isFriend'
>>>>       WHEN s.to_id IS NOT NULL THEN 'hasSentRequest'
>>>>       WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest'
>>>>       ELSE 'none'
>>>>    END AS 'friendStatus',
>>>>    (SELECT COUNT(1) AS d
>>>>       FROM friends f1
>>>>          JOIN friends f2 ON f1.fiend_id = f2.friend_id
>>>>       WHERE f1.user_id = 33 AND f2.user_id = u.id)
>>>> FROM users u
>>>> LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id
>>>> LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33
>>>> WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name;
>>>>
>>>> 33 is just the id of the active user I am using for testing. The WHERE
>>>> clause could be anything. I'm just using "u.name" here but I'm more
>>>> concerned about the construction of the result set than the WHERE clause.
>>>> These have more or less similar query plans, nothing that would change
>>>> things factorially. Is this the best I can do or am I missing the obvious?
>>>>
>>>> Here are the tables:
>>>>
>>>>
>>>> CREATE TABLE users (
>>>>   id            BIGINT,
>>>>   name          VARCHAR,
>>>>   imageURL      VARCHAR
>>>>   created       TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
>>>>   phone_natl    BIGINT,       /* National Phone Number */
>>>>   country_e164  SMALLINT,     /* E164 country code */
>>>>   email         VARCHAR(255),
>>>>   PRIMARY KEY (id),
>>>>   UNIQUE (email),
>>>>   UNIQUE (phone_natl, country_e164)
>>>> );
>>>>
>>>>
>>>> CREATE TABLE friends (
>>>>   user_id  BIGINT,
>>>>   friend_id   BIGINT,
>>>>   PRIMARY KEY (user_id, user_id),
>>>>   FOREIGN KEY (user_id)    REFERENCES users(id) ON DELETE CASCADE,
>>>>   FOREIGN KEY (friend_id)  REFERENCES users(id) ON DELETE CASCADE
>>>> );
>>>> CREATE INDEX idx_friends_friend ON friends(friend_id);
>>>>
>>>> CREATE TABLE friend_requests (
>>>>   from_id  BIGINT,
>>>>   to_id    BIGINT,
>>>>   created  TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
>>>>   PRIMARY KEY (from_id, user_id),
>>>>   FOREIGN KEY (from_id)  REFERENCES users(id) ON DELETE CASCADE,
>>>>   FOREIGN KEY (to_id)    REFERENCES users(id) ON DELETE CASCADE
>>>> );
>>>> CREATE INDEX idx_friend_requests_to ON friend_requests(to_id);
>>>>
>>>> Let me know if you guys need anything else.
>>>>
>>>>
>>>
>>
>> Hello Robert, none of your schemas worked for me, here's a clean version
>>
>> CREATE TABLE users (
>>   id            BIGINT,
>>   name          VARCHAR,
>>   imageURL      VARCHAR,
>>   created       TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
>>   phone_natl    BIGINT,
>>   country_e164  SMALLINT,
>>   email         VARCHAR(255),
>>   PRIMARY KEY (id),
>>   UNIQUE (email),
>>   UNIQUE (phone_natl, country_e164)
>> );
>>
>>
>> CREATE TABLE friends (
>>   user_id  BIGINT,
>>   friend_id   BIGINT,
>>   PRIMARY KEY (user_id, friend_id),
>>   FOREIGN KEY (user_id)    REFERENCES users(id) ON DELETE CASCADE,
>>   FOREIGN KEY (friend_id)  REFERENCES users(id) ON DELETE CASCADE
>> );
>> CREATE INDEX idx_friends_friend ON friends(friend_id);
>>
>> CREATE TABLE friend_requests (
>>   from_id  BIGINT,
>>   to_id    BIGINT,
>>   created  TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
>>   PRIMARY KEY (from_id, to_id),
>>   FOREIGN KEY (from_id)  REFERENCES users(id) ON DELETE CASCADE,
>>   FOREIGN KEY (to_id)    REFERENCES users(id) ON DELETE CASCADE
>> );
>> CREATE INDEX idx_friend_requests_to ON friend_requests(to_id);
>>
>> I may look into the query later but here's some thoughts
>>
>> * you want an index to speed up name searches on user.name, I suggest
>> pg_trgm + ILIKE as a starting point.
>> * you really expect more than a billion users? If not (or not in the near
>> future) use INT instead. This will save you a significant amount of memory
>> and eventually buy you time later.
>> * as long as you don't hit the disk for the queries you'll be fine, so
>> make sure you have enough memory or use read-slaves with smaller working
>> sets.
>>
>>
>
Really? I double checked and there isn't any mention about GIST in your
first email.
As for the Int/BigInt, It's not a big deal (from 5% to 20% memory savings
on indexes), but it was worth mentioning.

You may try to materialize the author friend list (used in the
common-friend count) with a CTE.

Reply via email to