Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later
On Fri, 23 Feb 2018 17:10:56 +1300 David Rowleywrote: > On 23 February 2018 at 04:00, Bill Moran wrote: > > 2) The negative impact of an unused index is tiny compared to the negative > > impact of a missing index. > > I'd say that would entirely depend on the workload of the table and > the entire cluster. Disk space and I/O to write WAL and index pages to > is surely a finite resource. Not to mention the possibility of > disallowing HOT-Updates in the heap. I feel like you missed my point. You're absolutely right, but the disagreement was not on whether or not an unused index could cause problems, but on the likelihood that the OP was going to build the simulation code to actually determine whether the index is needed or not. Information from the original question led me to believe that simulation was either beyond his skill level or beyond his time allocation; so I provided a less good, but more likely to be helpful (in my opinion) answer. The pattern that almost ALL new ventures I've seen follow is: 1) Hack something together based on an idea for a product 2) If the product actually succeeds, experience tons of issues related to scaling and performance 3) Run around like a crazy person fixing all the scaling and performance issues 4) Sell the company to someone else who ultimately becomes responsible for maturing the software In theory, there's no reason this pattern _has_ to be so prevalent, yet it is. Probably becuase it appears to minimize the up front cost, which the people footing the bill just love. Given that process, "shotgun" indexing is part of step 1 or step 3. Whereas the building of load simulations and _real_ tuning of the system is relegated to step 4. Since investers tend to want to get out quick if #2 isn't going to happen, they don't want people doing work that they consider part of step #4. I'm a pragmatist. I'd love to see everyone build software in a sane, well-considered manner. I'd also love to see government without corruption. However, in the world I _actually_ see, those things aren't prevalent. > It would seem to me that anyone using the "shotgun" indexing method > may end up having to learn more about indexing the hard way. Learning > the hard way is something I like to try to avoid, personally. Probably > it all has to come down to how important it is that your app actually > can handle the load vs devs/dba experience level divided by time, both > of the latter two are also a finite resource. So, it probably all has > to be carefully balanced and quite possibly a person's opinion > strongly relates to their experience. If you were in the air traffic > control business, perhaps your opinion might not be the same!? ... > Sorry, the aeroplane crashed because the replica instance lagged and > the plane's location wasn't updated... Analysis shows that the DBA > indexed every column in the table and the WAL volume was more than the > network's bandwidth could handle over the holiday period. (Note: I > know nothing about air traffic control, but it does seem like > something you'd want to make stable systems for, games on the > internet, probably less so...). I really hope that people writing ATC software have the experience to do really good testing (including load simulation, etc) but the 3 mile island accident happened partially because of faulty sensor design, so there's no guarantee that's the case. Interesting discussion. -- Bill Moran
Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later
On 23 February 2018 at 04:00, Bill Moranwrote: > 2) The negative impact of an unused index is tiny compared to the negative > impact of a missing index. I'd say that would entirely depend on the workload of the table and the entire cluster. Disk space and I/O to write WAL and index pages to is surely a finite resource. Not to mention the possibility of disallowing HOT-Updates in the heap. It would seem to me that anyone using the "shotgun" indexing method may end up having to learn more about indexing the hard way. Learning the hard way is something I like to try to avoid, personally. Probably it all has to come down to how important it is that your app actually can handle the load vs devs/dba experience level divided by time, both of the latter two are also a finite resource. So, it probably all has to be carefully balanced and quite possibly a person's opinion strongly relates to their experience. If you were in the air traffic control business, perhaps your opinion might not be the same!? ... Sorry, the aeroplane crashed because the replica instance lagged and the plane's location wasn't updated... Analysis shows that the DBA indexed every column in the table and the WAL volume was more than the network's bandwidth could handle over the holiday period. (Note: I know nothing about air traffic control, but it does seem like something you'd want to make stable systems for, games on the internet, probably less so...). -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later
On Thu, 22 Feb 2018 03:57:34 +0100 Tomas Vondrawrote: > > On 02/21/2018 05:00 PM, Bill Moran wrote: > > On Wed, 21 Feb 2018 13:33:18 +0100 > > Alexander Farber wrote: > > > >> Hi Martin - > >> > >> On Wed, Feb 21, 2018 at 1:26 PM, Martin Moore > >> wrote: > >> > >>> I’m no expert but I’d think it unlikely an index would be considered for a > >>> table with only 100 rows in. Also I’m pretty sure only one index per table > >>> is used, so you’d want to put state1 and state2 in one index. > >> > >> I hope to have more records in the words_games table later when my game is > >> launched (currently in soft launch/beta). > > > > To elaborate+clarify Martin's comments. > > > > The index will not be used while the table is small because it's actually > > slower > > to process an index than it is to just read the entire table. However, as > > the > > table gets more rows, these timings will reverse and Postgres will start > > using > > the indexes. It's probably best to just create them even though the table is > > small. The performance improvement you'll get when the table grows will be > > well worth it, and it avoids the problem of trying to remember to create it > > later. > > > > However, Martin's other comment about only using a single index is > > incorrect. > > Postgres can use multiple indexes per query, so it's often good practace to > > put indexes on every column that might ever be used in a WHERE clause. > > I call this practice "shotgun" and generally discourage people from > using it. It seems attractive, but not every where condition can be > evaluated using an index, and every index has maintenance overhead. > > There are cases when it's somewhat reasonable (e.g. when you don't know > which columns will be referenced by WHERE conditions, and data ingestion > has lower priority than queries). But that doesn't seem to be the case > here - you know the WHERE conditions, and people are likely sending a > lot of inserts (and expecting low latency responses). Can't _really_ disagree with anything you're saying there ... but I disagree with the overall sentament for the following reasons: 1) Not everyone has easy access to experienced people like you and I. As a result, I often recommend the "least likely to be wrong" course of action instead of recommending investigation that requires expertise that the original poster might not possess ... after all, if they had the experience to do all the research, it's unlikely that they'd be asking this question in the first place. 2) The negative impact of an unused index is tiny compared to the negative impact of a missing index. 3) Dropping an unused index is _far_ less headache than creating a missing index on a busy database. 4) Without knowing _all_ the details of the app and how it works, my past experience is that problems are about a jillion times more likely to be the result of underindexing (although I _have_ seen issues due to overindexing, so it _does_ happen) I can't argue that the _best_ course of action is to set up a simulation that can exercise the system at predicted size and load levels and use that to tune things. But communicating all that to others has never been easy in my experience. As recently as this month my team was blown away that I could create a simulation that demonstrated how my code would behave under real-world like conditions. It's apparently not something that a lot of people understand or are good at or something. -- Bill Moran
Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later
Thank your for the comments
Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later
On 02/21/2018 05:00 PM, Bill Moran wrote: > On Wed, 21 Feb 2018 13:33:18 +0100 > Alexander Farberwrote: > >> Hi Martin - >> >> On Wed, Feb 21, 2018 at 1:26 PM, Martin Moore >> wrote: >> >>> I’m no expert but I’d think it unlikely an index would be considered for a >>> table with only 100 rows in. Also I’m pretty sure only one index per table >>> is used, so you’d want to put state1 and state2 in one index. >> >> I hope to have more records in the words_games table later when my game is >> launched (currently in soft launch/beta). > > To elaborate+clarify Martin's comments. > > The index will not be used while the table is small because it's actually > slower > to process an index than it is to just read the entire table. However, as the > table gets more rows, these timings will reverse and Postgres will start using > the indexes. It's probably best to just create them even though the table is > small. The performance improvement you'll get when the table grows will be > well worth it, and it avoids the problem of trying to remember to create it > later. > > However, Martin's other comment about only using a single index is incorrect. > Postgres can use multiple indexes per query, so it's often good practace to > put indexes on every column that might ever be used in a WHERE clause. > I call this practice "shotgun" and generally discourage people from using it. It seems attractive, but not every where condition can be evaluated using an index, and every index has maintenance overhead. There are cases when it's somewhat reasonable (e.g. when you don't know which columns will be referenced by WHERE conditions, and data ingestion has lower priority than queries). But that doesn't seem to be the case here - you know the WHERE conditions, and people are likely sending a lot of inserts (and expecting low latency responses). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later
My point was that your explain would be the same with indexes as without as they won’t be used. Martin. From: Alexander FarberDate: Wednesday, 21 February 2018 at 12:33 Cc: pgsql-general Subject: Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later Hi Martin - On Wed, Feb 21, 2018 at 1:26 PM, Martin Moore wrote: I’m no expert but I’d think it unlikely an index would be considered for a table with only 100 rows in. Also I’m pretty sure only one index per table is used, so you’d want to put state1 and state2 in one index. I hope to have more records in the words_games table later when my game is launched (currently in soft launch/beta). Regards Alex
Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later
Here is the current DESC of the table (I already use few joins): words=> \d words_games Table "public.words_games" Column | Type | Collation | Nullable | Default --+--+---+--+-- gid | integer | | not null | nextval('words_games_gid_seq'::regclass) created | timestamp with time zone | | not null | finished | timestamp with time zone | | | player1 | integer | | not null | player2 | integer | | | played1 | timestamp with time zone | | | played2 | timestamp with time zone | | | state1 | text | | | state2 | text | | | score1 | integer | | not null | score2 | integer | | not null | hand1| character(1)[] | | not null | hand2| character(1)[] | | not null | pile | character(1)[] | | not null | letters | character(1)[] | | not null | values | integer[]| | not null | bid | integer | | not null | reason | text | | | Indexes: "words_games_pkey" PRIMARY KEY, btree (gid) "words_games_reason_index" btree (reason) "words_games_state1_index" btree (state1) "words_games_state2_index" btree (state2) Check constraints: "words_games_check" CHECK (player1 <> player2) "words_games_score1_check" CHECK (score1 >= 0) "words_games_score2_check" CHECK (score2 >= 0) Foreign-key constraints: "words_games_bid_fkey" FOREIGN KEY (bid) REFERENCES words_boards(bid) ON DELETE CASCADE "words_games_player1_fkey" FOREIGN KEY (player1) REFERENCES words_users(uid) ON DELETE CASCADE "words_games_player2_fkey" FOREIGN KEY (player2) REFERENCES words_users(uid) ON DELETE CASCADE Referenced by: TABLE "words_chat" CONSTRAINT "words_chat_gid_fkey" FOREIGN KEY (gid) REFERENCES words_games(gid) ON DELETE CASCADE TABLE "words_moves" CONSTRAINT "words_moves_gid_fkey" FOREIGN KEY (gid) REFERENCES words_games(gid) ON DELETE CASCADE TABLE "words_scores" CONSTRAINT "words_scores_gid_fkey" FOREIGN KEY (gid) REFERENCES words_games(gid) ON DELETE CASCADE
Re: Not sure if I should CREATE INDEX for text columns on which I plan to filter later
Hi Martin - On Wed, Feb 21, 2018 at 1:26 PM, Martin Moorewrote: > I’m no expert but I’d think it unlikely an index would be considered for a > table with only 100 rows in. Also I’m pretty sure only one index per table > is used, so you’d want to put state1 and state2 in one index. > I hope to have more records in the words_games table later when my game is launched (currently in soft launch/beta). Regards Alex