Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Nicolas Barbier
2013/11/12 Claudio Freire klaussfre...@gmail.com:

 On Tue, Nov 12, 2013 at 6:41 PM, Nicolas Barbier
 nicolas.barb...@gmail.com wrote:

 (Note that K B-trees can be merged by simply scanning all of them
 concurrently, and merging them just like a merge sort merges runs.
 Also, all B-trees except for the first level (of size S) can be
 compacted 100% as there is no need to reserve space for further
 insertions in them.)

 Unless you can guarantee strong correlation of index-order vs
 physical-order, scanning multiple indexes in index-order will be quite
 slow (random I/O).

As all the bigger trees are written in one pass (as the result of a
merge of multiple smaller trees), I would assume that it is rather
easy to guarantee it for them.

As for the smallest trees (size S), I think it doesn’t matter much as
they “fit easily in memory.” Initially I would say that redefining it
so that K of them (rather than 1) must still fit in memory is the easy
fix.

A future optimization could alleviate the need for the redefinition
(and would also improve normal B-tree indexes): Somehow make sure that
smaller trees (that fit in memory) are typically written out
more-or-less in the right order. For that, one could for example
postpone determining the ultimate block-order to write-out time. This
is similar to what Reiser4 calls “dancing trees,” but unfortunately
requires some rejiggering of the abstraction layers on PostgreSQL (I
think). Having deferred insertion (which is probably way easier to
implement) could conceivably also improve things.

URL:https://en.wikipedia.org/wiki/Dancing_tree

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Simon Riggs
On 12 November 2013 19:54, Claudio Freire klaussfre...@gmail.com wrote:

 On Tue, Nov 12, 2013 at 7:14 PM, Simon Riggs si...@2ndquadrant.com wrote:
 I still think we need to look at this from a query perspective though.
 We need to check whether there is a class of real world queries that
 are not well optimised by minmax indexes, or cannot be made to be in
 future releases. For example, large DELETEs from a table are almost
 trivially optimised for min max.

 Only if you don't have a PK (or other index).

Right. Min max indexes are optimised for large DELETEs, btrees are not
(yet), which is what we are discussing.

I believe it remains to be shown that a btree is actually desirable on
a very big table. So far the discussion has just assumed this is the
case, without looking at specific SQL. It might be better to look at
ways of avoiding a btree altogether than to spend time optimising
btrees for this case.

Perhaps we can enforce a PK constraint without using a btree, if one
is required. This might be guaranteed by using a sequence or other
mechanism.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Nicolas Barbier
2013/11/12 Simon Riggs si...@2ndquadrant.com:

 On 12 November 2013 21:41, Nicolas Barbier nicolas.barb...@gmail.com wrote:

 Look-up speed is as follows: Each look-up must look through all
 B-trees.

 That can be optimised by using a min max approach, so we need only
 look at sub-trees that may contain data.

Under the assumption that the data are really *really* inserted in
random order, I think that min max indexes would not help much: All
subtrees would contain so many different values that the boundaries
will almost never be narrow enough to exclude scanning it (unless one
is searching for outliers).

I think that min max indexes are very useful for data that is inserted
in a some less-than-random way: When rather large swathes of inserted
data fall in between boundaries that a lot of other data doesn’t fall
in between. For min max index scans to be fast, the data doesn’t
exactly have to be ordered in the heap (that’s one of the advantages
vs. B-trees, where a different order in the index and the heap is very
bad for scans of any significant part of the index), but it can also
not be entirely arbitrary.

I would say that min max indexes should be used in either of the
following cases (and probably some more that I don’t think of right
now):

* When the data is inserted close to ordered (i.e., past or future
dates that have a tendency to be correlated with “the current day,”
such as invoice dates). For this usecase, a normal B-tree would also
work, but would take more space and require more heavy maintenance.
* When large batches of “similar” data (fitting in between boundaries
that are more narrow than the “global boundaries”) are inserted at a
time, even when multiple of such batches arrive in random order.
* When insertion is up to entirely random, but the goal is to optimize
look-ups for (relatively rare) outliers.
* (combined with any the above to actually make the index *useful*)
When needing a lot a indexes on the same data or having a very high
rate of insertion, and therefore the maintenance burden matters a lot.

 I would add that it is possible to optimise large DELETEs from a table
 if complete sub-trees of the btree can be easily removed. This for me
 would be the compelling benefit of this approach.

Idem WRT the randomness: If the data are really deleted in a
completely random fashion (e.g., Leonardo might want to delete all
phone calls in a certain year, while the index is on phone number),
whole subtrees would almost never become candidates for deletion (the
same problem applies to a normal min max index). Of course, everything
would change if the data is not really deleted randomly.

The same usecases as mentioned above (replace “insertion” with
“deletion”) seem to apply for deletion in min max indexes.

Note that B-forests as described before don’t work well for a high (or
even not-so-high) rate of deletions: During VACUUM the updates to the
bigger trees would kill all performance similar to how a high rate of
insertion kills the performance of normal B-trees once they get big.
To remedy this, one could adds stuff such as “B-trees of deleted
entries” (i.e., negative trees) that may then afterwards be merged
with other such “negative” trees + “positive” trees. Look-ups would
need to take all those trees into account.

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Leonardo Francalanci
Simon Riggs wrote
 On 5 November 2013 14:28, Leonardo Francalanci lt;

 m_lists@

 gt; wrote:
 
 Either my sql is not correct (likely), or my understanding of the minmax
 index is
 not correct (even more likely), or the minmax index is not usable in a
 random inputs
 scenario.
 
 Please show the real world SQL you intend to run, so we can comment on
 it. Inventing a use case that breaks effectiveness of any optimisation
 is always easy, but the question is whether the use case is likely or
 even desirable.

The use case is pretty simple.
Think it as the NSA, as it would be much easier.
Collect all calls made/received by any user on a mobile network.
(in fact, it's something more than calls, so in fact is 2-10 rows per call).
Keep the data for 20 days.
That's the insert part.

Query:
search calls made/received by the user using IMSI (caller id) or IMEI
(phone id). Date range is usually days (past 4 days, from 10 days ago
to 5 days ago...)

The result is just a very small percentage of the rows present in the
table: a single user doesn't call that much!
Searches are made by a human, so no that many request per second.

It's not a write mostly scenario, it's a 99% write 1% read scenario.

Problem? having 4 btree indexes on random values (imsi+imei * 2,
since we have calling and caller) kills the performance in insertion
after a while.
Solution so far? partition every 15 minutes, create the indexes in bulk.


Simon Riggs wrote
 If we have a query to show the most recent calls by a particular caller
 
 SELECT *
 FROM cdr
 WHERE callerid = X
 ORDER BY call_timestamp DESC
 LIMIT 100
 
 then this could potentially be optimised using a minmax index, by
 traversing the data ranges in call_timestamp order. That is not part
 of the code in this initial release, since the main use case is for
 WHERE call_timestamp = X, or WHERE primarykey = Y

I don't understand how a index on call_timestamp would help
in the query above.


Simon Riggs wrote
 I don't believe there is a credible business case for running that
 same query but without the ORDER BY and LIMIT, since it could
 potentially return gazillions of rows


Gazillion of rows??? We're talking about calls made/received by
one user here. How many calls do you make in 10 days???


Simon Riggs wrote
  so it isn't surprising at all
 that it would access a large % of the table. 

In fact, the query I use return a fraction of the table, and only
a very small amount of users get searched.

Simon, you keep on talking about these minmax indexes, and
I still don't see any reference to some performance tests.

And, again, I think that random values insertion is the worst
use case for minmax indexes.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5778092.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Simon Riggs
On 13 November 2013 06:07, Leonardo Francalanci m_li...@yahoo.it wrote:

 The use case is pretty simple.
 Think it as the NSA, as it would be much easier.
 Collect all calls made/received by any user on a mobile network.
 (in fact, it's something more than calls, so in fact is 2-10 rows per call).
 Keep the data for 20 days.
 That's the insert part.

 Query:
 search calls made/received by the user using IMSI (caller id) or IMEI
 (phone id). Date range is usually days (past 4 days, from 10 days ago
 to 5 days ago...)

 The result is just a very small percentage of the rows present in the
 table: a single user doesn't call that much!
 Searches are made by a human, so no that many request per second.

 It's not a write mostly scenario, it's a 99% write 1% read scenario.

 Problem? having 4 btree indexes on random values (imsi+imei * 2,
 since we have calling and caller) kills the performance in insertion
 after a while.
 Solution so far? partition every 15 minutes, create the indexes in bulk.

So in the use case you describe, the min max index would require a
scan of only 25% of the table, not the 80% described earlier for
random inserts. In my experience, people wish to keep data for much
longer periods and so the percentage of scan required would drop lower
than 25%, possibly to 5% or less for many applications.

The plan would use sequential I/O so could still work quickly; given
the low read rate, longer query times could be acceptable.

Minmax indexes are simply a way to make this use case happen
automatically, without the need for manual partitioning of the table.
They are not the answer to every prayer, but with respect they are
better than you had claimed they would be. (25% not 80%, in your use
case). I saw this was likely to be the case and this is why I
challenged you to describe in more detail. Thank you.

 Simon Riggs wrote
 If we have a query to show the most recent calls by a particular caller

 SELECT *
 FROM cdr
 WHERE callerid = X
 ORDER BY call_timestamp DESC
 LIMIT 100

 then this could potentially be optimised using a minmax index, by
 traversing the data ranges in call_timestamp order. That is not part
 of the code in this initial release, since the main use case is for
 WHERE call_timestamp = X, or WHERE primarykey = Y

 I don't understand how a index on call_timestamp would help
 in the query above.

The min max index would cover call_timestamp and would be used to
optimise the query. That is not in the current version, it is a later
optimisation that I think is possible after considering your use case
and similar ones. This is a similar optimisation to the Merge Append
case for partitioned tables.

 Simon Riggs wrote
 I don't believe there is a credible business case for running that
 same query but without the ORDER BY and LIMIT, since it could
 potentially return gazillions of rows


 Gazillion of rows??? We're talking about calls made/received by
 one user here. How many calls do you make in 10 days???


 Simon Riggs wrote
  so it isn't surprising at all
 that it would access a large % of the table.

 In fact, the query I use return a fraction of the table, and only
 a very small amount of users get searched.

 Simon, you keep on talking about these minmax indexes, and
 I still don't see any reference to some performance tests.

Performance tests are only possible with a clear use case. It would be
helpful if you could benchmark your own use case and I request others
do the same. I have and will challenge people that simply assert this
new type of index is not useful based on a generic argument, with no
use case and no tests. That is nothing personal, it is simply that I
do not wish misunderstandings to block the adoption of new features.

Please see that Alvaro and I have gone out of our way to provide a new
facility to help you and others, and that it requires changing how we
think about the solution. I accept it may not provide for every case
but it requires careful analysis before deciding that is so.

 And, again, I think that random values insertion is the worst
 use case for minmax indexes.

I agree, it is. What we have disagreed on is the extent to which that
scenario exists for use cases on very large tables, which are
typically append-mostly with most queries accessing a subset of the
table, e.g. date range.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Jeremy Harris

On 13/11/13 09:07, Leonardo Francalanci wrote:

Problem? having 4 btree indexes on random values (imsi+imei * 2,
since we have calling and caller) kills the performance in insertion
after a while.


Surely there's good correlation between IMSI  IMEI, so have a separate
table to translate one to (a group of) the others, and
halve the indexes on your main table?
--
Jeremy



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Leonardo Francalanci
Simon Riggs wrote
 So in the use case you describe, the min max index would require a
 scan of only 25% of the table, not the 80% described earlier for
 random inserts. In my experience, people wish to keep data for much
 longer periods and so the percentage of scan required would drop lower
 than 25%, possibly to 5% or less for many applications.
 
 The plan would use sequential I/O so could still work quickly; given
 the low read rate, longer query times could be acceptable.


Quickly??? Seq scan 25% of a table of TB is not quick.



Simon Riggs wrote
 Minmax indexes are simply a way to make this use case happen
 automatically, without the need for manual partitioning of the table.

You logic assumes that we don't index anything but the call_timestamp.
That would lead to huge query response times.
Plus: btree doesn't have a big problem to keep up in sequential insertion
scenario (such as the call_timestamp). So I still don't see the gain
in using the minmax indexes: again, could you point me to some
performance tests of *any* use case?


Simon Riggs wrote
 They are not the answer to every prayer, but with respect they are
 better than you had claimed they would be. (25% not 80%, in your use
 case). I saw this was likely to be the case and this is why I
 challenged you to describe in more detail. Thank you.

I claimed they would scan the 80% of the table because I assumed I
had to use them in the random fields; not in the call_timestamp field.
I don't need a better index in the call_timestamp, because it's sequential,
I don't have problems with that. But it's useless: I don't want to seq scan
25% of a multi-TB table.



Simon Riggs wrote
 Performance tests are only possible with a clear use case. 

Well, so I can add my weird_index patch in postgresql source code,
and it would be committed right away??? I assumed you had
to prove somehow that the new index was better than what
is already available, at least for some cases.

Or, in other words: what are you going to write in the minmax index
documentation, try and see if they work better for you?


Simon Riggs wrote
 Please see that Alvaro and I have gone out of our way to provide a new
 facility to help you and others, and that it requires changing how we
 think about the solution. I accept it may not provide for every case
 but it requires careful analysis before deciding that is so.

If I came out too rough, I ask your pardon. I always appreciate people
taking their time to help someone else for free. 

Plus, I'm *very* interested in the minmax index, especially for
call_timestamp (some queries are date-range only, such as give me
all the calls in this particular 2 secs range) or the id column I have.
But, at the same time, I don't see any evidence that they work
better than btrees (except for the size of the index). 
I would like to see some numbers.

I worked a little in the bitmap index implementation, and I stopped because
on the large tables these indexes are supposed to be used, the heap
lookup took so much time that the (slightly) faster index access didn't
really help, because it was a fraction of the query time... I'm afraid
it would be the same with minmax indexes, that's why I wanted to 
see some numbers...


Simon Riggs wrote
 And, again, I think that random values insertion is the worst
 use case for minmax indexes.
 
 I agree, it is. What we have disagreed on is the extent to which that
 scenario exists for use cases on very large tables, which are
 typically append-mostly with most queries accessing a subset of the
 table, e.g. date range.

Mhh... maybe this is this point we don't understand each other?

I query the table by userid + date range.
The date range is *not* selective enough (it's, as you said, 25%
of the multi-TB table). The userid is selective *a lot*. 

I'm looking for a better index for the userid column(s).

The new indexes I mentioned in the OP claim they are better
in this scenario (but I don't blindly believe them)

I don't see how indexing the call_timestamp only could be
of any interest, since it would require seq-scanning 25%
of a huge table for every query.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5778124.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Leonardo Francalanci
Jeremy Harris wrote
 Surely there's good correlation between IMSI  IMEI, so have a separate
 table to translate one to (a group of) the others, and
 halve the indexes on your main table?

Yes; unfortunately not always both are available; but it's something
we are thinking about (it requires logic in the inserting application
that at the moment doesn't exist, but it is something that we'll
have to add sooner or later).
But in the end yes, trying to use less indexed-fields is a good path.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5778125.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Simon Riggs
On 13 November 2013 09:31, Leonardo Francalanci m_li...@yahoo.it wrote:

 Or, in other words: what are you going to write in the minmax index
 documentation, try and see if they work better for you?

That seems like good advice to me. Bacon  Aristotle.

There is very little written about suitability of any index type for
Postgres. I'm sure the docs could be improved there.


 Plus, I'm *very* interested in the minmax index

Good, thanks.

 But, at the same time, I don't see any evidence that they work
 better than btrees (except for the size of the index).

Min max indexes are not designed to be a better btree, they are
designed to be roughly same as automatic partitioning. They offer
considerably improved time to build, significantly reduced index size
and significantly improved insert performance. Min max will be clearly
slower than btrees for small numbers of records, though for large
numbers of records we may expect min max to perform same as btrees,
though that requires better testing to get a more accurate picture.
There may yet be optimisations of the patch also.

Based what we have discussed here, we've come up with two new
optimisations that can be used with MinMax, namely the bulk DELETE
case and the Merge Append case. Thank you for highlighting additional
cases and requirements.

From our discussions here, IMHO there is a strong case for avoiding
btrees completely for larger historical data tables. That isn't
something I had even considered as desirable before this conversation
but ISTM now that taking that approach will be more fruitful than
attempting to implement LSM trees.

Having said that, I am also in favour of declarative partitioning,
just that there is no funding available to work on that at present.

Further work on bitmap indexes is expected. They are already designed
with good insert performance in mind and this discussion re-emphasises
that requirement.

 I would like to see some numbers.

Alvaro has given me some results for his patch. The figures I have are
for a 2GB table.

Index Build Time
MinMax 11 s
Btree   96s

Index Size
MinMax 2 pages + metapage
Btree approx 200,000 pages + metapage

Load time
MinMax no overhead, same as raw COPY
BTree - considerably slower

Index SELECT
Slower for small groups of rows
Broadly same for large requests (more results required on that assessment)

I expect to publish results against TPC-H cases in next few weeks.

Additional tests are welcome for other use cases.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Leonardo Francalanci
Simon Riggs wrote
 From our discussions here, IMHO there is a strong case for avoiding
 btrees completely for larger historical data tables. That isn't
 something I had even considered as desirable before this conversation
 but ISTM now that taking that approach will be more fruitful than
 attempting to implement LSM trees.

Eh? I don't understand this point. How can I avoid btrees, and
searching by caller_id? I don't get it...


Simon Riggs wrote
 Alvaro has given me some results for his patch. The figures I have are
 for a 2GB table.
 
 Index Build Time
 MinMax 11 s
 Btree   96s
 
 Index Size
 MinMax 2 pages + metapage
 Btree approx 200,000 pages + metapage
 
 Load time
 MinMax no overhead, same as raw COPY
 BTree - considerably slower

Great!!! This looks very promising. Were the values indexed
sequential?




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5778150.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Merlin Moncure
On Wed, Nov 13, 2013 at 7:33 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 13 November 2013 09:31, Leonardo Francalanci m_li...@yahoo.it wrote:
 I would like to see some numbers.

 Alvaro has given me some results for his patch. The figures I have are
 for a 2GB table.

 Index Build Time
 MinMax 11 s
 Btree   96s

 Index Size
 MinMax 2 pages + metapage
 Btree approx 200,000 pages + metapage

 Load time
 MinMax no overhead, same as raw COPY
 BTree - considerably slower

 Index SELECT
 Slower for small groups of rows
 Broadly same for large requests (more results required on that assessment)

 I expect to publish results against TPC-H cases in next few weeks.

 Additional tests are welcome for other use cases.

Those are pretty exciting numbers.  These days for analytics work I'm
using mostly covering index type approaches.  I bet the tiny index
would more than offset the extra heap accesses.   Can you CLUSTER
against a minmax index?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Simon Riggs
On 13 November 2013 11:54, Merlin Moncure mmonc...@gmail.com wrote:
 On Wed, Nov 13, 2013 at 7:33 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 13 November 2013 09:31, Leonardo Francalanci m_li...@yahoo.it wrote:
 I would like to see some numbers.

 Alvaro has given me some results for his patch. The figures I have are
 for a 2GB table.

 Index Build Time
 MinMax 11 s
 Btree   96s

 Index Size
 MinMax 2 pages + metapage
 Btree approx 200,000 pages + metapage

 Load time
 MinMax no overhead, same as raw COPY
 BTree - considerably slower

 Index SELECT
 Slower for small groups of rows
 Broadly same for large requests (more results required on that assessment)

 I expect to publish results against TPC-H cases in next few weeks.

 Additional tests are welcome for other use cases.

 Those are pretty exciting numbers.  These days for analytics work I'm
 using mostly covering index type approaches.

If you're using index only scans then this will work for you as well,
hopefully better. Same principle wrt all visible page ranges.

 I bet the tiny index
 would more than offset the extra heap accesses.

That's the trade-off, yes. I was hoping that would lead to cases where
the min max is better than a btree, but not there yet.

 Can you CLUSTER
 against a minmax index?

Not in this release, at least in my understanding. It's not yet
possible to do an ordered fetch, so the cluster scan probably won't
work.

I was hoping to include some special Freespace Map modes that would help there.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Simon Riggs
On 13 November 2013 11:54, Merlin Moncure mmonc...@gmail.com wrote:

 Load time
 MinMax no overhead, same as raw COPY
 BTree - considerably slower

And just as a general comment, the min max index does not slow down
COPY as the table gets larger, whereas the btree gets slower as the
table gets larger. Which is the reason Leonardo requires partitioned
tables.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Leonardo Francalanci
Simon Riggs wrote
 Can you CLUSTER
 against a minmax index?
 
 Not in this release, at least in my understanding. It's not yet
 possible to do an ordered fetch, so the cluster scan probably won't
 work.

As per the patch I helped writing, CLUSTER should use the
sequential heap scan+sort when it makes sense.
So I think that if the index is not able to do an ordered fetch, 
CLUSTER should fall back to scan+sort automatically (which is
what you want in a large table anyway).

Obviously, that should be tested.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5778171.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-12 Thread Simon Riggs
On 5 November 2013 14:28, Leonardo Francalanci m_li...@yahoo.it wrote:

 Either my sql is not correct (likely), or my understanding of the minmax
 index is
 not correct (even more likely), or the minmax index is not usable in a
 random inputs
 scenario.

Please show the real world SQL you intend to run, so we can comment on
it. Inventing a use case that breaks effectiveness of any optimisation
is always easy, but the question is whether the use case is likely or
even desirable.

If we have a query to show the most recent calls by a particular caller

SELECT *
FROM cdr
WHERE callerid = X
ORDER BY call_timestamp DESC
LIMIT 100

then this could potentially be optimised using a minmax index, by
traversing the data ranges in call_timestamp order. That is not part
of the code in this initial release, since the main use case is for
WHERE call_timestamp = X, or WHERE primarykey = Y

I don't believe there is a credible business case for running that
same query but without the ORDER BY and LIMIT, since it could
potentially return gazillions of rows, so it isn't surprising at all
that it would access a large % of the table. Saying but I really do
want to run it isn't an argument in favour of it being a sensible
query to run - we are only interested in optimising sensible real
world queries.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-12 Thread Nicolas Barbier
2013/11/2 Simon Riggs si...@2ndquadrant.com:

 On 29 October 2013 16:10, Peter Geoghegan p...@heroku.com wrote:

 Presumably someone will get around to implementing a btree index
 insertion buffer one day. I think that would be a particularly
 compelling optimization for us, because we could avoid ever inserting
 index tuples that are already dead when the deferred insertion
 actually occurs.

 That's pretty much what the LSM-tree is.

[ Disclaimer: I have only skimmed the LSM-trees paper rather than read
it thoroughly. ]

LSM-trees seem to hit a wall when the total amount of data gets big
enough, unless one uses “multi-component LSM-trees” (as described in
the paper). Having a B-tree with deferred insertion would be similar
to having an LSM-tree without the multi-component property.

The underlying problem with fast random insertions into a B-tree is
that each write of a whole block writes only a small amount of “useful
changes” (once the tree gets large enough relative to memory). The
“multi-component” concept fixes that. I think that simply combining
that concept with normal B-trees is a rather elegant way of (at least
theoretically) solving the problem:

Definitions:

* Define a rather small integer K (with K ≥ 2), that will influence
maximum insertion speed (higher K = higher insertion speed), but also
look-up time (higher K = slower look-up).
* Define some size S “that easily fits in memory.”
* F is the fan-out of the B-tree. (If F  K, the algorithm results in
slower amortized insertion speed than simply using one single big
B-tree if only the amount of blocks read/written are taken into
account; it may still be better because of seek times.)
* n is the total number of entries in the index.

Algorithm:

* Start inserting into a small B-tree until it gets size S, then start
inserting into a new B-tree until that fills up to size S, etc.
* When K such B-trees (of size S) exist, merge them together into one
(S × K)-sized B-tree (delete the old ones).
* Do this recursively: Once K B-trees of size (K × S) exist, merge
them together into one (S × K^2)-sized B-tree, etc.
* Let each look-up walk all trees, and return the union of all results.

(Note that K B-trees can be merged by simply scanning all of them
concurrently, and merging them just like a merge sort merges runs.
Also, all B-trees except for the first level (of size S) can be
compacted 100% as there is no need to reserve space for further
insertions in them.)

Insertion speed can be calculated as follows (I disregard seek times
to make this easy; it also happens to be the right assumption for
SSDs): Assume that a small B-tree (size S) can simply be written out
without having to write each block multiple times. Each entry has to
be written log_K(n × log_F(n)) times. All blocks written are 100%
“useful changes.” Therefore, insertion speed is log_K(n × log_F(n))
times less than raw disk speed. (Note that I also disregard the time
needed for reading; This may make everything about twice as slow.)

Example: For K = 10, F = 100 (i.e., 80B per entry), blocksize = 8kB,
and n = 10^9 (i.e., 80GB of entries), the insertion speed is
log_10(10^9 × log_100(10^9)) = log_10(10^9 × 4.5) = ~9.5 times slower
than writing the 80GB of index entries sequentially. For the “one
single big B-tree” scenario, the insertion speed would be ~F = ~100
times slower than raw disk speed (assuming that all writes but the
ones to the leaf-level can be combined).

Look-up speed is as follows: Each look-up must look through all
B-trees. Assume for simplicity that all trees have the same height as
the single B-tree in the “one single big B-tree” case (i.e., which is
rather wrong (most are less tall), but seems good enough as an
estimate), there are K trees of each size (idem), and there are
log_K(n) different sizes. Then, the number of trees to walk is K ×
log_K(n), and therefore each look-up is K × log_K(n) slower than in
the “one single big B-tree” case.

Example: (using the same parameters as above) Look-up speed is 10 ×
log_10(10^9) = 90 times slower (i.e., because there are ~90 trees).

Index size: I think (didn’t calculate) that the combined size of the
B-trees will be about the same as (a little bit more than) the size of
a single big B-tree containing the same entries.

I have no idea whether someone already gave this kind of “forest of
B-trees” structure a name. Otherwise, I suggest “B-forest” :-).

In conclusion, use a “B-forest” when:

* The index entries are small (large fan-out).
* The insertion throughput is high.
* It’s OK for look-ups to be slow.
* Extra points when the storage medium has high seek times.

Major missing piece in PostgreSQL (I think):

* Functionality to merge K indexes into one (bigger) combined index.

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-12 Thread Nicolas Barbier
2013/11/12 Nicolas Barbier nicolas.barb...@gmail.com:

 In conclusion, use a “B-forest” when:

 * The index entries are small (large fan-out).
 * The insertion throughput is high.
 * It’s OK for look-ups to be slow.
 * Extra points when the storage medium has high seek times.

Oops, forgot the most important ones:

* Insertions are random.
* The total amount of data is very high.

Nicolas

-- 
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-12 Thread Simon Riggs
On 12 November 2013 21:41, Nicolas Barbier nicolas.barb...@gmail.com wrote:

 Look-up speed is as follows: Each look-up must look through all
 B-trees.

That can be optimised by using a min max approach, so we need only
look at sub-trees that may contain data.

 Index size: I think (didn’t calculate) that the combined size of the
 B-trees will be about the same as (a little bit more than) the size of
 a single big B-tree containing the same entries.

Agreed

 Major missing piece in PostgreSQL (I think):

 * Functionality to merge K indexes into one (bigger) combined index.

Good analysis.

I would add that it is possible to optimise large DELETEs from a table
if complete sub-trees of the btree can be easily removed. This for me
would be the compelling benefit of this approach.

I still think we need to look at this from a query perspective though.
We need to check whether there is a class of real world queries that
are not well optimised by minmax indexes, or cannot be made to be in
future releases. For example, large DELETEs from a table are almost
trivially optimised for min max.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-12 Thread Claudio Freire
On Tue, Nov 12, 2013 at 6:41 PM, Nicolas Barbier
nicolas.barb...@gmail.com wrote:
 (Note that K B-trees can be merged by simply scanning all of them
 concurrently, and merging them just like a merge sort merges runs.
 Also, all B-trees except for the first level (of size S) can be
 compacted 100% as there is no need to reserve space for further
 insertions in them.)

Unless you can guarantee strong correlation of index-order vs
physical-order, scanning multiple indexes in index-order will be quite
slow (random I/O).

On Tue, Nov 12, 2013 at 7:14 PM, Simon Riggs si...@2ndquadrant.com wrote:
 I still think we need to look at this from a query perspective though.
 We need to check whether there is a class of real world queries that
 are not well optimised by minmax indexes, or cannot be made to be in
 future releases. For example, large DELETEs from a table are almost
 trivially optimised for min max.

Only if you don't have a PK (or other index).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-11 Thread Jeff Janes
On Tue, Nov 5, 2013 at 9:52 AM, Leonardo Francalanci m_li...@yahoo.itwrote:

 Jeff Janes wrote
  Some experiments I did a few years ago showed that applying sorts to the
  data to be inserted could be helpful even when the sort batch size was as
  small as one tuple per 5 pages of existing index.  Maybe even less.

 Cool!!! Do you have any idea/hint on how I could try and replicate that?
 Do you remember how you did it?


I can't find my notes but I remember more or less how I did it.

Since we don't yet have an insertion buffer that allows the rows to be
sorted in different order for different indexes, I had to simulate it just
by using a table with a single index and hoping that that would extrapolate.

create table foo (x bigint);

To speed things up, you may want to prepopulate this with random data so
that the size of the index-to-be will exceed shared_buffers, or physical
RAM, before making the index.  Also, the effectiveness might depend on how
much the index has grown since its creations, since leaf pages are
initially correlated between physical order and logical order, but that
decreases over time.  So you may want to try different initial seed sizes.

create index on foo (x);

Then I use perl to make run-sorted data with different run sizes, and load
that via \copy.  I put all the data points in memory up front rather than
generating it per-run on the fly, so that perl consumes about the same
amount of memory regardless of the run size.  You would want to use more
than 1..1e6 if you are on a very large RAM machine.


Something like:

for $run_size in 1 10 100 1000 1 10; do
  perl -le 'my @x; push @x, int(rand()*1e8) foreach 1..1e6; while (@x)
{print foreach sort {$a=$b} splice @x,0,'$run_size'; }'| time psql -c
'\copy foo from stdin';
done

But you probably want another inner loop so that the \copy gets executed
multiple times per run_size, so that each run_size executes for at least a
couple checkpoint cycles.

Cheers,

Jeff


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-11 Thread Jeff Janes
On Thu, Oct 31, 2013 at 12:43 AM, Leonardo Francalanci m_li...@yahoo.itwrote:

 Jeff Janes wrote
  True, but that is also true of indexes created in bulk.  It all has to
  reach disk eventually--
  [...]
  If the checkpoint interval is as long as the partitioning period, then
  hopefully the active index buffers get re-dirtied while protected in
  shared_buffers, and only get written to disk once.

 Honestly, I made a lot of tests in the past, and I don't remember if I
 tried
 15-minute checkpoints + high shared_buffers. That might work. I'm going to
 try it and see what happens.


You might want to go even beyond 15 minutes.




 Jeff Janes wrote
  If the buffers get read, dirtied, and evicted from a small shared_buffers
  over and over again
  then you are almost guaranteed that will get written to disk multiple
  times

 (as I understand, but I might be wrong):
 high shared_buffers don't help because in such a random index writing, lots
 and lots of pages get dirtied, even if the change in the page was minimal.
 So, in the 15-minute period, you write the same pages over and over
 again.


But you write them only if you need to due to a checkpoint, needing new
buffers to read in something else that is not already in shared_buffers, or
because you are using a buffer-access-strategy that uses a ring.  If you
make checkpoints longs, it will cut down on the first.  If shared_buffers
is large enough to contain the active part of the indexes being updated,
that should cut down on the second.  I don't know if the third is a problem
or not--I think copy might try to use a ring-buffer, but I don't if it does
that for indexed table.



 Even if you have high shared_buffers, the same page will get sync-ed to
 disk
 multiple times (at every checkpoint).


If the active part of the indexes is much larger than you can could
possibly set shared_buffers to, then there is probably little point in
increasing shared_buffers from, say, 1% of the active index size to 8% of
it.  It only makes sense to increase it if you can do so large enough to
cover ~100% of the needed space.



 The idea of those other indexes is to avoid the random writing,
 maximizing
 the writing in sequence, even if that means writing more bytes. In other
 words: writing a full 8KB is no different than write 20 bytes in a page, as
 we'll have to sync the whole page anyway...


True, but that is the idea here as well.  If you can delay writing the page
until 20 bytes of it have been dirtied on 400 different occasions...

I'm not saying we shouldn't think about some kind of insert buffer, but I
really doubt that that is going to happen in 9.4 while increasing
shared_buffers can be done today, if it works and if you can live with the
consequences.

Cheers,

Jeff


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Andres Freund-3 wrote
 On 2013-11-04 11:27:33 -0500, Robert Haas wrote:
 On Mon, Nov 4, 2013 at 11:24 AM, Claudio Freire lt;

 klaussfreire@

 gt; wrote:
  Such a thing would help COPY, so maybe it's worth a look
 
 I have little doubt that a deferred insertion buffer of some kind
 could help performance on some workloads, though I suspect the buffer
 would have to be pretty big to make it worthwhile on a big COPY that
 generates mostly-random insertions.
 
 Even for random data presorting the to-be-inserted data appropriately
 could result in much better io patterns.

Mmh, I'm afraid that the buffer should be huge to get some real advantage.
You have to buffer enough values to avoid touching entire pages, which is
not that easy. The LSM-tree is much complicated than a simple memory-buffer
+ delayed inserts. The other index types that are supposed to help in the
random-insertion workload rely on sequential insertions (at the expense of
more writing, and slower reads) rather than re-use the btree pattern.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5776963.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Simon Riggs wrote
 Everybody on this thread is advised to look closely at Min Max indexes
 before starting any further work.
 
 MinMax will give us access to many new kinds of plan, plus they are
 about as close to perfectly efficient, by which I mean almost zero
 overhead, with regard to inserts as it is possible to get.

Simon, I don't understand how minmax indexes would help in a random-inserts
scenario.
While I would love to use minmax for other columns (since we also partition
and search based on a timestamp, which is usually well clustered), I thought
minmax index would be perfect in a mostly-incremental values scenario. 



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5776964.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Simon Riggs
On 5 November 2013 08:32, Leonardo Francalanci m_li...@yahoo.it wrote:
 Simon Riggs wrote
 Everybody on this thread is advised to look closely at Min Max indexes
 before starting any further work.

 MinMax will give us access to many new kinds of plan, plus they are
 about as close to perfectly efficient, by which I mean almost zero
 overhead, with regard to inserts as it is possible to get.

 Simon, I don't understand how minmax indexes would help in a random-inserts
 scenario.
 While I would love to use minmax for other columns (since we also partition
 and search based on a timestamp, which is usually well clustered), I thought
 minmax index would be perfect in a mostly-incremental values scenario.

Minmax indexes seem to surprise many people, so broad generalisations
aren't likely to be useful.

I think the best thing to do is to publish some SQL requests that
demonstrate in detail what you are trying to achieve and test them
against minmax indexes. That way we can discuss what does work and
what doesn't work well enough yet.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Simon Riggs wrote
 Minmax indexes seem to surprise many people, so broad generalisations
 aren't likely to be useful.
 
 I think the best thing to do is to publish some SQL requests that
 demonstrate in detail what you are trying to achieve and test them
 against minmax indexes. That way we can discuss what does work and
 what doesn't work well enough yet.

While I do believe in testing (since In theory there is no difference
between theory and practice. In practice there is), I would like to know
the properties of the minmax index before trying it.
What is it supposed to be good at? What are the pros/cons? We can't ask all
the users to just try the index and see if it works for them. 
As I said, my understanding is that is very efficient (both in insertion and
in searching) when data is somehow ordered in the table. But maybe I got it
wrong...

Anyway, the sql scenario is simple: a table with 4 bigint indexes; data in
the fields is a random bigint in the range 1-1000. Insertion is 5-10K
rows/second. One search every 1-5 seconds, by one of the indexed fields
(only equality, no range). There's also an index on a timestamp field, but
that's not random so it doesn't give that many problems (it's actually the
one where I wanted to try the minmax).




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5776976.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Simon Riggs
On 5 November 2013 09:57, Leonardo Francalanci m_li...@yahoo.it wrote:
 Simon Riggs wrote
 Minmax indexes seem to surprise many people, so broad generalisations
 aren't likely to be useful.

 I think the best thing to do is to publish some SQL requests that
 demonstrate in detail what you are trying to achieve and test them
 against minmax indexes. That way we can discuss what does work and
 what doesn't work well enough yet.

 While I do believe in testing (since In theory there is no difference
 between theory and practice. In practice there is), I would like to know
 the properties of the minmax index before trying it.
 What is it supposed to be good at? What are the pros/cons? We can't ask all
 the users to just try the index and see if it works for them.

No, but then all users aren't suggesting we need a new index type are they?

I think its reasonable for you to spend time checking whether what you
require will be met, and if not, what precise query doesn't it help,
so we can better design any future new-index.

 As I said, my understanding is that is very efficient (both in insertion and
 in searching) when data is somehow ordered in the table. But maybe I got it
 wrong...

 Anyway, the sql scenario is simple: a table with 4 bigint indexes; data in
 the fields is a random bigint in the range 1-1000. Insertion is 5-10K
 rows/second. One search every 1-5 seconds, by one of the indexed fields
 (only equality, no range). There's also an index on a timestamp field, but
 that's not random so it doesn't give that many problems (it's actually the
 one where I wanted to try the minmax).

Without meaning to pick on you, imprecise analysis yields unhelpful
new features. The clearer we are about what we are trying to solve the
more likely we are to solve it well. 30 seconds analysis on what is
needed is not sufficient to justify an additional man year of
development, especially if a man year of work has already been done
and the testing of the latest feature is now at hand.

The requests from the indexes field, are they ordered? are they
limited? Or you really want ALL calls? What is the tolerance on those?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Simon Riggs wrote
 On 5 November 2013 09:57, Leonardo Francalanci lt;

 m_lists@

 gt; wrote:
 While I do believe in testing (since In theory there is no difference
 between theory and practice. In practice there is), I would like to know
 the properties of the minmax index before trying it.
 What is it supposed to be good at? What are the pros/cons? We can't ask
 all
 the users to just try the index and see if it works for them.
 
 No, but then all users aren't suggesting we need a new index type are
 they?
 
 I think its reasonable for you to spend time checking whether what you
 require will be met, and if not, what precise query doesn't it help,
 so we can better design any future new-index.

I don't understand the parallel We can't ask all the users to just try the
index and see if it works for them with all users aren't suggesting we
need a new index type. Anyway:

I'm not suggesting we need a new index type. Please read my first post: I'm
asking info, fearing that there's just a lot of marketing/hype in those
indexes.

What do you mean by spend time checking whether what you require will be
met? Met by what, minmax indexes?


Simon Riggs wrote
 As I said, my understanding is that is very efficient (both in insertion
 and
 in searching) when data is somehow ordered in the table. But maybe I got
 it
 wrong...
 
 Anyway, the sql scenario is simple: a table with 4 bigint indexes; data
 in
 the fields is a random bigint in the range 1-1000. Insertion is 5-10K
 rows/second. One search every 1-5 seconds, by one of the indexed fields
 (only equality, no range). There's also an index on a timestamp field,
 but
 that's not random so it doesn't give that many problems (it's actually
 the
 one where I wanted to try the minmax).
 
 Without meaning to pick on you, imprecise analysis yields unhelpful
 new features. The clearer we are about what we are trying to solve the
 more likely we are to solve it well. 30 seconds analysis on what is
 needed is not sufficient to justify an additional man year of
 development, especially if a man year of work has already been done
 and the testing of the latest feature is now at hand.

I've never said that my analysis justifies a man year of work. As I said,
I'm actually not confident at all that even if we had those cool indexes
they would work on my scenario (marketing aside, there's not that much
data/tests out there on those indexes). I just wanted to know if the matter
was discussed in the past / getting more info.

At the same time, I'm reluctant to try a new index hoping it will work in my
case just because it's new and a man year of work has already been done.
Again: what's this minmax index supposed to be good at?
If it's indexing data in mostly-sequential order, it won't help me. From
what I got (maybe I got it wrong?) the index stores min/max values of
sequence of pages. In my case I guess that those min/max values would be
close to 1 (min) /1000 (max), because I insert data in random order. So
any query will scan the entire table anyway. Am I wrong?


Simon Riggs wrote
 The requests from the indexes field, are they ordered? 

Mmmh, I don't think I understand the question... an operator searches for
calls made by a user, so he searches in random order... 


Simon Riggs wrote
 are they
 limited? Or you really want ALL calls? What is the tolerance on those?

I want all the calls made by the user. But there won't be many calls for 1
user.
And most users will never be queried (as I said, one calls by person
search every 1-5 seconds, so a very small percentage of calls will ever be
queried/retrieved)



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5776982.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Claudio Freire
On Tue, Nov 5, 2013 at 6:57 AM, Leonardo Francalanci m_li...@yahoo.it wrote:
 Simon Riggs wrote
 Minmax indexes seem to surprise many people, so broad generalisations
 aren't likely to be useful.

 I think the best thing to do is to publish some SQL requests that
 demonstrate in detail what you are trying to achieve and test them
 against minmax indexes. That way we can discuss what does work and
 what doesn't work well enough yet.

 While I do believe in testing (since In theory there is no difference
 between theory and practice. In practice there is), I would like to know
 the properties of the minmax index before trying it.
 What is it supposed to be good at? What are the pros/cons? We can't ask all
 the users to just try the index and see if it works for them.
 As I said, my understanding is that is very efficient (both in insertion and
 in searching) when data is somehow ordered in the table. But maybe I got it
 wrong...

Well, for one, random inserts (with random data) on a min-max index
have a roughly 1/N chance of requiring a write to disk, and (N-1)/N
chance of being completely free (or maybe a read to verify a write
isn't needed, but that'll probably hit shared buffers), where N is the
number of tuples per page. Per index page that is.

Of course, non-random workloads are a different matter.

Min-max indexes always require a sequential scan of the min-max index
itself when querying. That works when you intend to query enough
tuples to make up the cost (that is, more tuples than M * N *
random_cost / seq_cost), where M is the number of pages in the index.
Well, actually, since they result in better io patterns as well, the
tradeoff is probably a little bit more tricky than that, in favor of
min-max indexes.

Min-max indexes tend to be very compact, so M is usually low.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Claudio Freire wrote
 Min-max indexes always require a sequential scan of the min-max index
 itself when querying. 

I'm worried about the number of heap pages that will be scanned.
My understanding is that given the random input, the index will
not be selective enough, and will end up requiring a lot of page 
scanning to get the relevant rows.

That is: the selectivity of the min/max values will be very low, given
the high cardinality and randomness of the input; I'm afraid that, in
most page ranges, the min will be lower than the queried ones,
and the max higher, resulting in lots of heap page scans.

Quick test:
a table with 1M rows, with random values from 1 to 1000:
create table t as select generate_series(1, 100) as i, trunc(random() *
1000) as n;

suppose a page range contains 100 rows (but my understanding is that minmax
index will use a much bigger row count...), let's find how many page
ranges
should be scanned to find a series of 50 values (again, random in the
1-1000 range):

with cte as 
 (select min(n) as minn, max(n) as maxn, i/100 from t group by i/100),
 inp as (select generate_series(1, 50) iinp, trunc(random() * 1000) as
s)
 select count(*) from inp
   left outer join cte on inp.s between minn and maxn group by iinp


I get  9000 pages for 49 values out of 50... which means scanning 90% of
the table.

Either my sql is not correct (likely), or my understanding of the minmax
index is
not correct (even more likely), or the minmax index is not usable in a
random inputs
scenario.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5777009.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Claudio Freire
On Tue, Nov 5, 2013 at 11:28 AM, Leonardo Francalanci m_li...@yahoo.it wrote:
 I get  9000 pages for 49 values out of 50... which means scanning 90% of
 the table.

 Either my sql is not correct (likely), or my understanding of the minmax
 index is
 not correct (even more likely), or the minmax index is not usable in a
 random inputs
 scenario.


Yep, you're correct. That's the cost for querying random values.

But, both real data isn't truly random, and you haven't really
analyzed update cost, which is what we were talking about in that last
post.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Claudio Freire wrote
 real data isn't truly random

Well, let's try normal_rand???

create table t1 as select trunc(normal_rand(100, 50, 3)) as n,
generate_series(1, 100) as i;

with cte as 
 (select min(n) as minn, max(n) as maxn, i/100 from t1 group by i/100),
 inp as (select generate_series(1, 100) iinp, trunc(normal_rand(100,
50, 3)) as s)

 select count(*),iinp from inp
   left outer join cte on inp.s between minn and maxn group by iinp;

Not that much different in my run... 



Claudio Freire wrote
 you haven't really
 analyzed update cost, which is what we were talking about in that last
 post.

I don't care for a better update cost if the cost to query is a table scan.
Otherwise, I'll just claim that no index at all is even better than minmax:
0 update cost, pretty much same query time.

Maybe there's value in minmax indexes for sequential data, but not for
random data, which is the topic of this thread.

BTW I would like to see some performance tests on the minmax indexes
vs btree for the sequential inputs... is the gain worth it? I couldn't find
any mention of performance tests in the minmax threads.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5777020.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Jeff Janes
On Tue, Nov 5, 2013 at 12:25 AM, Leonardo Francalanci m_li...@yahoo.itwrote:

 Andres Freund-3 wrote
  On 2013-11-04 11:27:33 -0500, Robert Haas wrote:
  On Mon, Nov 4, 2013 at 11:24 AM, Claudio Freire lt;

  klaussfreire@

  gt; wrote:
   Such a thing would help COPY, so maybe it's worth a look
 
  I have little doubt that a deferred insertion buffer of some kind
  could help performance on some workloads, though I suspect the buffer
  would have to be pretty big to make it worthwhile on a big COPY that
  generates mostly-random insertions.
 
  Even for random data presorting the to-be-inserted data appropriately
  could result in much better io patterns.

 Mmh, I'm afraid that the buffer should be huge to get some real advantage.
 You have to buffer enough values to avoid touching entire pages, which is
 not that easy.


Some experiments I did a few years ago showed that applying sorts to the
data to be inserted could be helpful even when the sort batch size was as
small as one tuple per 5 pages of existing index.  Maybe even less.

Cheers,

Jeff


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Claudio Freire
On Tue, Nov 5, 2013 at 12:22 PM, Leonardo Francalanci m_li...@yahoo.it wrote:
 Claudio Freire wrote
 you haven't really
 analyzed update cost, which is what we were talking about in that last
 post.

 I don't care for a better update cost if the cost to query is a table scan.
 Otherwise, I'll just claim that no index at all is even better than minmax:
 0 update cost, pretty much same query time.

 Maybe there's value in minmax indexes for sequential data, but not for
 random data, which is the topic of this thread.


Well, of course, they're not magic pixie dust.

But is your data really random? (or normal?)

That's the thing...


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Claudio Freire wrote
 Well, of course, they're not magic pixie dust. 

Of course they aren't. I think they can make a difference in a sequential
input scenario. But I'm not the one who said that they are fit to
solve the problems me and other people are talking about in this thread.


Claudio Freire wrote
 But is your data really random? (or normal?)
 That's the thing...

I still don't understand.

Even if the data was normal, it wouldn't work. You can try and
change the 3rd parameter in normal_rand and get a narrower
distribution, but at the same time the query values should be
narrower... so you'll get the same results. 

The problem is: if the range you get between min and max is 
too big in each page range, you'll end up scanning a lot of heap
pages.

To me, the thing is: has anyone made performance tests of these
minmax indexes with an input that is not sequential?

(BTW I'd like to see tests for the sequential input scenario too...)





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5777055.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Jeff Janes wrote
 Some experiments I did a few years ago showed that applying sorts to the
 data to be inserted could be helpful even when the sort batch size was as
 small as one tuple per 5 pages of existing index.  Maybe even less.

Cool!!! Do you have any idea/hint on how I could try and replicate that?
Do you remember how you did it?




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5777056.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Claudio Freire
On Tue, Nov 5, 2013 at 2:52 PM, Leonardo Francalanci m_li...@yahoo.it wrote:
 Jeff Janes wrote
 Some experiments I did a few years ago showed that applying sorts to the
 data to be inserted could be helpful even when the sort batch size was as
 small as one tuple per 5 pages of existing index.  Maybe even less.

 Cool!!! Do you have any idea/hint on how I could try and replicate that?
 Do you remember how you did it?


I do it regularly by sorting tuples before inserting/updating. It
helps quite significantly for batches of ~1000 tuples (well, in my
case).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Greg Stark
On Tue, Nov 5, 2013 at 5:30 PM, Claudio Freire klaussfre...@gmail.comwrote:

  Maybe there's value in minmax indexes for sequential data, but not for
  random data, which is the topic of this thread.


 Well, of course, they're not magic pixie dust.

 But is your data really random? (or normal?)


I think minmax indexes are more akin to bitmap indexes. They will be very
effective for columns with low-cardinality, especially for columns that are
very clustered. In the extreme if all the values in some regions of the
table are the same then minmax indexes would be optimal. I wouldn't expect
them to be very effective for a highly selective column that isn't well
clustered.

It really sounds like you're describing a particular workload that btrees
could just be more optimized for. Buffering all inserts in memory and
merging them into the btree lazily is actually something Heikki has
proposed in the past. I'm not clear if that gets you all the benefits of
the indexes you described or not but it seems to target the particular
problem you're having.

-- 
greg


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Alvaro Herrera
Greg Stark escribió:

 I think minmax indexes are more akin to bitmap indexes. They will be very
 effective for columns with low-cardinality, especially for columns that are
 very clustered. In the extreme if all the values in some regions of the
 table are the same then minmax indexes would be optimal. I wouldn't expect
 them to be very effective for a highly selective column that isn't well
 clustered.

I think clustering is more important than cardinality.  I mean you can
have a very useful minmax index on a float column, on which maybe there
are no two identical values.

I certainly don't think minmax indexes will solve all indexing problems.
In the end, they're just one more tool in your DBA toolbox.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-04 Thread Robert Haas
On Sat, Nov 2, 2013 at 6:07 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 29 October 2013 16:10, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Oct 29, 2013 at 7:53 AM, Leonardo Francalanci m_li...@yahoo.it 
 wrote:
 I don't see much interest in insert-efficient indexes.

 Presumably someone will get around to implementing a btree index
 insertion buffer one day. I think that would be a particularly
 compelling optimization for us, because we could avoid ever inserting
 index tuples that are already dead when the deferred insertion
 actually occurs.

 That's pretty much what the LSM-tree is.

What is pretty cool about this sort of thing is that there's no
intrinsic reason the insertion buffer needs to be block-structured or
disk-backed.  In theory, you can structure the in-memory portion of
the tree any way you like, using pointers and arbitrary-size memory
allocations and all that fun stuff.  You need to log that there's a
deferred insert (or commit to flushing the insertion buffer before
every commit, which would seem to miss the point) so that recovery can
reconstruct the in-memory data structure and flush it, but that's it:
the WAL format need not know any other details of the in-memory
portion of the tree.  I think that, plus the ability to use pointers
and so forth, might lead to significant performance gains.

In practice, the topology of our shared memory segment makes this a
bit tricky.  The problem isn't so much that it's fixed size as that it
lacks a real allocator, and that all the space used for shared_buffers
is nailed down and can't be borrowed for other purposes.  I'm very
interested in solving the problem of getting a real allocator for
shared memory because I think I need it for parallel sort, and even if
there's a way to avoid needing it there I have a strong feeling that
we'll want it for other applications of dynamic shared memory.  But it
should be possible to write the allocator in such a way that you just
give it a chunk of memory to manage, and it does, remaining agnostic
about whether the memory is from the main shared memory segment or a
dynamic one.

Of course, it's possible that even we do get a shared memory
allocator, a hypothetical person working on this project might prefer
to make the data block-structured anyway and steal storage from
shared_buffers.  So my aspirations in this area may not even be
relevant.  But I wanted to mention them, just in case anyone else is
thinking about similar things, so that we can potentially coordinate.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-04 Thread Claudio Freire
On Mon, Nov 4, 2013 at 1:09 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Nov 2, 2013 at 6:07 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 29 October 2013 16:10, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Oct 29, 2013 at 7:53 AM, Leonardo Francalanci m_li...@yahoo.it 
 wrote:
 I don't see much interest in insert-efficient indexes.

 Presumably someone will get around to implementing a btree index
 insertion buffer one day. I think that would be a particularly
 compelling optimization for us, because we could avoid ever inserting
 index tuples that are already dead when the deferred insertion
 actually occurs.

 That's pretty much what the LSM-tree is.

 What is pretty cool about this sort of thing is that there's no
 intrinsic reason the insertion buffer needs to be block-structured or
 disk-backed.  In theory, you can structure the in-memory portion of
 the tree any way you like, using pointers and arbitrary-size memory
 allocations and all that fun stuff.  You need to log that there's a
 deferred insert (or commit to flushing the insertion buffer before
 every commit, which would seem to miss the point)


Such a thing would help COPY, so maybe it's worth a look


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-04 Thread Robert Haas
On Mon, Nov 4, 2013 at 11:24 AM, Claudio Freire klaussfre...@gmail.com wrote:
 Such a thing would help COPY, so maybe it's worth a look

I have little doubt that a deferred insertion buffer of some kind
could help performance on some workloads, though I suspect the buffer
would have to be pretty big to make it worthwhile on a big COPY that
generates mostly-random insertions.  I think the question is not so
much whether it's worth doing but where anyone's going to find the
time to do it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-04 Thread Claudio Freire
On Mon, Nov 4, 2013 at 1:27 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Nov 4, 2013 at 11:24 AM, Claudio Freire klaussfre...@gmail.com 
 wrote:
 Such a thing would help COPY, so maybe it's worth a look

 I have little doubt that a deferred insertion buffer of some kind
 could help performance on some workloads, though I suspect the buffer
 would have to be pretty big to make it worthwhile on a big COPY that
 generates mostly-random insertions.  I think the question is not so
 much whether it's worth doing but where anyone's going to find the
 time to do it.


However, since an admin can increase work_mem for that COPY, using
work_mem for this would be reasonable, don't you agree?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-04 Thread Robert Haas
On Mon, Nov 4, 2013 at 11:31 AM, Claudio Freire klaussfre...@gmail.com wrote:
 On Mon, Nov 4, 2013 at 1:27 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Nov 4, 2013 at 11:24 AM, Claudio Freire klaussfre...@gmail.com 
 wrote:
 Such a thing would help COPY, so maybe it's worth a look

 I have little doubt that a deferred insertion buffer of some kind
 could help performance on some workloads, though I suspect the buffer
 would have to be pretty big to make it worthwhile on a big COPY that
 generates mostly-random insertions.  I think the question is not so
 much whether it's worth doing but where anyone's going to find the
 time to do it.


 However, since an admin can increase work_mem for that COPY, using
 work_mem for this would be reasonable, don't you agree?

Without implementing it and benchmarking the result, it's pretty hard
to know.  But if I were a betting man, I'd bet that's not nearly big
enough.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-04 Thread Andres Freund
On 2013-11-04 11:27:33 -0500, Robert Haas wrote:
 On Mon, Nov 4, 2013 at 11:24 AM, Claudio Freire klaussfre...@gmail.com 
 wrote:
  Such a thing would help COPY, so maybe it's worth a look
 
 I have little doubt that a deferred insertion buffer of some kind
 could help performance on some workloads, though I suspect the buffer
 would have to be pretty big to make it worthwhile on a big COPY that
 generates mostly-random insertions.

Even for random data presorting the to-be-inserted data appropriately
could result in much better io patterns.

 I think the question is not so
 much whether it's worth doing but where anyone's going to find the
 time to do it.

Yea :(

I think doing this outside of s_b will make stuff rather hard for
physical replication and crash recovery since we either will need to
flush the whole buffer at checkpoints - which is hard since the
checkpointer doesn't work inside individual databases - or we need to
persist the in-memory buffer across restart which also sucks.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-04 Thread Robert Haas
On Mon, Nov 4, 2013 at 11:32 AM, Andres Freund and...@2ndquadrant.com wrote:
 I think doing this outside of s_b will make stuff rather hard for
 physical replication and crash recovery since we either will need to
 flush the whole buffer at checkpoints - which is hard since the
 checkpointer doesn't work inside individual databases - or we need to
 persist the in-memory buffer across restart which also sucks.

You might be right, but I think part of the value of LSM-trees is that
the in-memory portion of the data structure is supposed to be able to
be optimized for in-memory storage rather than on disk storage.  It
may be that block-structuring that data bleeds away much of the
performance benefit.  Of course, I'm talking out of my rear end here:
I don't really have a clue how these algorithms are supposed to work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-04 Thread Gavin Flower

On 05/11/13 05:35, Robert Haas wrote:

On Mon, Nov 4, 2013 at 11:32 AM, Andres Freund and...@2ndquadrant.com wrote:

I think doing this outside of s_b will make stuff rather hard for
physical replication and crash recovery since we either will need to
flush the whole buffer at checkpoints - which is hard since the
checkpointer doesn't work inside individual databases - or we need to
persist the in-memory buffer across restart which also sucks.

You might be right, but I think part of the value of LSM-trees is that
the in-memory portion of the data structure is supposed to be able to
be optimized for in-memory storage rather than on disk storage.  It
may be that block-structuring that data bleeds away much of the
performance benefit.  Of course, I'm talking out of my rear end here:
I don't really have a clue how these algorithms are supposed to work.

How about having a 'TRANSIENT INDEX' that only exists in memory, so 
there is no requirement to write it to disk or to replicate directly? 
This type of index would be very fast and easier to implement.  Recovery 
would involve rebuilding the index, and sharing would involve recreating 
on a slave.  Probably not appropriate for a primary index, but may be 
okay for secondary indexes used to speed specific queries.


This could be useful in some situations now, and allow time to get 
experience in how best to implement the basic concept.  Then a more 
robust solution using WAL etc can be developed later.


I suspect that such a TRANSIENT INDEX would still be useful even when a 
more robust in memory index method was available.  As I expect it would 
be faster to set up than a robust memory index - which might be good 
when you need to have one or more indexes for a short period of time, or 
the size of the index is so small that recreating it requires very 
little time (total elapsed time might even be less than a disk backed one?).



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-04 Thread Simon Riggs
On 4 November 2013 16:09, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Nov 2, 2013 at 6:07 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 29 October 2013 16:10, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Oct 29, 2013 at 7:53 AM, Leonardo Francalanci m_li...@yahoo.it 
 wrote:
 I don't see much interest in insert-efficient indexes.

 Presumably someone will get around to implementing a btree index
 insertion buffer one day. I think that would be a particularly
 compelling optimization for us, because we could avoid ever inserting
 index tuples that are already dead when the deferred insertion
 actually occurs.

 That's pretty much what the LSM-tree is.

 What is pretty cool about this sort of thing is that there's no
 intrinsic reason the insertion buffer needs to be block-structured or
 disk-backed.  In theory, you can structure the in-memory portion of
 the tree any way you like, using pointers and arbitrary-size memory
 allocations and all that fun stuff.  You need to log that there's a
 deferred insert (or commit to flushing the insertion buffer before
 every commit, which would seem to miss the point) so that recovery can
 reconstruct the in-memory data structure and flush it, but that's it:
 the WAL format need not know any other details of the in-memory
 portion of the tree.  I think that, plus the ability to use pointers
 and so forth, might lead to significant performance gains.

Sounds like it could be cool, yes.

 In practice, the topology of our shared memory segment makes this a
 bit tricky.  The problem isn't so much that it's fixed size as that it
 lacks a real allocator, and that all the space used for shared_buffers
 is nailed down and can't be borrowed for other purposes.  I'm very
 interested in solving the problem of getting a real allocator for
 shared memory because I think I need it for parallel sort

Agreed, you need a shared memory allocator for parallel query. It's
just too tempting to build a hash table in shared memory on one
thread, then use the hash table from multiple sessions as we do a
parallel scan. Roughly same thing for parallel sort - passing pointers
to complete data objects around is much easier than actually moving
the data between processes, which would slow things down. We do also
need generalised inter-node pipe but that's not the best solution to
every problem.

It's also a great way of controlling over-allocation of resources.

 , and even if
 there's a way to avoid needing it there I have a strong feeling that
 we'll want it for other applications of dynamic shared memory.  But it
 should be possible to write the allocator in such a way that you just
 give it a chunk of memory to manage, and it does, remaining agnostic
 about whether the memory is from the main shared memory segment or a
 dynamic one.

Agreed

 Of course, it's possible that even we do get a shared memory
 allocator, a hypothetical person working on this project might prefer
 to make the data block-structured anyway and steal storage from
 shared_buffers.  So my aspirations in this area may not even be
 relevant.  But I wanted to mention them, just in case anyone else is
 thinking about similar things, so that we can potentially coordinate.

If anyone was going to work on LSM tree, I would advise building a
tree in shared/temp buffers first, then merging with the main tree.
The merge process could use the killed tuple approach to mark the
merging.

The most difficult thing about buffering the inserts is deciding which
poor sucker gets the task of cleaning up. That's probably better as an
off-line process, which is where the work comes in. Non shared
buffered approaches would add too much overhead to the main task.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-04 Thread Jeff Janes
On Mon, Nov 4, 2013 at 8:09 AM, Robert Haas robertmh...@gmail.com wrote:

 On Sat, Nov 2, 2013 at 6:07 AM, Simon Riggs si...@2ndquadrant.com wrote:
  On 29 October 2013 16:10, Peter Geoghegan p...@heroku.com wrote:
  On Tue, Oct 29, 2013 at 7:53 AM, Leonardo Francalanci m_li...@yahoo.it
 wrote:
  I don't see much interest in insert-efficient indexes.
 
  Presumably someone will get around to implementing a btree index
  insertion buffer one day. I think that would be a particularly
  compelling optimization for us, because we could avoid ever inserting
  index tuples that are already dead when the deferred insertion
  actually occurs.
 
  That's pretty much what the LSM-tree is.

 What is pretty cool about this sort of thing is that there's no
 intrinsic reason the insertion buffer needs to be block-structured or
 disk-backed.


How do we commit to not spilling to disk, in the face of an unbounded
number of indexes existing and wanting to use this mechanism
simultaneously?  If it routinely needs to spill to disk, that would
probably defeat the purpose of having it in the first place, but committing
to never doing so seems to be extremely restrictive. As you say it is also
freeing, in terms of using pointers and such, but I think the restrictions
would outweigh the freedom.




  In theory, you can structure the in-memory portion of
 the tree any way you like, using pointers and arbitrary-size memory
 allocations and all that fun stuff.  You need to log that there's a
 deferred insert (or commit to flushing the insertion buffer before
 every commit, which would seem to miss the point) so that recovery can
 reconstruct the in-memory data structure and flush it, but that's it:
 the WAL format need not know any other details of the in-memory
 portion of the tree.  I think that, plus the ability to use pointers
 and so forth, might lead to significant performance gains.

 In practice, the topology of our shared memory segment makes this a
 bit tricky.  The problem isn't so much that it's fixed size as that it
 lacks a real allocator, and that all the space used for shared_buffers
 is nailed down and can't be borrowed for other purposes.


I think the fixed size is also a real problem, especially given the
ubiquitous advice not to exceed 2 to 8 GB.

Cheers,

Jeff


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-04 Thread Claudio Freire
On Mon, Nov 4, 2013 at 5:01 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Of course, it's possible that even we do get a shared memory
 allocator, a hypothetical person working on this project might prefer
 to make the data block-structured anyway and steal storage from
 shared_buffers.  So my aspirations in this area may not even be
 relevant.  But I wanted to mention them, just in case anyone else is
 thinking about similar things, so that we can potentially coordinate.

 If anyone was going to work on LSM tree, I would advise building a
 tree in shared/temp buffers first, then merging with the main tree.
 The merge process could use the killed tuple approach to mark the
 merging.

 The most difficult thing about buffering the inserts is deciding which
 poor sucker gets the task of cleaning up. That's probably better as an
 off-line process, which is where the work comes in. Non shared
 buffered approaches would add too much overhead to the main task.

Thing is, if you want crash safety guarantees, you cannot use temp
(unlogged) buffers, and then you always have to flush to WAL at each
commit. If the staging index is shared, then it could mean a lot of
WAL (ie: probably around double the amount of WAL a regular b-tree
would generate).

Process-private staging trees that get merged on commit, ie:
transaction-scope staging trees, on the other hand, do not require WAL
logging, they can use temp buffers, and since they don't outlive the
transaction, it's quite obvious who does the merging (the committer).
Question is what kind of workload does that speed up with any
significance and whether the amount of work is worth that speedup on
those workloads.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-04 Thread Simon Riggs
On 4 November 2013 19:55, Gavin Flower gavinflo...@archidevsys.co.nz wrote:

 How about having a 'TRANSIENT INDEX' that only exists in memory, so there is
 no requirement to write it to disk or to replicate directly? This type of
 index would be very fast and easier to implement.  Recovery would involve
 rebuilding the index, and sharing would involve recreating on a slave.
 Probably not appropriate for a primary index, but may be okay for secondary
 indexes used to speed specific queries.

 This could be useful in some situations now, and allow time to get
 experience in how best to implement the basic concept.  Then a more robust
 solution using WAL etc can be developed later.

 I suspect that such a TRANSIENT INDEX would still be useful even when a more
 robust in memory index method was available.  As I expect it would be faster
 to set up than a robust memory index - which might be good when you need to
 have one or more indexes for a short period of time, or the size of the
 index is so small that recreating it requires very little time (total
 elapsed time might even be less than a disk backed one?).

UNLOGGED indexes have been discussed over many years and there is
pretty good acceptance of the idea for some use cases.

The main tasks are
* marking the index so they are unavailable on a hot standby
* rebuilding the index in full at the end of recovery - requires an
additional process to rebuild them possibly in priority order
* make sure the above doesn't violate security
* marking the index so it can't be used in the planner until rebuild
is complete - subtle stuff

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-04 Thread Simon Riggs
On 30 October 2013 14:34, Yann Fontana yann.font...@gmail.com wrote:


 On 30 October 2013 11:23, Leonardo Francalanci m_li...@yahoo.it wrote:

  In terms of generality, do you think its worth a man year of developer
  effort to replicate what you have already achieved? Who would pay?


 I work on an application that does exactly what Leonardo described. We hit
 the exact same problem, and came up with the same exact same solution (down
 to the 15 minutes interval). But I have also worked on other various
 datamarts (all using Oracle), and they are all subject to this problem in
 some form: B-tree indexes slow down bulk data inserts too much and need to
 be disabled or dropped and then recreated after the load. In some cases this
 is done easily enough, in others it's more complicated (example: every day,
 a process imports from 1 million to 1 billion records into a table partition
 that may contain from 0 to 1 billion records. To be as efficient as
 possible, you need some logic to compare the number of rows to insert to the
 number of rows already present, in order to decide whether to drop the
 indexes or not).

 Basically, my point is that this is a common problem for datawarehouses and
 datamarts. In my view, indexes that don't require developers to work around
 poor insert performance would be a significant feature in a
 datawarehouse-ready DBMS.


Everybody on this thread is advised to look closely at Min Max indexes
before starting any further work.

MinMax will give us access to many new kinds of plan, plus they are
about as close to perfectly efficient, by which I mean almost zero
overhead, with regard to inserts as it is possible to get.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-02 Thread Simon Riggs
On 29 October 2013 16:10, Peter Geoghegan p...@heroku.com wrote:
 On Tue, Oct 29, 2013 at 7:53 AM, Leonardo Francalanci m_li...@yahoo.it 
 wrote:
 I don't see much interest in insert-efficient indexes.

 Presumably someone will get around to implementing a btree index
 insertion buffer one day. I think that would be a particularly
 compelling optimization for us, because we could avoid ever inserting
 index tuples that are already dead when the deferred insertion
 actually occurs.

That's pretty much what the LSM-tree is.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-31 Thread Leonardo Francalanci
Jeff Janes wrote
 True, but that is also true of indexes created in bulk.  It all has to
 reach disk eventually--
 [...]
 If the checkpoint interval is as long as the partitioning period, then
 hopefully the active index buffers get re-dirtied while protected in
 shared_buffers, and only get written to disk once.  

Honestly, I made a lot of tests in the past, and I don't remember if I tried
15-minute checkpoints + high shared_buffers. That might work. I'm going to
try it and see what happens.


Jeff Janes wrote
 If the buffers get read, dirtied, and evicted from a small shared_buffers
 over and over again
 then you are almost guaranteed that will get written to disk multiple
 times

(as I understand, but I might be wrong): 
high shared_buffers don't help because in such a random index writing, lots
and lots of pages get dirtied, even if the change in the page was minimal.
So, in the 15-minute period, you write the same pages over and over again.
Even if you have high shared_buffers, the same page will get sync-ed to disk
multiple times (at every checkpoint).
The idea of those other indexes is to avoid the random writing, maximizing
the writing in sequence, even if that means writing more bytes. In other
words: writing a full 8KB is no different than write 20 bytes in a page, as
we'll have to sync the whole page anyway...

I'll try a 15-minute checkpoint interval... and see what happens.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5776470.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-31 Thread Leonardo Francalanci
Gavin Flower-2 wrote
 How about being able to mark indexes:
  'MEMORY ONLY' to make them not go to disk
 and
  'PERSISTENT | TRANSIENT' to mark if they should be recreated on 
 machine bootup?

I would love that. But:

1) I'd like to make some tests with a memory drive, and confirm that in
fact this would help (I'm sure I tried in the past, but I don't remember the
outcome)
2) I don't know if the fact that they are in memory should be handled by the
db or not. I was thinking about something more like RECREATE IF NOT FOUND,
that is: if the files aren't found at postgresql startup, re-create the
index...
3) I don't know how many people would be interested (and how
doable/complicated that would be, considering log-replay, replication etc
etc)





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5776471.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
 Hmm, you realise Alvaro is working on MinMax indexes in this release?
 They are very efficient with regard to index inserts and specially
 designed for use on large tables.
 
 Prior work by Heikki on Grouped Item Tuples was a way of reducing the
 size of indexes, yet still allowing uniqueness checks. That is
 implemented in SQLServer already and is very useful.


Reading the implementation of those features, I don't think they can help in 
the cases handled by the index types I mentioned (insertions of random values 
in big tables).



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Simon Riggs
On 30 October 2013 07:55, Leonardo Francalanci m_li...@yahoo.it wrote:
 Hmm, you realise Alvaro is working on MinMax indexes in this release?
 They are very efficient with regard to index inserts and specially
 designed for use on large tables.

 Prior work by Heikki on Grouped Item Tuples was a way of reducing the
 size of indexes, yet still allowing uniqueness checks. That is
 implemented in SQLServer already and is very useful.


 Reading the implementation of those features, I don't think they can help in 
 the cases handled by the index types I mentioned (insertions of random values 
 in big tables).

Presumably the data you are inserting isn't actually random. Please
describe the use case you are considering in more detail and some view
on how frequent that is, with some examples. Once we understand the
use case and agree it is important, we might solve problems.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
 Presumably the data you are inserting isn't actually random. Please
 describe the use case you are considering in more detail and some view
 on how frequent that is, with some examples. Once we understand the
 use case and agree it is important, we might solve problems.


Collecting calls data for mobile network operators (and no, I don't work for 
the NSA...)
Easily 5000-1 inserts per second. Indexes in timestamp and ID (not a 
problem, always increasing so no btree issues) and in called #, calling #, 
imsi, imei. The last four obviously are random, out of millions of possible 
values.
After the few first millions of records, the disks can't keep up with the 
amount of random writing in the indexes. Workaround: the table is partitioned 
every 15 minutes, and indexes created in bulk after we start the new 
15-minutes partition. Searches on current 15 minutes are not allowed (as it is 
not indexed), and searches on older data are K*log(N) (where K is the number of 
partitions). 
Yes, I could throw more disks, use ssd, sharding more, etc etc. But I still 
think that btree just aren't fit for this kind of problem. I don't delete data, 
I don't update data, there's not that much concurrency going on. I would 
sacrifice search speed (K*log(N) is already much slower than regular btree 
usage) for realtime insertion.

I don't think I'm the only one having a big system to be indexed by random 
values. 

In fact, I didn't want to turn this thread into a help me with this workload 
thread. I just wanted to know if there was some other known issues with these 
different indexes other than not enough time to implement them correctly: I 
was afraid that someone already dismissed them as good in theory, bad in 
practice...



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Simon Riggs
On 30 October 2013 10:35, Leonardo Francalanci m_li...@yahoo.it wrote:
 Presumably the data you are inserting isn't actually random. Please
 describe the use case you are considering in more detail and some view
 on how frequent that is, with some examples. Once we understand the
 use case and agree it is important, we might solve problems.


 Collecting calls data for mobile network operators (and no, I don't work for 
 the NSA...)
 Easily 5000-1 inserts per second. Indexes in timestamp and ID (not a 
 problem, always increasing so no btree issues) and in called #, calling #, 
 imsi, imei. The last four obviously are random, out of millions of possible 
 values.
 After the few first millions of records, the disks can't keep up with the 
 amount of random writing in the indexes. Workaround: the table is partitioned 
 every 15 minutes, and indexes created in bulk after we start the new 
 15-minutes partition. Searches on current 15 minutes are not allowed (as it 
 is not indexed), and searches on older data are K*log(N) (where K is the 
 number of partitions).
 Yes, I could throw more disks, use ssd, sharding more, etc etc. But I still 
 think that btree just aren't fit for this kind of problem. I don't delete 
 data, I don't update data, there's not that much concurrency going on. I 
 would sacrifice search speed (K*log(N) is already much slower than regular 
 btree usage) for realtime insertion.

 I don't think I'm the only one having a big system to be indexed by random 
 values.

 In fact, I didn't want to turn this thread into a help me with this 
 workload thread. I just wanted to know if there was some other known issues 
 with these different indexes other than not enough time to implement them 
 correctly: I was afraid that someone already dismissed them as good in 
 theory, bad in practice...

What is the reason for needing such fast access to individual groups
of records? Sure sounds like the NSA or similar ;-)

Sacrificing timeliness for efficiency is a common solution. I'm seeing
lots of areas where being able to specify the timeliness that is
acceptable in a query leads to various optimisations of this and
similar.

Indexes are a declarative solution.  We would need to be able to
specify the tolerances to be able to do this. (You can write your own
index...)

In terms of generality, do you think its worth a man year of developer
effort to replicate what you have already achieved? Who would pay?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
 What is the reason for needing such fast access to individual groups
 of records? Sure sounds like the NSA or similar ;-)


Users need to search all calls originated from/to a user or from/to a specific 
mobile phone to answer/analyze customers' probl... ok, I give up: I work for 
the NSA ;)

 In terms of generality, do you think its worth a man year of developer
 effort to replicate what you have already achieved? Who would pay?


1) I haven't achieved what I need: realtime indexing. I can't query the 
current 15 minutes table efficiently. Plus, K*log(N) is not that great when 
you have a lot of K.
2) I'm not suggesting that this is top priority. I'm asking if there's 
something else, other than we don't have time for this, that I don't know. In 
fact, I don't even know if those indexes types would really help in my 
(specific) case. That's why my original question was why aren't there 
developments in this area: I didn't mean to imply someone should do it. I just 
wanted to know if those indexes were already discussed (and maybe dismissed for 
some reason) in the past...



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Kaare Rasmussen

On 2013-10-30 12:08, Simon Riggs wrote:
effort to replicate what you have already achieved? Who would pay? 


The NSA, obviously ;-)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Simon Riggs
On 30 October 2013 11:23, Leonardo Francalanci m_li...@yahoo.it wrote:
 What is the reason for needing such fast access to individual groups
 of records? Sure sounds like the NSA or similar ;-)


 Users need to search all calls originated from/to a user or from/to a 
 specific mobile phone to answer/analyze customers' probl... ok, I give up: I 
 work for the NSA ;)

 In terms of generality, do you think its worth a man year of developer
 effort to replicate what you have already achieved? Who would pay?


 1) I haven't achieved what I need: realtime indexing. I can't query the 
 current 15 minutes table efficiently. Plus, K*log(N) is not that great when 
 you have a lot of K.
 2) I'm not suggesting that this is top priority. I'm asking if there's 
 something else, other than we don't have time for this, that I don't know. 
 In fact, I don't even know if those indexes types would really help in my 
 (specific) case. That's why my original question was why aren't there 
 developments in this area: I didn't mean to imply someone should do it. I 
 just wanted to know if those indexes were already discussed (and maybe 
 dismissed for some reason) in the past...

OK, I understand now.

LSM-trees seem patent free, since open source implementations exist.
The main concept is to partition the index into multiple trees, so
that the current insertion sub-tree fits more easily in memory.  That
sounds good and was exactly the solution I'd come up with as well,
which is a good cross check. It leads to a slow increase in index
response times, but we could offset that by having minimum values on
each subtree and using partitioning logic as with a minmax index.

LSM-tree also covers the goal of maintaining just 2 sub-trees and a
concurrent process of merging sub-trees. That sounds like it would
take a lot of additional time to get right and would need some
off-line process to perform the merge.

Please somebody advise patent status of Y-trees otherwise I wouldn't bother.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
 LSM-trees seem patent free

I'm no expert, and I gave it just a look some time ago: it looked to me very 
complicated to get right... and as far as I remember you don't get that much 
gain, unless you go multi-level which would complicate things further

 Please somebody advise patent status of Y-trees otherwise I wouldn't bother.
 
y-trees look much more easier to get right... (and to me they also make more 
sense, but I'm not skilled enough to judge). 

There's also the FD-tree, which looks a lot like the (patented...) fractal tree:
http://www.ntu.edu.sg/home/bshe/fdtree_pvldb.pdf


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Yann Fontana
On 30 October 2013 11:23, Leonardo Francalanci m_li...@yahoo.it wrote:

  In terms of generality, do you think its worth a man year of developer
  effort to replicate what you have already achieved? Who would pay?


I work on an application that does exactly what Leonardo described. We hit
the exact same problem, and came up with the same exact same solution (down
to the 15 minutes interval). But I have also worked on other various
datamarts (all using Oracle), and they are all subject to this problem in
some form: B-tree indexes slow down bulk data inserts too much and need to
be disabled or dropped and then recreated after the load. In some cases
this is done easily enough, in others it's more complicated (example: every
day, a process imports from 1 million to 1 billion records into a table
partition that may contain from 0 to 1 billion records. To be as efficient
as possible, you need some logic to compare the number of rows to insert to
the number of rows already present, in order to decide whether to drop the
indexes or not).

Basically, my point is that this is a common problem for datawarehouses and
datamarts. In my view, indexes that don't require developers to work around
poor insert performance would be a significant feature in a
datawarehouse-ready DBMS.

Yann


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Merlin Moncure
On Wed, Oct 30, 2013 at 9:23 AM, Leonardo Francalanci m_li...@yahoo.it wrote:
 LSM-trees seem patent free

 I'm no expert, and I gave it just a look some time ago: it looked to me very 
 complicated to get right... and as far as I remember you don't get that much 
 gain, unless you go multi-level which would complicate things further

 Please somebody advise patent status of Y-trees otherwise I wouldn't bother.

 y-trees look much more easier to get right... (and to me they also make more 
 sense, but I'm not skilled enough to judge).

 There's also the FD-tree, which looks a lot like the (patented...) fractal 
 tree:
 http://www.ntu.edu.sg/home/bshe/fdtree_pvldb.pdf

Skimming the white paper, it's clear right from the start that they
make assumptions about the hardware that appear to be already obsolete
-- extremely poor write performance vs read performance of SSD.
Later generation SSDs are increasingly using hardware optimizations to
convert random writes to sequential writes using various tricks.

Point being: hardware is marching along pretty fast (after 20+ years
of stagnation) and it's dangerous (IMO) to make big software
investments based on the situation on the ground *today*.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Jeff Janes
On Wed, Oct 30, 2013 at 3:35 AM, Leonardo Francalanci m_li...@yahoo.itwrote:

  Presumably the data you are inserting isn't actually random. Please
  describe the use case you are considering in more detail and some view
  on how frequent that is, with some examples. Once we understand the
  use case and agree it is important, we might solve problems.


 Collecting calls data for mobile network operators (and no, I don't work
 for the NSA...)
 Easily 5000-1 inserts per second. Indexes in timestamp and ID (not a
 problem, always increasing so no btree issues) and in called #, calling #,
 imsi, imei. The last four obviously are random, out of millions of possible
 values.
 After the few first millions of records, the disks can't keep up with the
 amount of random writing in the indexes.


So, like, 3 minutes worth?  How much RAM and shared_buffers do you have?
 The index insertions should be fast until the size of the active part of
the indexes being inserted into exceeds shared_buffers by some amount (what
that amount is would depend on how much dirty data the kernel is willing to
allow in the page cache before it starts suffering anxiety about it).  If
you have enough shared_buffers to make that last for 15 minutes, then you
shouldn't have a problem inserting with live indexes.

Cheers,

Jeff


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Jeff Janes
On Wed, Oct 30, 2013 at 4:23 AM, Leonardo Francalanci m_li...@yahoo.itwrote:



 1) I haven't achieved what I need: realtime indexing. I can't query the
 current 15 minutes table efficiently. Plus, K*log(N) is not that great
 when you have a lot of K.


Are partitions read-only once time has moved on, or can stragglers show up
that need to be inserted into older partitions?

You could periodically merge older partitions into larger tables, index
those aggregated tables, then transactionally disinherit the old partitions
and inherit the new aggregated one.  This would keep the value of K down,
at the expense of re-writing data multiple times (but all method write data
multiple times, some just hide it from you).



 2) I'm not suggesting that this is top priority. I'm asking if there's
 something else, other than we don't have time for this, that I don't
 know. In fact, I don't even know if those indexes types would really help
 in my (specific) case. That's why my original question was why aren't
 there developments in this area: I didn't mean to imply someone should do
 it. I just wanted to know if those indexes were already discussed (and
 maybe dismissed for some reason) in the past...


There have been several ideas discussed, but not a whole lot of time to
implement them.

By the way, what is the transaction structure of your inserts?  Are they
large batches between commits, or is each row committed?

Cheers,

Jeff


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
Jeff Janes wrote
 The index insertions should be fast until the size of the active part of
 the indexes being inserted into exceeds shared_buffers by some amount
 (what
 that amount is would depend on how much dirty data the kernel is willing
 to
 allow in the page cache before it starts suffering anxiety about it).  If
 you have enough shared_buffers to make that last for 15 minutes, then you
 shouldn't have a problem inserting with live indexes.

Sooner or later you'll have to checkpoint those shared_buffers... and we are
talking about GB of data (my understanding is that we change basically every
btree page, resulting in re-writing of the whole index).






--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5776413.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
Jeff Janes wrote
 Are partitions read-only once time has moved on, or can stragglers show up
 that need to be inserted into older partitions?
 
 You could periodically merge older partitions into larger tables, index
 those aggregated tables, then transactionally disinherit the old
 partitions
 and inherit the new aggregated one.  This would keep the value of K down,
 at the expense of re-writing data multiple times (but all method write
 data
 multiple times, some just hide it from you).

Yes, we could merge the partitions: the idea was to merge them during
night hour, when traffic is low ( and NSA people are sleeping ;) )



Jeff Janes wrote
 By the way, what is the transaction structure of your inserts?  Are they
 large batches between commits, or is each row committed?

Of course large batches (using COPY)




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5776416.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
 Point being: hardware is marching along pretty fast (after 20+ years
 of stagnation) and it's dangerous (IMO) to make big software
 investments based on the situation on the ground *today*.


Yes, that's a good point.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
Jeff Janes wrote
 You could periodically merge older partitions into larger tables, index
 those aggregated tables, then transactionally disinherit the old
 partitions
 and inherit the new aggregated one.  This would keep the value of K down,
 at the expense of re-writing data multiple times (but all method write
 data
 multiple times, some just hide it from you).

Forgot to add:

I thought also that we could:

- use the RAM as tablespace for indexes, and move the indexes later (but
postgresql doesn't handle very well a machine crash in this case... it would
be cool to create an index as recreate on crash...)
- use unlogged tables and turn those to logged to speed up somehow the
insertion; I actually started to write a patch for it, but making it work
for replication was too hard (not that I'm using replication, but it
wouldn't be accepted for wal_level = minimal). But this wouldn't solve the
problem anyway.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5776418.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Jeff Janes
On Wed, Oct 30, 2013 at 9:54 AM, Leonardo Francalanci m_li...@yahoo.itwrote:

 Jeff Janes wrote
  The index insertions should be fast until the size of the active part of
  the indexes being inserted into exceeds shared_buffers by some amount
  (what
  that amount is would depend on how much dirty data the kernel is willing
  to
  allow in the page cache before it starts suffering anxiety about it).  If
  you have enough shared_buffers to make that last for 15 minutes, then you
  shouldn't have a problem inserting with live indexes.

 Sooner or later you'll have to checkpoint those shared_buffers...


True, but that is also true of indexes created in bulk.  It all has to
reach disk eventually--either the checkpointer writes it out and fsyncs it,
or the background writer or user backends writes it out and the checkpoint
fsyncs it.  If bulk creation uses a ring buffer strategy (I don't know if
it does), then it might kick the buffers to kernel in more or less physical
order, which would help the kernel get them to disk in long sequential
writes.  Or not.  I think that this is where sorted checkpoint could really
help.

 and we are
 talking about GB of data (my understanding is that we change basically
every
 btree page, resulting in re-writing of the whole index).

If the checkpoint interval is as long as the partitioning period, then
hopefully the active index buffers get re-dirtied while protected in
shared_buffers, and only get written to disk once.  If the buffers get
read, dirtied, and evicted from a small shared_buffers over and over again
then you are almost guaranteed that will get written to disk multiple times
while they are still hot, unless your kernel is very aggressive about
caching dirty data (which will cause other problems).

Cheers,

Jeff


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Claudio Freire
On Wed, Oct 30, 2013 at 10:53 AM, Simon Riggs si...@2ndquadrant.com wrote:

 LSM-tree also covers the goal of maintaining just 2 sub-trees and a
 concurrent process of merging sub-trees. That sounds like it would
 take a lot of additional time to get right and would need some
 off-line process to perform the merge.



Not necessarily.

Merging means applying insertions/deletions from one subtree to another.
While it's normally preferable and more efficient to do it in batches, I've
successfully implemented in-memory versions that use other writers to
perform the task, amortizing the cost of merging across many operations. In
essence, when there's a need to merge two subtrees, an inserting process
also merges one entry, so slowly trees get merged. That works in-memory
very well, it's quite clear that it's not necessarily generalizable to
external storage, but it's a technique to have in mind.

Alternatively, vacuum could do it. It's quite clearly a vacuuming task
anyway.


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Gavin Flower

On 31/10/13 06:46, Jeff Janes wrote:
On Wed, Oct 30, 2013 at 9:54 AM, Leonardo Francalanci 
m_li...@yahoo.it mailto:m_li...@yahoo.it wrote:


Jeff Janes wrote
 The index insertions should be fast until the size of the active
part of
 the indexes being inserted into exceeds shared_buffers by some
amount
 (what
 that amount is would depend on how much dirty data the kernel is
willing
 to
 allow in the page cache before it starts suffering anxiety about
it).  If
 you have enough shared_buffers to make that last for 15 minutes,
then you
 shouldn't have a problem inserting with live indexes.

Sooner or later you'll have to checkpoint those shared_buffers...


True, but that is also true of indexes created in bulk.  It all has to 
reach disk eventually--either the checkpointer writes it out and 
fsyncs it, or the background writer or user backends writes it out and 
the checkpoint fsyncs it.  If bulk creation uses a ring buffer 
strategy (I don't know if it does), then it might kick the buffers to 
kernel in more or less physical order, which would help the kernel get 
them to disk in long sequential writes.  Or not.  I think that this is 
where sorted checkpoint could really help.


 and we are
 talking about GB of data (my understanding is that we change 
basically every

 btree page, resulting in re-writing of the whole index).

If the checkpoint interval is as long as the partitioning period, then 
hopefully the active index buffers get re-dirtied while protected in 
shared_buffers, and only get written to disk once.  If the buffers get 
read, dirtied, and evicted from a small shared_buffers over and over 
again then you are almost guaranteed that will get written to disk 
multiple times while they are still hot, unless your kernel is very 
aggressive about caching dirty data (which will cause other problems).


Cheers,

Jeff

How about being able to mark indexes:
'MEMORY ONLY' to make them not go to disk
and
'PERSISTENT | TRANSIENT' to mark if they should be recreated on 
machine bootup?


or something similar


Cheers,
Gavin


[HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Leonardo Francalanci
Hi,


I don't see much interest in insert-efficient indexes. These are the ones I've 
found:

- LSM-tree (used by Cassandra and SQLite4?)
- Y-Tree 
(http://www.bossconsulting.com/oracle_dba/white_papers/DW%20in%20oracle/P23%20(ytree%20index%20structure%20for%20DWs).pdf
 )
- Fractal indexes (TokuDB, patented)

While I understand that b*trees are still the best compromise in 
insertion/search speed, disk size, concurrency, and more in general in OLTP 
workloads, they are useless when it comes to insertion in big data tables (50M 
rows) of random values (not ordered values).

I would like to know if the lack of development in this area (not only in 
Postgresql, but in databases in general) is due to:

1) complex implementation
2) poor search performance
3) poor concurrency performance
4) not interesting for most users
5) something else???

I thought this was going to change due to the fast-insertion speeds needs of 
Social Applications, but only TokuDB seems to be the only successful player 
in the area (I don't know how much of it is due to good marketing). Most other 
DB technology claims faster insertion speed (MongoDB and the like...) but in 
the end they rely on the old b*tree + sharding instead of using different 
indexing mechanisms (with the exception of Cassandra).


Thank you in advance

Leonardo



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Craig Ringer
On 10/29/2013 03:53 PM, Leonardo Francalanci wrote:
 5) something else???

Quite likely nobody has had the enthusiasm and interest to implement a
viable, quality implementation and stick with it long enough to get it
committed.

There are a great many good ideas for improvements to Pg that just don't
have the people and time behind them to make them happen.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Tom Lane
Craig Ringer cr...@2ndquadrant.com writes:
 On 10/29/2013 03:53 PM, Leonardo Francalanci wrote:
 5) something else???

 Quite likely nobody has had the enthusiasm and interest to implement a
 viable, quality implementation and stick with it long enough to get it
 committed.

 There are a great many good ideas for improvements to Pg that just don't
 have the people and time behind them to make them happen.

Before getting too excited about some new academic index type, it's worth
noting the sad state in which hash indexes have languished for years.
Nobody's bothered to add WAL support, let alone do any other real work
on them.  The non-btree index types that have been getting love are the
ones that offer the ability to index queries that btree can't.  I think
a new index type whose only benefit is the claim to be faster in a narrow
use-case is likely to end up like hash, not getting used enough to be
properly maintained.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Merlin Moncure
On Tue, Oct 29, 2013 at 2:53 AM, Leonardo Francalanci m_li...@yahoo.it wrote:
 Hi,


 I don't see much interest in insert-efficient indexes. These are the ones 
 I've found:

 - LSM-tree (used by Cassandra and SQLite4?)
 - Y-Tree 
 (http://www.bossconsulting.com/oracle_dba/white_papers/DW%20in%20oracle/P23%20(ytree%20index%20structure%20for%20DWs).pdf
  )
 - Fractal indexes (TokuDB, patented)

 While I understand that b*trees are still the best compromise in 
 insertion/search speed, disk size, concurrency, and more in general in OLTP 
 workloads, they are useless when it comes to insertion in big data tables 
 (50M rows) of random values (not ordered values).

 I would like to know if the lack of development in this area (not only in 
 Postgresql, but in databases in general) is due to:

 1) complex implementation
 2) poor search performance
 3) poor concurrency performance
 4) not interesting for most users
 5) something else???

 I thought this was going to change due to the fast-insertion speeds needs of 
 Social Applications, but only TokuDB seems to be the only successful 
 player in the area (I don't know how much of it is due to good marketing). 
 Most other DB technology claims faster insertion speed (MongoDB and the 
 like...) but in the end they rely on the old b*tree + sharding instead of 
 using different indexing mechanisms (with the exception of Cassandra).

Another point to add: I don't really see btree as a barrier to
performance for most of the problems I face.  The real barriers to
database performance are storage, contention, and query planning.
Postgres btreee indexes are pretty fast and for stuff like bulk
insertions there are some optimization techniques available (such as
sharding or create index concurrently).

Stuff I'd like to see in terms of postgres indexing:
*) faster wal logged hash index
*) composite gist/gin
*) faster gist/gin (to the extent that it's possible).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Leonardo Francalanci
 Before getting too excited about some new academic index type, it's worth
 noting the sad state in which hash indexes have languished for years.
 Nobody's bothered to add WAL support, let alone do any other real work
 on them.  The non-btree index types that have been getting love are the
 ones that offer the ability to index queries that btree can't.  I think
 a new index type whose only benefit is the claim to be faster in a narrow
 use-case is likely to end up like hash, not getting used enough to be
 properly maintained.
             regards, tom lane

Aren't hash indexes in a poor state because they are not faster than btree in 
every condition?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Alvaro Herrera
Leonardo Francalanci wrote:
  Before getting too excited about some new academic index type, it's worth
  noting the sad state in which hash indexes have languished for years.
  Nobody's bothered to add WAL support, let alone do any other real work
  on them.  The non-btree index types that have been getting love are the
  ones that offer the ability to index queries that btree can't.  I think
  a new index type whose only benefit is the claim to be faster in a narrow
  use-case is likely to end up like hash, not getting used enough to be
  properly maintained.
 
 Aren't hash indexes in a poor state because they are not faster than
 btree in every condition?

Chicken and egg.  Maybe they can be made faster than btrees (in some
situations) with enough tweaks, but because there are so many
outstanding problems, no one wants to do the huge amount of legwork to
even get to the point where such tweaks can be made in the first place.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread k...@rice.edu
On Tue, Oct 29, 2013 at 02:53:37PM +, Leonardo Francalanci wrote:
  Before getting too excited about some new academic index type, it's worth
  noting the sad state in which hash indexes have languished for years.
  Nobody's bothered to add WAL support, let alone do any other real work
  on them.  The non-btree index types that have been getting love are the
  ones that offer the ability to index queries that btree can't.  I think
  a new index type whose only benefit is the claim to be faster in a narrow
  use-case is likely to end up like hash, not getting used enough to be
  properly maintained.
              regards, tom lane
 
 Aren't hash indexes in a poor state because they are not faster than btree in 
 every condition?
 

Hi Leonardo,

If there was ONE perfect index, better in every condition, postgres would be
using it. As in everything else, each type has its strengths and weaknesses.
The hash index allows equality searches for very large key lengths using a
relatively very small index size. As has been mentioned before, we still do
not have WAL logging for hash indexes. But even so, for I/O bound systems
hash indexes are twice as fast for searches than the btree equivalent.

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Tom Lane
Leonardo Francalanci m_li...@yahoo.it writes:
 Before getting too excited about some new academic index type, it's worth
 noting the sad state in which hash indexes have languished for years.

 Aren't hash indexes in a poor state because they are not faster than btree in 
 every condition?

They should, in theory, be faster than btrees -- O(1) not O(log N) page
fetches per lookup.  In practice they don't seem to be faster, and
nobody's bothered to find out exactly why.  Again, this isn't a terribly
encouraging precedent for implementing some other index type that's
supposed to (sometimes) be faster than btrees.

None of this is meant to discourage you from trying to write an index
type if you have the time and motivation to pursue it.  Just trying to
answer your question as to why nobody's done it already.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Leonardo Francalanci
 Another point to add: I don't really see btree as a barrier to
 performance for most of the problems I face.  The real barriers to
 database performance are storage, contention, and query planning.

Ehm that's true for regular OLTP stuff, which I understand is what most (95%?) 
of people use/need. But if you try to insert rows into a 50M table with a 
couple of indexes, btrees just can't keep up. 
Of course, you can't have it all: fast at big table insertion, good contention, 
good query times... 

 Postgres btreee indexes are pretty fast and for stuff like bulk
 insertions there are some optimization techniques available (such as
 sharding or create index concurrently).


At the moment I'm relying on partitioning + creating indexes in bulk on 
latest table (the partitioning is based on time). But that means K*log(N) 
search times (where K is the number of partitions).
That's why I gave a look at these different indexing mechanisms.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Leonardo Francalanci
 They should, in theory, be faster than btrees -- O(1) not O(log N) page
 fetches per lookup.  In practice they don't seem to be faster, and
 nobody's bothered to find out exactly why.  Again, this isn't a terribly
 encouraging precedent for implementing some other index type that's
 supposed to (sometimes) be faster than btrees.

Yes, I understand. Which is also why I was curious to know if the claims 
those papers (and the databases using them) make were real...

Thank you everybody for your replies.

Leonardo


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Claudio Freire
On Tue, Oct 29, 2013 at 1:10 PM, Peter Geoghegan p...@heroku.com wrote:

 On Tue, Oct 29, 2013 at 7:53 AM, Leonardo Francalanci m_li...@yahoo.it
 wrote:
  I don't see much interest in insert-efficient indexes.

 Presumably someone will get around to implementing a btree index
 insertion buffer one day. I think that would be a particularly
 compelling optimization for us, because we could avoid ever inserting
 index tuples that are already dead when the deferred insertion
 actually occurs.



Well, that should be relatively easy the way web mining does it (with
inverted indexes).

Have a small (presumably RAM-fitting) staging index where inserts take
place, and regularly dump it into the master index, the rationale being
that by the time you dump it, it'll be more efficient to do many inserts at
once for one, and there will be lots of dead tuples you don't even have to
consider for two.

And when I say relatively easy, I mean it in the sense that it only needs
careful juggling of existing data structures.


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Merlin Moncure
On Tue, Oct 29, 2013 at 10:49 AM, Leonardo Francalanci m_li...@yahoo.it wrote:
 Another point to add: I don't really see btree as a barrier to
 performance for most of the problems I face.  The real barriers to
 database performance are storage, contention, and query planning.

 Ehm that's true for regular OLTP stuff, which I understand is what most 
 (95%?) of people use/need. But if you try to insert rows into a 50M table 
 with a couple of indexes, btrees just can't keep up.
 Of course, you can't have it all: fast at big table insertion, good 
 contention, good query times...

 Postgres btreee indexes are pretty fast and for stuff like bulk
 insertions there are some optimization techniques available (such as
 sharding or create index concurrently).


 At the moment I'm relying on partitioning + creating indexes in bulk on 
 latest table (the partitioning is based on time). But that means K*log(N) 
 search times (where K is the number of partitions).
 That's why I gave a look at these different indexing mechanisms.

I bet you've mis-diagnosed the problem.  Btrees don't have a problem
keeping up with 50m records; you're problem is that after a certain
point your page cache can't keep up with the pseudo-random i/o
patterns and you start seeing faults to storage.  Disk storage is
several order of magnitude slower than memory and thus performance
collapses.   This has nothing to do the btree algorithm except to the
extent it affects i/o patterns.

With the advances in storage over the last several years such that
commodity priced SSD is available I think that all lot of assumptions
under these trade-offs will change.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Peter Geoghegan
On Tue, Oct 29, 2013 at 7:53 AM, Leonardo Francalanci m_li...@yahoo.it wrote:
 I don't see much interest in insert-efficient indexes.

Presumably someone will get around to implementing a btree index
insertion buffer one day. I think that would be a particularly
compelling optimization for us, because we could avoid ever inserting
index tuples that are already dead when the deferred insertion
actually occurs.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Leonardo Francalanci
 I bet you've mis-diagnosed the problem.  Btrees don't have a problem
 keeping up with 50m records; you're problem is that after a certain
 point your page cache can't keep up with the pseudo-random i/o
 patterns and you start seeing faults to storage.
 [...]   This has nothing to do the btree algorithm except to the
 extent it affects i/o patterns.


Of course; that's why those different index types aim to use more sequential 
than random writes.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Jeff Janes
On Tue, Oct 29, 2013 at 8:16 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Leonardo Francalanci m_li...@yahoo.it writes:
  Before getting too excited about some new academic index type, it's
 worth
  noting the sad state in which hash indexes have languished for years.

  Aren't hash indexes in a poor state because they are not faster than
 btree in every condition?

 They should, in theory, be faster than btrees -- O(1) not O(log N) page
 fetches per lookup.


However, all but one or two of those page fetches are almost surely cached,
so if the problem is IO then the benefits are not likely to be seen.



 In practice they don't seem to be faster, and
 nobody's bothered to find out exactly why.


We know why, more or less.  Hash indexes use lmgr locks to protect against
bucket splits conflicting with ordinary operations, and that destroys
performance even in isolation, and destroys it even more in concurrent
situations.

Robert removed the lmgr lock on the meta page by using a retry loop with
lightweight locks.  I've outlined how to remove the heavyweight lock on the
bucket page as well, but it would require an on-disk change to the index so
that each page knows how far the bucket it is in has been split, and it
also might abuse the intention of lightweight locks a bit.  But I'm
reluctant to put much time into that without there being any prospects of
solving the problem of how to WAL bucket splits when buckets can have an
unbounded number of overflow pages.

(Once each page knows its own split level, we could also remove the need
for even a light-weight lock on the metapage for most operations by
stuffing some of the key info from that into the relcache.)

Cheers,

Jeff


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes:
 Robert removed the lmgr lock on the meta page by using a retry loop with
 lightweight locks.  I've outlined how to remove the heavyweight lock on the
 bucket page as well, but it would require an on-disk change to the index so
 that each page knows how far the bucket it is in has been split, and it
 also might abuse the intention of lightweight locks a bit.

FWIW, I don't think that on-disk changes to hash indexes would be a
showstopper problem at this point.  We could force people to reindex them
by means of changing the index version number on the metapage.  The
reindex downtime would be annoying for production situations --- but given
the lack of WAL support, who'd be using one in production anyway?

 But I'm
 reluctant to put much time into that without there being any prospects of
 solving the problem of how to WAL bucket splits when buckets can have an
 unbounded number of overflow pages.

Agreed, if we don't see how to implement WAL logging then it's improbable
they'll ever get to production quality :-(.

ISTM the issue here is that we'd need to acknowledge incompletely-split
buckets as a valid state, no?  But that could be a good thing anyway,
if it'd mean that we don't have to completely lock a bucket while
splitting it.  All the other index types have comparable situations
where a maintenance operation might be only partly done.

Not that I'm volunteering to put any time into this myself.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Simon Riggs
On 29 October 2013 07:53, Leonardo Francalanci m_li...@yahoo.it wrote:

 I don't see much interest in insert-efficient indexes.

Hmm, you realise Alvaro is working on MinMax indexes in this release?
They are very efficient with regard to index inserts and specially
designed for use on large tables.

Prior work by Heikki on Grouped Item Tuples was a way of reducing the
size of indexes, yet still allowing uniqueness checks. That is
implemented in SQLServer already and is very useful.

Your comment about the lack of development in indexes seems counter to
the literature that I've seen. The main problem is people keep
patenting things, making it fairly difficult for everyone.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Leonardo Francalanci
 Hmm, you realise Alvaro is working on MinMax indexes in this release?

 They are very efficient with regard to index inserts and specially
 designed for use on large tables.
 
 Prior work by Heikki on Grouped Item Tuples was a way of reducing the
 size of indexes, yet still allowing uniqueness checks. That is
 implemented in SQLServer already and is very useful.

Ah! Didn't know that!

 Your comment about the lack of development in indexes seems counter to
 the literature that I've seen. The main problem is people keep
 patenting things, making it fairly difficult for everyone.

Mmh, maybe I wasn't clear: I meant lack of development (maybe I should have 
said implementation?) in postgresql and in the other sql databases of the 
fast-insertion indexes you can find in literature.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers