subject:"Re\: \[GENERAL\] Postgres as In\-Memory Database\?"

Re: [GENERAL] Postgres as In-Memory Database?

2014-04-07 Thread Andrew Sullivan

On Tue, Apr 08, 2014 at 12:46:37AM +0200, Stefan Keller wrote:
> Hi Andrew
> 
> 2014-04-07 23:37 GMT+02:00 Andrew Sullivan wrote:
> > (1) this has been discussed many times in the past (...)
> 
> Can you point me to one of these discussions?
> 
> Actually, I browsed once again the mailing list and this is one of the few
> posts I found:
> "In-Memory Columnar Store" 9.12.13 by knizhnik.

I think you have to go back in time further than that.  I recall
in-memory table pinning being a recurrent topic during the 8.x series.
I also seem to recall it being mostly on the hackers list.

> I said, that I'd like to discuss things before I code.

Ok.  I think in the past what has been successful is some basic design
combined with a POC or some such, generally discussed on -hackers
since that's where all the people who really know the back end hang
out.  It seems to me that most of the "built in replication" stuff
that ended up happening worked that way, and that seems to be roughly
similar size of work to this, but I haven't thought about it too much.

> To me it's unclear why design of Postgres should prevent implementation of
> "in-memory tables" e.g. as foreign data wrappers (see e.g. white papers
> for SQL Server mentioned before).

I don't think it does.  But new code in the back end isn't free: it
presents a future maintenance burden that others may not be willing to
pay.  These things always have to be traded off.

Best regards,

A

-- 
Andrew Sullivan
a...@crankycanuck.ca

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgres as In-Memory Database?

2014-04-07 Thread Stefan Keller

Hi Andrew

2014-04-07 23:37 GMT+02:00 Andrew Sullivan wrote:
> (1) this has been discussed many times in the past (...)

Can you point me to one of these discussions?

Actually, I browsed once again the mailing list and this is one of the few
posts I found:
"In-Memory Columnar Store" 9.12.13 by knizhnik.

> (2) nobody seems to be saying, "I have $n to spend on this effort and
> $thesepatches to contribute towards this end along with $thisdesign,"
> but instead to be saying, "It'd be nice if someone else did this work;"

>
Can't see that anybody suggested that. In contrary:
I said, that I'd like to discuss things before I code.

> (3) there _are_ several in-memory-only databases on the
> market, including free-software ones, so it isn't clear what Postgres
> would contribute, especially since its basic design isn't obviously
> amenable to this sort of use.

To me it's unclear why design of Postgres should prevent implementation of
"in-memory tables" e.g. as foreign data wrappers (see e.g. white papers
for SQL Server mentioned before).

Regards,
Stefan


2014-04-07 23:37 GMT+02:00 Andrew Sullivan :

> On Mon, Apr 07, 2014 at 10:43:58PM +0200, Stefan Keller wrote:
> > running out-of-memory [1][2] - i.e. pretty much what has been discussed
> > here - although little bit reluctantly :-)
>
> It is just possible that some of the reluctance is because (1) this
> has been discussed many times in the past, partly with the arguments
> you've already seen in this thread, and with much the same results;
> (2) nobody seems to be saying, "I have $n to spend on this effort and
> $thesepatches to contribute towards this end along with $thisdesign,"
> but instead to be saying, "It'd be nice if someone else did this
> work;" and (3) there _are_ several in-memory-only databases on the
> market, including free-software ones, so it isn't clear what Postgres
> would contribute, especially since its basic design isn't obviously
> amenable to this sort of use.
>
> Best regards,
>
> A
>
> --
> Andrew Sullivan
> a...@crankycanuck.ca
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] Postgres as In-Memory Database?

2014-04-07 Thread Andrew Sullivan

On Mon, Apr 07, 2014 at 10:43:58PM +0200, Stefan Keller wrote:
> running out-of-memory [1][2] - i.e. pretty much what has been discussed
> here - although little bit reluctantly :-)

It is just possible that some of the reluctance is because (1) this
has been discussed many times in the past, partly with the arguments
you've already seen in this thread, and with much the same results;
(2) nobody seems to be saying, "I have $n to spend on this effort and
$thesepatches to contribute towards this end along with $thisdesign,"
but instead to be saying, "It'd be nice if someone else did this
work;" and (3) there _are_ several in-memory-only databases on the
market, including free-software ones, so it isn't clear what Postgres
would contribute, especially since its basic design isn't obviously
amenable to this sort of use.

Best regards,

A

-- 
Andrew Sullivan
a...@crankycanuck.ca

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgres as In-Memory Database?

2014-04-07 Thread Stefan Keller

Hi Hadi, hi all

It makes sense to me to design cstore_fdw for volume of data which is
larger than main memory.

Coming back to my original thread, I'd like to ponder further on what makes
in-memory special - and how to configure or extend Postgres to implement
that.

I found e.g. some brand new functions of SQL Server called
"Memory-optimized tables" which "fully reside in memory and can’t be paged
out", are garbage collected, have special index, are persisting changes
using transaction log and checkpoint streams, and are monitored for not
running out-of-memory [1][2] - i.e. pretty much what has been discussed
here - although little bit reluctantly :-)

Yours, Stefan

[1] "SQL Server In-Memory OLTP Internals Overview for CTP2" (PDF)
http://t.co/T6zToWc6y6
[2] "SQL Server 2014 In-Memory OLTP: Memory Management for Memory-Optimized
Tables"
http://blogs.technet.com/b/dataplatforminsider/archive/2013/11/14/sql-server-2014-in-memory-oltp-memory-management-for-memory-optimized-tables.aspx



2014-04-07 17:40 GMT+02:00 Hadi Moshayedi :

> Hey Stefan,
>
> @Hadi: Can you say something about usage of cstore FDW in-memory?
>>
>>
> We designed cstore_fdw with the applications in mind where volume of data
> is much larger than main memory. In general, columnar stores usually bring
> two benefits:
>
> 1. Doing less disk I/O than row stores. We can skip reading entire columns
> or column blocks that are not related to the given query. This is effective
> when (a) volume of data is larger than main memory so OS cannot cache whole
> dataset, (b) most of our queries only require a small subset of columns to
> complete.
>
> 2. Vector processing and making better use of CPU. This usually helps most
> when data is in memory. If data is in disk and is not cached, I/O cost is
> usually higher than CPU cost, and vector processing may not help much.
>
> cstore_fdw tries to optimize for #1. Also note that because we use
> compression, more data can be cached in memory and chance of hitting disk
> decreases.
>
> But we don't do vector processing yet, and it is not our three month
> timeline.
>
> If you want to be able use more CPU cores in PostgreSQL, you can have a
> look at CitusDB [1] which is built upon PostgreSQL and distributes queries
> to use all cpu cores in a single or more machines.
>
> [1] http://citusdata.com/
>
> -- Hadi
>
>

Re: [GENERAL] Postgres as In-Memory Database?

2014-04-07 Thread Hadi Moshayedi

Hey Stefan,

@Hadi: Can you say something about usage of cstore FDW in-memory?
>
>
We designed cstore_fdw with the applications in mind where volume of data
is much larger than main memory. In general, columnar stores usually bring
two benefits:

1. Doing less disk I/O than row stores. We can skip reading entire columns
or column blocks that are not related to the given query. This is effective
when (a) volume of data is larger than main memory so OS cannot cache whole
dataset, (b) most of our queries only require a small subset of columns to
complete.

2. Vector processing and making better use of CPU. This usually helps most
when data is in memory. If data is in disk and is not cached, I/O cost is
usually higher than CPU cost, and vector processing may not help much.

cstore_fdw tries to optimize for #1. Also note that because we use
compression, more data can be cached in memory and chance of hitting disk
decreases.

But we don't do vector processing yet, and it is not our three month
timeline.

If you want to be able use more CPU cores in PostgreSQL, you can have a
look at CitusDB [1] which is built upon PostgreSQL and distributes queries
to use all cpu cores in a single or more machines.

[1] http://citusdata.com/

-- Hadi

Re: [GENERAL] Postgres as In-Memory Database?

2014-04-07 Thread Stefan Keller

Hi,

I wrote
> Finally the paper is mostly about column stores - nothing about
persistence.

Regarding column store, Hadi wrote 2014-04-03 18:43 GMT+02:00 about the
release of a PostgreSQL Columnar Store called "cstore_fdw" [1]!

@Hadi: Can you say something about usage of cstore FDW in-memory?

Regards, S.

[1] http://citusdata.com/blog/76-postgresql-columnar-store-for-analytics




2014-04-02 0:32 GMT+02:00 Stefan Keller :

> Hi Yeb
>
> Thanks for the pointers.
>
> Of course disk access is not obsolete: As I said, I suppose changes are
> streamed to disk.
>
> When I mentioned "no disk access" I meant the indices of RDBMS which
> designed to handle disk access - which seems to me different in in-memory
> dabases.
>
> The paper referred by you is coming from SAP's chief scientist and it
> confirms actually my claim, that there's no need for a primary index since
> the primary attribute (i.e. all attributes) is already kept sorted
> in-memory.
>
> It also mentions an insert-only technique: "This approach has been adopted
> before in POSTGRES [21] in 1987 and was called "time-travel".
> I would be interested what "time-travel" is and if this is still used by
> Postgres.
> Finally the paper is mostly about column stores - nothing about
> persistence. In mentions Disaster recovery" in the last section about
> future work, though.
>
> -S.
>
>
>
>
> 2014-04-01 21:57 GMT+02:00 Yeb Havinga :
>
>  On 2014-04-01 04:20, Jeff Janes wrote:
>>
>> On Sunday, March 30, 2014, Stefan Keller  wrote:
>>
>>>  Hi Jeff
>>>
>>> 2013/11/20 Jeff Janes 
>>>

  I don't know what you mean about enhancements in the buffer pool.
  For an in-memory database, there shouldn't be a buffer pool in the first
 place, as it is *all* in memory.

>>>
>>>  You are right: In-memory DBs are making buffer-pooling obsolete -
>>> except for making data persistent (see below).
>>>
>>
>>
>>  I would be very reluctant to use any database engine which considered
>> disk access obsolete.
>>
>>
>> The disk is not obsolete but something called 'anti-caching' is used:
>> http://www.vldb.org/pvldb/vol6/p1942-debrabant.pdf
>>
>>
>>
>>
>>   Are there any show cases out there?
>>>
>>
>>  What did the HANA users have to say?  Seems like they would be in the
>> best position to provide the test cases.
>>
>>
>> This paper provides some insights into the research behind HANA
>> http://www.sigmod09.org/images/sigmod1ktp-plattner.pdf
>>
>> regards
>> Yeb
>>
>>
>

Re: [GENERAL] Postgres as In-Memory Database?

2014-04-07 Thread Stefan Keller

Hi Florian

Thanks for the remark. I've seen recently somebody from the "core" team
(was it at PgCon Rev Meeting [1] or a blog) mentioning it meaning to revive
it?

Yours, Stefan

[1] http://wiki.postgresql.org/wiki/PgCon_2013_Developer_Meeting



2014-04-07 8:15 GMT+02:00 Florian Weimer :

> On 04/02/2014 12:32 AM, Stefan Keller wrote:
>
>  It also mentions an insert-only technique: "This approach has been
>> adopted before in POSTGRES [21] in 1987 and was called "time-travel".
>> I would be interested what "time-travel" is and if this is still used by
>> Postgres.
>>
>
> Back in the old days, PostgreSQL never deleted any tuples.  Rows were
> deleted by writing the deletion time into a column.  As a result, you could
> go back to old data just by telling PostgreSQL to report rows which where
> visible at a given time.
>
> Obviously, this approach precluded use of PostgreSQL in many scenarios.
>  For example, you wouldn't want to use it as your web application session
> store.
>
> --
> Florian Weimer / Red Hat Product Security Team
>

Re: [GENERAL] Postgres as In-Memory Database?

2014-04-07 Thread Florian Weimer


On 04/02/2014 12:32 AM, Stefan Keller wrote:


It also mentions an insert-only technique: "This approach has been
adopted before in POSTGRES [21] in 1987 and was called "time-travel".
I would be interested what "time-travel" is and if this is still used by
Postgres.


Back in the old days, PostgreSQL never deleted any tuples.  Rows were 
deleted by writing the deletion time into a column.  As a result, you 
could go back to old data just by telling PostgreSQL to report rows 
which where visible at a given time.


Obviously, this approach precluded use of PostgreSQL in many scenarios. 
 For example, you wouldn't want to use it as your web application 
session store.


--
Florian Weimer / Red Hat Product Security Team


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgres as In-Memory Database?

2014-04-01 Thread Stefan Keller

Hi Yeb

Thanks for the pointers.

Of course disk access is not obsolete: As I said, I suppose changes are
streamed to disk.

When I mentioned "no disk access" I meant the indices of RDBMS which
designed to handle disk access - which seems to me different in in-memory
dabases.

The paper referred by you is coming from SAP's chief scientist and it
confirms actually my claim, that there's no need for a primary index since
the primary attribute (i.e. all attributes) is already kept sorted
in-memory.

It also mentions an insert-only technique: "This approach has been adopted
before in POSTGRES [21] in 1987 and was called "time-travel".
I would be interested what "time-travel" is and if this is still used by
Postgres.
Finally the paper is mostly about column stores - nothing about
persistence. In mentions Disaster recovery" in the last section about
future work, though.

-S.

2014-04-01 21:57 GMT+02:00 Yeb Havinga :

>  On 2014-04-01 04:20, Jeff Janes wrote:
>
> On Sunday, March 30, 2014, Stefan Keller  wrote:
>
>>  Hi Jeff
>>
>> 2013/11/20 Jeff Janes 
>>
>>>
>>>  I don't know what you mean about enhancements in the buffer pool.  For
>>> an in-memory database, there shouldn't be a buffer pool in the first place,
>>> as it is *all* in memory.
>>>
>>
>>  You are right: In-memory DBs are making buffer-pooling obsolete -
>> except for making data persistent (see below).
>>
>
>
>  I would be very reluctant to use any database engine which considered
> disk access obsolete.
>
>
> The disk is not obsolete but something called 'anti-caching' is used:
> http://www.vldb.org/pvldb/vol6/p1942-debrabant.pdf
>
>
>
>
>   Are there any show cases out there?
>>
>
>  What did the HANA users have to say?  Seems like they would be in the
> best position to provide the test cases.
>
>
> This paper provides some insights into the research behind HANA
> http://www.sigmod09.org/images/sigmod1ktp-plattner.pdf
>
> regards
> Yeb
>
>

Re: [GENERAL] Postgres as In-Memory Database?

2014-04-01 Thread Yeb Havinga


On 2014-04-01 04:20, Jeff Janes wrote:
On Sunday, March 30, 2014, Stefan Keller > wrote:


Hi Jeff

2013/11/20 Jeff Janes >


I don't know what you mean about enhancements in the buffer
pool.  For an in-memory database, there shouldn't be a buffer
pool in the first place, as it is *all* in memory.


You are right: In-memory DBs are making buffer-pooling obsolete -
except for making data persistent (see below).



I would be very reluctant to use any database engine which considered 
disk access obsolete.


The disk is not obsolete but something called 'anti-caching' is used:
http://www.vldb.org/pvldb/vol6/p1942-debrabant.pdf


Are there any show cases out there?


What did the HANA users have to say?  Seems like they would be in the 
best position to provide the test cases.


This paper provides some insights into the research behind HANA 
http://www.sigmod09.org/images/sigmod1ktp-plattner.pdf


regards
Yeb

Re: [GENERAL] Postgres as In-Memory Database?

2014-04-01 Thread Stefan Keller

Hi Jeff

I agree with most of your statements.

2014-04-01 4:20 GMT+02:00 Jeff Janes :

> On Sunday, March 30, 2014, Stefan Keller  wrote:
>
>> Hi Jeff
>>
>>
>> 2013/11/20 Jeff Janes 
>>
>>>
>>> I don't know what you mean about enhancements in the buffer pool.  For
>>> an in-memory database, there shouldn't be a buffer pool in the first place,
>>> as it is *all* in memory.
>>>
>>
>> You are right: In-memory DBs are making buffer-pooling obsolete - except
>> for making data persistent (see below).
>>
>
> I would be very reluctant to use any database engine which considered disk
> access obsolete.  There will always be a risk where data grows to exceed
> RAM, and where it would be inconvenient to expand RAM fast enough to
> accommodate it.  I've played those games enough with Perl and C in-memory
> systems.  You fight and squeeze to fit the data into RAM, then the data
> size grows 3% and all of our work is for naught.  You can buy more RAM, if
> you have the budget, and the RAM isn't back-ordered for 3 months because
> the factory that makes it had a fire, and if more RAM fits on your
> motherboard, and
>

I think there are some use cases where you can plan and foresee how data
increases.


>
>
>> >
>> > Do you know why it is slow?  I'd give high odds that it would be a
>> specific implementation detail in
>> > the code that is suboptimal, or maybe a design decision of PostGIS,
>> rather than some high level
>> > architectural decision of PostgreSQL.
>>
>> Referring to the application is something you can always say - but
>> shouldn't prevent on enhancing Postgres.
>>
>>
>> Postgres has been enhanced.  Now we need to change osm2pgsql to take
>> advantage of them.  It defines indexes on the tables that are going to be
>> bulk loaded with COPY, which defeats some recent optimizations made to
>> COPY.  The creation of the indexes should be delayed until after the bulk
>> load is done.
>>
>> A further enhancement to Postgres would be would be to automatically
>> defer creation of the indexes when a table is truncated or created within a
>> transaction, so that users get the benefit of the improvement
>>
>> These enhancements to osm2pgsql seem to be reasonable to me. I hope
>> somebody has time to care about.
>>
>
> I have a fork of osm2pgsql on github which delays the index build until
> the COPY is done.  I'm not really motivated to convince anyone to merge it
> (as my interest is postgresql not osm itself), but if someone wants to pick
> it up, that is fine with me.  It helps somewhat, but it is not a
> game-changer because there are other bigger bottlenecks, at least for HDD
> based systems.
>
> One of the bigger bottlenecks is building the GIN indexes on the way table
> at the end.  Setting maintenance_work_mem to huge values helps a lot, if
> you can find a safe setting for it considering multiple index builds it
> might be doing (at that point in the load, osm2pgsql's node cache has been
> released, so there is substantial RAM to re-purpose).  It would be better
> for this use if PostgreSQL built the index by using an external sort,
> rather than iterating over the table building maintenance_work_mem sized
> chunks of red-black trees.  The problem there is that osm uses the gin
> index in an odd way (the vast majority of nodes occur in exactly one way,
> with a minority occurring in more than one), and using a disk sort might
> not be ideal for the more common use cases where GIN is used, where a given
> token usually occurs in far more than one document.  So an improvement that
> only improves osm2pgsql and degrades other uses is unlikely to be adopted.
>
> Another bottleneck is just the raw COPY into the node table.  When that is
> running against an unindexed table which was created in the same
> transaction, I see that osm2pgsql takes about 50% of a CPU to print a
> copy-stream, and postgresql uses about 50% of a CPU to parse that stream
> and insert into the table.  So they add up to about 1 CPU despite the fact
> this a is multiple CPU machine.  So they seem to be playing ping-pong with
> the pipe buffer when in theory they should each by able to run at almost
> full speed.  I don't know how to get it stop playing ping-pong, but I have
> other use cases where this shows up, so trade-off-free solution would be
> nifty.  I suspect that that is more of a kernel issue than either
> postgresql or osm2pgsql.
>
> You could do the COPY in parallel in multiple threads, but the problem
> there is you can't use the "created in same transaction" optimization to
> avoid WAL overhead.  There is no fix to this without changing PostgreSQL to
> accommodate it, but i have no clear idea how one would do that.  Importing
> a snapshot doesn't seem like it would be enough, as you can only import
> snapshots for reading, not for writing.
>
> Also, I'm not too sure how seriously to take the goal of optimizing
> osm2pgsql.  Development on it seems to be less than vigorous.  And its
> purpose is to create a database to b

Re: [GENERAL] Postgres as In-Memory Database?

2014-03-31 Thread Alban Hertroys

On 01 Apr 2014, at 4:20, Jeff Janes  wrote:

> On Sunday, March 30, 2014, Stefan Keller  wrote:
> There seem to be two main things which make in-memory dbs special: 
> 1. Index: Having all data in memory there is no need for a default index. A 
> full-table scan "suddenly" becomes the default.
> 
> Surely not.  I would say that full table scans are *already* the default, 
> deviated from only if it thinks an index seems to be better.  If you don't 
> make an index, it can't seem to be better.  And I don't know of any 
> higher-level in-memory language which fails to provide a way to do efficient 
> searching into an in-memory structure, usually in the form of hash tables or 
> balanced trees.  If "let's seq scan everything as long as it is already in 
> memory" is a good idea, why would Perl, Java, Python, etc. (not to mention C 
> libraries and the source code of PostgreSQL itself) provide ways to do 
> efficient searches in memory?
> 
> The caveat here is you can't make certain constraints without an index.  In 
> theory you *could* have an unique constraint without an index to support it, 
> but if it were implemented you would probably rapidly learn that you don't 
> actually want to do that.

That’s what I thought initially reading that line as well, but that’s not 
necessarily true for an in-memory database.

The keyword here is “default index”. The reasoning is probably along the lines 
that if all your data is in-memory, then keeping it all sorted on the primary 
column(s) is relatively cheap (compared to doing so on disk). You could even 
split off the primary column(s) from the rest of the records and link back to 
them using pointers, which makes it easier to keep them in a (somewhat) 
balanced btree.

When you do that, the data effectively becomes the primary key index and would 
theoretically also be usable as the primary key constraint. So you _do_ have an 
index, but it’s not a separate one; it’s part of your data. That frees up 
memory that you do not need to preserve for an index, which is probably rather 
a gain for an in-memory database.

Apparently this works for HANA, but for an MVCC database like Postgres that’s a 
little more involved.

Of course, with databases like that you want them replicated on different 
hardware and kernels to reduce data-loss risks. And that brings us closer to 
what ISTR Stonebreaker is doing these days.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgres as In-Memory Database?

2014-03-31 Thread Jeff Janes

On Sunday, March 30, 2014, Stefan Keller  wrote:

> Hi Jeff
>
> 2013/11/20 Jeff Janes 
> 
> >
>
>>
>> I don't know what you mean about enhancements in the buffer pool.  For an
>> in-memory database, there shouldn't be a buffer pool in the first place, as
>> it is *all* in memory.
>>
>
> You are right: In-memory DBs are making buffer-pooling obsolete - except
> for making data persistent (see below).
>

I would be very reluctant to use any database engine which considered disk
access obsolete.  There will always be a risk where data grows to exceed
RAM, and where it would be inconvenient to expand RAM fast enough to
accommodate it.  I've played those games enough with Perl and C in-memory
systems.  You fight and squeeze to fit the data into RAM, then the data
size grows 3% and all of our work is for naught.  You can buy more RAM, if
you have the budget, and the RAM isn't back-ordered for 3 months because
the factory that makes it had a fire, and if more RAM fits on your
motherboard, and

> >
> > Do you know why it is slow?  I'd give high odds that it would be a
> specific implementation detail in
> > the code that is suboptimal, or maybe a design decision of PostGIS,
> rather than some high level
> > architectural decision of PostgreSQL.
>
> Referring to the application is something you can always say - but
> shouldn't prevent on enhancing Postgres.
>
>
> Postgres has been enhanced.  Now we need to change osm2pgsql to take
> advantage of them.  It defines indexes on the tables that are going to be
> bulk loaded with COPY, which defeats some recent optimizations made to
> COPY.  The creation of the indexes should be delayed until after the bulk
> load is done.
>
> A further enhancement to Postgres would be would be to automatically defer
> creation of the indexes when a table is truncated or created within a
> transaction, so that users get the benefit of the improvement
>
> These enhancements to osm2pgsql seem to be reasonable to me. I hope
> somebody has time to care about.
>

I have a fork of osm2pgsql on github which delays the index build until the
COPY is done.  I'm not really motivated to convince anyone to merge it (as
my interest is postgresql not osm itself), but if someone wants to pick it
up, that is fine with me.  It helps somewhat, but it is not a game-changer
because there are other bigger bottlenecks, at least for HDD based systems.

One of the bigger bottlenecks is building the GIN indexes on the way table
at the end.  Setting maintenance_work_mem to huge values helps a lot, if
you can find a safe setting for it considering multiple index builds it
might be doing (at that point in the load, osm2pgsql's node cache has been
released, so there is substantial RAM to re-purpose).  It would be better
for this use if PostgreSQL built the index by using an external sort,
rather than iterating over the table building maintenance_work_mem sized
chunks of red-black trees.  The problem there is that osm uses the gin
index in an odd way (the vast majority of nodes occur in exactly one way,
with a minority occurring in more than one), and using a disk sort might
not be ideal for the more common use cases where GIN is used, where a given
token usually occurs in far more than one document.  So an improvement that
only improves osm2pgsql and degrades other uses is unlikely to be adopted.

Another bottleneck is just the raw COPY into the node table.  When that is
running against an unindexed table which was created in the same
transaction, I see that osm2pgsql takes about 50% of a CPU to print a
copy-stream, and postgresql uses about 50% of a CPU to parse that stream
and insert into the table.  So they add up to about 1 CPU despite the fact
this a is multiple CPU machine.  So they seem to be playing ping-pong with
the pipe buffer when in theory they should each by able to run at almost
full speed.  I don't know how to get it stop playing ping-pong, but I have
other use cases where this shows up, so trade-off-free solution would be
nifty.  I suspect that that is more of a kernel issue than either
postgresql or osm2pgsql.

You could do the COPY in parallel in multiple threads, but the problem
there is you can't use the "created in same transaction" optimization to
avoid WAL overhead.  There is no fix to this without changing PostgreSQL to
accommodate it, but i have no clear idea how one would do that.  Importing
a snapshot doesn't seem like it would be enough, as you can only import
snapshots for reading, not for writing.

Also, I'm not too sure how seriously to take the goal of optimizing
osm2pgsql.  Development on it seems to be less than vigorous.  And its
purpose is to create a database to be used, so wouldn't it make more sense
to optimize the use, not the creation?  And if you do want to optimize the
creation, the obvious way to do it so to create the export in a way more
closely aligned to that need, rather than a generic export.

> In the meantime I discussed with HANA users an thought about what

Re: [GENERAL] Postgres as In-Memory Database?

2014-03-30 Thread Stefan Keller

Hi Jeff

2013/11/20 Jeff Janes 

> On Mon, Nov 18, 2013 at 5:39 PM, Stefan Keller  wrote:
>
>> Hi Jeff and Martin
>>
>> On 18. November 2013 17:44 Jeff Janes  wrote:
>> > I rather doubt that.  All the bottlenecks I know about for well cached
>> read-only workloads are around
>> > locking for in-memory concurrency protection, and have little or
>> nothing to do with secondary storage.
>>
>> Interesting point. But I think this is only partially the case - as
>> Stonebraker asserts [1]. While I don't see how to speed-up locking (and
>> latching), AFAIK there is quite some room for enhancement in buffer pooling
>> (see also [2]). Especially in GIS environments there are heavy calculations
>> and random access operations - so buffer pool will play a role.
>>
>
> I don't know what you mean about enhancements in the buffer pool.  For an
> in-memory database, there shouldn't be a buffer pool in the first place, as
> it is *all* in memory.
>

You are right: In-memory DBs are making buffer-pooling obsolete - except
for making data persistent (see below).


> But anyway, it seems to me like the imported database size for the
> OpenStreetMap is going to be about 500GB (extrapolating from Antarctica,
> the only file I could download within a reasonable time), and none of the
> servers listed in the benchmark had anywhere near that amount of memory, so
> that has little chance of working as an in-memory database.
>
> Improvements made for just in-memory situations suddenly become worthless
> if your data grows 10% larger and no longer fits in memory, while
> improvements that work for everyone work for everyone.  There is a place
> for both, but it is not surprising that a project to make general-purpose
> tools spends more time on the latter than the former.
>
> To Martin: Stonebraker explicitly supports my hypothesis that in-memory
>> databases become prevalent in the future and that "elephants" will be
>> challenged if they don't adapt to new architectures, like in-memory and
>> column stores.
>>
>
> This would be more persuasive if there were impressive osm2vertica or
> osm2oracle benchmarking results to show.  Otherwise the claims just look
> like commercial marketing material to me.  I'm not saying there are no
> improvements to be made, but the way to make them is to figure out where
> the bottlenecks are, not read other people's advertisements and chase them.
>
> The specific use case here is a PostGIS query of an OpenStreetMap data of
>> the whole world (see [3]).
>>
>> On 2013/11/18 Jeff Janes  wrote:
>>
>>> >> On Sun, Nov 17, 2013 at 4:02 PM, Stefan Keller 
>>  wrote:
>> >> BTW: Having said (to Martijn) that using Postgres is probably more
>> efficient, than programming an in-memory
>> > database in a decent language: OpenStreetMap has a very, very large
>> Node table which is heavily
>> > used by other tables (like ways) - and becomes rather slow in Postgres.
>> >
>> > Do you know why it is slow?  I'd give high odds that it would be a
>> specific implementation detail in
>> > the code that is suboptimal, or maybe a design decision of PostGIS,
>> rather than some high level
>> > architectural decision of PostgreSQL.
>>
>> Referring to the application is something you can always say - but
>> shouldn't prevent on enhancing Postgres.
>>
>
> Postgres has been enhanced.  Now we need to change osm2pgsql to take
> advantage of them.  It defines indexes on the tables that are going to be
> bulk loaded with COPY, which defeats some recent optimizations made to
> COPY.  The creation of the indexes should be delayed until after the bulk
> load is done.
>
> A further enhancement to Postgres would be would be to automatically defer
> creation of the indexes when a table is truncated or created within a
> transaction, so that users get the benefit of the improvement even without
> taking special steps.  But that would be quite a bit of work, and would
> probably also change user-visible behavior, in that violations of unique
> constraints would not be detected until later in the process.
>

These enhancements to osm2pgsql seem to be reasonable to me. I hope
somebody has time to care about.

In the meantime I discussed with HANA users an thought about what makes
in-memory dbs special and how to configure Postgres to be an in-memory db.

There seem to be two main things which make in-memory dbs special:
1. Index: Having all data in memory there is no need for a default index. A
full-table scan "suddenly" becomes the default.
2. Persistence: All updates/changes to the db are streamed to disk.
3. An increase of data simply needs to be compensated with more memory
(since memory became cheap).

AFAIK one can configure Postgres for all these properties: One can force
the use of full-table scan(?) and replicate to disk(?).

What do you think about this? Are there any show cases out there?

-S.

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-20 Thread Jeff Janes

On Mon, Nov 18, 2013 at 5:39 PM, Stefan Keller  wrote:

> Hi Jeff and Martin
>
> On 18. November 2013 17:44 Jeff Janes  wrote:
> > I rather doubt that.  All the bottlenecks I know about for well cached
> read-only workloads are around
> > locking for in-memory concurrency protection, and have little or nothing
> to do with secondary storage.
>
> Interesting point. But I think this is only partially the case - as
> Stonebraker asserts [1]. While I don't see how to speed-up locking (and
> latching), AFAIK there is quite some room for enhancement in buffer pooling
> (see also [2]). Especially in GIS environments there are heavy calculations
> and random access operations - so buffer pool will play a role.
>

I don't know what you mean about enhancements in the buffer pool.  For an
in-memory database, there shouldn't be a buffer pool in the first place, as
it is *all* in memory.  But anyway, it seems to me like the imported
database size for the OpenStreetMap is going to be about 500GB
(extrapolating from Antarctica, the only file I could download within a
reasonable time), and none of the servers listed in the benchmark had
anywhere near that amount of memory, so that has little chance of working
as an in-memory database.

Improvements made for just in-memory situations suddenly become worthless
if your data grows 10% larger and no longer fits in memory, while
improvements that work for everyone work for everyone.  There is a place
for both, but it is not surprising that a project to make general-purpose
tools spends more time on the latter than the former.

To Martin: Stonebraker explicitly supports my hypothesis that in-memory
> databases become prevalent in the future and that "elephants" will be
> challenged if they don't adapt to new architectures, like in-memory and
> column stores.
>

This would be more persuasive if there were impressive osm2vertica or
osm2oracle benchmarking results to show.  Otherwise the claims just look
like commercial marketing material to me.  I'm not saying there are no
improvements to be made, but the way to make them is to figure out where
the bottlenecks are, not read other people's advertisements and chase them.

The specific use case here is a PostGIS query of an OpenStreetMap data of
> the whole world (see [3]).
>
> On 2013/11/18 Jeff Janes  wrote:
>
>> >> On Sun, Nov 17, 2013 at 4:02 PM, Stefan Keller 
>  wrote:
> >> BTW: Having said (to Martijn) that using Postgres is probably more
> efficient, than programming an in-memory
> > database in a decent language: OpenStreetMap has a very, very large Node
> table which is heavily
> > used by other tables (like ways) - and becomes rather slow in Postgres.
> >
> > Do you know why it is slow?  I'd give high odds that it would be a
> specific implementation detail in
> > the code that is suboptimal, or maybe a design decision of PostGIS,
> rather than some high level
> > architectural decision of PostgreSQL.
>
> Referring to the application is something you can always say - but
> shouldn't prevent on enhancing Postgres.
>

Postgres has been enhanced.  Now we need to change osm2pgsql to take
advantage of them.  It defines indexes on the tables that are going to be
bulk loaded with COPY, which defeats some recent optimizations made to
COPY.  The creation of the indexes should be delayed until after the bulk
load is done.

A further enhancement to Postgres would be would be to automatically defer
creation of the indexes when a table is truncated or created within a
transaction, so that users get the benefit of the improvement even without
taking special steps.  But that would be quite a bit of work, and would
probably also change user-visible behavior, in that violations of unique
constraints would not be detected until later in the process.

Cheers,

Jeff

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-20 Thread Stefan Keller

Hi Bruce

2013/11/20 Bruce Momjian 

> On Sun, Nov 17, 2013 at 09:00:05PM +0900, Michael Paquier wrote:
> > On Sun, Nov 17, 2013 at 8:25 PM, Stefan Keller 
> wrote:
> > > How can Postgres be used and configured as an In-Memory Database?
> > >
> > > Does anybody know of thoughts or presentations about this "NoSQL
> feature" -
> > > beyond e.g. "Perspectives on NoSQL" from Gavin Roy at PGCon 2010)?
> > >
> > > Given, say 128 GB memory or more, and (read-mostly) data that fit's
> into
> > > this, what are the hints to optimize Postgres (postgresql.conf etc.)?
> > In this case as you are trading system safety (system will not be
> > crash-safe) for performance... The following parameters would be
> > suited:
> > - Improve performance by reducing the amount of data flushed:
> > fsync = off
> > synchronous_commit=off
> > - Reduce the size of WALs:
> > full_page_writes = off
> > - Disable the background writer:
> > bgwriter_lru_maxpages = 0
> > Regards,
>
> FYI, the Postgres manual covers non-durability settings:
>
> http://www.postgresql.org/docs/9.3/static/non-durability.html


Thanks for the hint. On 17. November 2013 22:26 I referred to the same
document page.
Aside config params it is suggested to use memory-backed file system (i.e.
RAM disk).
But what I am interested in, is how Postgres can be functionally enhanced
given the dataset fits into (some big) memory!
Being aware and assured that the dataset is in-memory, does'nt this lead to
significant speed up, like Stonebraker, Ora and SAP affirm?

-S.

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-20 Thread Jeff Janes

On Tue, Nov 19, 2013 at 7:41 PM, Edson Richter wrote:
>
>
>  Ok, I still have one doubt (I'm learning a lot, tkx!):
>
> What happens, then, if data has been commited (so it is in xlog), but it
> is not in data pages yet, and it doesn't fit in memory buffers anymore: how
> would PostgreSQL query data without having to wait for checkpoint happend
> and data be available in data pages?
>

PostgreSQL never just drops a dirty page from the buffers, unless the
object it is part of was dropped (or the system crashes, in which case it
has to go through recovery).  Rather it would first evict the dirty page by
writing it to the kernel (which in turn will write it to disk, eventually),
at which point it is the kernel's responsibility to send the correct data
back upon request when it is later needed again--either by fetching it from
its own cache if it is still there or by reading it from disk.

Cheers,

Jeff

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-20 Thread Bruce Momjian

On Sun, Nov 17, 2013 at 09:00:05PM +0900, Michael Paquier wrote:
> On Sun, Nov 17, 2013 at 8:25 PM, Stefan Keller  wrote:
> > How can Postgres be used and configured as an In-Memory Database?
> >
> > Does anybody know of thoughts or presentations about this "NoSQL feature" -
> > beyond e.g. "Perspectives on NoSQL" from Gavin Roy at PGCon 2010)?
> >
> > Given, say 128 GB memory or more, and (read-mostly) data that fit's into
> > this, what are the hints to optimize Postgres (postgresql.conf etc.)?
> In this case as you are trading system safety (system will not be
> crash-safe) for performance... The following parameters would be
> suited:
> - Improve performance by reducing the amount of data flushed:
> fsync = off
> synchronous_commit=off
> - Reduce the size of WALs:
> full_page_writes = off
> - Disable the background writer:
> bgwriter_lru_maxpages = 0
> Regards,

FYI, the Postgres manual covers non-durability settings:

http://www.postgresql.org/docs/9.3/static/non-durability.html

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-19 Thread Edson Richter


Em 20/11/2013 01:30, Jeff Janes escreveu:



On Tuesday, November 19, 2013, Edson Richter wrote:

Em 19/11/2013 22:29, Jeff Janes escreveu:

On Sun, Nov 17, 2013 at 4:46 PM, Edson Richter
> wrote:

Yes, those optimizations I was talking about: having database
server store transaction log in high speed solid state disks
and consider it done while background thread will update data
in slower disks...

There is no reason to wait for fsync in slow disks to
guarantee consistency... If database server crashes, then it
just need to "redo" log transactions from fast disk into
slower data storage and database server is ready to go (I
think this is Sybase/MS SQL strategy for years).



Using a nonvolatile write cache for pg_xlog is certainly possible
and often done with PostgreSQL.  It is not important that the
nonvolatile write cache is fronting for SSD, fronting for HDD is
fine as the write cache turns the xlog into pure sequential
writes and HDD should not have a problem keeping up.

Cheers,

Jeff

Hum... I agree about the tecnology (SSD x HDD, etc) - but may be I
misunderstood, but I have read that to keep always safe data, I
must use fsync, and as result every transaction must wait for data
to be written in disk before returning as success.


A transaction must wait for the *xlog* to fsynced to "disk", but 
non-volatile write cache counts as disk.  It does not need to wait for 
the ordinary data files to be fsynced.  Checkpoints do need to wait 
for the ordinary data files to be fsynced, but the checkpoint process 
is a background process and it can wait for that without impeding user 
processes.


If the checkpointer falls far enough behind, then things do start to 
fall apart, but I think that this is true of any system. So you can't 
just get get a BBU for the xlog and ignore all other IO 
entirely--eventually the other data does need to reach disk, and if it 
gets dirtied faster than it gets cleaned for a prolonged period then 
things will freeze up.


By using the approach I've described you will have fsync (and data
will be 100% safe), but transaction is considered success once
written in the transaction log that is pure sequencial (and even
pre-allocated space, without need to ask OS for new files or new
space) - and also no need to wait for slow operations to write
data in data pages.

Am I wrong?


No user-facing process needs to wait for the data pages to fsync, 
unless things have really gotten fouled up.


Cheers,

Jeff

Ok, I still have one doubt (I'm learning a lot, tkx!):

What happens, then, if data has been commited (so it is in xlog), but it 
is not in data pages yet, and it doesn't fit in memory buffers anymore: 
how would PostgreSQL query data without having to wait for checkpoint 
happend and data be available in data pages?


Regards,

Edson

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-19 Thread Jeff Janes

On Tuesday, November 19, 2013, Edson Richter wrote:

>  Em 19/11/2013 22:29, Jeff Janes escreveu:
>
>  On Sun, Nov 17, 2013 at 4:46 PM, Edson Richter 
>  'edsonrich...@hotmail.com');>
> > wrote:
>
>
>>  Yes, those optimizations I was talking about: having database server
>> store transaction log in high speed solid state disks and consider it done
>> while background thread will update data in slower disks...
>>
>> There is no reason to wait for fsync in slow disks to guarantee
>> consistency... If database server crashes, then it just need to "redo" log
>> transactions from fast disk into slower data storage and database server is
>> ready to go (I think this is Sybase/MS SQL strategy for years).
>>
>
>
>  Using a nonvolatile write cache for pg_xlog is certainly possible and
> often done with PostgreSQL.  It is not important that the nonvolatile write
> cache is fronting for SSD, fronting for HDD is fine as the write cache
> turns the xlog into pure sequential writes and HDD should not have a
> problem keeping up.
>
>  Cheers,
>
>  Jeff
>
> Hum... I agree about the tecnology (SSD x HDD, etc) - but may be I
> misunderstood, but I have read that to keep always safe data, I must use
> fsync, and as result every transaction must wait for data to be written in
> disk before returning as success.
>

A transaction must wait for the *xlog* to fsynced to "disk", but
non-volatile write cache counts as disk.  It does not need to wait for the
ordinary data files to be fsynced.  Checkpoints do need to wait for the
ordinary data files to be fsynced, but the checkpoint process is a
background process and it can wait for that without impeding user processes.

If the checkpointer falls far enough behind, then things do start to fall
apart, but I think that this is true of any system. So you can't just get
get a BBU for the xlog and ignore all other IO entirely--eventually the
other data does need to reach disk, and if it gets dirtied faster than it
gets cleaned for a prolonged period then things will freeze up.

> By using the approach I've described you will have fsync (and data will be
> 100% safe), but transaction is considered success once written in the
> transaction log that is pure sequencial (and even pre-allocated space,
> without need to ask OS for new files or new space) - and also no need to
> wait for slow operations to write data in data pages.
>
> Am I wrong?
>

No user-facing process needs to wait for the data pages to fsync, unless
things have really gotten fouled up.

Cheers,

Jeff

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-19 Thread Edson Richter


Em 19/11/2013 22:29, Jeff Janes escreveu:
On Sun, Nov 17, 2013 at 4:46 PM, Edson Richter 
mailto:edsonrich...@hotmail.com>> wrote:


Yes, those optimizations I was talking about: having database
server store transaction log in high speed solid state disks and
consider it done while background thread will update data in
slower disks...

There is no reason to wait for fsync in slow disks to guarantee
consistency... If database server crashes, then it just need to
"redo" log transactions from fast disk into slower data storage
and database server is ready to go (I think this is Sybase/MS SQL
strategy for years).



Using a nonvolatile write cache for pg_xlog is certainly possible and 
often done with PostgreSQL.  It is not important that the nonvolatile 
write cache is fronting for SSD, fronting for HDD is fine as the write 
cache turns the xlog into pure sequential writes and HDD should not 
have a problem keeping up.


Cheers,

Jeff
Hum... I agree about the tecnology (SSD x HDD, etc) - but may be I 
misunderstood, but I have read that to keep always safe data, I must use 
fsync, and as result every transaction must wait for data to be written 
in disk before returning as success.
By using the approach I've described you will have fsync (and data will 
be 100% safe), but transaction is considered success once written in the 
transaction log that is pure sequencial (and even pre-allocated space, 
without need to ask OS for new files or new space) - and also no need to 
wait for slow operations to write data in data pages.


Am I wrong?

Edson

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-19 Thread Jeff Janes

On Sun, Nov 17, 2013 at 4:46 PM, Edson Richter wrote:

> Yes, those optimizations I was talking about: having database server store
> transaction log in high speed solid state disks and consider it done while
> background thread will update data in slower disks...
>
> There is no reason to wait for fsync in slow disks to guarantee
> consistency... If database server crashes, then it just need to "redo" log
> transactions from fast disk into slower data storage and database server is
> ready to go (I think this is Sybase/MS SQL strategy for years).
>

Using a nonvolatile write cache for pg_xlog is certainly possible and often
done with PostgreSQL.  It is not important that the nonvolatile write cache
is fronting for SSD, fronting for HDD is fine as the write cache turns the
xlog into pure sequential writes and HDD should not have a problem keeping
up.

Cheers,

Jeff

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-19 Thread Stefan Keller

Dear Bricklen and Andrew

2013/11/19 bricklen 

> On Mon, Nov 18, 2013 at 11:16 PM, Stefan Keller 
>  wrote:
>
>>
>> > I don't think there's any evidence that the Postgres developers ignore
>> > useful optimisations.  What you're arguing is that the optimisation
>> > you have in mind isn't covered.
>>
>> No; my point is that I - and others like Stonebraker, Oracle and SAP etc.
>> - see room for optimization because assumptions about HW changed. To me,
>> that should be enough evidence to start thinking about enhancements.
>>
>
>
> You must not read the -hackers list often enough, there are regularly long
> discussions about changing settings and adding features to take into
> account new hardware capabilities.
> If you feel so strongly that the core developers are not scratching your
> itch, donate some code or money to fund they feature you feel are missing.
>

I usually discuss things - with core devs and devs and others - before I
code.
And coding was what's obviously needed regarding
the file_fixed_length_record_fdw.
I'm reading -hackers often and don't get a single valuable hit when
searching for "in-memory" in postgres-* lists.
So, may we come back on track?

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-19 Thread bricklen

On Mon, Nov 18, 2013 at 11:16 PM, Stefan Keller  wrote:

>
> > I don't think there's any evidence that the Postgres developers ignore
> > useful optimisations.  What you're arguing is that the optimisation
> > you have in mind isn't covered.
>
> No; my point is that I - and others like Stonebraker, Oracle and SAP etc.
> - see room for optimization because assumptions about HW changed. To me,
> that should be enough evidence to start thinking about enhancements.
>

You must not read the -hackers list often enough, there are regularly long
discussions about changing settings and adding features to take into
account new hardware capabilities.
If you feel so strongly that the core developers are not scratching your
itch, donate some code or money to fund they feature you feel are missing.

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-19 Thread Andrew Dunstan



On 11/17/2013 07:02 PM, Stefan Keller wrote:
2013/11/18 Andreas Brandl > wrote:

> What is your use-case?

It's geospatial data from OpenStreetMap stored in a schema optimized 
for PostGIS extension (produced by osm2pgsql).


BTW: Having said (to Martijn) that using Postgres is probably more 
efficient, than programming an in-memory database in a decent 
language: OpenStreetMap has a very, very large Node table which is 
heavily used by other tables (like ways) - and becomes rather slow in 
Postgres. Since it's of fixed length I'm looking at 
file_fixed_length_record_fdw extension [1][2] (which is in-memory) to 
get the best of both worlds.


--Stefan

[1] 
http://wiki.postgresql.org/wiki/Foreign_data_wrappers#file_fixed_length_record_fdw

[2] https://github.com/adunstan/file_fixed_length_record_fdw



First. please don't top-post on the PostgreSQL lists. See 



Second, what the heck makes you think that this is in any sense 
in-memory? You can process a multi-terabyte fixed length file. It's not 
held in memory.


cheers

andrew



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-18 Thread Stefan Keller

Hi Andrew

You wrote:
> And indeed, given the specifics of the use
> case you're outlining, it's as much a demonstration of that evaluation
> as a repudiation of it.

Maybe my use cases seem to be a special case (to me and over a million
users of OpenStreetMap it's not).
Anyhow: That's why I'm investigating an FDW extension.

> I don't think there's any evidence that the Postgres developers ignore
> useful optimisations.  What you're arguing is that the optimisation
> you have in mind isn't covered.

No; my point is that I - and others like Stonebraker, Oracle and SAP etc. -
see room for optimization because assumptions about HW changed. To me, that
should be enough evidence to start thinking about enhancements.

Yours, S.



2013/11/19 Andrew Sullivan 

> On Tue, Nov 19, 2013 at 02:39:17AM +0100, Stefan Keller wrote:
> > Referring to the application is something you can always say - but
> > shouldn't prevent on enhancing Postgres.
>
> With respect, that sounds like a sideways version of, "You should
> optimise for $usecase".  You could be right, but I think the judgement
> of the Postgres developers has generally been that special cases are
> not the mainline case.  And indeed, given the specifics of the use
> case you're outlining, it's as much a demonstration of that evaluation
> as a repudiation of it.
>
> I don't think there's any evidence that the Postgres developers ignore
> useful optimisations.  What you're arguing is that the optimisation
> you have in mind isn't covered.  What you need is an argument that it
> is generally useful.  Otherwise, the right thing to do is get a
> specialised tool (which might be a special optimisation of the
> Postgres code).
>
> Best,
>
> A
>
> --
> Andrew Sullivan
> a...@crankycanuck.ca
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-18 Thread Andrew Sullivan

On Tue, Nov 19, 2013 at 02:39:17AM +0100, Stefan Keller wrote:
> Referring to the application is something you can always say - but
> shouldn't prevent on enhancing Postgres.

With respect, that sounds like a sideways version of, "You should
optimise for $usecase".  You could be right, but I think the judgement
of the Postgres developers has generally been that special cases are
not the mainline case.  And indeed, given the specifics of the use
case you're outlining, it's as much a demonstration of that evaluation
as a repudiation of it.

I don't think there's any evidence that the Postgres developers ignore
useful optimisations.  What you're arguing is that the optimisation
you have in mind isn't covered.  What you need is an argument that it
is generally useful.  Otherwise, the right thing to do is get a
specialised tool (which might be a special optimisation of the
Postgres code).

Best,

A

-- 
Andrew Sullivan
a...@crankycanuck.ca

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-18 Thread Stefan Keller

Hi Jeff and Martin

On 18. November 2013 17:44 Jeff Janes  wrote:
> I rather doubt that.  All the bottlenecks I know about for well cached
read-only workloads are around
> locking for in-memory concurrency protection, and have little or nothing
to do with secondary storage.

Interesting point. But I think this is only partially the case - as
Stonebraker asserts [1]. While I don't see how to speed-up locking (and
latching), AFAIK there is quite some room for enhancement in buffer pooling
(see also [2]). Especially in GIS environments there are heavy calculations
and random access operations - so buffer pool will play a role.

To Martin: Stonebraker explicitly supports my hypothesis that in-memory
databases become prevalent in the future and that "elephants" will be
challenged if they don't adapt to new architectures, like in-memory and
column stores.

The specific use case here is a PostGIS query of an OpenStreetMap data of
the whole world (see [3]).

On 2013/11/18 Jeff Janes  wrote:

> >> On Sun, Nov 17, 2013 at 4:02 PM, Stefan Keller 
 wrote:
>> BTW: Having said (to Martijn) that using Postgres is probably more
efficient, than programming an in-memory
> database in a decent language: OpenStreetMap has a very, very large Node
table which is heavily
> used by other tables (like ways) - and becomes rather slow in Postgres.
>
> Do you know why it is slow?  I'd give high odds that it would be a
specific implementation detail in
> the code that is suboptimal, or maybe a design decision of PostGIS,
rather than some high level
> architectural decision of PostgreSQL.

Referring to the application is something you can always say - but
shouldn't prevent on enhancing Postgres.
PostGIS extension isn't involved in this use case. In this use case it's
about handling a very huge table with a bigint id and two numbers
representing lat/lon. As I said, an obvious solution is to access the
tupels as fixed length records (which isn't a universal solution - but
exploiting the fact that's in-memory).

You can replicate this use case by trying to load the planet file into
Postgres using osm2pgsql (see [2]). The record currently is about 20
hours(!) I think with 32GB and SSDs.

--Stefan

[1] Michael Stonebraker: “The Traditional RDBMS Wisdom is All Wrong”:
http://blog.jooq.org/2013/08/24/mit-prof-michael-stonebraker-the-traditional-rdbms-wisdom-is-all-wrong/
[2] Oracle Database In-Memory Option - A Preview: In-Memory Acceleration
for All Applications
http://www.oracle.com/us/corporate/features/database-in-memory-option/index.html
[3] osm2pgsql benchmark:
http://wiki.openstreetmap.org/wiki/Osm2pgsql/benchmarks

2013/11/18 Jeff Janes 

> On Sun, Nov 17, 2013 at 4:02 PM, Stefan Keller  wrote:
>
>> 2013/11/18 Andreas Brandl  wrote:
>> > What is your use-case?
>>
>> It's geospatial data from OpenStreetMap stored in a schema optimized for
>> PostGIS extension (produced by osm2pgsql).
>>
>> BTW: Having said (to Martijn) that using Postgres is probably more
>> efficient, than programming an in-memory database in a decent language:
>> OpenStreetMap has a very, very large Node table which is heavily used by
>> other tables (like ways) - and becomes rather slow in Postgres.
>>
>
>
> Do you know why it is slow?  I'd give high odds that it would be a
> specific implementation detail in the code that is suboptimal, or maybe a
> design decision of PostGIS, rather than some high level architectural
> decision of PostgreSQL.
>
> Cheers,
>
> Jeff
>

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-18 Thread Jeff Janes

On Sun, Nov 17, 2013 at 4:02 PM, Stefan Keller  wrote:

> 2013/11/18 Andreas Brandl  wrote:
> > What is your use-case?
>
> It's geospatial data from OpenStreetMap stored in a schema optimized for
> PostGIS extension (produced by osm2pgsql).
>
> BTW: Having said (to Martijn) that using Postgres is probably more
> efficient, than programming an in-memory database in a decent language:
> OpenStreetMap has a very, very large Node table which is heavily used by
> other tables (like ways) - and becomes rather slow in Postgres.
>

Do you know why it is slow?  I'd give high odds that it would be a specific
implementation detail in the code that is suboptimal, or maybe a design
decision of PostGIS, rather than some high level architectural decision of
PostgreSQL.

Cheers,

Jeff

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-18 Thread Jeff Janes

On Sun, Nov 17, 2013 at 1:33 PM, Stefan Keller  wrote:

> Hi Edson,
>
> On 2013/11/17 Edson Richter  you wrote:
> > One question: would you please expand your answer and explain how would
> this adversely affect async replication?
>
> Is this a question or a hint (or both) :-)? Of course almost all
> non-durable settings [1] will delay replication.
>
> I think I have to add, that pure speed of a read-mostly database is the
> main scenario I have in mind.
> Duration, High-availability and Scaling out are perhaps additional or
> separate scenarios.
>

I think the main bottleneck you will run into is the client-server
architecture.  PostgreSQL does not have embedded mode, so every interaction
has to bounce data back and forth between processes.

>
> So, to come back to my question: I think that Postgres could be even
> faster by magnitudes, if the assumption of writing to slow secondary
> storage (like disks) is removed (or replaced).
>

I rather doubt that.  All the bottlenecks I know about for well cached
read-only workloads are around locking for in-memory concurrency
protection, and have little or nothing to do with secondary storage.

Cheers,

Jeff

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-17 Thread John R Pierce


On 11/17/2013 4:46 PM, Edson Richter wrote:


There is no reason to wait for fsync in slow disks to guarantee 
consistency... If database server crashes, then it just need to "redo" 
log transactions from fast disk into slower data storage and database 
server is ready to go (I think this is Sybase/MS SQL strategy for years).


you need to fsync that slower disk before you can purge the older WAL or 
redo log, whatever, from your fast storage.   this fsync can, of course, 
be quite a ways behind the current commit status.





--
john r pierce  37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-17 Thread Edson Richter

Em 17/11/2013 22:02, Gavin Flower escreveu:

On 18/11/13 12:53, Stefan Keller wrote:

Hi Martijn

2013/11/17 Martijn van Oosterhout > wrote:

> If your dataset fits in memory then the problem is trivial: any decent
> programming language provides you with all the necessary tools to deal
> with data purely in memory.

What about Atomicity, Concurrency and about SQL query language and 
the extension mechanisms of Postgres? To me, that's not trivial.

> There are also quite a lot of databases that cover this area.

Agreed. That's what partially triggered my question, It's notably 
Oracle TimesTen, MS SQL Server 2014 (project Hekaton), (distributed) 
"MySQL Cluster", SAP HANA or SQLite >3. To me this rather confirms 
that an architecture and/or configuration for in-memory could be an 
issue also in Postgres.

The actual architecture of Postgres assumes that memory resources are 
expensive and optimizes avoiding disk I/O. Having more memory 
available affects database design e.g. that it can optimize for a 
working set to be stored entirely in main memory.

--Stefan

[...]

It would allow optimised indexes that store memory pointers of 
individual records, rather than to a block & then search for the 
record - as well as other optimisations that only make sense when data 
is known to be in RAM (and RAM is plentiful).  As already big severs 
can have a TerraByte or more of RAM, that will become more & more 
common place.  I have 32GB on my development box.

Cheers,
Gavin

Yes, those optimizations I was talking about: having database server 
store transaction log in high speed solid state disks and consider it 
done while background thread will update data in slower disks...

There is no reason to wait for fsync in slow disks to guarantee 
consistency... If database server crashes, then it just need to "redo" 
log transactions from fast disk into slower data storage and database 
server is ready to go (I think this is Sybase/MS SQL strategy for years).

Also, consider to have lazy loading (current?) or eager loading 
(perhaps, I just learned a bit about pg_warmcache).

And, of course, indexes that would point to pages in disk to memory 
areas when in RAM - as you just mentioned.

Regards,

Edson

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-17 Thread Gavin Flower

On 18/11/13 12:53, Stefan Keller wrote:

Hi Martijn

2013/11/17 Martijn van Oosterhout > wrote:

> If your dataset fits in memory then the problem is trivial: any decent
> programming language provides you with all the necessary tools to deal
> with data purely in memory.

What about Atomicity, Concurrency and about SQL query language and the 
extension mechanisms of Postgres? To me, that's not trivial.

> There are also quite a lot of databases that cover this area.

Agreed. That's what partially triggered my question, It's notably 
Oracle TimesTen, MS SQL Server 2014 (project Hekaton), (distributed) 
"MySQL Cluster", SAP HANA or SQLite >3. To me this rather confirms 
that an architecture and/or configuration for in-memory could be an 
issue also in Postgres.

The actual architecture of Postgres assumes that memory resources are 
expensive and optimizes avoiding disk I/O. Having more memory 
available affects database design e.g. that it can optimize for a 
working set to be stored entirely in main memory.

--Stefan

[...]

It would allow optimised indexes that store memory pointers of 
individual records, rather than to a block & then search for the record 
- as well as other optimisations that only make sense when data is known 
to be in RAM (and RAM is plentiful).  As already big severs can have a 
TerraByte or more of RAM, that will become more & more common place.  I 
have 32GB on my development box.

Cheers,
Gavin

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-17 Thread Stefan Keller

2013/11/18 Andreas Brandl  wrote:
> What is your use-case?

It's geospatial data from OpenStreetMap stored in a schema optimized for
PostGIS extension (produced by osm2pgsql).

BTW: Having said (to Martijn) that using Postgres is probably more
efficient, than programming an in-memory database in a decent language:
OpenStreetMap has a very, very large Node table which is heavily used by
other tables (like ways) - and becomes rather slow in Postgres. Since it's
of fixed length I'm looking at file_fixed_length_record_fdw extension
[1][2] (which is in-memory) to get the best of both worlds.

--Stefan

[1]
http://wiki.postgresql.org/wiki/Foreign_data_wrappers#file_fixed_length_record_fdw
[2] https://github.com/adunstan/file_fixed_length_record_fdw


2013/11/18 Andreas Brandl 

> Hi Stefan,
>
> > How can Postgres be used and configured as an In-Memory Database?
> >
>
> we've put the data directory on our buildserver directly on a ramdisk
> (e.g. /dev/shm) to improve build times.
>
> Obviously you then don't care too much about durability here, so one can
> switch off all related settings (as has already been pointed out). The only
> thing to do on a server reboot would be to re-create a fresh data directory
> on the ramdisk.
>
> So if you're able to start from scratch relatively cheap (i.e. on a server
> reboot), don't care about durability/crash safety at all and your database
> fits into ram that solution is easy to handle.
>
> I've also tried having only a separate tablespace on ramdisk but abandoned
> the idea because postgres seemed too surprised to see the tablespace empty
> after a reboot (all tables gone).
>
> Overall the above solution works and improves our build times but I think
> there are better ways to have in-memory/application caches than using a
> postgres.
>
> What is your use-case?
>
> Regards
> Andreas
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-17 Thread Stefan Keller

Hi Martijn

2013/11/17 Martijn van Oosterhout  wrote:

> > If your dataset fits in memory then the problem is trivial: any decent
> programming language provides you with all the necessary tools to deal
> with data purely in memory.

What about Atomicity, Concurrency and about SQL query language and the
extension mechanisms of Postgres? To me, that's not trivial.

> There are also quite a lot of databases that cover this area.

Agreed. That's what partially triggered my question, It's notably Oracle
TimesTen, MS SQL Server 2014 (project Hekaton), (distributed) "MySQL
Cluster", SAP HANA or SQLite >3. To me this rather confirms that an
architecture and/or configuration for in-memory could be an issue also in
Postgres.

The actual architecture of Postgres assumes that memory resources are
expensive and optimizes avoiding disk I/O. Having more memory available
affects database design e.g. that it can optimize for a working set to be
stored entirely in main memory.

--Stefan


2013/11/17 Martijn van Oosterhout 

> On Sun, Nov 17, 2013 at 10:33:30PM +0100, Stefan Keller wrote:
> > I think I have to add, that pure speed of a read-mostly database is the
> > main scenario I have in mind.
> > Duration, High-availability and Scaling out are perhaps additional or
> > separate scenarios.
> >
> > So, to come back to my question: I think that Postgres could be even
> faster
> > by magnitudes, if the assumption of writing to slow secondary storage
> (like
> > disks) is removed (or replaced).
>
> If your dataset fits in memory then the problem is trivial: any decent
> programming language provides you with all the necessary tools to deal
> with data purely in memory.  There are also quite a lot of databases
> that cover this area.
>
> PostgreSQL excels in the area where your data is much larger than your
> memory. This is a much more difficult problem and I think one worth
> focussing on. Pure in memory databases are just not as interesting.
>
> Have a nice day,
> --
> Martijn van Oosterhout  http://svana.org/kleptog/
> > He who writes carelessly confesses thereby at the very outset that he
> does
> > not attach much importance to his own thoughts.
>-- Arthur Schopenhauer
>
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.10 (GNU/Linux)
>
> iQIVAwUBUolDw0vt++dL5i1EAQiArQ//cDQUz9YGOC+dmHBjsix1w1DdM3VUpAzE
> U4yWcVb83tsq+jEuY4+NAGTnPk7Ks4cXACNQiMuS5ISSKdxkuCabt+pi1mHwi2z6
> aO8/Fhy4nBIC9qllqCXUpexNrDoarQ3xCSrJF+8AB7Y2dtIpQkEmPszYoF2LzWv+
> vOoydD19xiAVAYAlR+AJi7IBl4Z7IH4ODfdoQ75JW7ZJIjlg8BwPU0wfg8oJbzxT
> nVZMj+8txD6ozzR49yTVXnDXwzlSxG95Bu15uinvBWHHQSuONvvpAhL/IfI1tPH7
> 7pz8KR6+SvFPS5MdsCQ31qSxQThWDg1JkG6aNpch8pG7XI0yBX4uK3ViwM07nIZ9
> hTuEOZvtWwxA1OipwFxxc784qESunnY3zQ293xIaKlVAYG7f8Eg43wjQXL4Pi2Q/
> cXvbh6T3bKQyyEcuStjzGALOXWCM+76P6vk9UhWNx1Gwf2R08MlkcbgwSIxg4CVi
> 7t0jm13/lMYGPpykUb5D1uFoymVOIOBzfpLkgzYcDcpMUjwpDmJhjaPTBwytil0e
> Wh1LzILUC1e+8ojVbh4jY0W/yHdzFVm95zDKdfrLPUigsCte7nCAoC423iblI2VW
> GBFJxydK73ttE1o2wBIK5h6j3vn2e7Tb521vi4eR+lTkjavHtVB6m6Mow+ZFvjvi
> QS4G2eUy9o0=
> =BGV+
> -END PGP SIGNATURE-
>
>

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-17 Thread Andreas Brandl

Hi Stefan,

> How can Postgres be used and configured as an In-Memory Database?
> 

we've put the data directory on our buildserver directly on a ramdisk (e.g. 
/dev/shm) to improve build times.

Obviously you then don't care too much about durability here, so one can switch 
off all related settings (as has already been pointed out). The only thing to 
do on a server reboot would be to re-create a fresh data directory on the 
ramdisk.

So if you're able to start from scratch relatively cheap (i.e. on a server 
reboot), don't care about durability/crash safety at all and your database fits 
into ram that solution is easy to handle. 

I've also tried having only a separate tablespace on ramdisk but abandoned the 
idea because postgres seemed too surprised to see the tablespace empty after a 
reboot (all tables gone).

Overall the above solution works and improves our build times but I think there 
are better ways to have in-memory/application caches than using a postgres. 

What is your use-case?

Regards
Andreas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-17 Thread Edson Richter


Em 17/11/2013 20:46, Andreas Brandl escreveu:

Edson,


Em 17/11/2013 19:26, Stefan Keller escreveu:

Hi Edson

As Rob wrote: Having a feature like an in-memory table like SQLite
has
[1] would make application cahces obsoleteand interesting to
discuss
(but that was'nt exactly what I asked above).

Hi, Stephan,

I don't think any feature you add to database server would bring
obsolescence to app server caches: app server caches have just no lag
at
all:

1) Don't need network connection to database server
2) Don't need to materialize results (for instance, I have in mind a
Java or .Net app server running hundred thousands of objects in
memory).

IMHO, no matter how much you improve database, app server caches
provides additional level of speed that cannot be achieved by
database.

That said, I still can see huge improvements in database server.
Having strong in memory operation would bring substantial
improvements.
For instance, if you have in-memory database (tables, indexes, etc)
for
all sort of queries, and just **commit** to disks, then you will have
unprecedent performance.
I would get benefit from this architecture, since typical customer
database has < 64Gb on size (after 2 or 3 years of data recording).
So,
a database server with 64Gb of memory would keep everything in
memory,
and just commit data to disc.

In this case, commited data would be instantly available to queries
(because they are all in memory) while log (changes) is recorded in a
fast disk (a SSD, perhaps) and then those changes are made persistent
data, written async into slow massive disks (SCSI or SAS).

This would allow also a hybrid operation (too keep as much data pages
as
possible in memory, with a target of 50% or more in memory).

When database server is started, it would have lazy load (data is
loaded
and kept in memory as it is used) or eager load (for slower startup
but
faster execution).

not sure I fully understand your point. Isn't this the typical 
mode-of-operation plus added cache warming?

Anyways, just wanted to point you to [1] which gives a good overview of cache 
warming techniques.

Regards,
Andreas

[1] http://raghavt.blogspot.fr/2012/04/caching-in-postgresql.html



Worndeful, never knew about it.
I'm ready ASAP.

Regards

Edson


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-17 Thread Andreas Brandl

Edson,

> Em 17/11/2013 19:26, Stefan Keller escreveu:
> > Hi Edson
> >
> > As Rob wrote: Having a feature like an in-memory table like SQLite
> > has
> > [1] would make application cahces obsoleteand interesting to
> > discuss
> > (but that was'nt exactly what I asked above).
> 
> Hi, Stephan,
> 
> I don't think any feature you add to database server would bring
> obsolescence to app server caches: app server caches have just no lag
> at
> all:
> 
> 1) Don't need network connection to database server
> 2) Don't need to materialize results (for instance, I have in mind a
> Java or .Net app server running hundred thousands of objects in
> memory).
> 
> IMHO, no matter how much you improve database, app server caches
> provides additional level of speed that cannot be achieved by
> database.
> 
> That said, I still can see huge improvements in database server.
> Having strong in memory operation would bring substantial
> improvements.
> For instance, if you have in-memory database (tables, indexes, etc)
> for
> all sort of queries, and just **commit** to disks, then you will have
> unprecedent performance.
> I would get benefit from this architecture, since typical customer
> database has < 64Gb on size (after 2 or 3 years of data recording).
> So,
> a database server with 64Gb of memory would keep everything in
> memory,
> and just commit data to disc.
> 
> In this case, commited data would be instantly available to queries
> (because they are all in memory) while log (changes) is recorded in a
> fast disk (a SSD, perhaps) and then those changes are made persistent
> data, written async into slow massive disks (SCSI or SAS).
> 
> This would allow also a hybrid operation (too keep as much data pages
> as
> possible in memory, with a target of 50% or more in memory).
> 
> When database server is started, it would have lazy load (data is
> loaded
> and kept in memory as it is used) or eager load (for slower startup
> but
> faster execution).

not sure I fully understand your point. Isn't this the typical 
mode-of-operation plus added cache warming?

Anyways, just wanted to point you to [1] which gives a good overview of cache 
warming techniques.

Regards,
Andreas

[1] http://raghavt.blogspot.fr/2012/04/caching-in-postgresql.html


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-17 Thread Martijn van Oosterhout

On Sun, Nov 17, 2013 at 10:33:30PM +0100, Stefan Keller wrote:
> I think I have to add, that pure speed of a read-mostly database is the
> main scenario I have in mind.
> Duration, High-availability and Scaling out are perhaps additional or
> separate scenarios.
> 
> So, to come back to my question: I think that Postgres could be even faster
> by magnitudes, if the assumption of writing to slow secondary storage (like
> disks) is removed (or replaced).

If your dataset fits in memory then the problem is trivial: any decent
programming language provides you with all the necessary tools to deal
with data purely in memory.  There are also quite a lot of databases
that cover this area.

PostgreSQL excels in the area where your data is much larger than your
memory. This is a much more difficult problem and I think one worth
focussing on. Pure in memory databases are just not as interesting.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer

signature.asc
Description: Digital signature

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-17 Thread Edson Richter


Em 17/11/2013 19:26, Stefan Keller escreveu:

Hi Edson

As Rob wrote: Having a feature like an in-memory table like SQLite has 
[1] would make application cahces obsoleteand interesting to discuss 
(but that was'nt exactly what I asked above).


Hi, Stephan,

I don't think any feature you add to database server would bring 
obsolescence to app server caches: app server caches have just no lag at 
all:


1) Don't need network connection to database server
2) Don't need to materialize results (for instance, I have in mind a 
Java or .Net app server running hundred thousands of objects in memory).


IMHO, no matter how much you improve database, app server caches 
provides additional level of speed that cannot be achieved by database.


That said, I still can see huge improvements in database server.
Having strong in memory operation would bring substantial improvements.
For instance, if you have in-memory database (tables, indexes, etc) for 
all sort of queries, and just **commit** to disks, then you will have 
unprecedent performance.
I would get benefit from this architecture, since typical customer 
database has < 64Gb on size (after 2 or 3 years of data recording). So, 
a database server with 64Gb of memory would keep everything in memory, 
and just commit data to disc.


In this case, commited data would be instantly available to queries 
(because they are all in memory) while log (changes) is recorded in a 
fast disk (a SSD, perhaps) and then those changes are made persistent 
data, written async into slow massive disks (SCSI or SAS).


This would allow also a hybrid operation (too keep as much data pages as 
possible in memory, with a target of 50% or more in memory).


When database server is started, it would have lazy load (data is loaded 
and kept in memory as it is used) or eager load (for slower startup but 
faster execution).


May be I'm just wondering too much, since I don't know PostgreSQL 
internals...



Regards,

Edson



--Stefan


[1] http://www.sqlite.org/inmemorydb.html
[2] http://www.postgresql.org/docs/9.1/static/non-durability.html

2013/11/17 Edson Richter >


Em 17/11/2013 12:15, rob stone escreveu:


On Sun, 2013-11-17 at 12:25 +0100, Stefan Keller wrote:

How can Postgres be used and configured as an In-Memory
Database?


Does anybody know of thoughts or presentations about this
"NoSQL
feature" - beyond e.g. "Perspectives on NoSQL" from Gavin
Roy at PGCon
2010)?


Given, say 128 GB memory or more, and (read-mostly) data
that fit's
into this, what are the hints to optimize Postgres
(postgresql.conf
etc.)?


-- Stefan

Not as being completely "in memory".
Back in the "good ol'days" of DMS II (written in Algol and ran on
Burroughs mainframes) and Linc II (also Burroughs) it was
possible to
define certain tables as being memory resident. This was
useful for low
volatile data such as salutations, street types, county or
state codes,
time zones, preferred languages, etc.
It saved disk I/O twice. Firstly building your drop down lists and
secondly when the entered data hit the server and was validated.
It would be good to have a similar feature in PostgreSql.
If a table was altered by, say inserting a new street type,
then the
data base engine has to refresh the cache. This is the only
overhead.

Cheers,
Robert


For this purpose (building drop down lists, salutations, street
types, county or state codes), I keep a permanent data cache at
app server side (after all, they will be shared among all users -
even on a multi tenant application). This avoids network
connection, and keep database server memory available for database
operations (like reporting and transactions).
But I agree there are lots of gaings having a "in memory" option
for tables and so. I believe PostgreSQL already does that
automatically (most used tables are kept in memory cache).

Edson.




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org

)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-17 Thread Stefan Keller

Hi Edson,

On 2013/11/17 Edson Richter  you wrote:
> One question: would you please expand your answer and explain how would
this adversely affect async replication?

Is this a question or a hint (or both) :-)? Of course almost all
non-durable settings [1] will delay replication.

I think I have to add, that pure speed of a read-mostly database is the
main scenario I have in mind.
Duration, High-availability and Scaling out are perhaps additional or
separate scenarios.

So, to come back to my question: I think that Postgres could be even faster
by magnitudes, if the assumption of writing to slow secondary storage (like
disks) is removed (or replaced).

--Stefan

[1] http://www.postgresql.org/docs/9.1/static/non-durability.html




2013/11/17 Edson Richter 

> Em 17/11/2013 10:00, Michael Paquier escreveu:
>
>  On Sun, Nov 17, 2013 at 8:25 PM, Stefan Keller 
>> wrote:
>>
>>> How can Postgres be used and configured as an In-Memory Database?
>>>
>>> Does anybody know of thoughts or presentations about this "NoSQL
>>> feature" -
>>> beyond e.g. "Perspectives on NoSQL" from Gavin Roy at PGCon 2010)?
>>>
>>> Given, say 128 GB memory or more, and (read-mostly) data that fit's into
>>> this, what are the hints to optimize Postgres (postgresql.conf etc.)?
>>>
>> In this case as you are trading system safety (system will not be
>> crash-safe) for performance... The following parameters would be
>> suited:
>> - Improve performance by reducing the amount of data flushed:
>> fsync = off
>> synchronous_commit=off
>> - Reduce the size of WALs:
>> full_page_writes = off
>> - Disable the background writer:
>> bgwriter_lru_maxpages = 0
>> Regards,
>>
> One question: would you please expand your answer and explain how would
> this adversely affect async replication?
>
> Edson
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-17 Thread Stefan Keller

Hi Edson

As Rob wrote: Having a feature like an in-memory table like SQLite has [1]
would make application cahces obsolete and interesting to discuss (but that
was'nt exactly what I asked above).

--Stefan



[1] http://www.sqlite.org/inmemorydb.html
[2] http://www.postgresql.org/docs/9.1/static/non-durability.html

2013/11/17 Edson Richter 

> Em 17/11/2013 12:15, rob stone escreveu:
>
>
>> On Sun, 2013-11-17 at 12:25 +0100, Stefan Keller wrote:
>>
>>> How can Postgres be used and configured as an In-Memory Database?
>>>
>>>
>>> Does anybody know of thoughts or presentations about this "NoSQL
>>> feature" - beyond e.g. "Perspectives on NoSQL" from Gavin Roy at PGCon
>>> 2010)?
>>>
>>>
>>> Given, say 128 GB memory or more, and (read-mostly) data that fit's
>>> into this, what are the hints to optimize Postgres (postgresql.conf
>>> etc.)?
>>>
>>>
>>> -- Stefan
>>>
>> Not as being completely "in memory".
>> Back in the "good ol'days" of DMS II (written in Algol and ran on
>> Burroughs mainframes) and Linc II (also Burroughs) it was possible to
>> define certain tables as being memory resident. This was useful for low
>> volatile data such as salutations, street types, county or state codes,
>> time zones, preferred languages, etc.
>> It saved disk I/O twice. Firstly building your drop down lists and
>> secondly when the entered data hit the server and was validated.
>> It would be good to have a similar feature in PostgreSql.
>> If a table was altered by, say inserting a new street type, then the
>> data base engine has to refresh the cache. This is the only overhead.
>>
>> Cheers,
>> Robert
>>
>
> For this purpose (building drop down lists, salutations, street types,
> county or state codes), I keep a permanent data cache at app server side
> (after all, they will be shared among all users - even on a multi tenant
> application). This avoids network connection, and keep database server
> memory available for database operations (like reporting and transactions).
> But I agree there are lots of gaings having a "in memory" option for
> tables and so. I believe PostgreSQL already does that automatically (most
> used tables are kept in memory cache).
>
> Edson.
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-17 Thread Edson Richter


Em 17/11/2013 12:15, rob stone escreveu:


On Sun, 2013-11-17 at 12:25 +0100, Stefan Keller wrote:

How can Postgres be used and configured as an In-Memory Database?


Does anybody know of thoughts or presentations about this "NoSQL
feature" - beyond e.g. "Perspectives on NoSQL" from Gavin Roy at PGCon
2010)?


Given, say 128 GB memory or more, and (read-mostly) data that fit's
into this, what are the hints to optimize Postgres (postgresql.conf
etc.)?


-- Stefan

Not as being completely "in memory".
Back in the "good ol'days" of DMS II (written in Algol and ran on
Burroughs mainframes) and Linc II (also Burroughs) it was possible to
define certain tables as being memory resident. This was useful for low
volatile data such as salutations, street types, county or state codes,
time zones, preferred languages, etc.
It saved disk I/O twice. Firstly building your drop down lists and
secondly when the entered data hit the server and was validated.
It would be good to have a similar feature in PostgreSql.
If a table was altered by, say inserting a new street type, then the
data base engine has to refresh the cache. This is the only overhead.

Cheers,
Robert


For this purpose (building drop down lists, salutations, street types, 
county or state codes), I keep a permanent data cache at app server side 
(after all, they will be shared among all users - even on a multi tenant 
application). This avoids network connection, and keep database server 
memory available for database operations (like reporting and transactions).
But I agree there are lots of gaings having a "in memory" option for 
tables and so. I believe PostgreSQL already does that automatically 
(most used tables are kept in memory cache).


Edson.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-17 Thread rob stone

On Sun, 2013-11-17 at 12:25 +0100, Stefan Keller wrote:
> How can Postgres be used and configured as an In-Memory Database?
> 
> 
> Does anybody know of thoughts or presentations about this "NoSQL
> feature" - beyond e.g. "Perspectives on NoSQL" from Gavin Roy at PGCon
> 2010)?
> 
> 
> Given, say 128 GB memory or more, and (read-mostly) data that fit's
> into this, what are the hints to optimize Postgres (postgresql.conf
> etc.)?
> 
> 
> -- Stefan

Not as being completely "in memory".
Back in the "good ol'days" of DMS II (written in Algol and ran on
Burroughs mainframes) and Linc II (also Burroughs) it was possible to
define certain tables as being memory resident. This was useful for low
volatile data such as salutations, street types, county or state codes,
time zones, preferred languages, etc.
It saved disk I/O twice. Firstly building your drop down lists and
secondly when the entered data hit the server and was validated.
It would be good to have a similar feature in PostgreSql.
If a table was altered by, say inserting a new street type, then the
data base engine has to refresh the cache. This is the only overhead.

Cheers,
Robert

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-17 Thread Edson Richter


Em 17/11/2013 10:00, Michael Paquier escreveu:

On Sun, Nov 17, 2013 at 8:25 PM, Stefan Keller  wrote:

How can Postgres be used and configured as an In-Memory Database?

Does anybody know of thoughts or presentations about this "NoSQL feature" -
beyond e.g. "Perspectives on NoSQL" from Gavin Roy at PGCon 2010)?

Given, say 128 GB memory or more, and (read-mostly) data that fit's into
this, what are the hints to optimize Postgres (postgresql.conf etc.)?

In this case as you are trading system safety (system will not be
crash-safe) for performance... The following parameters would be
suited:
- Improve performance by reducing the amount of data flushed:
fsync = off
synchronous_commit=off
- Reduce the size of WALs:
full_page_writes = off
- Disable the background writer:
bgwriter_lru_maxpages = 0
Regards,
One question: would you please expand your answer and explain how would 
this adversely affect async replication?


Edson


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgres as In-Memory Database?

2013-11-17 Thread Michael Paquier

On Sun, Nov 17, 2013 at 8:25 PM, Stefan Keller  wrote:
> How can Postgres be used and configured as an In-Memory Database?
>
> Does anybody know of thoughts or presentations about this "NoSQL feature" -
> beyond e.g. "Perspectives on NoSQL" from Gavin Roy at PGCon 2010)?
>
> Given, say 128 GB memory or more, and (read-mostly) data that fit's into
> this, what are the hints to optimize Postgres (postgresql.conf etc.)?
In this case as you are trading system safety (system will not be
crash-safe) for performance... The following parameters would be
suited:
- Improve performance by reducing the amount of data flushed:
fsync = off
synchronous_commit=off
- Reduce the size of WALs:
full_page_writes = off
- Disable the background writer:
bgwriter_lru_maxpages = 0
Regards,
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

46 matches

Mail list logo