subject:"\[HACKERS\] \[ANNOUNCE\] IMCS\: In Memory Columnar Store for PostgreSQL"

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-10 Thread Robert Haas

On Thu, Jan 9, 2014 at 12:46 PM, Claudio Freire klaussfre...@gmail.com wrote:
 On Thu, Jan 9, 2014 at 2:22 PM, Robert Haas robertmh...@gmail.com wrote:
 It would be nice to have better operating system support for this.
 For example, IIUC, 64-bit Linux has 128TB of address space available
 for user processes.  When you clone(), it can either share the entire
 address space (i.e. it's a thread) or none of it (i.e. it's a
 process).  There's no option to, say, share 64TB and not the other
 64TB, which would be ideal for us.  We could then map dynamic shared
 memory segments into the shared portion of the address space and do
 backend-private allocations in the unshared part.  Of course, even if
 we had that, it wouldn't be portable, so who knows how much good it
 would do.  But it would be awfully nice to have the option.

 You can map a segment at fork time, and unmap it after forking. That
 doesn't really use RAM, since it's supposed to be lazily allocated (it
 can be forced to be so, I believe, with PROT_NONE and MAP_NORESERVE,
 but I don't think that's portable).

 That guarantees it's free.

It guarantees that it is free as of the moment you unmap it, but it
doesn't guarantee that future memory allocations or shared library
loads couldn't stomp on the space.

Also, that not-portable thing is a bit of a problem.  I've got no
problem with the idea that third-party code may be platform-specific,
but I think the stuff we ship in core has got to work on more or less
all reasonably modern systems.

 Next, you can map shared memory at explicit addresses (linux's mmap
 has support for that, and I seem to recall Windows did too).

 All you have to do, is some book-keeping in shared memory (so all
 processes can coordinate new mappings).

I did something like this back in 1998 or 1999 at the operating system
level, and it turned out not to work very well.  I was working on an
experimental research operating system kernel, and we wanted to add
support for mmap(), so we set aside a portion of the virtual address
space for file mappings.  That region was shared across all processes
in the system.  One problem is that there's no guarantee the space is
big enough for whatever you want to map; and the other problem is that
it can easily get fragmented.  Now, 64-bit address spaces go some way
to ameliorating these concerns so maybe it can be made to work, but I
would be a teeny bit cautious about using the word just to describe
the complexity involved.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-10 Thread Robert Haas

On Thu, Jan 9, 2014 at 2:09 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Thu, Jan 9, 2014 at 12:21 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jan 7, 2014 at 10:20 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Tue, Jan 7, 2014 at 2:46 AM, Robert Haas robertmh...@gmail.com wrote:

 Well, right now we just reopen the same object from all of the
 processes, which seems to work fine and doesn't require any of this
 complexity.  The only problem I don't know how to solve is how to make
 a segment stick around for the whole postmaster lifetime.  If
 duplicating the handle into the postmaster without its knowledge gets
 us there, it may be worth considering, but that doesn't seem like a
 good reason to rework the rest of the existing mechanism.

 I think one has to try this to see if it works as per the need. If it's not
 urgent, I can try this early next week?

 Anything we want to get into 9.4 has to be submitted by next Tuesday,
 but I don't know that we're going to get this into 9.4.

 Using DuplicateHandle(), we can make segment stick for Postmaster
 lifetime. I have used below test (used dsm_demo module) to verify:
 Session - 1
 select dsm_demo_create('this message is from session-1');
  dsm_demo_create
 -
82712

 Session - 2
 -
 select dsm_demo_read(82712);
dsm_demo_read
 
  this message is from session-1
 (1 row)

 Session-1
 \q

 -- till here it will work without DuplicateHandle as well

 Session -2
 select dsm_demo_read(82712);
dsm_demo_read
 
  this message is from session-1
 (1 row)

 Session -2
 \q

 Session -3
 select dsm_demo_read(82712);
dsm_demo_read
 
  this message is from session-1
 (1 row)

 -- above shows that handle stays around.

 Note -
 Currently I have to bypass below code in dam_attach(), as it assumes
 segment will not stay if it's removed from control file.

 /*
 * If we didn't find the handle we're looking for in the control
 * segment, it probably means that everyone else who had it mapped,
 * including the original creator, died before we got to this point.
 * It's up to the caller to decide what to do about that.
 */
 if (seg-control_slot == INVALID_CONTROL_SLOT)
 {
 dsm_detach(seg);
 return NULL;
 }


 Could you let me know what exactly you are expecting in patch,
 just a call to DuplicateHandle() after CreateFileMapping() or something
 else as well?

Well, I guess what I was thinking is that we could have a call
dsm_keep_segment() which would be invoked on an already-created
dsm_segment *.  On Linux, that would just bump the reference count in
the control segment up by one so that it doesn't get destroyed until
postmaster shutdown.  On Windows it may as well still do that for
consistency, but will also need to do this DuplicateHandle() trick.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-10 Thread Claudio Freire

On Fri, Jan 10, 2014 at 3:23 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jan 9, 2014 at 12:46 PM, Claudio Freire klaussfre...@gmail.com 
 wrote:
 On Thu, Jan 9, 2014 at 2:22 PM, Robert Haas robertmh...@gmail.com wrote:
 It would be nice to have better operating system support for this.
 For example, IIUC, 64-bit Linux has 128TB of address space available
 for user processes.  When you clone(), it can either share the entire
 address space (i.e. it's a thread) or none of it (i.e. it's a
 process).  There's no option to, say, share 64TB and not the other
 64TB, which would be ideal for us.  We could then map dynamic shared
 memory segments into the shared portion of the address space and do
 backend-private allocations in the unshared part.  Of course, even if
 we had that, it wouldn't be portable, so who knows how much good it
 would do.  But it would be awfully nice to have the option.

 You can map a segment at fork time, and unmap it after forking. That
 doesn't really use RAM, since it's supposed to be lazily allocated (it
 can be forced to be so, I believe, with PROT_NONE and MAP_NORESERVE,
 but I don't think that's portable).

 That guarantees it's free.

 It guarantees that it is free as of the moment you unmap it, but it
 doesn't guarantee that future memory allocations or shared library
 loads couldn't stomp on the space.

You would only unmap prior to remapping, only the to-be-mapped
portion, so I don't see a problem.

 Also, that not-portable thing is a bit of a problem.  I've got no
 problem with the idea that third-party code may be platform-specific,
 but I think the stuff we ship in core has got to work on more or less
 all reasonably modern systems.

 Next, you can map shared memory at explicit addresses (linux's mmap
 has support for that, and I seem to recall Windows did too).

 All you have to do, is some book-keeping in shared memory (so all
 processes can coordinate new mappings).

 I did something like this back in 1998 or 1999 at the operating system
 level, and it turned out not to work very well.  I was working on an
 experimental research operating system kernel, and we wanted to add
 support for mmap(), so we set aside a portion of the virtual address
 space for file mappings.  That region was shared across all processes
 in the system.  One problem is that there's no guarantee the space is
 big enough for whatever you want to map; and the other problem is that
 it can easily get fragmented.  Now, 64-bit address spaces go some way
 to ameliorating these concerns so maybe it can be made to work, but I
 would be a teeny bit cautious about using the word just to describe
 the complexity involved.

Ok, yes, fragmentation could be an issue if the address range is not
humongus enough.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-10 Thread Robert Haas

On Fri, Jan 10, 2014 at 1:35 PM, Claudio Freire klaussfre...@gmail.com wrote:
 You can map a segment at fork time, and unmap it after forking. That
 doesn't really use RAM, since it's supposed to be lazily allocated (it
 can be forced to be so, I believe, with PROT_NONE and MAP_NORESERVE,
 but I don't think that's portable).

 That guarantees it's free.

 It guarantees that it is free as of the moment you unmap it, but it
 doesn't guarantee that future memory allocations or shared library
 loads couldn't stomp on the space.

 You would only unmap prior to remapping, only the to-be-mapped
 portion, so I don't see a problem.

OK, yeah, that way works.  That's more or less what Noah proposed
before.  But I was skeptical it would work well everywhere.  I suppose
we won't know until somebody tries it.  (I didn't.)

 Ok, yes, fragmentation could be an issue if the address range is not
 humongus enough.

I've often thought that 64-bit machines are so capable that there's no
reason to go any higher.  But lately I've started to wonder.  There
are already machines out there with 2^40 bytes of physical memory,
and the number just keeps creeping up.  When you reserve a couple of
bits to indicate user or kernel space, and then consider that virtual
address space can be many times larger than physical memory, it starts
not to seem like that much.

But I'm not that excited about the amount of additional memory we'll
eat when somebody decides to make a pointer 16 bytes.  Ugh.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-10 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 I've often thought that 64-bit machines are so capable that there's no
 reason to go any higher.  But lately I've started to wonder.  There
 are already machines out there with 2^40 bytes of physical memory,
 and the number just keeps creeping up.  When you reserve a couple of
 bits to indicate user or kernel space, and then consider that virtual
 address space can be many times larger than physical memory, it starts
 not to seem like that much.

 But I'm not that excited about the amount of additional memory we'll
 eat when somebody decides to make a pointer 16 bytes.  Ugh.

Once you really need that, you're not going to care about doubling
the size of pointers.  At worst, you're giving up 1 bit of address
space to gain 64 more.

(Still, I rather doubt it'll happen in my lifetime.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-09 Thread Robert Haas

On Wed, Jan 8, 2014 at 2:39 PM, knizhnik knizh...@garret.ru wrote:
 I wonder what is the intended use case of dynamic shared memory?
 Is is primarly oriented on PostgreSQL extensions or it will be used also in
 PosatgreSQL core?

My main motivation is that I want to use it to support parallel query.
 There is unfortunately quite a bit of work left to be done before we
can make that a reality, but that's the goal.

 May be I am wrong, but I do not see some reasons for creating multiple DSM
 segments by the same extension.

Right.

 And total number of DSM segments is expected to be not very large (10). The
 same is true for synchronization primitives (LWLocks for example) needed to
 synchronize access to this DSM segments. So I am not sure if possibility to
 place locks in DSM is really so critical...
 We can just reserved some space for LWLocks which can be used by extension,
 so that LWLockAssign() can be used without RequestAddinLWLocks or
 RequestAddinLWLocks can be used not only from preloaded extension.

If you're doing all of this at postmaster startup time, that all works
fine.  If you want to be able to load up an extension on the fly, then
it doesn't.  You can only RequestAddinLWLocks() at postmaster start
time, not afterwards, so currently any extension that wants to use
lwlocks has to be loaded at postmaster startup time, or you're out of
luck.

Well.  Technically we reserve something like 3 extra lwlocks that
could be assigned later.  But relying on those to be available is not
very reliable, and also, 3 is not very many, considering that we have
something north of 32k core lwlocks in the default configuration.

 IMHO the main trouble with DSM is lack of guarantee that segment is always
 mapped to the same virtual address.
 Without such guarantee it is not possible to use direct (normal) pointers
 inside DSM.
 But there seems to be no reasonable solution.

Yeah, that basically sucks.  But it's very hard to do any better.  At
least on a 64-bit platform, there's an awful lot of address space
available, and in theory it ought to be possible to find a portion of
that address space that isn't in use by any Postgres process and have
all of the backends map the shared memory segment there.  But there's
no portable way to do that, and it seems like it would require an
awful lot of IPC to achieve consensus on where to put a new mapping.

On non-Windows platforms, Noah had the idea that could reserve a large
chunk of address space mapped as PROT_NONE and then overwrite it with
mappings later as needed.  However, I'm not sure how portable that is
or whether it'll cause performance consequences (like page table
bloat) if the space doesn't end up getting used (or if it does).  And
unless you have an awful lot of space available, it's hard to be sure
that new mappings are going to fit.  And then there's Windows.

It would be nice to have better operating system support for this.
For example, IIUC, 64-bit Linux has 128TB of address space available
for user processes.  When you clone(), it can either share the entire
address space (i.e. it's a thread) or none of it (i.e. it's a
process).  There's no option to, say, share 64TB and not the other
64TB, which would be ideal for us.  We could then map dynamic shared
memory segments into the shared portion of the address space and do
backend-private allocations in the unshared part.  Of course, even if
we had that, it wouldn't be portable, so who knows how much good it
would do.  But it would be awfully nice to have the option.

I haven't given up hope that we'll some day find a way to make
same-address mappings work, at least on some platforms.  But I don't
expect it to happen soon.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-09 Thread Claudio Freire

On Thu, Jan 9, 2014 at 2:22 PM, Robert Haas robertmh...@gmail.com wrote:
 It would be nice to have better operating system support for this.
 For example, IIUC, 64-bit Linux has 128TB of address space available
 for user processes.  When you clone(), it can either share the entire
 address space (i.e. it's a thread) or none of it (i.e. it's a
 process).  There's no option to, say, share 64TB and not the other
 64TB, which would be ideal for us.  We could then map dynamic shared
 memory segments into the shared portion of the address space and do
 backend-private allocations in the unshared part.  Of course, even if
 we had that, it wouldn't be portable, so who knows how much good it
 would do.  But it would be awfully nice to have the option.

You can map a segment at fork time, and unmap it after forking. That
doesn't really use RAM, since it's supposed to be lazily allocated (it
can be forced to be so, I believe, with PROT_NONE and MAP_NORESERVE,
but I don't think that's portable).

That guarantees it's free.

Next, you can map shared memory at explicit addresses (linux's mmap
has support for that, and I seem to recall Windows did too).

All you have to do, is some book-keeping in shared memory (so all
processes can coordinate new mappings).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-09 Thread Amit Kapila

On Thu, Jan 9, 2014 at 12:21 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jan 7, 2014 at 10:20 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Tue, Jan 7, 2014 at 2:46 AM, Robert Haas robertmh...@gmail.com wrote:

 Well, right now we just reopen the same object from all of the
 processes, which seems to work fine and doesn't require any of this
 complexity.  The only problem I don't know how to solve is how to make
 a segment stick around for the whole postmaster lifetime.  If
 duplicating the handle into the postmaster without its knowledge gets
 us there, it may be worth considering, but that doesn't seem like a
 good reason to rework the rest of the existing mechanism.

 I think one has to try this to see if it works as per the need. If it's not
 urgent, I can try this early next week?

 Anything we want to get into 9.4 has to be submitted by next Tuesday,
 but I don't know that we're going to get this into 9.4.

Using DuplicateHandle(), we can make segment stick for Postmaster
lifetime. I have used below test (used dsm_demo module) to verify:
Session - 1
select dsm_demo_create('this message is from session-1');
 dsm_demo_create
-
   82712

Session - 2
-
select dsm_demo_read(82712);
   dsm_demo_read

 this message is from session-1
(1 row)

Session-1
\q

-- till here it will work without DuplicateHandle as well

Session -2
select dsm_demo_read(82712);
   dsm_demo_read

 this message is from session-1
(1 row)

Session -2
\q

Session -3
select dsm_demo_read(82712);
   dsm_demo_read

 this message is from session-1
(1 row)

-- above shows that handle stays around.

Note -
Currently I have to bypass below code in dam_attach(), as it assumes
segment will not stay if it's removed from control file.

/*
* If we didn't find the handle we're looking for in the control
* segment, it probably means that everyone else who had it mapped,
* including the original creator, died before we got to this point.
* It's up to the caller to decide what to do about that.
*/
if (seg-control_slot == INVALID_CONTROL_SLOT)
{
dsm_detach(seg);
return NULL;
}


Could you let me know what exactly you are expecting in patch,
just a call to DuplicateHandle() after CreateFileMapping() or something
else as well?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-09 Thread knizhnik


On 01/09/2014 09:22 PM, Robert Haas wrote:

On Wed, Jan 8, 2014 at 2:39 PM, knizhnik knizh...@garret.ru wrote:

I wonder what is the intended use case of dynamic shared memory?
Is is primarly oriented on PostgreSQL extensions or it will be used also in
PosatgreSQL core?

My main motivation is that I want to use it to support parallel query.
  There is unfortunately quite a bit of work left to be done before we
can make that a reality, but that's the goal.


I do not want to waste your time, but this topic is very interesting to 
me and I will be very pleased if you drop few words about how DSM can 
help to implement parallel query processing?
It seems to me that the main complexity is in optimizer - it needs to 
split query plan into several subplans which can be executed 
concurrently and then merge their partial results.
As far as I understand it is not possible to use multithreading for 
parallel query execution because most of PostgreSQL code is 
non-reentrant. So we need to execute this subplans by several processes. 
And unlike threads, the only way of efficient exchanging data between 
processes is shared memory. So it is clear why do we need shared memory 
for parallel query execution. But why it has to be dynamic? Why it can 
not be preallocated at start time as most of other resources used by 
PostgreSQL?





May be I am wrong, but I do not see some reasons for creating multiple DSM
segments by the same extension.

Right.


And total number of DSM segments is expected to be not very large (10). The
same is true for synchronization primitives (LWLocks for example) needed to
synchronize access to this DSM segments. So I am not sure if possibility to
place locks in DSM is really so critical...
We can just reserved some space for LWLocks which can be used by extension,
so that LWLockAssign() can be used without RequestAddinLWLocks or
RequestAddinLWLocks can be used not only from preloaded extension.

If you're doing all of this at postmaster startup time, that all works
fine.  If you want to be able to load up an extension on the fly, then
it doesn't.  You can only RequestAddinLWLocks() at postmaster start
time, not afterwards, so currently any extension that wants to use
lwlocks has to be loaded at postmaster startup time, or you're out of
luck.

Well.  Technically we reserve something like 3 extra lwlocks that
could be assigned later.  But relying on those to be available is not
very reliable, and also, 3 is not very many, considering that we have
something north of 32k core lwlocks in the default configuration.


3 is definitely too small.
But you agreed with me that number of DSM segments will be not very large.
And if we do not need fine grain locking (and IMHO it is not needed for 
most extensions), then we need just few (most likely one) lock per DSM 
segment.
It means that if instead of 3 we reserve let's say 30 LW-locks, then it 
will be enough for most extensions. And there will be almost now extra 
resources overhead, because as you wrote PostgreSQL has 32k locks in 
default configuration.


Certainly if we need independent lock for each page of DSM memory than 
there will be no other choice except placing locks in DSM segment 
itself. But once again - I do not think that most of extension needed 
shared memory will use such fine grain locking.






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-09 Thread knizhnik


On 01/09/2014 09:46 PM, Claudio Freire wrote:

On Thu, Jan 9, 2014 at 2:22 PM, Robert Haas robertmh...@gmail.com wrote:

It would be nice to have better operating system support for this.
For example, IIUC, 64-bit Linux has 128TB of address space available
for user processes.  When you clone(), it can either share the entire
address space (i.e. it's a thread) or none of it (i.e. it's a
process).  There's no option to, say, share 64TB and not the other
64TB, which would be ideal for us.  We could then map dynamic shared
memory segments into the shared portion of the address space and do
backend-private allocations in the unshared part.  Of course, even if
we had that, it wouldn't be portable, so who knows how much good it
would do.  But it would be awfully nice to have the option.

You can map a segment at fork time, and unmap it after forking. That
doesn't really use RAM, since it's supposed to be lazily allocated (it
can be forced to be so, I believe, with PROT_NONE and MAP_NORESERVE,
but I don't think that's portable).

That guarantees it's free.

Next, you can map shared memory at explicit addresses (linux's mmap
has support for that, and I seem to recall Windows did too).

All you have to do, is some book-keeping in shared memory (so all
processes can coordinate new mappings).
As far as I undersand the main advantage of DSM is that segment can be 
allocated at any time - not only at fork time.
And it is not because of memory consumption: even without unmap, 
allocation of some memory region doesn't cause loose pg physical memory. 
And there are usually no problem with exhaustion of virtual space at 
64-bit architecture. But using some combination of flags (as 
MAP_NORESERVE), it is usually possible to completely eliminate overhead 
of reserving some address range in virtual space. But mapping 
dynamically created segment (not at fork time) to the same address 
really seems to be a big challenge.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-09 Thread Claudio Freire

On Thu, Jan 9, 2014 at 4:24 PM, knizhnik knizh...@garret.ru wrote:
 On 01/09/2014 09:46 PM, Claudio Freire wrote:

 On Thu, Jan 9, 2014 at 2:22 PM, Robert Haas robertmh...@gmail.com wrote:

 It would be nice to have better operating system support for this.
 For example, IIUC, 64-bit Linux has 128TB of address space available
 for user processes.  When you clone(), it can either share the entire
 address space (i.e. it's a thread) or none of it (i.e. it's a
 process).  There's no option to, say, share 64TB and not the other
 64TB, which would be ideal for us.  We could then map dynamic shared
 memory segments into the shared portion of the address space and do
 backend-private allocations in the unshared part.  Of course, even if
 we had that, it wouldn't be portable, so who knows how much good it
 would do.  But it would be awfully nice to have the option.

 You can map a segment at fork time, and unmap it after forking. That
 doesn't really use RAM, since it's supposed to be lazily allocated (it
 can be forced to be so, I believe, with PROT_NONE and MAP_NORESERVE,
 but I don't think that's portable).

 That guarantees it's free.

 Next, you can map shared memory at explicit addresses (linux's mmap
 has support for that, and I seem to recall Windows did too).

 All you have to do, is some book-keeping in shared memory (so all
 processes can coordinate new mappings).

 As far as I undersand the main advantage of DSM is that segment can be
 allocated at any time - not only at fork time.
 And it is not because of memory consumption: even without unmap, allocation
 of some memory region doesn't cause loose pg physical memory. And there are
 usually no problem with exhaustion of virtual space at 64-bit architecture.
 But using some combination of flags (as MAP_NORESERVE), it is usually
 possible to completely eliminate overhead of reserving some address range in
 virtual space. But mapping dynamically created segment (not at fork time) to
 the same address really seems to be a big challenge.

At fork time I only wrote about reserving the address space. After
reserving it, all you have to do is implement an allocator that works
in shared memory (protected by a lwlock of course).

In essence, a hypothetical pg_dsm_alloc(region_name) would use regular
shared memory to coordinate returning an already mapped region (same
address which is guaranteed to work since we reserved that region), or
allocate one (within the reserved address space).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-09 Thread knizhnik


On 01/09/2014 11:09 PM, Amit Kapila wrote:

On Thu, Jan 9, 2014 at 12:21 AM, Robert Haas robertmh...@gmail.com wrote:

On Tue, Jan 7, 2014 at 10:20 PM, Amit Kapila amit.kapil...@gmail.com wrote:

On Tue, Jan 7, 2014 at 2:46 AM, Robert Haas robertmh...@gmail.com wrote:

Well, right now we just reopen the same object from all of the
processes, which seems to work fine and doesn't require any of this
complexity.  The only problem I don't know how to solve is how to make
a segment stick around for the whole postmaster lifetime.  If
duplicating the handle into the postmaster without its knowledge gets
us there, it may be worth considering, but that doesn't seem like a
good reason to rework the rest of the existing mechanism.

I think one has to try this to see if it works as per the need. If it's not
urgent, I can try this early next week?

Anything we want to get into 9.4 has to be submitted by next Tuesday,
but I don't know that we're going to get this into 9.4.

Using DuplicateHandle(), we can make segment stick for Postmaster
lifetime. I have used below test (used dsm_demo module) to verify:
Session - 1
select dsm_demo_create('this message is from session-1');
  dsm_demo_create
-
82712

Session - 2
-
select dsm_demo_read(82712);
dsm_demo_read

  this message is from session-1
(1 row)

Session-1
\q

-- till here it will work without DuplicateHandle as well

Session -2
select dsm_demo_read(82712);
dsm_demo_read

  this message is from session-1
(1 row)

Session -2
\q

Session -3
select dsm_demo_read(82712);
dsm_demo_read

  this message is from session-1
(1 row)

-- above shows that handle stays around.

Note -
Currently I have to bypass below code in dam_attach(), as it assumes
segment will not stay if it's removed from control file.

/*
* If we didn't find the handle we're looking for in the control
* segment, it probably means that everyone else who had it mapped,
* including the original creator, died before we got to this point.
* It's up to the caller to decide what to do about that.
*/
if (seg-control_slot == INVALID_CONTROL_SLOT)
{
dsm_detach(seg);
return NULL;
}


Could you let me know what exactly you are expecting in patch,
just a call to DuplicateHandle() after CreateFileMapping() or something
else as well?


As far as I understand DuplicateHandle() should really do the trick: 
protect segment from deallocation.

But should postmaster be somehow notified about this handle?
For example, if we really wants to delete this segment (drop extension), 
we should somehow make Postmaster to close this handle.

How it can be done?



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-09 Thread Amit Kapila

On Fri, Jan 10, 2014 at 1:00 AM, knizhnik knizh...@garret.ru wrote:
 On 01/09/2014 11:09 PM, Amit Kapila wrote:


 Using DuplicateHandle(), we can make segment stick for Postmaster
 lifetime. I have used below test (used dsm_demo module) to verify:

 As far as I understand DuplicateHandle() should really do the trick: protect
 segment from deallocation.
 But should postmaster be somehow notified about this handle?
 For example, if we really wants to delete this segment (drop extension), we
 should somehow make Postmaster to close this handle.
 How it can be done?

I think we need to use some form of IPC to communicate it to Postmaster.
I could not think of any other way atm.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-09 Thread knizhnik


On 01/09/2014 11:30 PM, Claudio Freire wrote:

On Thu, Jan 9, 2014 at 4:24 PM, knizhnik knizh...@garret.ru wrote:

On 01/09/2014 09:46 PM, Claudio Freire wrote:

On Thu, Jan 9, 2014 at 2:22 PM, Robert Haas robertmh...@gmail.com wrote:

It would be nice to have better operating system support for this.
For example, IIUC, 64-bit Linux has 128TB of address space available
for user processes.  When you clone(), it can either share the entire
address space (i.e. it's a thread) or none of it (i.e. it's a
process).  There's no option to, say, share 64TB and not the other
64TB, which would be ideal for us.  We could then map dynamic shared
memory segments into the shared portion of the address space and do
backend-private allocations in the unshared part.  Of course, even if
we had that, it wouldn't be portable, so who knows how much good it
would do.  But it would be awfully nice to have the option.

You can map a segment at fork time, and unmap it after forking. That
doesn't really use RAM, since it's supposed to be lazily allocated (it
can be forced to be so, I believe, with PROT_NONE and MAP_NORESERVE,
but I don't think that's portable).

That guarantees it's free.

Next, you can map shared memory at explicit addresses (linux's mmap
has support for that, and I seem to recall Windows did too).

All you have to do, is some book-keeping in shared memory (so all
processes can coordinate new mappings).

As far as I undersand the main advantage of DSM is that segment can be
allocated at any time - not only at fork time.
And it is not because of memory consumption: even without unmap, allocation
of some memory region doesn't cause loose pg physical memory. And there are
usually no problem with exhaustion of virtual space at 64-bit architecture.
But using some combination of flags (as MAP_NORESERVE), it is usually
possible to completely eliminate overhead of reserving some address range in
virtual space. But mapping dynamically created segment (not at fork time) to
the same address really seems to be a big challenge.

At fork time I only wrote about reserving the address space. After
reserving it, all you have to do is implement an allocator that works
in shared memory (protected by a lwlock of course).

In essence, a hypothetical pg_dsm_alloc(region_name) would use regular
shared memory to coordinate returning an already mapped region (same
address which is guaranteed to work since we reserved that region), or
allocate one (within the reserved address space).
Why do we need named segments? There is ShmemAlloc function in 
PostgreSQL API.
If RequestAddinShmemSpace can be used without requirement to place 
module in preloaded list, then isn't it enough for most extensions?

And ShmemInitHash can be used to maintain named regions if it is needed...

So if we have some reserved address space, do we actually need some 
special allocator for this space to allocate new segments in it?

Why existed API to shared memory is not enough?



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-09 Thread Claudio Freire

On Thu, Jan 9, 2014 at 4:39 PM, knizhnik knizh...@garret.ru wrote:
 At fork time I only wrote about reserving the address space. After
 reserving it, all you have to do is implement an allocator that works
 in shared memory (protected by a lwlock of course).

 In essence, a hypothetical pg_dsm_alloc(region_name) would use regular
 shared memory to coordinate returning an already mapped region (same
 address which is guaranteed to work since we reserved that region), or
 allocate one (within the reserved address space).

 Why do we need named segments? There is ShmemAlloc function in PostgreSQL
 API.
 If RequestAddinShmemSpace can be used without requirement to place module in
 preloaded list, then isn't it enough for most extensions?
 And ShmemInitHash can be used to maintain named regions if it is needed...

If you want to dynamically create the segments, you need some way to
identify them. That is, the name. Otherwise, RequestWhateverShmemSpace
won't know when to return an already-mapped region or not.

Mind you, the name can be a number. No need to make it a string.

 So if we have some reserved address space, do we actually need some special
 allocator for this space to allocate new segments in it?
 Why existed API to shared memory is not enough?

I don't know this existing API you mention. But I think this is quite
a specific case very unlikely to be serviced from existing APIs. You
need a data structure that can map names to regions, any hash map will
do, or even an array since one wouldn't expect it to be too big, or
require it to be too fast, and then you need to unmap the reserve
mapping and put a shared region there instead, before returning the
pointer to this shared region.

So, the special thing is, the book-keeping region sits in regular
shared memory, whereas the allocated regions sit in newly-created
segments. And segments are referenced by pointers (since the address
space is fixed and shared). Is there something like that already?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-09 Thread Claudio Freire

On Thu, Jan 9, 2014 at 4:48 PM, Claudio Freire klaussfre...@gmail.com wrote:
 On Thu, Jan 9, 2014 at 4:39 PM, knizhnik knizh...@garret.ru wrote:
 At fork time I only wrote about reserving the address space. After
 reserving it, all you have to do is implement an allocator that works
 in shared memory (protected by a lwlock of course).

 In essence, a hypothetical pg_dsm_alloc(region_name) would use regular
 shared memory to coordinate returning an already mapped region (same
 address which is guaranteed to work since we reserved that region), or
 allocate one (within the reserved address space).

 Why do we need named segments? There is ShmemAlloc function in PostgreSQL
 API.
 If RequestAddinShmemSpace can be used without requirement to place module in
 preloaded list, then isn't it enough for most extensions?
 And ShmemInitHash can be used to maintain named regions if it is needed...

 If you want to dynamically create the segments, you need some way to
 identify them. That is, the name. Otherwise, RequestWhateverShmemSpace
 won't know when to return an already-mapped region or not.

 Mind you, the name can be a number. No need to make it a string.

 So if we have some reserved address space, do we actually need some special
 allocator for this space to allocate new segments in it?
 Why existed API to shared memory is not enough?


Oh, I notice why the confusion now.

The reserve mapping I was proposing, was a MAP_NORESERVE with PROT_NONE.

Ie: forbidden access. Which guarantees the OS won't try to allocate
physical RAM to it.

You'd have to re-map it before using, so it's not like a regular
shared memory region where you can simply allocate pointers and
intersperse bookkeeping data in-place.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-09 Thread knizhnik


On 01/09/2014 11:48 PM, Claudio Freire wrote:

On Thu, Jan 9, 2014 at 4:39 PM, knizhnik knizh...@garret.ru wrote:

At fork time I only wrote about reserving the address space. After
reserving it, all you have to do is implement an allocator that works
in shared memory (protected by a lwlock of course).

In essence, a hypothetical pg_dsm_alloc(region_name) would use regular
shared memory to coordinate returning an already mapped region (same
address which is guaranteed to work since we reserved that region), or
allocate one (within the reserved address space).

Why do we need named segments? There is ShmemAlloc function in PostgreSQL
API.
If RequestAddinShmemSpace can be used without requirement to place module in
preloaded list, then isn't it enough for most extensions?
And ShmemInitHash can be used to maintain named regions if it is needed...

If you want to dynamically create the segments, you need some way to
identify them. That is, the name. Otherwise, RequestWhateverShmemSpace
won't know when to return an already-mapped region or not.

Mind you, the name can be a number. No need to make it a string.


So if we have some reserved address space, do we actually need some special
allocator for this space to allocate new segments in it?
Why existed API to shared memory is not enough?

I don't know this existing API you mention. But I think this is quite
a specific case very unlikely to be serviced from existing APIs. You
need a data structure that can map names to regions, any hash map will
do, or even an array since one wouldn't expect it to be too big, or
require it to be too fast, and then you need to unmap the reserve
mapping and put a shared region there instead, before returning the
pointer to this shared region.

So, the special thing is, the book-keeping region sits in regular
shared memory, whereas the allocated regions sit in newly-created
segments. And segments are referenced by pointers (since the address
space is fixed and shared). Is there something like that already?

By existed API I mostly mean 6 functions:

RequestAddinShmemSpace()
RequestAddinLWLocks()
ShmemInitStruct()
LWLockAssign()
ShmemAlloc()
ShmemInitHash()

If it will be possible to use this function without requirement for 
module to be included in shared_preload_libraries list, then do we 
really need DSM?

And it can be achieved by
1. Preserving address space (as you suggested)
2. Preserving some fixed number of free LWLocks (not very large  100).

I do not have something against creation of own allocator of named 
shared memory segments within preserved address space.
I just not sure if it is actually needed. In some sense 
RequestAddinShmemSpace() can be such allocator.






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-09 Thread Jim Nasby


On 1/9/14, 1:18 PM, knizhnik wrote:

So it is clear why do we need shared memory for parallel query execution. But 
why it has to be dynamic? Why it can not be preallocated at start time as most 
of other resources used by PostgreSQL?


That would limit us to doing something like allocating a fixed maximum of 
parallel processes (which might be workable) and only allocating a very small 
amount of memory for IPC. Small as in can only handle a small number of tuples. 
That sounds like a really inefficient way to shuffle data to and from parallel 
processes, especially because one or both sides would probably have to actually 
copy the data if we're doing it that way.

With DSM if you want to do something like a parallel sort each process can put 
their results into memory that the parent process can directly access.

Of course the other enormous win for DSM is it's the foundation for finally 
being able to resize things without a restart. For large dollar sites that 
ability would be hugely beneficial.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-08 Thread Robert Haas

On Tue, Jan 7, 2014 at 10:20 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Tue, Jan 7, 2014 at 2:46 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Jan 6, 2014 at 4:04 PM, james ja...@mansionfamily.plus.com wrote:
 The point remains that you need to duplicate it into every process that
 might
 want to use it subsequently, so it makes sense to DuplicateHandle into the
 parent, and then to advertise that  handle value publicly so that other
 child
 processes can DuplicateHandle it back into their own process.

 Well, right now we just reopen the same object from all of the
 processes, which seems to work fine and doesn't require any of this
 complexity.  The only problem I don't know how to solve is how to make
 a segment stick around for the whole postmaster lifetime.  If
 duplicating the handle into the postmaster without its knowledge gets
 us there, it may be worth considering, but that doesn't seem like a
 good reason to rework the rest of the existing mechanism.

 I think one has to try this to see if it works as per the need. If it's not
 urgent, I can try this early next week?

Anything we want to get into 9.4 has to be submitted by next Tuesday,
but I don't know that we're going to get this into 9.4.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-08 Thread knizhnik


On 01/08/2014 10:51 PM, Robert Haas wrote:

On Tue, Jan 7, 2014 at 10:20 PM, Amit Kapila amit.kapil...@gmail.com wrote:

On Tue, Jan 7, 2014 at 2:46 AM, Robert Haas robertmh...@gmail.com wrote:

On Mon, Jan 6, 2014 at 4:04 PM, james ja...@mansionfamily.plus.com wrote:

The point remains that you need to duplicate it into every process that
might
want to use it subsequently, so it makes sense to DuplicateHandle into the
parent, and then to advertise that  handle value publicly so that other
child
processes can DuplicateHandle it back into their own process.

Well, right now we just reopen the same object from all of the
processes, which seems to work fine and doesn't require any of this
complexity.  The only problem I don't know how to solve is how to make
a segment stick around for the whole postmaster lifetime.  If
duplicating the handle into the postmaster without its knowledge gets
us there, it may be worth considering, but that doesn't seem like a
good reason to rework the rest of the existing mechanism.

I think one has to try this to see if it works as per the need. If it's not
urgent, I can try this early next week?

Anything we want to get into 9.4 has to be submitted by next Tuesday,
but I don't know that we're going to get this into 9.4.


I wonder what is the intended use case of dynamic shared memory?
Is is primarly oriented on PostgreSQL extensions or it will be used also 
in PosatgreSQL core?
In case of extensions, shared memory may be needed to store some 
collected/calculated information which will be used by extension functions.


The main advantage of DSM (from my point of view) comparing with existed 
mechanism of preloaded extension is that it is not necessary to restart 
server to add new extension requiring shared memory.
DSM segment can be attached or created by _PG_init function of the 
loaded module.
But there will be not so much sense in this mechanism if this segment 
will be deleted when there are no more processes attached to it.
So to make DSM really useful for extension it needs some mechanism to 
pin segment in memory during all server/extension lifetime.


May be I am wrong, but I do not see some reasons for creating multiple 
DSM segments by the same extension.
And total number of DSM segments is expected to be not very large (10). 
The same is true for synchronization primitives (LWLocks for example) 
needed to synchronize access to this DSM segments. So I am not sure if 
possibility to place locks in DSM is really so critical...
We can just reserved some space for LWLocks which can be used by 
extension, so that LWLockAssign() can be used without 
RequestAddinLWLocks or RequestAddinLWLocks can be used not only from 
preloaded extension.


IMHO the main trouble with DSM is lack of guarantee that segment is 
always mapped to the same virtual address.
Without such guarantee it is not possible to use direct (normal) 
pointers inside DSM.

But there seems to be no reasonable solution.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-07 Thread Amit Kapila

On Tue, Jan 7, 2014 at 2:46 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Jan 6, 2014 at 4:04 PM, james ja...@mansionfamily.plus.com wrote:
 The point remains that you need to duplicate it into every process that
 might
 want to use it subsequently, so it makes sense to DuplicateHandle into the
 parent, and then to advertise that  handle value publicly so that other
 child
 processes can DuplicateHandle it back into their own process.

 Well, right now we just reopen the same object from all of the
 processes, which seems to work fine and doesn't require any of this
 complexity.  The only problem I don't know how to solve is how to make
 a segment stick around for the whole postmaster lifetime.  If
 duplicating the handle into the postmaster without its knowledge gets
 us there, it may be worth considering, but that doesn't seem like a
 good reason to rework the rest of the existing mechanism.

I think one has to try this to see if it works as per the need. If it's not
urgent, I can try this early next week?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-06 Thread james


On 06/01/2014 03:14, Robert Haas wrote:

That's up to the application.  After calling dsm_create(), you call
dsm_segment_handle() to get the 32-bit integer handle for that
segment.  Then you have to get that to the other process(es) somehow.
If you're trying to share a handle with a background worker, you can
stuff it in bgw_main_arg.  Otherwise, you'll probably need to store it
in the main shared memory segment, or a file, or whatever.
Well, that works for sysv shm, sure.  But I was interested (possibly 
from Konstantin)
how the handle transfer takes place at the moment, particularly if it is 
possible
to create additional segments dynamically.  I haven't looked at the 
extension at all.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-06 Thread james


On 06/01/2014 04:20, Amit Kapila wrote:

Duplicate handle should work, but we need to communicate the handle
to other process using IPC.
Only if the other process needs to use it.  The IPC is not to transfer 
the handle to
the other process, just to tell it which slot in its handle table 
contains the handle.
If you just want to ensure that its use-count never goes to zero, the 
receiver does

not need to know what the handle is.

However ...

The point remains that you need to duplicate it into every process that 
might

want to use it subsequently, so it makes sense to DuplicateHandle into the
parent, and then to advertise that  handle value publicly so that other 
child

processes can DuplicateHandle it back into their own process.

The handle value can change so you also need to refer to the handle in the
parent and map it in each child to the local equivalent.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-06 Thread Robert Haas

On Mon, Jan 6, 2014 at 4:04 PM, james ja...@mansionfamily.plus.com wrote:
 The point remains that you need to duplicate it into every process that
 might
 want to use it subsequently, so it makes sense to DuplicateHandle into the
 parent, and then to advertise that  handle value publicly so that other
 child
 processes can DuplicateHandle it back into their own process.

Well, right now we just reopen the same object from all of the
processes, which seems to work fine and doesn't require any of this
complexity.  The only problem I don't know how to solve is how to make
a segment stick around for the whole postmaster lifetime.  If
duplicating the handle into the postmaster without its knowledge gets
us there, it may be worth considering, but that doesn't seem like a
good reason to rework the rest of the existing mechanism.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-05 Thread Robert Haas

On Sat, Jan 4, 2014 at 3:27 PM, knizhnik knizh...@garret.ru wrote:
 1. I want IMCS to work with PostgreSQL versions not supporting DSM (dynamic
 shared memory), like 9.2, 9.3.1,...

Yeah.  If it's loaded at postmaster start time, then it can work with
any version.  On 9.4+, you could possibly make it work even if it's
loaded on the fly by using the dynamic shared memory facilities.
However, there are currently some limitations to those facilities that
make some things you might want to do tricky.  There are pending
patches to lift some of these limitations.

 2. IMCS is using PostgreSQL hash table implementation (ShmemInitHash,
 hash_search,...)
 May be I missed something - I just noticed DSM and have no chance to
 investigate it, but looks like hash table can not be allocated in DSM...

It wouldn't be very difficult to write an analog of ShmemInitHash() on
top of the dsm_toc patch that is currently pending.  A problem,
though, is that it's not currently possible to put LWLocks in dynamic
shared memory, and even spinlocks will be problematic if
--disable-spinlocks is used.  I'm due to write a post about these
problems; perhaps I should go do that.

 3. IMCS is allocating memory using ShmemAlloc. In case of using DSM I have
 to provide own allocator (although creation of non-releasing memory
 allocator should not be a big issue).

The dsm_toc infrastructure would solve this problem.

 4. Current implementation of DSM still suffers from 256Gb problem. Certainly
 I can create multiple segments and so provide workaround without using huge
 pages, but it complicates allocator.

So it sounds like DSM should also support huge pages somehow.  I'm not
sure what that should look like.

 5. I wonder if I dynamically add new DSM segment - will it be available for
 other PostgreSQL processes? For example I run query which loads data in IMCS
 and so needs more space and allocates new DSM segment. Then another query is
 executed by other PostgreSQL process which tries to access this data. This
 process is not forked from the process created this new DSM segment, so I do
 not understand how this segment will be mapped to the address space of this
 process, preserving address... Certainly I can prohibit dynamic extension of
 IMCS storage (hoping that in this case there will be no such problem with
 DSM). But in this case we will loose the main advantage of using DSM instead
 of old schema of plugin's private shared memory.

You can definitely dynamically add a new DSM segment; that's the point
of making it *dynamic* shared memory.  What's a bit tricky as things
stand today is making sure that it sticks around.  The current model
is that the DSM segment is destroyed when the last process unmaps it.
It would be easy enough to lift that limitation on systems other than
Windows; we could just add a dsm_keep_until_shutdown() API or
something similar.  But on Windows, segments are *automatically*
destroyed *by the operating system* when the last process unmaps them,
so it's not quite so clear to me how we can allow it there.  The main
shared memory segment is no problem because the postmaster always has
it mapped, even if no one else does, but that doesn't help for dynamic
shared memory segments.

 6. IMCS has some configuration parameters which has to be set through
 postgresql.conf. So in any case user has to edit postgresql.conf file.
 In case of using DSM it will be not necessary to add IMCS to
 shared_preload_libraries list. But I do not think that it is so restrictive
 and critical requirement, is it?

I don't really see a problem here.  One of the purposes of dynamic
shared memory (and dynamic background workers) is precisely that you
don't *necessarily* need to put extensions that use shared memory in
shared_preload_libraries - or in other words, you can add the
extension to a running server without restarting it.  If you know in
advance that you will want it, you probably still *want* to put it in
shared_preload_libraries, but part of the idea is that we can get away
from requiring that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-05 Thread james


On 05/01/2014 16:50, Robert Haas wrote:

  But on Windows, segments are*automatically*
destroyed*by the operating system*  when the last process unmaps them,
so it's not quite so clear to me how we can allow it there.  The main
shared memory segment is no problem because the postmaster always has
it mapped, even if no one else does, but that doesn't help for dynamic
shared memory segments.

Surely you just need to DuplicateHandle into the parent process?  If you
want to (tidily) dispose of it at some time, then you'll need to tell the
postmaster that you have done so and what the handle is in its process,
but if you just want it to stick around, then you can just pass it up.

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-05 Thread Robert Haas

On Sun, Jan 5, 2014 at 12:34 PM, james ja...@mansionfamily.plus.com wrote:
 On 05/01/2014 16:50, Robert Haas wrote:

  But on Windows, segments are *automatically*
 destroyed *by the operating system* when the last process unmaps them,
 so it's not quite so clear to me how we can allow it there.  The main
 shared memory segment is no problem because the postmaster always has
 it mapped, even if no one else does, but that doesn't help for dynamic
 shared memory segments.

 Surely you just need to DuplicateHandle into the parent process?  If you
 want to (tidily) dispose of it at some time, then you'll need to tell the
 postmaster that you have done so and what the handle is in its process,
 but if you just want it to stick around, then you can just pass it up.

Uh, I don't know, maybe?  Does the postmaster have to do something to
receive the duplicated handle, or can the child just throw it over the
wall to the parent and let it rot until the postmaster finally exits?
The latter would be nicer for our purposes, perhaps, as running more
code from within the postmaster is risky for us.  If a regular backend
process dies, the postmaster will restart everything and the database
will come back on line, but if the postmaster itself dies, we're hard
down.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-05 Thread knizhnik

From my point of view it is not a big problem that it is not possible 
to place LWLock in DSM.
I can allocate LWLocks in standard way - using RequestAddinLWLocks and 
use them for synchronization.


Concerning support of huge pages - actually I do not think that it 
should involve something more than just setting MAP_HUGETLB flag.
Allocation of correspondent number of huge pages should be done by 
system administrator.


And what I still do not completely understand - how DSM enforces that 
segment created by one PosatgreSQL process will be mapped to the same 
virtual memory address in all other PostgreSQL processes.
As far as I understand right now (with standard PostgreSQL shared memory 
segments) it is enforced by fork().
Shared memory segments are allocated in one process and all other 
processes are forked from this process inheriting this memory segments.


But if new DSM segment is allocated at during execution of some query, 
then we should add it to virtual space of all PostgreSQL processes. Even 
if we somehow notify them all about presence of new segment, there is 
absolutely no warranty that all of them can map this segment to the 
specified memory address (it can be for some reasons already used by 
some other shared object).
Or may be DSM doesn't guarantee than DSM segment is mapped to the same 
address in all processes?
In this case it significantly complicates DSM usage: it will not be 
possible to use direct pointers.


Can you clarify me please how dynamically allocated DSM segments will be 
shared by all PostgreSQL processes?



On 01/05/2014 08:50 PM, Robert Haas wrote:

On Sat, Jan 4, 2014 at 3:27 PM, knizhnik knizh...@garret.ru wrote:

1. I want IMCS to work with PostgreSQL versions not supporting DSM (dynamic
shared memory), like 9.2, 9.3.1,...

Yeah.  If it's loaded at postmaster start time, then it can work with
any version.  On 9.4+, you could possibly make it work even if it's
loaded on the fly by using the dynamic shared memory facilities.
However, there are currently some limitations to those facilities that
make some things you might want to do tricky.  There are pending
patches to lift some of these limitations.


2. IMCS is using PostgreSQL hash table implementation (ShmemInitHash,
hash_search,...)
May be I missed something - I just noticed DSM and have no chance to
investigate it, but looks like hash table can not be allocated in DSM...

It wouldn't be very difficult to write an analog of ShmemInitHash() on
top of the dsm_toc patch that is currently pending.  A problem,
though, is that it's not currently possible to put LWLocks in dynamic
shared memory, and even spinlocks will be problematic if
--disable-spinlocks is used.  I'm due to write a post about these
problems; perhaps I should go do that.


3. IMCS is allocating memory using ShmemAlloc. In case of using DSM I have
to provide own allocator (although creation of non-releasing memory
allocator should not be a big issue).

The dsm_toc infrastructure would solve this problem.


4. Current implementation of DSM still suffers from 256Gb problem. Certainly
I can create multiple segments and so provide workaround without using huge
pages, but it complicates allocator.

So it sounds like DSM should also support huge pages somehow.  I'm not
sure what that should look like.


5. I wonder if I dynamically add new DSM segment - will it be available for
other PostgreSQL processes? For example I run query which loads data in IMCS
and so needs more space and allocates new DSM segment. Then another query is
executed by other PostgreSQL process which tries to access this data. This
process is not forked from the process created this new DSM segment, so I do
not understand how this segment will be mapped to the address space of this
process, preserving address... Certainly I can prohibit dynamic extension of
IMCS storage (hoping that in this case there will be no such problem with
DSM). But in this case we will loose the main advantage of using DSM instead
of old schema of plugin's private shared memory.

You can definitely dynamically add a new DSM segment; that's the point
of making it *dynamic* shared memory.  What's a bit tricky as things
stand today is making sure that it sticks around.  The current model
is that the DSM segment is destroyed when the last process unmaps it.
It would be easy enough to lift that limitation on systems other than
Windows; we could just add a dsm_keep_until_shutdown() API or
something similar.  But on Windows, segments are *automatically*
destroyed *by the operating system* when the last process unmaps them,
so it's not quite so clear to me how we can allow it there.  The main
shared memory segment is no problem because the postmaster always has
it mapped, even if no one else does, but that doesn't help for dynamic
shared memory segments.


6. IMCS has some configuration parameters which has to be set through
postgresql.conf. So in any case user has to edit postgresql.conf file.
In case of using DSM

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-05 Thread james


On 05/01/2014 18:02, Robert Haas wrote:

On Sun, Jan 5, 2014 at 12:34 PM, jamesja...@mansionfamily.plus.com  wrote:

On 05/01/2014 16:50, Robert Haas wrote:

  But on Windows, segments are*automatically*
destroyed*by the operating system*  when the last process unmaps them,
so it's not quite so clear to me how we can allow it there.  The main
shared memory segment is no problem because the postmaster always has
it mapped, even if no one else does, but that doesn't help for dynamic
shared memory segments.

Surely you just need to DuplicateHandle into the parent process?  If you
want to (tidily) dispose of it at some time, then you'll need to tell the
postmaster that you have done so and what the handle is in its process,
but if you just want it to stick around, then you can just pass it up.

Uh, I don't know, maybe?  Does the postmaster have to do something to
receive the duplicated handle


In principle, no, so long as the child has a handle to the parent 
process that has

the appropriate permissions.  Given that these processes have a parent/child
relationship that shouldn't be too hard to arrange.

, or can the child just throw it over the
wall to the parent and let it rot until the postmaster finally exits?

Yes.  Though it might be a good idea to record the handle somewhere (perhaps
in a table) so that any potential issues from an insane system spamming 
the postmaster

with handles are apparent.

I'm intrigued - how are the handles shared between children that are peers
in the current scheme?  Some handle transfer must already be in place.

Could you share the handles to an immortal worker if you want to reduce any
potential impact on the postmaster?

The latter would be nicer for our purposes, perhaps, as running more
code from within the postmaster is risky for us.  If a regular backend
process dies, the postmaster will restart everything and the database
will come back on line, but if the postmaster itself dies, we're hard
down.

-- Robert Haas EnterpriseDB: http://www.enterprisedb.com The 
Enterprise PostgreSQL Company

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-05 Thread Robert Haas

On Sun, Jan 5, 2014 at 1:28 PM, knizhnik knizh...@garret.ru wrote:
 From my point of view it is not a big problem that it is not possible to
 place LWLock in DSM.
 I can allocate LWLocks in standard way - using RequestAddinLWLocks and use
 them for synchronization.

Sure, well, that works fine if you're being loaded from
shared_preload_libraries.  If you want to be able to load the
extension after startup time, though, it's no good.

 And what I still do not completely understand - how DSM enforces that
 segment created by one PosatgreSQL process will be mapped to the same
 virtual memory address in all other PostgreSQL processes.

It doesn't.  One process calls dsm_create() to create a shared memory
segment.  Other processes call dsm_attach() to attach it.  There's no
guarantee that they'll map it at the same address; they'll just map it
somewhere.

 Or may be DSM doesn't guarantee than DSM segment is mapped to the same
 address in all processes?
 In this case it significantly complicates DSM usage: it will not be possible
 to use direct pointers.

Yeah, that's where we're at.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-05 Thread Robert Haas

On Sun, Jan 5, 2014 at 1:44 PM, james ja...@mansionfamily.plus.com wrote:
 I'm intrigued - how are the handles shared between children that are peers
 in the current scheme?  Some handle transfer must already be in place.

That's up to the application.  After calling dsm_create(), you call
dsm_segment_handle() to get the 32-bit integer handle for that
segment.  Then you have to get that to the other process(es) somehow.
If you're trying to share a handle with a background worker, you can
stuff it in bgw_main_arg.  Otherwise, you'll probably need to store it
in the main shared memory segment, or a file, or whatever.

 Could you share the handles to an immortal worker if you want to reduce any
 potential impact on the postmaster?

You could, but this seems like this justification for spawning another
process, and how immortal is that worker really?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-05 Thread Amit Kapila

On Sun, Jan 5, 2014 at 11:04 PM, james ja...@mansionfamily.plus.com wrote:
 On 05/01/2014 16:50, Robert Haas wrote:

  But on Windows, segments are *automatically*
 destroyed *by the operating system* when the last process unmaps them,
 so it's not quite so clear to me how we can allow it there.  The main
 shared memory segment is no problem because the postmaster always has
 it mapped, even if no one else does, but that doesn't help for dynamic
 shared memory segments.

 Surely you just need to DuplicateHandle into the parent process?

   Ideally DuplicateHandle should work, but while going through Windows
   internals of shared memory functions on below link, I observed that
   they mentioned it that it will work for child proceess.
   http://msdn.microsoft.com/en-us/library/ms810613.aspx
   Refer section Inheriting and duplicating memory-mapped file object
   handles

  If you
 want to (tidily) dispose of it at some time, then you'll need to tell the
 postmaster that you have done so and what the handle is in its process,
 but if you just want it to stick around, then you can just pass it up.

Duplicate handle should work, but we need to communicate the handle
to other process using IPC.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-04 Thread David Fetter

I'm sorry I misunderstood about the extension you wrote.

Is there some way not to use shared memory for it?

Cheers,
David.

On Sat, Jan 04, 2014 at 11:46:25AM +0400, knizhnik wrote:
 Hi David,
 
 Sorry, but I do not completely understand your suggestions:
 
 1. IMCS really contains single patch file sysv_shmem.patch.
 Applying this patch is not mandatory for using IMCS: it just solves
 the problem with support of  256Gb of shared memory.
 Right now PostgreSQL is not able to use more than 256Gb shared
 buffers at Linux with standard 4kb pages.
 I have found proposal for using MAP_HUGETLB flag in commit fest:
 
 http://www.postgresql.org/message-id/20131125032920.ga23...@toroid.org
 
 but unfortunately it was rejected. Hugepages are intensively used by
 Oracle and I think that them will be useful for improving
 performance of PorstreSQL. So not just IMCS can benefit from this
 patch. My patch  is much more simple - I specially limited scope of
 this patch to one file. Certainly switch huge tlb on/off should be
 done through postgresql.conf configuration file.
 
 In any case - IMCS can be used without this patch: you just could
 not use more than 256Gb memory, even if your system has more RAM.
 
 2. I do not understand The add-on is not formatted as an EXTENSION
 IMCS was created as standard extension - I just look at the examples
 of other PostgreSQL extensions included in PostgreSQL distribution
 (for example pg_stat_statements). It can be added using create
 extension imcs and removed drop extension imcs commands.
 
 If there are some violations of PostgreSQL extensions rules, please
 let me know, I will fix them.
 But I thought that I have done everything in legal way.
 
 
 
 
 
 
 On 01/04/2014 03:21 AM, David Fetter wrote:
 On Thu, Jan 02, 2014 at 08:48:24PM +0400, knizhnik wrote:
 I want to announce implementation of In-Memory Columnar Store
 extension for PostgreSQL.
 Vertical representation of data is stored in PostgreSQL shared memory.
 Thanks for the hard work!
 
 I noticed a couple of things about this that probably need some
 improvement.
 
 1.  There are unexplained patches against other parts of PostgreSQL,
 which means that they may break other parts of PostgreSQL in equally
 inexplicable ways.  Please rearrange the patch so it doesn't require
 this.  This leads to:
 
 2.  The add-on is not formatted as an EXTENSION, which would allow
 people to add it or remove it cleanly.
 
 Would you be so kind as to fix these?
 
 Cheers,
 David.

-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-04 Thread knizhnik


On 01/04/2014 12:05 PM, David Fetter wrote:

I'm sorry I misunderstood about the extension you wrote.

Is there some way not to use shared memory for it?


No, IMCS (In-Memory Columnar Store) is storing data in shared memory.
Certainly I could allocate shared memory myself, but due to portability 
and easy maintenance reasons I decided to reuse PostgreSQL mechanism of 
shared memory. The only requirement is that IMSC extension (as well as 
pg_stat_statements extension) should be included in

shared_preload_libraries list in postgresql.conf.

IMCS memory is not somehow interleave with shared memory used for 
PostgreSQL shared buffers.
And the only limitation is this 2567Gb limit at Linux, which can be 
resolved using the patch included in IMCS distributive.






Cheers,
David.

On Sat, Jan 04, 2014 at 11:46:25AM +0400, knizhnik wrote:

Hi David,

Sorry, but I do not completely understand your suggestions:

1. IMCS really contains single patch file sysv_shmem.patch.
Applying this patch is not mandatory for using IMCS: it just solves
the problem with support of  256Gb of shared memory.
Right now PostgreSQL is not able to use more than 256Gb shared
buffers at Linux with standard 4kb pages.
I have found proposal for using MAP_HUGETLB flag in commit fest:

http://www.postgresql.org/message-id/20131125032920.ga23...@toroid.org

but unfortunately it was rejected. Hugepages are intensively used by
Oracle and I think that them will be useful for improving
performance of PorstreSQL. So not just IMCS can benefit from this
patch. My patch  is much more simple - I specially limited scope of
this patch to one file. Certainly switch huge tlb on/off should be
done through postgresql.conf configuration file.

In any case - IMCS can be used without this patch: you just could
not use more than 256Gb memory, even if your system has more RAM.

2. I do not understand The add-on is not formatted as an EXTENSION
IMCS was created as standard extension - I just look at the examples
of other PostgreSQL extensions included in PostgreSQL distribution
(for example pg_stat_statements). It can be added using create
extension imcs and removed drop extension imcs commands.

If there are some violations of PostgreSQL extensions rules, please
let me know, I will fix them.
But I thought that I have done everything in legal way.






On 01/04/2014 03:21 AM, David Fetter wrote:

On Thu, Jan 02, 2014 at 08:48:24PM +0400, knizhnik wrote:

I want to announce implementation of In-Memory Columnar Store
extension for PostgreSQL.
Vertical representation of data is stored in PostgreSQL shared memory.

Thanks for the hard work!

I noticed a couple of things about this that probably need some
improvement.

1.  There are unexplained patches against other parts of PostgreSQL,
which means that they may break other parts of PostgreSQL in equally
inexplicable ways.  Please rearrange the patch so it doesn't require
this.  This leads to:

2.  The add-on is not formatted as an EXTENSION, which would allow
people to add it or remove it cleanly.

Would you be so kind as to fix these?

Cheers,
David.

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-04 Thread Tom Lane

knizhnik knizh...@garret.ru writes:
 On 01/04/2014 12:05 PM, David Fetter wrote:
 Is there some way not to use shared memory for it?

 No, IMCS (In-Memory Columnar Store) is storing data in shared memory.

It would probably be better if it made use of the dynamic shared memory
features that exist in HEAD.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-04 Thread knizhnik


On 01/04/2014 11:11 PM, Tom Lane wrote:

knizhnik knizh...@garret.ru writes:

On 01/04/2014 12:05 PM, David Fetter wrote:

Is there some way not to use shared memory for it?

No, IMCS (In-Memory Columnar Store) is storing data in shared memory.

It would probably be better if it made use of the dynamic shared memory
features that exist in HEAD.

regards, tom lane


Thank you, I will try it.
But I have some concerns:

1. I want IMCS to work with PostgreSQL versions not supporting DSM 
(dynamic shared memory), like 9.2, 9.3.1,...


2. IMCS is using PostgreSQL hash table implementation (ShmemInitHash, 
hash_search,...)
May be I missed something - I just noticed DSM and have no chance to 
investigate it, but looks like hash table can not be allocated in DSM...


3. IMCS is allocating memory using ShmemAlloc. In case of using DSM I 
have to provide own allocator (although creation of non-releasing memory 
allocator should not be a big issue).


4. Current implementation of DSM still suffers from 256Gb problem. 
Certainly I can create multiple segments and so provide workaround 
without using huge pages, but it complicates allocator.


5. I wonder if I dynamically add new DSM segment - will it be available 
for other PostgreSQL processes? For example I run query which loads data 
in IMCS and so needs more space and allocates new DSM segment. Then 
another query is executed by other PostgreSQL process which tries to 
access this data. This process is not forked from the process created 
this new DSM segment, so I do not understand how this segment will be 
mapped to the address space of this process, preserving address... 
Certainly I can prohibit dynamic extension of IMCS storage (hoping that 
in this case there will be no such problem with DSM). But in this case 
we will loose the main advantage of using DSM instead of old schema of 
plugin's private shared memory.


6. IMCS has some configuration parameters which has to be set through 
postgresql.conf. So in any case user has to edit postgresql.conf file.
In case of using DSM it will be not necessary to add IMCS to 
shared_preload_libraries list. But I do not think that it is so 
restrictive and critical requirement, is it?




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-03 Thread David Fetter

On Thu, Jan 02, 2014 at 08:48:24PM +0400, knizhnik wrote:
 I want to announce implementation of In-Memory Columnar Store
 extension for PostgreSQL.
 Vertical representation of data is stored in PostgreSQL shared memory.

Thanks for the hard work!

I noticed a couple of things about this that probably need some
improvement.

1.  There are unexplained patches against other parts of PostgreSQL,
which means that they may break other parts of PostgreSQL in equally
inexplicable ways.  Please rearrange the patch so it doesn't require
this.  This leads to:

2.  The add-on is not formatted as an EXTENSION, which would allow
people to add it or remove it cleanly.

Would you be so kind as to fix these?

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

2014-01-03 Thread knizhnik


Hi David,

Sorry, but I do not completely understand your suggestions:

1. IMCS really contains single patch file sysv_shmem.patch.
Applying this patch is not mandatory for using IMCS: it just solves the 
problem with support of  256Gb of shared memory.
Right now PostgreSQL is not able to use more than 256Gb shared buffers 
at Linux with standard 4kb pages.

I have found proposal for using MAP_HUGETLB flag in commit fest:

http://www.postgresql.org/message-id/20131125032920.ga23...@toroid.org

but unfortunately it was rejected. Hugepages are intensively used by 
Oracle and I think that them will be useful for improving performance of 
PorstreSQL. So not just IMCS can benefit from this patch. My patch  is 
much more simple - I specially limited scope of this patch to one file. 
Certainly switch huge tlb on/off should be done through postgresql.conf 
configuration file.


In any case - IMCS can be used without this patch: you just could not 
use more than 256Gb memory, even if your system has more RAM.


2. I do not understand The add-on is not formatted as an EXTENSION
IMCS was created as standard extension - I just look at the examples of 
other PostgreSQL extensions included in PostgreSQL distribution
(for example pg_stat_statements). It can be added using create 
extension imcs and removed drop extension imcs commands.


If there are some violations of PostgreSQL extensions rules, please let 
me know, I will fix them.

But I thought that I have done everything in legal way.






On 01/04/2014 03:21 AM, David Fetter wrote:

On Thu, Jan 02, 2014 at 08:48:24PM +0400, knizhnik wrote:

I want to announce implementation of In-Memory Columnar Store
extension for PostgreSQL.
Vertical representation of data is stored in PostgreSQL shared memory.

Thanks for the hard work!

I noticed a couple of things about this that probably need some
improvement.

1.  There are unexplained patches against other parts of PostgreSQL,
which means that they may break other parts of PostgreSQL in equally
inexplicable ways.  Please rearrange the patch so it doesn't require
this.  This leads to:

2.  The add-on is not formatted as an EXTENSION, which would allow
people to add it or remove it cleanly.

Would you be so kind as to fix these?

Cheers,
David.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

38 matches

Mail list logo