Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas
Le jeudi 07 février 2008, Greg Smith a écrit : Le mercredi 06 février 2008, Dimitri Fontaine a écrit : In other cases, a logical line is a physical line, so we start after first newline met from given lseek start position, and continue reading after the last lseek position until a newline. Now you're talking. Find a couple of split points that way, fine-tune the boundaries a bit so they rest on line termination points, and off you go. I was thinking of not even reading the file content from the controller thread, just decide splitting points in bytes (0..ST_SIZE/4 - ST_SIZE/4+1..2*ST_SIZE/4 etc) and let the reading thread fine-tune by beginning to process input after having read first newline, etc. And while we're still at the design board, I'm also thinking to add a per-section parameter (with a global default value possible) split_file_reading which defaults to False, and which you'll have to set True for pgloader to behave the way we're talking about. When split_file_reading = False and section_threads != 1 pgloader will have to manage several processing threads per section but only one file reading thread, giving the read input to processing theads in a round-robin fashion. In the future the processing thread choosing will possibly (another knob) be smarter than that, as soon as we get CE support into pgloader. When split_file_reading = True and section_threads != 1 pgloader will have to manage several processing threads per section, each one responsible of reading its own part of the file, processing boundaries to be discovered at reading time. Adding in here CE support in this case means managing two separate thread pools per section, one responsible of splitted file reading and another responsible of data buffering and routing (COPY to partition instead of to parent table). In both cases, maybe it would also be needed for pgloader to be able to have a separate thread for COPYing the buffer to the server, allowing it to continue preparing next buffer in the meantime? This will need some re-architecturing of pgloader, but it seems it worth it (I'm not entirely sold about the two thread-pools idea, though, and this last continue-reading-while-copying-idea still has to be examined). Some of the work needing to be done is by now quite clear for me, but a part of it still needs its design-time share. As usual though, the real hard part is knowing what we exactly want to get done, and we're showing good progress here :) Greg's behavior: max_threads = N max_parallel_sections = 1 section_threads = -1 split_file_reading= True Simon's behaviour: max_threads = N max_parallel_sections = 1 # I don't think Simon wants parallel sections section_threads = -1 split_file_reading= False Comments? -- dim signature.asc Description: This is a digitally signed message part.
Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas
On Thu, Feb 07, 2008 at 12:06:42PM -0500, Greg Smith wrote: On Thu, 7 Feb 2008, Dimitri Fontaine wrote: I was thinking of not even reading the file content from the controller thread, just decide splitting points in bytes (0..ST_SIZE/4 - ST_SIZE/4+1..2*ST_SIZE/4 etc) and let the reading thread fine-tune by beginning to process input after having read first newline, etc. The problem I was pointing out is that if chunk#2 moved foward a few bytes before it started reading in search of a newline, how will chunk#1 know that it's supposed to read up to that further point? You have to stop #1 from reading further when it catches up with where #2 started. Since the start of #2 is fuzzy until some reading is done, what you're describing will need #2 to send some feedback to #1 after they've both started, and that sounds bad to me. I like designs where the boundaries between threads are clearly defined before any of them start and none of them ever talk to the others. As long as both processes understand the start condition, there is not a problem. p1 starts at beginning and processes through chunk2 offset until it reaches the start condition. p2 starts loading from chunk2 offset plus the amount needed to reach the start condition, ... DBfile|---|--x--|x|-x--| x chunk1--- x chunk2 x chunk3---... As long as both pieces use the same test, they will each process non-overlapping segments of the file and still process 100% of the file. Ken In both cases, maybe it would also be needed for pgloader to be able to have a separate thread for COPYing the buffer to the server, allowing it to continue preparing next buffer in the meantime? That sounds like a V2.0 design to me. I'd only chase after that level of complexity if profiling suggests that's where the bottleneck really is. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas
On Thu, 7 Feb 2008, Dimitri Fontaine wrote: I was thinking of not even reading the file content from the controller thread, just decide splitting points in bytes (0..ST_SIZE/4 - ST_SIZE/4+1..2*ST_SIZE/4 etc) and let the reading thread fine-tune by beginning to process input after having read first newline, etc. The problem I was pointing out is that if chunk#2 moved foward a few bytes before it started reading in search of a newline, how will chunk#1 know that it's supposed to read up to that further point? You have to stop #1 from reading further when it catches up with where #2 started. Since the start of #2 is fuzzy until some reading is done, what you're describing will need #2 to send some feedback to #1 after they've both started, and that sounds bad to me. I like designs where the boundaries between threads are clearly defined before any of them start and none of them ever talk to the others. In both cases, maybe it would also be needed for pgloader to be able to have a separate thread for COPYing the buffer to the server, allowing it to continue preparing next buffer in the meantime? That sounds like a V2.0 design to me. I'd only chase after that level of complexity if profiling suggests that's where the bottleneck really is. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas
I was thinking of not even reading the file content from the controller thread, just decide splitting points in bytes (0..ST_SIZE/4 - ST_SIZE/4+1..2*ST_SIZE/4 etc) and let the reading thread fine-tune by beginning to process input after having read first newline, etc. The problem I was pointing out is that if chunk#2 moved foward a few bytes before it started reading in search of a newline, how will chunk#1 know that it's supposed to read up to that further point? You have to stop #1 from reading further when it catches up with where #2 started. Since the start of #2 is fuzzy until some reading is done, what you're describing will need #2 to send some feedback to #1 after they've both started, and that sounds bad to me. I like designs where the boundaries between threads are clearly defined before any of them start and none of them ever talk to the others. I don't think that any communication is needed beyond the beginning of the threads. Each thread knows that it should start at byte offset X and end at byte offset Y, but if Y happens to be in the middle of a record then just keep going until the end of the record. As long as the algorithm for reading past the end marker is the same as the algorithm for skipping past the beginning marker then all is well. -- Mark Lewis ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas
On Thu, 7 Feb 2008, Greg Smith wrote: The problem I was pointing out is that if chunk#2 moved foward a few bytes before it started reading in search of a newline, how will chunk#1 know that it's supposed to read up to that further point? You have to stop #1 from reading further when it catches up with where #2 started. Since the start of #2 is fuzzy until some reading is done, what you're describing will need #2 to send some feedback to #1 after they've both started, and that sounds bad to me. It doesn't have to be fuzzy at all. Both threads will presumably be able to use the same algorithm to work out where the boundary is, therefore they'll get the same result. No need to pass back information. Matthew -- There is something in the lecture course which may not have been visible so far, which is reality -- Computer Science Lecturer ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas
On Wed, 2008-02-06 at 12:27 +0100, Dimitri Fontaine wrote: Multi-Threading behavior and CE support -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Now, pgloader will be able to run N threads, each one loading some data to a partitionned child-table target. N will certainly be configured depending on the number of server cores and not depending on the partition numbers... So what do we do when reading a tuple we want to store in a partition which has no dedicated Thread started yet, and we already have N Threads running? I'm thinking about some LRU(Thread) to choose a Thread to terminate (launch COPY with current buffer and quit) and start a new one for the current partition target. Hopefully there won't be such high values of N that the LRU is a bad choice per see, and the input data won't be so messy to have to stop/start Threads at each new line. For me, it would be good to see a --parallel=n parameter that would allow pg_loader to distribute rows in round-robin manner to n different concurrent COPY statements. i.e. a non-routing version. Making that work well, whilst continuing to do error-handling seems like a challenge, but a very useful goal. Adding intelligence to the row distribution may be technically hard but may also simply move the bottleneck onto pg_loader. We may need multiple threads in pg_loader, or we may just need multiple sessions from pg_loader. Experience from doing the non-routing parallel version may help in deciding whether to go for the routing version. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas
Hi, I've been thinking about this topic some more, and as I don't know when I'll be able to go and implement it I'd want to publish the ideas here. This way I'll be able to find them again :) Le mardi 05 février 2008, Dimitri Fontaine a écrit : Le mardi 05 février 2008, Simon Riggs a écrit : Much better than triggers and rules, but it will be hard to get it to work. Well, I'm thinking about providing a somewhat modular approach where pgloader code is able to recognize CHECK constraints, load a module registered to the operator and data types, then use it. Here's how I think I'm gonna implement it: User level configuration -=-=-=-=-=-=-=-=-=- At user level, you will have to add a constraint_exclusion = on parameter to pgloader section configuration for it to bother checking if the destination table has some children etc. You'll need to provide also a global ce_path parameter (where to find user python constraint exclusion modules) and a ce_modules parameter for each section where constraint_exclusion = on: ce_modules = columnA:module:class, columnB:module:class As the ce_path could point to any number of modules where a single type is supported by several modules, I'll let the user choose which module to use. Constraint exclusion modules -=-=-=-=-=-=-=-=-=-=-=-=- The modules will provide one or several class(es) (kind of a packaging issue), each one will have to register which datatypes and operators they know about. Here's some pseudo-code of a module, which certainly is the best way to express a code design idea: class MyCE: def __init__(self, operator, constant, cside='r'): CHECK ( col operator constant ) = cside = 'r', could be 'l' ... @classmethod def support_type(cls, type): return type in ['integer', 'bigint', 'smallint', 'real', 'double'] @classmethod def support_operator(cls, op): return op in ['=', '', '', '=', '=', '%'] def check(self, op, data): if op == '' : return self.gt(data) ... def gt(self, data): if cside == 'l': return self.constant data elif cside == 'r': return data self.constant This way pgloader will be able to support any datatype (user datatype like IP4R included) and operator (@@, ~= or whatever). For pgloader to handle a CHECK() constraint, though, it'll have to be configured to use a CE class supporting the used operators and datatypes. PGLoader constraint exclusion support -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- The CHECK() constraint being a tree of check expressions[*] linked by logical operators, pgloader will have to build some logic tree of MyCE (user CE modules) and evaluate all the checks in order to be able to choose the input line partition. [*]: check((a % 10) = 1) makes an expression tree containing 2 check nodes After having parsed pg_constraint.consrc (not conbin which seems too much an internal dump for using it from user code) and built a CHECK tree for each partition, pgloader will try to decide if it's about range partitioning (most common case). If each partition CHECK tree is AND((a=b, ac) or a variation of it, we have range partitioning. Then surely we can optimize the code to run to choose the partition where to COPY data to and still use the module operator implementation, e.g. making a binary search on a partitions limits tree. If you want some other widely used (or not) partitioning scheme to be recognized and optimized by pgloader, just tell me and we'll see about it :) Having this step as a user module seems overkill at the moment, though. Multi-Threading behavior and CE support -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Now, pgloader will be able to run N threads, each one loading some data to a partitionned child-table target. N will certainly be configured depending on the number of server cores and not depending on the partition numbers... So what do we do when reading a tuple we want to store in a partition which has no dedicated Thread started yet, and we already have N Threads running? I'm thinking about some LRU(Thread) to choose a Thread to terminate (launch COPY with current buffer and quit) and start a new one for the current partition target. Hopefully there won't be such high values of N that the LRU is a bad choice per see, and the input data won't be so messy to have to stop/start Threads at each new line. Comments welcome, regards, -- dim signature.asc Description: This is a digitally signed message part.
Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas
Le mercredi 06 février 2008, Simon Riggs a écrit : For me, it would be good to see a --parallel=n parameter that would allow pg_loader to distribute rows in round-robin manner to n different concurrent COPY statements. i.e. a non-routing version. What happen when you want at most N parallel Threads and have several sections configured: do you want pgloader to serialize sections loading (often there's one section per table, sometimes different sections target the same table) but parallelise each section loading? I'm thinking we should have a global max_threads knob *and* and per-section max_thread one if we want to go this way, but then multi-threaded sections will somewhat fight against other sections (multi-threaded or not) for threads to use. So I'll also add a parameter to configure how many (max) sections to load in parallel at any time. We'll then have (default values presented): max_threads = 1 max_parallel_sections = 1 section_threads = -1 The section_threads parameter would be overloadable at section level but would need to stay = max_threads (if not, discarded, warning issued). When section_threads is -1, pgloader tries to have the higher number of them possible, still in the max_threads global limit. If max_parallel_section is -1, pgloader start a new thread per each new section, maxing out at max_threads, then it waits for a thread to finish before launching a new section loading. If you have N max_threads and max_parallel_sections = section_threads = -1, then we'll see some kind of a fight between new section threads and in section thread (the parallel non-routing COPY behaviour). But then it's a user choice. Adding in it the Constraint_Exclusion support would not mess it up, but it'll have some interest only when section_threads != 1 and max_threads 1. Making that work well, whilst continuing to do error-handling seems like a challenge, but a very useful goal. Quick tests showed me python threading model allows for easily sharing of objects between several threads, I don't think I'll need to adjust my reject code when going per-section multi-threaded. Just have to use a semaphore object to continue rejected one line at a time. Not that complex if reliable. Adding intelligence to the row distribution may be technically hard but may also simply move the bottleneck onto pg_loader. We may need multiple threads in pg_loader, or we may just need multiple sessions from pg_loader. Experience from doing the non-routing parallel version may help in deciding whether to go for the routing version. If non-routing per-section multi-threading is a user request and not that hard to implement (thanks to python), that sounds a good enough reason for me to provide it :) I'll keep you (and the list) informed as soon as I'll have the code to play with. -- dim signature.asc Description: This is a digitally signed message part.
Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas
On Wed, 6 Feb 2008, Simon Riggs wrote: For me, it would be good to see a --parallel=n parameter that would allow pg_loader to distribute rows in round-robin manner to n different concurrent COPY statements. i.e. a non-routing version. Let me expand on this. In many of these giant COPY situations the bottleneck is plain old sequential I/O to a single process. You can almost predict how fast the rows will load using dd. Having a process that pulls rows in and distributes them round-robin is good, but it won't crack that bottleneck. The useful approaches I've seen for other databases all presume that the data files involved are large enough that on big hardware, you can start multiple processes running at different points in the file and beat anything possible with a single reader. If I'm loading a TB file, odds are good I can split that into 4 or more vertical pieces (say rows 1-25%, 25-50%, 50-75%, 75-100%), start 4 loaders at once, and get way more than 1 disk worth of throughput reading. You have to play with the exact number because if you push the split too far you introduce seek slowdown instead of improvements, but that's the basic design I'd like to see one day. It's not parallel loading that's useful for the cases I'm thinking about until something like this comes around. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas
Hi Greg, On 2/6/08 7:56 AM, Greg Smith [EMAIL PROTECTED] wrote: If I'm loading a TB file, odds are good I can split that into 4 or more vertical pieces (say rows 1-25%, 25-50%, 50-75%, 75-100%), start 4 loaders at once, and get way more than 1 disk worth of throughput reading. You have to play with the exact number because if you push the split too far you introduce seek slowdown instead of improvements, but that's the basic design I'd like to see one day. It's not parallel loading that's useful for the cases I'm thinking about until something like this comes around. Just load 4 relfiles. You have to be able to handle partial relfiles, which changes the storage mgmt a bit, but the benefits are easier to achieve. - Luke ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas
Greg Smith wrote: On Wed, 6 Feb 2008, Simon Riggs wrote: For me, it would be good to see a --parallel=n parameter that would allow pg_loader to distribute rows in round-robin manner to n different concurrent COPY statements. i.e. a non-routing version. Let me expand on this. In many of these giant COPY situations the bottleneck is plain old sequential I/O to a single process. You can almost predict how fast the rows will load using dd. Having a process that pulls rows in and distributes them round-robin is good, but it won't crack that bottleneck. The useful approaches I've seen for other databases all presume that the data files involved are large enough that on big hardware, you can start multiple processes running at different points in the file and beat anything possible with a single reader. If I'm loading a TB file, odds are good I can split that into 4 or more vertical pieces (say rows 1-25%, 25-50%, 50-75%, 75-100%), start 4 loaders at once, and get way more than 1 disk worth of throughput reading. You have to play with the exact number because if you push the split too far you introduce seek slowdown instead of improvements, but that's the basic design I'd like to see one day. It's not parallel loading that's useful for the cases I'm thinking about until something like this comes around. Some food for thought here: Most BI Type applications which does data conversions/cleansing also might end up sorting the data before its loaded into a database so starting parallel loaders at Total different points ruins that effort. A More pragmatic approach will be to read the next rows from the input file So if there are N parallel streams then each one is offset by 1 from each other and jumps by N rows so the seeks are pretty much narrrowed down to few rows (ideally 1) instead of jumping 1/Nth rows every time a read happens. For example to replicate this with dd to see the impact use a big file and use the seek option and blocksizes .. Somebody out here once had done that test and showed that seek time on the file being read is reduced significantly and depending on the file system it does intelligent prefetching (which unfortunately UFS in Solaris does not do best by default) all the reads for the next stream will already be in memory. Regards, Jignesh ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas
Le mercredi 06 février 2008, Greg Smith a écrit : If I'm loading a TB file, odds are good I can split that into 4 or more vertical pieces (say rows 1-25%, 25-50%, 50-75%, 75-100%), start 4 loaders at once, and get way more than 1 disk worth of throughput reading. pgloader already supports starting at any input file line number, and limit itself to any number of reads: -C COUNT, --count=COUNT number of input lines to process -F FROMCOUNT, --from=FROMCOUNT number of input lines to skip So you could already launch 4 pgloader processes with the same configuration fine but different command lines arguments. It there's interest/demand, it's easy enough for me to add those parameters as file configuration knobs too. Still you have to pay for client to server communication instead of having the backend read the file locally, but now maybe we begin to compete? Regards, -- dim signature.asc Description: This is a digitally signed message part.
Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas
Improvements are welcome, but to compete in the industry, loading will need to speed up by a factor of 100. Note that Bizgres loader already does many of these ideas and it sounds like pgloader does too. - Luke Msg is shrt cuz m on ma treo -Original Message- From: Dimitri Fontaine [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 06, 2008 12:41 PM Eastern Standard Time To: pgsql-performance@postgresql.org Cc: Greg Smith Subject:Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas Le mercredi 06 février 2008, Greg Smith a écrit : If I'm loading a TB file, odds are good I can split that into 4 or more vertical pieces (say rows 1-25%, 25-50%, 50-75%, 75-100%), start 4 loaders at once, and get way more than 1 disk worth of throughput reading. pgloader already supports starting at any input file line number, and limit itself to any number of reads: -C COUNT, --count=COUNT number of input lines to process -F FROMCOUNT, --from=FROMCOUNT number of input lines to skip So you could already launch 4 pgloader processes with the same configuration fine but different command lines arguments. It there's interest/demand, it's easy enough for me to add those parameters as file configuration knobs too. Still you have to pay for client to server communication instead of having the backend read the file locally, but now maybe we begin to compete? Regards, -- dim
Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas
Le Wednesday 06 February 2008 18:37:41 Dimitri Fontaine, vous avez écrit : Le mercredi 06 février 2008, Greg Smith a écrit : If I'm loading a TB file, odds are good I can split that into 4 or more vertical pieces (say rows 1-25%, 25-50%, 50-75%, 75-100%), start 4 loaders at once, and get way more than 1 disk worth of throughput reading. pgloader already supports starting at any input file line number, and limit itself to any number of reads: In fact, the -F option works by having pgloader read the given number of lines but skip processing them, which is not at all what Greg is talking about here I think. Plus, I think it would be easier for me to code some stat() then lseek() then read() into the pgloader readers machinery than to change the code architecture to support a separate thread for the file reader. Greg, what would you think of a pgloader which will separate file reading based on file size as given by stat (os.stat(file)[ST_SIZE]) and number of threads: we split into as many pieces as section_threads section config value. This behaviour won't be available for sections where type = text and field_count(*) is given, cause in this case I don't see how pgloader could reliably recognize a new logical line beginning and start processing here. In other cases, a logical line is a physical line, so we start after first newline met from given lseek start position, and continue reading after the last lseek position until a newline. *:http://pgloader.projects.postgresql.org/#_text_format_configuration_parameters Comments? -- dim signature.asc Description: This is a digitally signed message part.
Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas
Le Wednesday 06 February 2008 18:49:56 Luke Lonergan, vous avez écrit : Improvements are welcome, but to compete in the industry, loading will need to speed up by a factor of 100. Oh, I meant to compete with internal COPY command instead of \copy one, not with the competition. AIUI competing with competition will need some PostgreSQL internal improvements, which I'll let the -hackers do :) Note that Bizgres loader already does many of these ideas and it sounds like pgloader does too. We're talking about how to improve pgloader :) -- dim signature.asc Description: This is a digitally signed message part.
Re: [PERFORM] Benchmark Data requested --- pgloader CE design ideas
On Wed, 6 Feb 2008, Dimitri Fontaine wrote: In fact, the -F option works by having pgloader read the given number of lines but skip processing them, which is not at all what Greg is talking about here I think. Yeah, that's not useful. Greg, what would you think of a pgloader which will separate file reading based on file size as given by stat (os.stat(file)[ST_SIZE]) and number of threads: we split into as many pieces as section_threads section config value. Now you're talking. Find a couple of split points that way, fine-tune the boundaries a bit so they rest on line termination points, and off you go. Don't forget that the basic principle here implies you'll never know until you're done just how many lines were really in the file. When thread#1 is running against chunk#1, it will never have any idea what line chunk#2 really started at until it reaches there, at which point it's done and that information isn't helpful anymore. You have to stop thinking in terms of lines for this splitting; all you can do is split the file into useful byte sections and then count the lines within them as you go. Anything else requires a counting scan of the file and such a sequential read is exactly what can't happen (especially not more than once), it just takes too long. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq