Re: [HACKERS] Segmentation fault with HEAD.
Rune, This is a readline issue of some sort. Basically this seg fault happens when you quit and the .psql_history file changes. I don't really know why it happens and what the right solution is but if you empty the contents of this file (or maybe delete it) then your seg fault will go away (until it will show up again in few months or so :-) ). This is not THE proper solution probably but it does the job. Alon. On 11/14/06 4:49 AM, Rune Bromer [EMAIL PROTECTED] wrote: Hi. I just compiled the version in HEAD with no problems. I connected with template1, made some tests and quited. I got the following error: template1=# \q psql(15213) malloc: *** error for object 0x1804e00: incorrect checksum for freed object - object was probably modified after being freed, break at szone_error to debug psql(15213) malloc: *** set a breakpoint in szone_error to debug Segmentation fault Now. I can't find a list of knows bug to compare agains, so sorry if this is a know bug. If you need more information, please let me know how to record these, and I will post them as soon as possible. Rune ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Segmentation fault with HEAD.
Thank you very much, I'll verify this tomorrow and return if the error persists. What led you to the conclusion that it was a readline error? I looked into it for a couple of hours, but as I'm new to the postgres source code I cound't find the error :) For the life of me I can't remember! That was some time ago. It sure wasn't obvious to me - took some time. I think few of my colleagues were having some issues with Apple's default readline and they had to install a 3rd party readline. I didn't get around to doing that yet. Is your error on a mac os? ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Unnecessary rescan for non scrollable holdable cursors
Hi, When persisting a holdable cursor at COMMIT time we currently choose to rewind the executor and re-scan the whole result set into the tuplestore in order to be able to scroll backwards later on. And then, we reposition the cursor to the position we been in. However, unless I am missing something, this seems to be done always, even if the cursor is not scrollable. I suppose adding a simple conditional or two in PersistHoldablePortal() in portalcmds.c could save the rescan and filling up the tuplestore with tuples that will never be looked at, in the case that we never want to scroll back. Anyway, definitely not critical, but should save some time and space in those specific situations. Regards, Alon. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Copy From Insert UNLESS
Alon Goldshuv on Bizgres has been working on this as well. Maybe you could collaborate? Alon? I would love to collaborate. The proposal is neat, however, I am not too excited about handling errors in such high granularity, as far as the user is concerned. I am more on the same line with Tom Lane's statement in Simon's thread (Practical error logging for very large COPY statements): The general problem that needs to be solved is trap any error that occurs during attempted insertion of a COPY row, and instead of aborting the copy, record the data and the error message someplace else. Seen in that light, implementing a special path for uniqueness violations is pretty pointless. But, I definitely share your struggle to finding a good way to handle those unique/FK constraints... Out of your current possible known solutions list: . Temporary table that filters out the evil tuples. . BEFORE TRIGGER handling the tuple if the constraint of interest is violated. . INSERT wrapped in a subtransaction. . (Other variations) I really don't like Temporary tables (too much user intervention) or subtransactions (slw). I also don't like using pg_loader for that manner, as although it's a nice tool, isolating errors with it for large data sets is impractical. I guess the BEFORE TRIGGER is the closest solution to what I would like to achieve. I think something can be done even without a trigger. We could trap any of the following: - bad data (any error before the tuple can be created). - domain constraints - check constraints - NOT NULL constraints As far as UNIQUE goes, maybe there is a good way to do a bt scan against the index table right before the simple_heap_insert call? Hopefully without too much code duplication. I am not too familiar with that code, so I don't have a very specific idea yet. I don't know how much slower things will become with this extra scan (I would think it will still be simpler and faster than a subtransaction), but I figure that there is a price to pay if you want single row error isolation. Otherwise, if the user wants to run COPY like it is currently (all data rows or nothing) they could still do it in the same speed using the current code path, bypassing the extra scan. Not sure this way very helpful, but these are my thoughts at this moment. Regards, Alon. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Libpq COPY optimization
I'll send it to -patches shortly Alon. On 1/23/06 10:20 PM, Bruce Momjian pgman@candle.pha.pa.us wrote: Can I get an updated patch for this? --- Tom Lane wrote: Alon Goldshuv [EMAIL PROTECTED] writes: Please help me understand this better. It appears to me that when the client-backend pipe fills up, pqSendSome() consumes any incoming NOTICE/WARNING messages before waiting, which should prevent deadlock. Hm, I had forgotten that the low-level pqSendSome routine does that. That makes the PQconsumeInput call in PQputCopyData redundant (or almost; see below). The parseInput call is still needed, because it's there to pull NOTICE messages out of the input buffer and get rid of them, rather than possibly having the input buffer grow to exceed memory. But when there's nothing for it to do, parseInput is cheap enough that there's no real need to bypass it. In short, if you just remove the PQconsumeInput call I think you'll find that it does what you want. The only case where it's helpful to have it there is if there's a incomplete message in the input buffer, as parseInput isn't quite so fast if it has to determine that the message is incomplete. Without the PQconsumeInput call, the incomplete-message state could persist for a long time, and you'd pay the parseInput overhead each time through PQputCopyData. However, that's certainly not the normal situation, so I think we could leave that case slightly pessimal. It's certainly true that that path in parseInput is a lot faster than a kernel call, so it'd still be better than it is now. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Libpq COPY optimization
Tom, It's not enough better, because it will still deadlock given a sufficiently large message-to-send. I don't think you can postpone the clearing-input action until after all the data is sent. regards, tom lane Please help me understand this better. It appears to me that when the client-backend pipe fills up, pqSendSome() consumes any incoming NOTICE/WARNING messages before waiting, which should prevent deadlock. I'll look at the code again, maybe I missed something. Thx, Alon. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] Libpq COPY optimization
The following is a suggestion for optimizing the libpq COPY FROM call for better performance. I submitted a similar suggestion awhile ago, but it wasn't safe enough. This one is better. It shows a pretty significant improvement while maintaining deadlock prevention. The change is localized to PQputCopyData, not requiring an alternate version of pqPutMsgEnd. The change is that before returning, PQputCopyData would process inbound NOTICEs *only if* the buffer was flushed. (and drop the other PQconsumeInput/parseInput calls in PQputCopyData) At the end of the copy operation, libpq's caller will call PQputCopyEnd followed by PQgetResult. PQputCopyEnd flushes the buffer, but I don't think we need to add any PQconsumeInput/ parseInput there, because any remaining inbound NOTICEs will be handled immediately afterward when libpq's caller calls PQgetResult. Alon. interrupt-checking-patch-for-libpq.patch.txt Description: Binary data ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] Libpq optimization
In the libpq COPY interface function PQputCopyData(): /* * Check for NOTICE messages coming back from the server. Since the * server might generate multiple notices during the COPY, we have to * consume those in a reasonably prompt fashion to prevent the comm * buffers from filling up and possibly blocking the server. */ if (!PQconsumeInput(conn)) return -1;/* I/O failure */ parseInput(conn); I moved it to a different location, just a bit further, after the check for is output buffer full and we are ready to flush? in the same function if ((conn-outBufSize - conn-outCount - 5) nbytes) { here } As the code comment suggests, it is extremely important to consume incoming messages from the server to prevent deadlock. However we should only worry about it before sending data out. Most calls to PQputCopyData don't actually send any data but just place it in the out buffer and return. Therefore we can perform the consumeinput/parseinput right before flushing, instead of reading from the server every time we call PQputCopyData and not send anything (which happens probably in 99% of the time). Right? Or am I missing something. This change improves COPY performance. thx Alon. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Libpq optimization
Tom, And, unfortunately, you've broken it. The pqFlush call visible in that routine is not the only place that may try to send data (see also pqPutMsgEnd). You are right, thanks for pointing that out. Still, in pqPutMsgEnd we will be sending data only after 8K is reached, which is about once in 80 for a 100 byte row size... Alon. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] COPY FROM performance improvements
This is a second iteration of a previous thread that didn't resolve few weeks ago. I made some more modifications to the code to make it compatible with the current COPY FROM code and it should be more agreeable this time. The main premise of the new code is that it improves the text data parsing speed by about 4-5x, resulting in total improvements that lie between 15% to 95% for data importing (higher range gains will occur on large data rows without many columns - implying more parsing and less converting to internal format). This is done by replacing a char-at-a-time parsing with buffered parsing and also using fast scan routines and minimum amount of loading/appending into line and attribute buf. The new code passes both COPY regression tests (copy, copy2) and doesn't break any of the others. It also supports encoding conversions (thanks Peter and Tatsuo and your feedback) and the 3 line-end types. Having said that, using COPY with different encodings was only minimally tested. We are looking into creating new tests and hopefully add them to postgres regression suite one day if it's desired by the community. This new code is improving the delimited data format parsing. BINARY and CSV will stay the same and will be executed separately for now (therefore there is some code duplication) In the future I plan to write improvements to the CSV path too, so that it will be executed without duplication of code. I am still missing supporting data that uses COPY_OLD_FE (question: what are the use cases? When will it be used? Please advise) I'll send out the patch soon. It's basically there to show that there is a way to load data faster. In future releases of the patch it will be more complete and elegant. I'll appreciate any comments/advices. Alon. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] NOLOGGING option, or ?
Bruce, Point taken. Now, you can argue that a different escape should be possible, or that some other escape syntax could be used, but the existing mechanism is clearly 100% reliable when used properly and not broken. I think that having an option for another escape syntax (such as using ESCAPE AS clause in delimited format (non-csv) COPY) with a default of '\\' is a good compromise that will allow users to escape their data (like COPY currently is), or by specifying another escape character allow all of their backslashes to be treated as data. I'll start a new discussion about it on a new thread soon. Thx, Alon. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] NOLOGGING option, or ?
2) A modified command syntax for introducing a direct single row error handling. By direct I mean - a row that if rejected from within the COPY command context does not throw an error and rollsback the whole transaction. Instead the error is caught and recorded elsewhere, maybe in some error table, with some more information that can later on be retrieved. The following rows continue to be processed. This way there is barely any error handling overhead. Is there any idea on exactly how would this be done? Do you plan on using savepoints to implement it? I fail to see how is this barely any overhead. Savepoints are not that expensive but they are not free either. (No, I haven't measured it.) Good question, I am not entirely sure if this is possible yet, as I didn't think it through entirely yet. I guess data errors could be divided into 2 main categories: mal-formed data where error is detected even before forming a tuple, and the other is errors that are caused by some constraint violation, that is, after the tuple is formed and inserted. From what I hear and experience the big majority of errors are of the first type. In that case the error could be caught, the data line + line number + error description could be inserted into an ERROR table (all TEXT fields), and then COPY can skip forming a tuple, and move to parsing the next. In this process there is barely any overhead. The more difficult part obviously is handling the second error type, which I haven't looked at yet deeply. Hopefully it is not impossible to do while keeping transaction integrity (Any ideas anyone?). The overhead for this one will probably be larger, but again, we expect those to happen less (in most cases at least). Nevertheless, it is surely much faster than recursively narrowing down batch sizes. Alon. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] NOLOGGING option, or ?
I have been working on improving the COPY command performance and as a result also came up with other thoughts that may possibly be better off implemented in a new command (i.e LOAD DATA) rather than adding them to the existing COPY. The improvements I made were in the COPY parsing logic - changing a char-by-char parsing to a multi char buffered fast parsing, while using a bytebuffer which is faster than StringInfoData and minimizing the number of loads into the buffer. The data conversion and insertion parts of COPY I left untouched. As a result the parsing performance increased by about 550%, and the overall COPY performance increased by: Around 40% for 15 column (mixed types) table. Around 90% for 1 column table. (the difference betweeen the two is caused by data conversion overhead). I will post the patch and more numbers to the list later today with more details. I'll just comment now that it is only available for delimited ASCII input data when client and server encodings are the same. CSV and encoding conversions may be added later, this is merely to show that data could be loaded much faster. Here are some things that make me think a new LOAD command is a good idea: 1) There seem to be a possibility that many COPY modifications/improvements may be problematic to incorporate in the current postgres COPY code. Further more, it may be desired to keep the COPY command as is and also have a way to run an improved COPY command for purposes of backwards compatibility. 2) A modified command syntax for introducing a direct single row error handling. By direct I mean - a row that if rejected from within the COPY command context does not throw an error and rollsback the whole transaction. Instead the error is caught and recorded elsewhere, maybe in some error table, with some more information that can later on be retrieved. The following rows continue to be processed. This way there is barely any error handling overhead. Having a recursive row isolation into smaller batches is extremely expensive for non-small data sets. It's not an option for serious users. 3) maybe have an option to indicate the EOL (end of line) format in the command syntax. Current COPY code detects the line-end according to the first data line, this is problematic is 1st data line is mal formatted, and also this doesn't allow having any CR's for example in the data file when EOL is only a linefeed. That causes extra data errors in COPY processing. A CR is a valid data character. Specifying the line end in command syntax will save all this badness. 4) Data integrity and escaping improvements. My patch changes now treats all characters as data (unless it's an escaped delim or EOL) and therefore data integrity is preserved (take for example the following valid data field file:\new\bang that after COPY into the database, querying for it from psql will result is wrong data -- backslashes are gone, there are 2 data lines, and a bell will ring for \b!) However, some people that already got used to the postgres COPY escaping way may want to keep it. They could do so by still using the old COPY. 5) allow an ERRORLIMIT to allow control of aborting a load after a certain number of errors (and a pre-requisite for this is point number 2 above). 6) allow LIMIT and OFFSET, for files with header rows for example (could be done in COPY too). 7) Allow the blocks to be directly written to the table, rather than via the buffer cache. 8) Allow a bulk index insertion operation at the end of the LOAD step, if the data has been loaded in sorted order. Use something like the SORTED INDEXES statement on Oracle sql*loader to specify the sort order of the incoming data, so that the index build step can bypass another external sort before loading directly into the index. 9) allow for Simon's WAL bypass. I have surely missed some problems that hide behind the idea, but these points make me believe that LOAD DATA is a good idea. Alon. On 5/31/05 7:47 PM, Tom Lane [EMAIL PROTECTED] wrote: Simon Riggs [EMAIL PROTECTED] writes: Recent test results have shown a substantial performance improvement (+25%) if WAL logging is disabled for large COPY statements. How much of that is left after we fix the 64-bit-CRC issue? Now, I would like to discuss adding an enable_logging USERSET GUC, [ fear and loathing ... ] I don't like the idea of a GUC at all, and USERSET is right out. I think it would have to be system-wide (cf fsync) to be even implementable let alone somewhat predictable. Even if it could be done per-backend with reasonable semantics, random users should not get to make that decision --- it should be the DBA's call, which means it needs at least SUSET permissions. BTW, I'm sure you are the last one who needs to be reminded that any such thing breaks PITR completely. Which is surely sufficient reason not to let it be USERSET. regards, tom lane ---(end of
Re: [HACKERS] NOLOGGING option, or ?
Bruce, The patch is not there to show that something is broken is it there to show how things could be done in another way, which may or may not be more desireable. COPY works as designed. The idea that some guy we have never heard of is going to appear and rewrite COPY's processing and tell us that the existing code is actually broken seems pretty arrogant to me. If it is broken (meaning doesn't work as designed), please show us facts rather than conjecture. I am sure that the code works as designed. In my previous email I was referring to the fact that using COPY with a data field that happens to have backslashes in it, and then querying this data field you will get different results. For example do COPY of a field c:\\one\ten\nine and query for it. The result will be much different. This is a problem with clickstream data for example. That's all. It's very possible that there are reasons that I missed for why things are the way they are. (Please explain how you handle literal delimiters and nulls with no escape processing.) Escape processing is done only for these 2 cases. Thx, Alon. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] NOLOGGING option, or ?
yeah, this is all a bit confusing, but I *hope* I can clarify things here as I think I got a better understanding now. before that, let me just clarify that the performance improvements in the patch have nothing to do with the escaping mechanizm. Escapes could change. The performance gain in due to a buffered processing with minimal line/attribute buffer loads. I think that the basic issue is that there are some database users that would like to take their data and put it into the database without pre-processing it - regardless if there are any backslashes in it or 0x0D (CR's) etc... these are the users I am targeting in my patch as these are the users I ran into in the field. The only responsibility of these users is to explicitly escape any delimiter or 0x0A (LF) characters that they intend to have as DATA. that's all. On the other hand there are users that would like to pre-process their data with C-escape sequences (or alternatevly, users that already have their data escaped) - this is what the postgres COPY targets these days. 2 different ways to do it... none of them is right or wrong. Examples: users that my patch targets may have a data row as such (delim = '|', EOL = [LF]): c:\one\two|d:\ten\nine[LF] using the way i do escaping in my patch those 2 fields of data will end up in the DB as Field 1: c:\one\two Field 2: d:\ten\nine which is what the user would want. If they wanted to have a pipe char in the second field they could escape it as such: d:\ten\nine here is a pipe \| [LF] and no error will occur, and result will be: Field 2: d:\ten\nine here is a pipe | If you try to insert that first data line above using the existing COPY command you will get an undesired result: Field 1: c:one wo Field 2: d: ine Now, the other way around, users that do intend for their data to have escape sequences in it may have a line like this: that's a \t tab| and this is a \nline feed [LF] and will get the desired result of: Field 1: that's a tab Field 2: and this is a line feed while using my code they will get undesired results: Field 1: that's a \t tab Field 2: and this is a \nline feed so, basically it really depends on the target audience... Bruce, does that sounds right to you? Alon. -Original Message- From: [EMAIL PROTECTED] on behalf of Steve Atkins Sent: Wed 6/1/2005 10:47 PM To: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] NOLOGGING option, or ? On Wed, Jun 01, 2005 at 07:35:33PM -0700, Luke Lonergan wrote: I propose an extended syntax to COPY with a change in semantics to remove the default of WITH ESCAPE '\'. Er, doesn't this break existing database dumps? Yes, one of the previously stated reasons to create another command for loading data. Another possible approach is to keep the default, but allow the escape processing to be turned off. I've been following this thread, and I'm a little confused. Could you possibly clarify what you mean, by providing a couple of lines of input as it would be formatted with escape processing turned off - containing a text field with an embedded newline and tab and a null field. Cheers, Steve ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] NOLOGGING option, or ?
I've been following this thread, and I'm a little confused. Could you possibly clarify what you mean, by providing a couple of lines of input as it would be formatted with escape processing turned off - containing a text field with an embedded newline and tab and a null field. yeah, this is all a bit confusing, but I *hope* I can clarify things here as I think I got a better understanding now. before that, let me just clarify that the performance improvements in the patch have nothing to do with the escaping mechanizm. Escapes could change. The performance gain in due to a buffered processing with minimal line/attribute buffer loads. I think that the basic issue is that there are some database users that would like to take their data and put it into the database without pre-processing it - regardless if there are any backslashes in it or 0x0D (CR's) etc... these are the users I am targeting in my patch as these are the users I ran into in the field. The only responsibility of these users is to explicitly escape any delimiter or 0x0A (LF) characters that they intend to have as DATA. that's all. On the other hand there are users that would like to pre-process their data with C-escape sequences (or alternatevly, users that already have their data escaped) - this is what the postgres COPY targets these days. 2 different ways to do it... none of them is right or wrong. Examples: users that my patch targets may have a data row as such (delim = '|', EOL = [LF]): c:\one\two|d:\ten\nine[LF] using the way i do escaping in my patch those 2 fields of data will end up in the DB as Field 1: c:\one\two Field 2: d:\ten\nine which is what the user would want. If they wanted to have a pipe char in the second field they could escape it as such: d:\ten\nine here is a pipe \| [LF] and no error will occur, and result will be: Field 2: d:\ten\nine here is a pipe | If you try to insert that first data line above using the existing COPY command you will get an undesired result: Field 1: c:one wo Field 2: d: ine Now, the other way around, users that do intend for their data to have escape sequences in it may have a line like this: that's a \t tab| and this is a \nline feed [LF] and will get the desired result of: Field 1: that's a tab Field 2: and this is a line feed while using my code they will get undesired results: Field 1: that's a \t tab Field 2: and this is a \nline feed so, basically it really depends on the target audience... Bruce, does that sounds right to you? Alon. -Original Message- From: [EMAIL PROTECTED] on behalf of Steve Atkins Sent: Wed 6/1/2005 10:47 PM To: pgsql-hackers@postgresql.org Subject: Re: [HACKERS] NOLOGGING option, or ? On Wed, Jun 01, 2005 at 07:35:33PM -0700, Luke Lonergan wrote: I propose an extended syntax to COPY with a change in semantics to remove the default of WITH ESCAPE '\'. Er, doesn't this break existing database dumps? Yes, one of the previously stated reasons to create another command for loading data. Another possible approach is to keep the default, but allow the escape processing to be turned off. I've been following this thread, and I'm a little confused. Could you possibly clarify what you mean, by providing a couple of lines of input as it would be formatted with escape processing turned off - containing a text field with an embedded newline and tab and a null field. Cheers, Steve ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster --(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]