[Monotone-devel] severe performance penalty for mtn log individual-file?
Hi, is there a reason that mtn log individual-file has such a huge performance penalty? It is a pain to use currently: + time mtn log 4,93 real 4,15 user 0,16 sys + time mtn log src/com/msc/sdm/application/SdmVersion.java 349,39 real 258,93 user 1,10 sys hunter[16]$ hunter[19]$ mtn --version monotone 0.27 (Basis-Revision: 341e4a18c594cec49896fa97bd4e74de7bee5827) hunter[20]$ -- Regards, Georg. smime.p7s Description: S/MIME Cryptographic Signature ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
Re: [Monotone-devel] encrypted monotone (and digression on
On Sun, Jul 09, 2006 at 12:10:42PM -0700, Nathaniel Smith wrote: Just noticed this project: http://aleph0.info/apso/ Early stages, but might interest some people here. Er... That page is terribly outdated. The project has gone through many changes after I set up the page. And I'm curious to know how you got that link, since I only told 5 guys about it. :-) I will try to update it later, but I am really busy these days, so I'm not sure when I'll be able to do that. Currently proprietary licensed, though the webpage claims that will change. I am trying to understand the implications of sayin GPL v2 or a later version. GPL v3 seems to have problems with cryptography (and in particular, that project can be used to hide source code, which is something RMS would not like, I guess) If it's released as v2 or later, then someone writes a plugin and releass it under v3, and well, I'm not sure that would be good. I haven't looked at their technique yet; my plan to do something like this was to just teach mtn-dumb how to wrap encryption around each of its packets, and HMAC its merkle keys. The advantage is that mtn-dumb is transport only; you can't get nearly so much encryption if you have to be able to do fancy VCS operations like finding heads, where you need indexing, etc. So it's actually a good thing in an encrypted store if the only things it supports are push and pull. What I did was to encrypt packets and store them in another database. For other VC systems, I plan to encrypt deltas and any meta-information necessary to rebuild the database. (I've been thinking about encrypted storage because I've been getting increasingly frustrated at moving files around by hand between desktop/laptop/school, and thinking how to write a runs-in-background, I-don't-have-to-think-about-it, promiscuously-communicating eager file transmitter, and the obvious solution is to use some simple wrappers around monotone. But then you want to be able to use an untrusted host as a rendezvous point. Exactly! :-) J. ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
[Monotone-devel] Automate stdio chunk size
Hi all, Currently, the automation stdio interface emits all data in chunks that do not exceed 1024 byte in size. This value was initially choosen somewhat randomly, but seems to be way too low. Frontends like ViewMTN, Lara, Guitone, or TracMonotone make heavy use of the automation interface and thus could benefit from an increased chunk size. For example, after setting the chunk size to 1M, reading mtn's ChangeLog through 'automate get_file' took roughly 1/10 of the time it took before. So are there any objections against setting constants::automate_stdio_size to, say, 1MB? Is there anyone('s application) depending on smaller chunks? Should we increase the automate format version number? Regards, Thomas -- Thomas Moschny [EMAIL PROTECTED] ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
Re: [Monotone-devel] Automate stdio chunk size
Hi Thomas =) [...] So are there any objections against setting constants::automate_stdio_size to, say, 1MB? Is there anyone('s application) depending on smaller chunks? Should we increase the automate format version number? As long as this chunk size is still reported by the chunk size part of stdio's output (cmdNo:err:[l|m]:chunkSize:) I have no problems with tweaking this value. I depend on a correct value there, nothing more. Thomas -- Developer of Guitone ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
Re: [Monotone-devel] Automate stdio chunk size
On Monday 10 July 2006 13:59, Thomas Keller wrote: As long as this chunk size is still reported by the chunk size part of stdio's output (cmdNo:err:[l|m]:chunkSize:) I have no problems with tweaking this value. I depend on a correct value there, nothing more. Sure, this won't be changed. - Thomas M. -- Thomas Moschny [EMAIL PROTECTED] ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
[Monotone-devel] Re: Automate stdio chunk size
Thomas Moschny [EMAIL PROTECTED] writes: [...] So are there any objections against setting constants::automate_stdio_size to, say, 1MB? Is there anyone('s application) depending on smaller chunks? Should we increase the automate format version number? How about makeing the chunk size settable using a new command (leaving the default as it is)? Or set the default to 1M (or BUFSIZ, or something), and then clients that would deadlock have a way to set it to something smaller. I'm not sure whether 1M would cause a problem. I wouldn't rule it out for some simpler clients (using synchronous I/O and polling various inputs). ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
[Monotone-devel] Re: severe performance penalty for mtn log individual-file?
Georg-W. Koltermann [EMAIL PROTECTED] writes: is there a reason that mtn log individual-file has such a huge performance penalty? It is a pain to use currently: + time mtn log 4,93 real 4,15 user 0,16 sys + time mtn log src/com/msc/sdm/application/SdmVersion.java 349,39 real 258,93 user 1,10 sys I'm not sure. There's a note here http://venge.net/monotone/wiki/PerformanceWork, and it's been discussed on the list before. Intuitively it feels as though mtn log file should be a bit slower than mtn log, because it's doing mtn log and filtering the messages. The note on the wiki seems unconvincing since mtn log shows which files have been changed in each revision. OK, there's the issue of file renaming, but (intuitively) surely that can't be *that* costly to compute, especially since renames are so relatively rare? [...] ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
[Monotone-devel] Re: severe performance penalty for mtn log individual-file?
Bruce Stephens [EMAIL PROTECTED] writes: [...] I'm not sure. There's a note here http://venge.net/monotone/wiki/PerformanceWork, and it's been discussed on the list before. http://article.gmane.org/gmane.comp.version-control.monotone.devel/6213 gives a plausible explanation. ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
Re: [Monotone-devel] Re: Automate stdio chunk size
On Monday 10 July 2006 14:45, Bruce Stephens wrote: How about makeing the chunk size settable using a new command (leaving the default as it is)? Or set the default to 1M (or BUFSIZ, or something), and then clients that would deadlock have a way to set it to something smaller. I'm not sure whether 1M would cause a problem. I wouldn't rule it out for some simpler clients (using synchronous I/O and polling various inputs). After thinking a while about it, it is no longer clear to me, why there is a need for chunked output *at all* ... The reading side of a pipe can always read the data in arbitrarily (and independently of the sender) sized packets, even when using synchronous I/O, by simply specifying the size in the read() call. The sender must of course check how many bytes of it's write() call actually got written. - Thomas -- Thomas Moschny [EMAIL PROTECTED] ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
Re: [Monotone-devel] Re: severe performance penalty for mtn log individual-file?
On Monday 10 July 2006 14:56, Bruce Stephens wrote: The note on the wiki seems unconvincing since mtn log shows which files have been changed in each revision. OK, there's the issue of file renaming, but (intuitively) surely that can't be *that* costly to compute, especially since renames are so relatively rare? There surely must be something wrong, because a quickly hacked python version of 'mtn log file' is *much* faster (and yes, it takes care of renames). - Thomas -- Thomas Moschny [EMAIL PROTECTED] ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
Re: [Monotone-devel] Re: Automate stdio chunk size
Thomas Moschny wrote After thinking a while about it, it is no longer clear to me, why there is a need for chunked output *at all* ... The reading side of a pipe can always read the data in arbitrarily (and independently of the sender) sized packets, even when using synchronous I/O, by simply specifying the size in the read() call. The sender must of course check how many bytes of it's write() call actually got written. Well, maybe there is no need for chunked output, but there is definitely the need for some EOF token which tells the client hey, I got all the data. Ideally this would be paired with the checksum of the just outputted data so the client can ensure that it got all data correctly. Thomas Keller. ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
Re: [Monotone-devel] Re: Automate stdio chunk size
On Mon, 2006-07-10 at 16:09 +0200, Thomas Keller wrote: Thomas Moschny wrote After thinking a while about it, it is no longer clear to me, why there is a need for chunked output *at all* ... The reading side of a pipe can always read the data in arbitrarily (and independently of the sender) sized packets, even when using synchronous I/O, by simply specifying the size in the read() call. The sender must of course check how many bytes of it's write() call actually got written. Well, maybe there is no need for chunked output, but there is definitely the need for some EOF token which tells the client hey, I got all the data. Ideally this would be paired with the checksum of the just outputted data so the client can ensure that it got all data correctly. We can't use an in-stream EOF token, because the stream should be binary-safe. So this means prefixing each data chunk with the size of that chunk. A chunk is output when it reaches the maximum size (because having a known maximum size seems convenient), or when the stream is flushed (my understanding is that this is the Right Thing to do, plus it could be nice if we have commands that take a long time to finish). I think we need to keep the chunked output format, but there's no reason not to increase the maximum size. Just that when we do we should bump the interface version, since the maximum size is a documented part of the interface. But, I'm not sure how important this is, since I doubt anyone is relying on that. There are changes to inventory in the works, that would require changing the interface version anyway, perhaps we should increase the chunk size at the same time we land that? Tim ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
Re: [Monotone-devel] Re: Automate stdio chunk size
We can't use an in-stream EOF token, because the stream should be binary-safe. So this means prefixing each data chunk with the size of that chunk. A chunk is output when it reaches the maximum size (because having a known maximum size seems convenient), or when the stream is flushed (my understanding is that this is the Right Thing to do, plus it could be nice if we have commands that take a long time to finish). Well, the EOF token wouldn't really have to be '\0', just something a parser could distinguish from the normal output flow. F.e. in emails the header is separated from the body by double newlines \n\n. If basic_io would become standard for all output of the automation interface there could even be some well-defined end token there, like ... command_finished 1234... where the 1234... part could be the checksum for the complete output echoed before that token. Thomas. ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
[Monotone-devel] Re: Automate stdio chunk size
Thomas Keller [EMAIL PROTECTED] writes: Well, the EOF token wouldn't really have to be '\0', just something a parser could distinguish from the normal output flow. F.e. in emails the header is separated from the body by double newlines \n\n. If basic_io would become standard for all output of the automation interface there could even be some well-defined end token there, like ... command_finished 1234... where the 1234... part could be the checksum for the complete output echoed before that token. It's surely simpler just to say how many bytes the thing has, then readers know to read that many bytes? (Otherwise you'll have people worrying about the case where the content somehow manages to contain its own checksum.) ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
Re: [Monotone-devel] Re: Automate stdio chunk size
On Mon, 2006-07-10 at 18:50 +0200, Thomas Keller wrote: We can't use an in-stream EOF token, because the stream should be binary-safe. So this means prefixing each data chunk with the size of that chunk. A chunk is output when it reaches the maximum size (because having a known maximum size seems convenient), or when the stream is flushed (my understanding is that this is the Right Thing to do, plus it could be nice if we have commands that take a long time to finish). Well, the EOF token wouldn't really have to be '\0', just something a parser could distinguish from the normal output flow. F.e. in emails the header is separated from the body by double newlines \n\n. If basic_io would become standard for all output of the automation interface there could even be some well-defined end token there, like basic_io is not always appropriate, for example automate get_file. This command also means that the output stream can contain arbirtrary binary data, so no in-stream EOF token would be safe. Tim ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
[Monotone-devel] Re: Monotone-devel Digest, Vol 39, Issue 15
[EMAIL PROTECTED] writes: From: Nathaniel Smith [EMAIL PROTECTED] Subject: Re: [Monotone-devel] Re: Monotone-devel Digest, Vol 39, Issue 15 [ code to check that mtn process is still alive after sleep is wrong ] I just saw the code in mtn.py that does a sleep(3) in order to wait for the server to get going. Take a look at the revised attached patch which does the check and will also bail out quickly if the sub-process fails. -Eric server-check.patch Description: Binary data ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
Re: [Monotone-devel] Re: Automate stdio chunk size
On Monday 10 July 2006 17:48 Timothy Brownawell wrote: There are changes to inventory in the works, that would require changing the interface version anyway, perhaps we should increase the chunk size at the same time we land that? Yes. And I think we should change the docs (for the new interface version) to *not* specify a maximum chunk size, thus allowing us to change it freely later, for example through a command line option. - Thomas -- Thomas Moschny [EMAIL PROTECTED] ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
Re: [Monotone-devel] Re: Automate stdio chunk size
On Mon, Jul 10, 2006 at 08:17:36PM +0200, Thomas Moschny wrote: On Monday 10 July 2006 17:48 Timothy Brownawell wrote: There are changes to inventory in the works, that would require changing the interface version anyway, perhaps we should increase the chunk size at the same time we land that? Yes. And I think we should change the docs (for the new interface version) to *not* specify a maximum chunk size, thus allowing us to change it freely later, for example through a command line option. Err, yes, I'm sort of surprised that's in the docs at all. The point of having an upper-limit is to put an upper bound on how much memory monotone has to use. 1M seems a bit large for this purpose, and I'm astonished if you actually have to go to 1M to get the benefit. Could someone run timings at different block sizes and pick one that gives most of the speed benefit without being huge? -- Nathaniel -- Damn the Solar System. Bad light; planets too distant; pestered with comets; feeble contrivance; could make a better one myself. -- Lord Jeffrey ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
Re: [Monotone-devel] encrypted monotone (and digression on
On Mon, 2006-07-10 at 13:29 -0700, Rob Schoening wrote: but my question is really: how vulnerable is mtn serve today to DoS and buffer overrun type exploits? DoS: It'd be fairly simple to make monotone eat all your CPU (or on an SMP box, as much CPU as a single-threaded program can eat). If you give someone write access, they can also fill up your disk. Buffer overrun: We tend to not use fixed-size buffers, so I don't think this is terribly likely. Tim ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
Fwd: [Monotone-devel] encrypted monotone (and digression on
Wrong reply button... -- Forwarded message -- From: Nuno Lucas [EMAIL PROTECTED] Date: Jul 10, 2006 10:08 PM Subject: Re: [Monotone-devel] encrypted monotone (and digression on To: Rob Schoening [EMAIL PROTECTED] On 7/10/06, Rob Schoening [EMAIL PROTECTED] wrote: I have a somewhat unrelated question that touches on a more fundamental security issue: what is the relative security risk of running netsync on a public port assuming it's running as a non privileged user? how much of a vulnerability is it for the host that's serving it? Any network server, even running as an unpreviledged user has it's risks, because as soon as someone can run code on the server (by means of program bugs) the chance of breaking it gets higher astronomically (there can always be other programs running local on the server with it's own bugs or even kernel bugs that allow a previledge escalation). So, the risks are the same as any other server program, except the fact you can (in theory) trust more other more stable programs than the in-developing-phase monotone binaries. it is, of course, relatively simple today to deploy mtn on a private port and use SSH port forwarding to access it which all but eliminates this problem. Or, eventually, one could use mtn dumb over SSH. I believe that is the right answer if your data is of great value for you, but I'll let others confirm this. but my question is really: how vulnerable is mtn serve today to DoS and buffer overrun type exploits? The only defense against DoS attacks is you having more bandwidth than your attacker so monotone can't do nothing against this. It's true, though, that monotone heavy CPU usage makes it a bit more vulnerable on this field. Buffer overruns are program errors, so the only answer is either build monotone with the gcc hardened options to avoid this (not 100% full proof against all errrors, but can help a *lot*) or peruse the source code until you are certain there aren't any. RS Best regards, ~Nuno Lucas ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
Re: [Monotone-devel] Re: Automate stdio chunk size
On 7/10/06, Nathaniel Smith [EMAIL PROTECTED] wrote: On Mon, Jul 10, 2006 at 08:17:36PM +0200, Thomas Moschny wrote: The point of having an upper-limit is to put an upper bound on how much memory monotone has to use. 1M seems a bit large for this purpose, and I'm astonished if you actually have to go to 1M to get the benefit. Could someone run timings at different block sizes and pick one that gives most of the speed benefit without being huge? Note that this interface will probably be run on a pipe and different block sizes will have a different impact on different operating systems (and even between versions). I remember there was a paper on the drastic differences in speed between the pipe handling on Windows 2000 and XP for different block sizes. Maybe someone can recollect where that paper was... Best regards, ~Nuno Lucas -- Nathaniel ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel
[Monotone-devel] Re: encrypted monotone (and digression on
Rob Schoening wrote: I have a somewhat unrelated question that touches on a more fundamental security issue: what is the relative security risk of running netsync on a public port assuming it's running as a non privileged user? how much of a vulnerability is it for the host that's serving it? Nobody's shown us exploits yet, but it would be foolish for me to imply that none exist or are possible. I can point to a few things that might reassure you. Whether they do is another matter. 1. Monotone authenticates users (by RSA-signing a nonce and requesting an RSA signature in response) before anything else. One may be able to DoS the server (in a CPU sense) if anonymous requests are permitted; if you insist on authenticated connections from known clients, this risk is reduced. 2. Monotone does ::read() off a network socket and into a fixed-size stack buffer. However, it does this in exactly one place (netsync.cc, session::read_some()) and always issues the read call for the full length of the buffer, starting at its beginning, and never restarts the read or tries to mix parsing and reading. 3. That buffer is immediately appended to a heap std::string and data is parsed from there using safer extractor functions. The extractor functions all test the length of every extraction against the string length, and assert fatally if they are asked to pass the end of the string they're reading from. If there's insufficient data for a complete command packet during parsing, we give up and restart parsing from the string's beginning next time we receive data. 4. Other major parsing points are basic_io.{cc,hh} and xdelta.cc; it is possible that those contain logic that can be tricked into indexing past the end of the std::strings they're reading from. I'd be happy to go through them with a concerned reader doing an audit / inserting more dynamic checks / adding tests that try specific attacks. 5. With the exception of misbehavior in glibc during getaddrinfo() and setlocale(), we appear to be valgrind-clean. 6. You should be able to chroot / jail / zone / otherwise sandbox us, so long as we can access libstdc++, libc, libnss, and our database. We also need to be able to create transient journal files in the directory we keep the database in, as part of the page-transaction system in sqlite. Still, it's a nontrivial program, you're right to be concerned. Even if you trust our code, we also inherit the possibility of vulnerabilities from sqlite, botan, lua, idna, and boost. We do a fair bit of input validation, don't call printf, are careful to avoid malloc/free or use of raw pointers, etc. but it's hard to be sure. -graydon ___ Monotone-devel mailing list Monotone-devel@nongnu.org http://lists.nongnu.org/mailman/listinfo/monotone-devel