[Monotone-devel] severe performance penalty for mtn log individual-file?

2006-07-10 Thread Georg-W. Koltermann
Hi,

is there a reason that mtn log individual-file has such a huge
performance penalty?  It is a pain to use currently:

+ time mtn log
4,93 real 4,15 user 0,16 sys
+ time mtn log src/com/msc/sdm/application/SdmVersion.java
  349,39 real   258,93 user 1,10 sys
hunter[16]$
hunter[19]$ mtn --version
monotone 0.27 (Basis-Revision: 341e4a18c594cec49896fa97bd4e74de7bee5827)
hunter[20]$
  

--
Regards,
Georg.



smime.p7s
Description: S/MIME Cryptographic Signature
___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] encrypted monotone (and digression on

2006-07-10 Thread Jeronimo Pellegrini
On Sun, Jul 09, 2006 at 12:10:42PM -0700, Nathaniel Smith wrote:
 Just noticed this project:
   http://aleph0.info/apso/
 Early stages, but might interest some people here.

Er... That page is terribly outdated. The project has gone through many
changes after I set up the page.
And I'm curious to know how you got that link, since I only told 5
guys about it.  :-)

I will try to update it later, but I am really busy these days, so
I'm not sure when I'll be able to do that.

 Currently proprietary licensed, though the webpage claims that will
 change.

I am trying to understand the implications of sayin GPL v2 or a later
version. GPL v3 seems to have problems with cryptography (and in
particular, that project can be used to hide source code, which is
something RMS would not like, I guess)
If it's released as v2 or later, then someone writes a plugin and
releass it under v3, and well, I'm not sure that would be good.

 I haven't looked at their technique yet; my plan to do something like
 this was to just teach mtn-dumb how to wrap encryption around each of
 its packets, and HMAC its merkle keys.  The advantage is that mtn-dumb
 is transport only; you can't get nearly so much encryption if you have
 to be able to do fancy VCS operations like finding heads, where you
 need indexing, etc.  So it's actually a good thing in an encrypted
 store if the only things it supports are push and pull.

What I did was to encrypt packets and store them in another database.
For other VC systems, I plan to encrypt deltas and any meta-information
necessary to rebuild the database.

 (I've been thinking about encrypted storage because I've been getting
 increasingly frustrated at moving files around by hand between
 desktop/laptop/school, and thinking how to write a runs-in-background,
 I-don't-have-to-think-about-it, promiscuously-communicating eager file
 transmitter, and the obvious solution is to use some simple wrappers
 around monotone.  But then you want to be able to use an untrusted
 host as a rendezvous point.

Exactly! :-)

J.



___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


[Monotone-devel] Automate stdio chunk size

2006-07-10 Thread Thomas Moschny
Hi all,

Currently, the automation stdio interface emits all data in chunks that do not 
exceed 1024 byte in size. This value was initially choosen somewhat randomly, 
but seems to be way too low.

Frontends like ViewMTN, Lara, Guitone, or TracMonotone make heavy use of the 
automation interface and thus could benefit from an increased chunk size.

For example, after setting the chunk size to 1M, reading mtn's ChangeLog 
through 'automate get_file' took roughly 1/10 of the time it took before.

So are there any objections against setting constants::automate_stdio_size to, 
say, 1MB? Is there anyone('s application) depending on smaller chunks? Should 
we increase the automate format version number?

Regards,
Thomas

-- 
Thomas Moschny  [EMAIL PROTECTED]


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] Automate stdio chunk size

2006-07-10 Thread Thomas Keller


Hi Thomas =)


[...]
So are there any objections against setting constants::automate_stdio_size to, 
say, 1MB? Is there anyone('s application) depending on smaller chunks? Should 
we increase the automate format version number?


As long as this chunk size is still reported by the chunk size part of 
stdio's output (cmdNo:err:[l|m]:chunkSize:) I have no problems 
with tweaking this value. I depend on a correct value there, nothing more.


Thomas
-- Developer of Guitone


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] Automate stdio chunk size

2006-07-10 Thread Thomas Moschny
On Monday 10 July 2006 13:59, Thomas Keller wrote:
 As long as this chunk size is still reported by the chunk size part of
 stdio's output (cmdNo:err:[l|m]:chunkSize:) I have no problems
 with tweaking this value. I depend on a correct value there, nothing more.

Sure, this won't be changed.

- Thomas M.

-- 
Thomas Moschny  [EMAIL PROTECTED]


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


[Monotone-devel] Re: Automate stdio chunk size

2006-07-10 Thread Bruce Stephens
Thomas Moschny [EMAIL PROTECTED] writes:

[...]

 So are there any objections against setting
 constants::automate_stdio_size to, say, 1MB? Is there anyone('s
 application) depending on smaller chunks? Should we increase the
 automate format version number?

How about makeing the chunk size settable using a new command (leaving
the default as it is)?  Or set the default to 1M (or BUFSIZ, or
something), and then clients that would deadlock have a way to set it
to something smaller.

I'm not sure whether 1M would cause a problem.  I wouldn't rule it out
for some simpler clients (using synchronous I/O and polling various
inputs).


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


[Monotone-devel] Re: severe performance penalty for mtn log individual-file?

2006-07-10 Thread Bruce Stephens
Georg-W. Koltermann [EMAIL PROTECTED] writes:

 is there a reason that mtn log individual-file has such a huge
 performance penalty?  It is a pain to use currently:

 + time mtn log
 4,93 real 4,15 user 0,16 sys
 + time mtn log src/com/msc/sdm/application/SdmVersion.java
   349,39 real   258,93 user 1,10 sys

I'm not sure.  There's a note here
http://venge.net/monotone/wiki/PerformanceWork, and it's been
discussed on the list before.

Intuitively it feels as though mtn log file should be a bit slower
than mtn log, because it's doing mtn log and filtering the
messages.

The note on the wiki seems unconvincing since mtn log shows which
files have been changed in each revision.  OK, there's the issue of
file renaming, but (intuitively) surely that can't be *that* costly to
compute, especially since renames are so relatively rare?

[...]



___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


[Monotone-devel] Re: severe performance penalty for mtn log individual-file?

2006-07-10 Thread Bruce Stephens
Bruce Stephens [EMAIL PROTECTED] writes:

[...]

 I'm not sure.  There's a note here
 http://venge.net/monotone/wiki/PerformanceWork, and it's been
 discussed on the list before.

http://article.gmane.org/gmane.comp.version-control.monotone.devel/6213
gives a plausible explanation.


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] Re: Automate stdio chunk size

2006-07-10 Thread Thomas Moschny
On Monday 10 July 2006 14:45, Bruce Stephens wrote:
 How about makeing the chunk size settable using a new command (leaving
 the default as it is)?  Or set the default to 1M (or BUFSIZ, or
 something), and then clients that would deadlock have a way to set it
 to something smaller.

 I'm not sure whether 1M would cause a problem.  I wouldn't rule it out
 for some simpler clients (using synchronous I/O and polling various
 inputs).

After thinking a while about it, it is no longer clear to me, why there is a 
need for chunked output *at all* ...

The reading side of a pipe can always read the data in arbitrarily (and 
independently of the sender) sized packets, even when using synchronous I/O, 
by simply specifying the size in the read() call. The sender must of course 
check how many bytes of it's write() call actually got written.

- Thomas

-- 
Thomas Moschny  [EMAIL PROTECTED]


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] Re: severe performance penalty for mtn log individual-file?

2006-07-10 Thread Thomas Moschny
On Monday 10 July 2006 14:56, Bruce Stephens wrote:
 The note on the wiki seems unconvincing since mtn log shows which
 files have been changed in each revision.  OK, there's the issue of
 file renaming, but (intuitively) surely that can't be *that* costly to
 compute, especially since renames are so relatively rare?

There surely must be something wrong, because a quickly hacked python version 
of 'mtn log file' is *much* faster (and yes, it takes care of renames).

- Thomas

-- 
Thomas Moschny  [EMAIL PROTECTED]


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] Re: Automate stdio chunk size

2006-07-10 Thread Thomas Keller

Thomas Moschny wrote
After thinking a while about it, it is no longer clear to me, why there is a 
need for chunked output *at all* ...


The reading side of a pipe can always read the data in arbitrarily (and 
independently of the sender) sized packets, even when using synchronous I/O, 
by simply specifying the size in the read() call. The sender must of course 
check how many bytes of it's write() call actually got written.


Well, maybe there is no need for chunked output, but there is definitely 
the need for some EOF token which tells the client hey, I got all the 
data. Ideally this would be paired with the checksum of the just 
outputted data so the client can ensure that it got all data correctly.


Thomas Keller.


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] Re: Automate stdio chunk size

2006-07-10 Thread Timothy Brownawell
On Mon, 2006-07-10 at 16:09 +0200, Thomas Keller wrote:
 Thomas Moschny wrote
  After thinking a while about it, it is no longer clear to me, why there is 
  a 
  need for chunked output *at all* ...
  
  The reading side of a pipe can always read the data in arbitrarily (and 
  independently of the sender) sized packets, even when using synchronous 
  I/O, 
  by simply specifying the size in the read() call. The sender must of course 
  check how many bytes of it's write() call actually got written.
 
 Well, maybe there is no need for chunked output, but there is definitely 
 the need for some EOF token which tells the client hey, I got all the 
 data. Ideally this would be paired with the checksum of the just 
 outputted data so the client can ensure that it got all data correctly.

We can't use an in-stream EOF token, because the stream should be
binary-safe. So this means prefixing each data chunk with the size of
that chunk. A chunk is output when it reaches the maximum size (because
having a known maximum size seems convenient), or when the stream is
flushed (my understanding is that this is the Right Thing to do, plus it
could be nice if we have commands that take a long time to finish).

I think we need to keep the chunked output format, but there's no reason
not to increase the maximum size. Just that when we do we should bump
the interface version, since the maximum size is a documented part of
the interface. But, I'm not sure how important this is, since I doubt
anyone is relying on that.

There are changes to inventory in the works, that would require changing
the interface version anyway, perhaps we should increase the chunk size
at the same time we land that?

Tim




___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] Re: Automate stdio chunk size

2006-07-10 Thread Thomas Keller

We can't use an in-stream EOF token, because the stream should be
binary-safe. So this means prefixing each data chunk with the size of
that chunk. A chunk is output when it reaches the maximum size (because
having a known maximum size seems convenient), or when the stream is
flushed (my understanding is that this is the Right Thing to do, plus it
could be nice if we have commands that take a long time to finish).


Well, the EOF token wouldn't really have to be '\0', just something a 
parser could distinguish from the normal output flow. F.e. in emails the 
header is separated from the body by double newlines \n\n. If basic_io 
would become standard for all output of the automation interface there 
could even be some well-defined end token there, like


...

command_finished 1234...

where the 1234... part could be the checksum for the complete output 
echoed before that token.


Thomas.


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


[Monotone-devel] Re: Automate stdio chunk size

2006-07-10 Thread Bruce Stephens
Thomas Keller [EMAIL PROTECTED] writes:

 Well, the EOF token wouldn't really have to be '\0', just something a
 parser could distinguish from the normal output flow. F.e. in emails
 the header is separated from the body by double newlines \n\n. If
 basic_io would become standard for all output of the automation
 interface there could even be some well-defined end token there, like

 ...

 command_finished 1234...

 where the 1234... part could be the checksum for the complete output
 echoed before that token.

It's surely simpler just to say how many bytes the thing has, then
readers know to read that many bytes?

(Otherwise you'll have people worrying about the case where the
content somehow manages to contain its own checksum.)


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] Re: Automate stdio chunk size

2006-07-10 Thread Timothy Brownawell
On Mon, 2006-07-10 at 18:50 +0200, Thomas Keller wrote:
  We can't use an in-stream EOF token, because the stream should be
  binary-safe. So this means prefixing each data chunk with the size of
  that chunk. A chunk is output when it reaches the maximum size (because
  having a known maximum size seems convenient), or when the stream is
  flushed (my understanding is that this is the Right Thing to do, plus it
  could be nice if we have commands that take a long time to finish).
 
 Well, the EOF token wouldn't really have to be '\0', just something a 
 parser could distinguish from the normal output flow. F.e. in emails the 
 header is separated from the body by double newlines \n\n. If basic_io 
 would become standard for all output of the automation interface there 
 could even be some well-defined end token there, like

basic_io is not always appropriate, for example automate get_file.
This command also means that the output stream can contain arbirtrary
binary data, so no in-stream EOF token would be safe.

Tim




___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


[Monotone-devel] Re: Monotone-devel Digest, Vol 39, Issue 15

2006-07-10 Thread Eric Anderson
[EMAIL PROTECTED] writes:
  From: Nathaniel Smith [EMAIL PROTECTED]
  Subject: Re: [Monotone-devel] Re: Monotone-devel Digest, Vol 39, Issue
   15
  
  [ code to check that mtn process is still alive after sleep is wrong ]

I just saw the code in mtn.py that does a sleep(3) in order to wait
for the server to get going.  Take a look at the revised attached
patch which does the check and will also bail out quickly if the
sub-process fails.
-Eric


server-check.patch
Description: Binary data
___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] Re: Automate stdio chunk size

2006-07-10 Thread Thomas Moschny
On Monday 10 July 2006 17:48 Timothy Brownawell wrote:
 There are changes to inventory in the works, that would require changing
 the interface version anyway, perhaps we should increase the chunk size
 at the same time we land that?

Yes. And I think we should change the docs (for the new interface version) to 
*not* specify a maximum chunk size, thus allowing us to change it freely 
later, for example through a command line option.

- Thomas

-- 
Thomas Moschny [EMAIL PROTECTED]


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] Re: Automate stdio chunk size

2006-07-10 Thread Nathaniel Smith
On Mon, Jul 10, 2006 at 08:17:36PM +0200, Thomas Moschny wrote:
 On Monday 10 July 2006 17:48 Timothy Brownawell wrote:
  There are changes to inventory in the works, that would require changing
  the interface version anyway, perhaps we should increase the chunk size
  at the same time we land that?
 
 Yes. And I think we should change the docs (for the new interface version) to 
 *not* specify a maximum chunk size, thus allowing us to change it freely 
 later, for example through a command line option.

Err, yes, I'm sort of surprised that's in the docs at all.

The point of having an upper-limit is to put an upper bound on how
much memory monotone has to use.  1M seems a bit large for this
purpose, and I'm astonished if you actually have to go to 1M to get
the benefit.  Could someone run timings at different block sizes
and pick one that gives most of the speed benefit without being huge?

-- Nathaniel

-- 
Damn the Solar System.  Bad light; planets too distant; pestered with
comets; feeble contrivance; could make a better one myself.
  -- Lord Jeffrey


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] encrypted monotone (and digression on

2006-07-10 Thread Timothy Brownawell
On Mon, 2006-07-10 at 13:29 -0700, Rob Schoening wrote:

 but my question is really: how vulnerable is mtn serve today to DoS
 and buffer overrun type exploits?

DoS: It'd be fairly simple to make monotone eat all your CPU (or on an
SMP box, as much CPU as a single-threaded program can eat). If you give
someone write access, they can also fill up your disk.

Buffer overrun: We tend to not use fixed-size buffers, so I don't think
this is terribly likely.

Tim




___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Fwd: [Monotone-devel] encrypted monotone (and digression on

2006-07-10 Thread Nuno Lucas

Wrong reply button...

-- Forwarded message --
From: Nuno Lucas [EMAIL PROTECTED]
Date: Jul 10, 2006 10:08 PM
Subject: Re: [Monotone-devel] encrypted monotone (and digression on
To: Rob Schoening [EMAIL PROTECTED]


On 7/10/06, Rob Schoening [EMAIL PROTECTED] wrote:

I have a somewhat unrelated question that touches on a more fundamental
security issue:

what is the relative security risk of running netsync on a public port
assuming it's running as a non privileged user?  how much of a vulnerability
is it for the host that's serving it?


Any network server, even running as an unpreviledged user has it's
risks, because as soon as someone can run code on the server (by means
of program bugs) the chance of breaking it gets higher astronomically
(there can always be other programs running local on the server with
it's own bugs or even kernel bugs that allow a previledge escalation).

So, the risks are the same as any other server program, except the
fact you can (in theory) trust more other more stable programs than
the in-developing-phase monotone binaries.


it is, of course, relatively simple today to deploy mtn on a private port
and use SSH port forwarding to access it which all but eliminates this
problem. Or, eventually, one could use mtn dumb over SSH.


I believe that is the right answer if your data is of great value for
you, but I'll let others confirm this.


but my question is really: how vulnerable is mtn serve today to DoS and
buffer overrun type exploits?


The only defense against DoS attacks is you having more bandwidth than
your attacker so monotone can't do nothing against this. It's true,
though, that monotone heavy CPU usage makes it a bit more vulnerable
on this field.

Buffer overruns are program errors, so the only answer is either build
monotone with the gcc hardened options to avoid this (not 100% full
proof against all errrors, but can help a *lot*)  or peruse the source
code until you are certain there aren't any.


RS


Best regards,
~Nuno Lucas


___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


Re: [Monotone-devel] Re: Automate stdio chunk size

2006-07-10 Thread Nuno Lucas

On 7/10/06, Nathaniel Smith [EMAIL PROTECTED] wrote:

On Mon, Jul 10, 2006 at 08:17:36PM +0200, Thomas Moschny wrote:
The point of having an upper-limit is to put an upper bound on how
much memory monotone has to use.  1M seems a bit large for this
purpose, and I'm astonished if you actually have to go to 1M to get
the benefit.  Could someone run timings at different block sizes
and pick one that gives most of the speed benefit without being huge?


Note that this interface will probably be run on a pipe and different
block sizes will have a different impact on different operating
systems (and even between versions).

I remember there was a paper on the drastic differences in speed
between the pipe handling on Windows 2000 and XP for different block
sizes.
Maybe someone can recollect where that paper was...


Best regards,
~Nuno Lucas


-- Nathaniel



___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel


[Monotone-devel] Re: encrypted monotone (and digression on

2006-07-10 Thread Graydon Hoare

Rob Schoening wrote:
I have a somewhat unrelated question that touches on a more fundamental 
security issue:
 
what is the relative security risk of running netsync on a public port 
assuming it's running as a non privileged user?  how much of a 
vulnerability is it for the host that's serving it?


Nobody's shown us exploits yet, but it would be foolish for me to imply 
that none exist or are possible. I can point to a few things that might 
reassure you. Whether they do is another matter.


1. Monotone authenticates users (by RSA-signing a nonce and requesting 
an RSA signature in response) before anything else. One may be able to 
DoS the server (in a CPU sense) if anonymous requests are permitted; if 
you insist on authenticated connections from known clients, this risk is 
reduced.


2. Monotone does ::read() off a network socket and into a fixed-size 
stack buffer. However, it does this in exactly one place (netsync.cc, 
session::read_some()) and always issues the read call for the full 
length of the buffer, starting at its beginning, and never restarts the 
read or tries to mix parsing and reading.


3. That buffer is immediately appended to a heap std::string and data is 
parsed from there using safer extractor functions. The extractor 
functions all test the length of every extraction against the string 
length, and assert fatally if they are asked to pass the end of the 
string they're reading from. If there's insufficient data for a complete 
command packet during parsing, we give up and restart parsing from the 
string's beginning next time we receive data.


4. Other major parsing points are basic_io.{cc,hh} and xdelta.cc; it is 
possible that those contain logic that can be tricked into indexing past 
the end of the std::strings they're reading from. I'd be happy to go 
through them with a concerned reader doing an audit / inserting more 
dynamic checks / adding tests that try specific attacks.


5. With the exception of misbehavior in glibc during getaddrinfo() and 
setlocale(), we appear to be valgrind-clean.


6. You should be able to chroot / jail / zone / otherwise sandbox us, so 
long as we can access libstdc++, libc, libnss, and our database. We also 
need to be able to create transient journal files in the directory we 
keep the database in, as part of the page-transaction system in sqlite.


Still, it's a nontrivial program, you're right to be concerned. Even if 
you trust our code, we also inherit the possibility of vulnerabilities 
from sqlite, botan, lua, idna, and boost. We do a fair bit of input 
validation, don't call printf, are careful to avoid malloc/free or use 
of raw pointers, etc. but it's hard to be sure.


-graydon



___
Monotone-devel mailing list
Monotone-devel@nongnu.org
http://lists.nongnu.org/mailman/listinfo/monotone-devel