Re: Debian's problems, Debian's future

2002-04-12 Thread Robert Tiberius Johnson
On Wed, 2002-04-10 at 02:28, Anthony Towns wrote: 
 I think you'll find you're also unfairly weighting this against people
 who do daily updates. If you do an update once a month, it's not as much
 of a bother waiting a while to download the Packages files -- you're
 going to have to wait _much_ longer to download the packages themselves.
 
 I'd suggest your formula would be better off being:
 
   bandwidthcost = sum( x = 1..30, prob(x) * cost(x) / x )
 
 (If you update every day for a month, your cost isn't just one download,
 it's 30 downloads. If you update once a week for a month, your cost
 isn't that of a single download, it's four times that. The /x takes that
 into account)

I think it depends on what you're measuring.  I can think of two ways to
measure the goodness of these schemes (there are certainly others): 

1. What is the average bandwidth required at the server? 
2. What is the average bandwidth required at the client? 

The two questions are related: If users update after i days with
prob1(i), then the probability that a connection arriving at a server is
from a user updating after i days is 

prob2(i)=(prob1(i)/i)*norm, 

where norm is a normalization factor so the probabilities sum to 1. 
I've been looking at question 2, and you're suggesting that I look at
question 1, except you forgot the normalization factor.  I think this is
what you mean.  Please correct me if I've misunderstood. 

Anyway, here are the results you asked for.  I'm NOT including the
normalization factor for easier comparison with your numbers.  My diff
numbers are a little different from yours mainly because I charge 1K of
overhead for each file request. 

Diff scheme 
daysdspace  ebwidth
---
1   12.000K 342.00K
2   24.000K 171.20K
3   36.000K 95.900K
4   48.000K 58.500K
5   60.000K 38.800K
6   72.000K 27.900K
7   84.000K 21.800K
8   96.000K 18.200K
9   108.00K 16.100K
10  120.00K 14.900K
11  132.00K 14.100K
12  144.00K 13.700K
13  156.00K 13.400K
14  168.00K 13.300K
15  180.00K 13.100K

Checksum file scheme with 4 byte checksums:
bsize   dspace  ebwidth
---
20  312.50K 173.70K
40  156.30K 89.300K
60  104.20K 62.200K
80  78.100K 49.300K
100 62.500K 42.200K
120 52.100K 37.900K
140 44.600K 35.300K
160 39.100K 33.600K
180 34.700K 32.700K
200 31.300K 32.200K
220 28.400K 32.100K
240 26.000K 32.200K
260 24.000K 32.500K
280 22.300K 33.000K
300 20.800K 33.600K
320 19.500K 34.300K
340 18.400K 35.100K
360 17.400K 35.900K
380 16.400K 36.800K
400 15.600K 37.700K

I'm probably underestimating the bandwidth of the checksum file scheme. 
I'm pretty confident about the diff scheme estimates, though.

I think the performance of the two schemes is pretty close.  Even though
this looks pretty good for the checksum file scheme, I'm still partial
to the diff scheme because 

- The checksum file scheme bottoms out at 32K, but the diff scheme can
reduce transfers to 13K (using more disk space).

- I trust my estimates of the diff scheme more.  The rsync scheme will
definitely take more bandwidth than my estimates predict.

- As debian gets larger, the checksum files will get larger, and so the
bandwidth will get larger.  So over time, any advantage of the checksum
file scheme will disappear.

- The diff scheme is more flexible and easier to tune.  The checksum
file scheme has a sweet spot at 220 byt blocks.  Predicting the actual
value of this sweet spot may be hard in the real world.

Best,
Rob



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: Debian's problems, Debian's future

2002-04-12 Thread Robert Tiberius Johnson
On Wed, 2002-04-10 at 04:35, Michael Bramer wrote:
  Scheme Disk space Bandwidth
  ---
  Checksums (bwidth optimal)26K   81K
  diffs (4 days)32K  331K
  diffs (9 days)71K   66K
  diffs (20 days)  159K   27K
 
 can you explain your counts?

Sure.  At the end of this message is a script that you can use with the
program gp to recreate my numbers.  Here's a quick description:

Anthony Towns said that the average size of a diff between two Packages
files is 12K (after compression with bzip2).  So if the server keeps d
days of diffs, this will take about d*12K of disk space.  If I go for i
days without updating, then when I do update, if i = d, then I will
need to download i diff files, using about i*(12K + 1K) bandwidth.  (The
1K is for each GET request, since I'm downloading i files).  If i  d,
then I need to get the whole Packages.gz file, which I estimate as about
1.6M.  So let bwidth(d,i) = the amount of bandwidth used doing an update
after i days, where the server has kept d days of diffs.

So how much bandwidth is used _on average_?  Well, it depends on how
often everybody updates.  If everybody updates everyday, then everybody
would just need to download 1 diff, using 13K.  If everybody updates
every week, then the average bandwidth is 7*13K=91K.  In reality, we
don't know how often people update, but my guess is that people tend to
update often.  So I just guessed that the probability that someone
updates after i days is prob(i)=((2/3)^i)/2.  Why this formula?  It
seemed good at the time.  So then the average bandwidth used is

average_bwidth(d)=sum i=1,...,infinity of bwidth(d,i) * prob (i)

That's it for the diff stuff.

For the checksum scheme, the disk space required is the number of
checksums times the size of each checksum.  The number of checksums is
the size of Packages.gz divided by the block size.  Since the checksum
file has to be transferred to every client, the size of the checksum
file contributes to the bandwidth estimate, as well.  Additionally, I
estimate that 75 packages change in debian every day (derived by looking
at debian-devel-changes in Feb. and March).  Using a little probability,
I computed the average number of blocks in Packages.gz that will change
in i days.  I then estimate that each of these blocks will have to be
transferred during an update, and use that to estimate the amount of
bandwidth required for an update.  Then I average, as with the diff
scheme.  Let me know if you think there's any problems with this..

I've been playing around recently with a more realistic (in my opinion)
user model.  I now predict that the probability that a user will update
every i days is n/(i+1)^3 (n is a normalization factor).  I like this
model because it predicts that if a user hasn't updated in a long time,
it'll probably be a long time before they update.  This seems
intuitively correct to me.  So here's some new numbers comparing the
diff scheme and the rsync scheme in this new user model.  In my opinion,
diff still wins.

These numbers use prob(i)=(n/(i+1)^3)/i, so these numbers are the
average bandwidth, averaged over what the server sees.  For an
explanation of this, see
http://lists.debian.org/debian-devel/2002/debian-devel-200204/msg01076.html.

Diffs:
daysdspace  ebwidth
---
1   12.000K 296.80K
2   24.000K 110.70K
3   36.000K 58.800K
4   48.000K 39.100K
5   60.000K 30.000K
6   72.000K 25.300K
7   84.000K 22.600K
8   96.000K 21.000K
9   108.00K 19.900K
10  120.00K 19.200K
11  132.00K 18.700K
12  144.00K 18.400K
13  156.00K 18.100K
14  168.00K 17.900K
15  180.00K 17.800K

Checksum files:
bsize   dspace  ebwidth
---
20  312.50K 315.40K
40  156.30K 161.10K
60  104.20K 111.00K
80  78.100K 86.800K
100 62.500K 73.100K
120 52.100K 64.600K
140 44.600K 59.100K
160 39.100K 55.500K
180 34.700K 53.000K
200 31.300K 51.500K
220 28.400K 50.500K
240 26.000K 50.100K
260 24.000K 50.000K
280 22.300K 50.100K
300 20.800K 50.500K
320 19.500K 51.100K
340 18.400K 51.800K
360 17.400K 52.700K
380 16.400K 53.700K
400 15.600K 54.700K


Best,
Rob

/***
 * Info about debian
 ***/

/* The number of packages in debian */
npkgs=8000.0

/* How big is Packages[.gz] as a function of the number of packages */
compressed_bytes_per_pkg=200.0
uncompressed_bytes_per_pkg=800.0

Re: Debian's problems, Debian's future

2002-04-12 Thread Robert Tiberius Johnson
On Wed, 2002-04-10 at 09:46, Erich Schubert wrote:
 What diff options do you use?
 As the diffs are expected to be applied to the correct version, they
 probably shouldn't contain the old data, but the new data only.

Good point.  I used diff -ed, so I think this is not including
unnecessary context info, as you suggest.  

Best,
Rob



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: Debian's problems, Debian's future

2002-04-12 Thread Anthony Towns
On Thu, Apr 11, 2002 at 10:40:31PM -0700, Robert Tiberius Johnson wrote:
 On Wed, 2002-04-10 at 02:28, Anthony Towns wrote: 
  I'd suggest your formula would be better off being:
  bandwidthcost = sum( x = 1..30, prob(x) * cost(x) / x )
 I think it depends on what you're measuring.  I can think of two ways to
 measure the goodness of these schemes (there are certainly others): 
 
 1. What is the average bandwidth required at the server? 
 2. What is the average bandwidth required at the client? 

I don't think the bandwidth at the server is a major issue to anyone,
although obviously improvements there are a Good Thing.

Personally, I think amount of time spent waiting for apt-get update
to finish is the important measure (well, apt-get update; apt-get
dist-upgrade is important too, but I don't thing we've seen any feasible
ideas at improving the latter).

 prob2(i)=(prob1(i)/i)*norm, 
 
 where norm is a normalization factor so the probabilities sum to 1. 
 I've been looking at question 2, and you're suggesting that I look at
 question 1, except you forgot the normalization factor.  I think this is
 what you mean.  Please correct me if I've misunderstood. 

No, I'm not. I'm saying that the amount of time spent waiting for
apt-get update needs to count every apt-get update you run, not just
the first. So, if over a period of a week, I run it seven times, and you
run it once, I wait seven times as long as you do, so it's seven times
more important to speed things up for me, than for you.

 Anyway, here are the results you asked for.  I'm NOT including the
 normalization factor for easier comparison with your numbers.  My diff
 numbers are a little different from yours mainly because I charge 1K of
 overhead for each file request. 

Merging, and reordering by decreasing estimated bandwidth. The ones marked
with *'s aren't worth considering because there's a method that's both
has less bandwidth required, and takes up less diskspace. The ones without
stars are thus ordered by increasing diskspace, and decreasing bandwidth.

 days/
 bsize dspace  ebwidth
 ---

Having the ebwidth of the current situation (everyone downloads the
entire Packages file) for comparison would be helpful.

 1 12.000K 342.00K [diff]
 20312.50K *   173.70K [cksum/rsync]
 2 24.000K *   171.20K [diff]
 3 36.000K *   95.900K [diff]
 40156.30K *   89.300K [cksum/rsync]
 60104.20K *   62.200K [cksum/rsync]
 4 48.000K *   58.500K [diff]
 8078.100K *   49.300K [cksum/rsync]
 100   62.500K *   42.200K [cksum/rsync]
 5 60.000K *   38.800K [diff]
 120   52.100K *   37.900K [cksum/rsync]
 400   15.600K 37.700K [cksum/rsync]
 380   16.400K 36.800K [cksum/rsync]
 360   17.400K 35.900K [cksum/rsync]
 140   44.600K *   35.300K [cksum/rsync]
 340   18.400K 35.100K [cksum/rsync]
 320   19.500K 34.300K [cksum/rsync]
 300   20.800K *   33.600K [cksum/rsync]
 160   39.100K *   33.600K [cksum/rsync]
 280   22.300K 33.000K [cksum/rsync]
 180   34.700K *   32.700K [cksum/rsync]
 260   24.000K 32.500K [cksum/rsync]
 240   26.000K 32.200K [cksum/rsync]
 200   31.300K *   32.200K [cksum/rsync]
 220   28.400K 32.100K [cksum/rsync]
 6 72.000K 27.900K [diff]
 7 84.000K 21.800K [diff]
 8 96.000K 18.200K [diff]
 9 108.00K 16.100K [diff]
 10120.00K 14.900K [diff]
 11132.00K 14.100K [diff]
 12144.00K 13.700K [diff]
 13156.00K 13.400K [diff]
 14168.00K 13.300K [diff]
 15180.00K 13.100K [diff]

180k is roughly 10% of the size of the corresponding Packages.gz, so
is relatively trivial. Since we'll probably do it at the same time as
dropping the uncompressed Packages file (sid/main/i386 alone is 6MB),
this is pretty neglible.

Cheers,
aj

-- 
Anthony Towns [EMAIL PROTECTED] http://azure.humbug.org.au/~aj/
I don't speak for anyone save myself. GPG signed mail preferred.

 ``BAM! Science triumphs again!'' 
-- http://www.angryflower.com/vegeta.gif


pgp6DeEYsec6i.pgp
Description: PGP signature


Re: Debian's problems, Debian's future

2002-04-12 Thread Robert Tiberius Johnson
On Fri, 2002-04-12 at 00:14, Anthony Towns wrote:
 No, I'm not. I'm saying that the amount of time spent waiting for
 apt-get update needs to count every apt-get update you run, not just
 the first. So, if over a period of a week, I run it seven times, and you
 run it once, I wait seven times as long as you do, so it's seven times
 more important to speed things up for me, than for you.

Got it.  Thanks for clearing that up.

 Having the ebwidth of the current situation (everyone downloads the
 entire Packages file) for comparison would be helpful.

Your right.  Here it is: old_ebwidth = 879K.

Best,
Rob



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: Debian's problems, Debian's future

2002-04-10 Thread Michael Bramer
On Wed, Apr 10, 2002 at 10:25:22AM +1000, Martijn van Oosterhout wrote:
 On Tue, Apr 09, 2002 at 05:02:34PM +0200, Michael Bramer wrote:
  you propose to add 'some' diff files for all files on ftp-master.d.o? 
  
  With rsync we need only one rsync-checksum file per normal file and
  all apt's need only download the neededs parts.
  
  You get the point?
 
 With the standard rsync algorithm, the rsync checksum files would actually
 be 8 times larger than the original file (you need to store the checksum
 for each possible block in the file).

I don't see that the checksum file is larger than the origanl file. If
the checksum file is larger, we will have more bytes to download... This
was not the goal.

 What you are suggesting is that the server store checksums for precalculated
 blocks on the server. This would be 4 bytes per 1k in the original file or
 so. The transaction proceeds as follows:
 
 1. Client asks for checksum list off server
 2. Client calculates checksums for local file
 3. Client compares list of server with list of client
 4. Client downloads changed regions.

Yes, this is the way..

 Note, this is not the rsync algorithm, but the one that is possibly
 patented.

maybe I don't understand the rsync algorithm...

IMHO the rsync algorithm is:
 1.) Computer beta splits file B in blocks.
 2.) calculate two checksums 
 a.) weak ``rolling'' 32-bit checksum
 b.) md5sum
 3.) Computer B send this to computer A.
 4.) Computer A search in file A for parts with the same checksums from
 file B
 5.) Computer A request unmatch blocks from computer B and 
 build the file B.

I get this from /usr/share/doc/rsync/tech_report.tex.gz

right? 

The _only_ difference is: precalculate the checksums on computer B

Or maybe store the calculated checksums in a /var/cache/rsync/ cache
dir. 


sorry, I know that partentes don't have any logic, but this is the same
algorithm, only with some cache. Comments?

Gruss
Grisu
-- 
Michael Bramer  -  a Debian Linux Developer  http://www.debsupport.de
PGP: finger [EMAIL PROTECTED]  -- Linux Sysadmin   -- Use Debian Linux
Nicht geschehene Taten ziehen oft einen erstaunlichen Mangel an Folgen 
 nach sich. --   S.J. Lec


pgpQY2Jd0eOPS.pgp
Description: PGP signature


Re: Debian's problems, Debian's future

2002-04-10 Thread Robert Tiberius Johnson
On Tue, 2002-04-09 at 17:25, Martijn van Oosterhout wrote:
 What you are suggesting is that the server store checksums for precalculated
 blocks on the server. This would be 4 bytes per 1k in the original file or
 so. The transaction proceeds as follows:
 
 1. Client asks for checksum list off server
 2. Client calculates checksums for local file
 3. Client compares list of server with list of client
 4. Client downloads changed regions.
 
 Note, this is not the rsync algorithm, but the one that is possibly
 patented.

This looks like an interesting algorithm, so I decided to compare it to
the diff scheme analyzed in 
http://lists.debian.org/debian-devel/2002/debian-devel-200204/msg00502.html

The above message also gives my analysis methodology.

The results:


- The following table summarizes the performance of the checksum-based
scheme and the diff-based scheme under the assumption that users tend to
perform apt-get update often.  I think disk space is cheap and bandwidth
is expensive, so 20 days of diffs is the best choice.

Scheme Disk space Bandwidth
---
Checksums (bwidth optimal)26K   81K
diffs (4 days)32K  331K
diffs (9 days)71K   66K
diffs (20 days)  159K   27K

- The analysis is unfairly favorable to the checksum scheme, because I
do not count the bandwidth required to request all the changed blocks,
only the bandwidth used to transmit the changed blocks.

- For the user model in the message above, the optimal block size for
this algorithm is around 245 bytes .

- In the diff-based scheme, each mirror can decide on a
diskspace/bandwidth tradeoff by simply keeping more old diffs or
deleting some old diffs.  The checksum-based scheme doesn't really
support tweaking at the mirror.

- I tend to update every day.  For people who update every day, the
diff-based scheme only needs to transfer about 8K, but the
checksum-based scheme needs to transfer 45K.  So for me, diffs are
better. :)

Best,
Rob



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: Debian's problems, Debian's future

2002-04-10 Thread Anthony Towns
On Wed, Apr 10, 2002 at 01:26:17AM -0700, Robert Tiberius Johnson wrote:
 - I tend to update every day.  For people who update every day, the
 diff-based scheme only needs to transfer about 8K, but the
 checksum-based scheme needs to transfer 45K.  So for me, diffs are
 better. :)

I think you'll find you're also unfairly weighting this against people
who do daily updates. If you do an update once a month, it's not as much
of a bother waiting a while to download the Packages files -- you're
going to have to wait _much_ longer to download the packages themselves.

I'd suggest your formula would be better off being:

bandwidthcost = sum( x = 1..30, prob(x) * cost(x) / x )

(If you update every day for a month, your cost isn't just one download,
it's 30 downloads. If you update once a week for a month, your cost
isn't that of a single download, it's four times that. The /x takes that
into account)

Bandwidth cost, then is something like the average amount downloaded
by a testing/unstable user per day to update main.

My results, are something like:

 0 days of diffs: 843.7 KiB (the current situation)
 1  day of diffs: 335.7 KiB
 2 days of diffs: 167.7 KiB
 3 days of diffs:  93.7 KiB
 4 days of diffs:  56.9 KiB
 5 days of diffs:  37.5 KiB
 6 days of diffs:  26.8 KiB
 7 days of diffs:  20.7 KiB
 8 days of diffs:  17.2 KiB
 9 days of diffs:  15.1 KiB
10 days of diffs:  13.9 KiB
11 days of diffs:  13.2 KiB
12 days of diffs:  12.7 KiB
13 days of diffs:  12.4 KiB
14 days of diffs:  12.3 KiB
15 days of diffs:  12.2 KiB

...which pretty much matches what I'd expect: at the moment, just to
update main, people download around 1.2MB per day; if we let them just
download the diff against yesterday, the average would plunge to only
a couple of hundred k, and you rapidly reach the point of diminishing
returns.

I used figures of 1.5MiB for the standard gzipped Packages file you
download if you can't use diffs, and 12KiB for the size of each daily
diff -- if you're three days out of date, you download three diffs and
apply them in order to get up to date. 12KiB is the average size of
daily bzip2'ed --ed diffs over last month for sid/main/i386.

The script I used for the above was (roughly):

#!/usr/bin/python

def cost_diff(day, ndiffs):
if day = ndiffs:
return 12 * 1024 * day
else:
return 1.5 * 1024 * 1024

def prob(d):
return (2.0 / 3.0) ** d / 2.0

def summate(f,p):
cost = 0.0
for d in range(1,31):
cost += f(d) * p(d) / d
return cost

for x in range(0,16):
print %s day/s of diffs: %.1f KiB % \
(x, summate(lambda y: cost_diff(y,x), prob) / 1024)


I'd be interested in seeing what the rsync stats look like with the
/ days factor added in.

Cheers,
aj

-- 
Anthony Towns [EMAIL PROTECTED] http://azure.humbug.org.au/~aj/
I don't speak for anyone save myself. GPG signed mail preferred.

 ``BAM! Science triumphs again!'' 
-- http://www.angryflower.com/vegeta.gif


pgpj6VLhgv0uw.pgp
Description: PGP signature


Re: Debian's problems, Debian's future

2002-04-10 Thread Anthony Towns
On Wed, Apr 10, 2002 at 07:28:42PM +1000, Anthony Towns wrote:
  0 days of diffs: 843.7 KiB (the current situation)

 ...which pretty much matches what I'd expect: at the moment, just to
 update main, people download around 1.2MB per day;

Uh, obviously this should be 843KiB. (I'd been playing with other
probabilities when I was writing the latter part. Tsktsk.)

Cheers,
aj

-- 
Anthony Towns [EMAIL PROTECTED] http://azure.humbug.org.au/~aj/
I don't speak for anyone save myself. GPG signed mail preferred.

 ``BAM! Science triumphs again!'' 
-- http://www.angryflower.com/vegeta.gif


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: Debian's problems, Debian's future

2002-04-10 Thread Richard Atterer
On Tue, Apr 09, 2002 at 10:58:24AM +0200, Michael Bramer wrote:
 On Tue, Apr 09, 2002 at 06:39:19PM +1000, Martijn van Oosterhout wrote:
  I beleive this method is patented by somebody, [snip]
 
 has someone a pointer? 

Here's some stuff from my mail archives - I haven't checked whether
the links still work. The following one probably doesn't, but looks
like the patent number is 6167407:

http://164.195.100.11/netacgi/nph-Parser?Sect1=PTO1Sect2=HITOFFd=PALLp=1u=/netahtml/srchnum.htmr=1f=Gl=50s1='6167407'.WKU.OS=PN/6167407RS=PN/6167407

Cheers,

  Richard

-- 
  __   _
  |_) /|  Richard Atterer |  CS student at the Technische  |  GnuPG key:
  | \/¯|  http://atterer.net  |  Universität München, Germany  |  0x888354F7
  ¯ '` ¯

- Forwarded message from Clifford Heath [EMAIL PROTECTED] -

Date: 29 Jan 2001 10:05:11 +1100
From: Clifford Heath [EMAIL PROTECTED]
Sender: [EMAIL PROTECTED]
To: Goswin Brederlow [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: reverse checksumming [legal] 

 What came first, rsync or the patent?

OSA refers to its issued patent US6006034 as the SmartPull patent.  This
isn't the patent that threatens rsync, though it might be relevant to some
of the preceeding discussion. I haven't considered whether there's any
overlap with rsync itself - in any case I doubt OSA would attempt to block
use of rsync.  We think that rsync is wonderful!

The patent that may overlap with rsync is US5446888 and its followup number
US5721907, with precedence date Jan 14 1994.  I am not a lawyer, but it
seems to directly conflict with rsync.  I've had correspondence with the
rights holders, as OSA wished to implement something similar in a product
but held off until licensing concerns were addressed. They are Travelling
Software Inc. (TSI), and refer to the technology as SpeedSync, using it in
their LapLink product line.  At the time of my last contact (August 1999),
Travelling Software had not made a determination if rsync infringes on any
intellectual property right of TSI or not.  Read the patent and decide for
yourself. I'm not qualified to hold a legal opinion.

The patent clearly identifies which operations are performed on the host
sending the file, and which on the host receiving it. We discovered a method
which reversed many of the operations with substantial benefit (as Tim
Adam has told you), and filed a patent to this effect, with a defensive
intent. This latest patent has not issued (it's pending).  So we have no
rights (yet!) to ask you to cease and desist from implementing and using it.
Be aware that this might change in the future. I personally believe (and
think OSA agrees) that it would be counter-productive to the industry as a
whole, but it's not my decision.  Who knows, OSA itself might be sold to
some sharks who think differently...

 This is just the rsync algorithm and thats probably way older than the
 patent, so the patent might not hold.

I don't believe that rsync is older, but in any case it's difficult and
expensive to challenge an issued patent over prior art, and I don't think
that Tridge is likely to do that.  If you fear a suit from TSI and would
choose a prior art defense, you will need Tridge's help, as only he could
establish priority.

 Can the text of the Patent be found anywhere online?

http://www.delphion.com/details?pn=US05721907__

--
Clifford Heath, Open Software Associates, mailto:[EMAIL PROTECTED],
Ph +613 9895 2194, Fax 9895 2020, http://www.osa.com.au/~cjh,
56-60 Rutland Rd, Box Hill 3128, Melbourne, Victoria, Australia.


- End forwarded message -


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: Debian's problems, Debian's future

2002-04-10 Thread Martijn van Oosterhout
On Wed, Apr 10, 2002 at 09:22:50AM +0200, Michael Bramer wrote:
 On Wed, Apr 10, 2002 at 10:25:22AM +1000, Martijn van Oosterhout wrote:
  With the standard rsync algorithm, the rsync checksum files would actually
  be 8 times larger than the original file (you need to store the checksum
  for each possible block in the file).
 
 I don't see that the checksum file is larger than the origanl file. If
 the checksum file is larger, we will have more bytes to download... This
 was not the goal.

That's because the client doesn't not download the checksums. Look below.

 maybe I don't understand the rsync algorithm...
 
 IMHO the rsync algorithm is:
  1.) Computer beta splits file B in blocks.
  2.) calculate two checksums 
  a.) weak ``rolling'' 32-bit checksum
  b.) md5sum
  3.) Computer B send this to computer A.
  4.) Computer A search in file A for parts with the same checksums from
  file B
  5.) Computer A request unmatch blocks from computer B and 
  build the file B.
 
 I get this from /usr/share/doc/rsync/tech_report.tex.gz

Computer A wants to download a file F from computer B.

1. Computer A splits it's version into blocks, calculates the checksum for
each block.
2. Computer A sends this list to computer B. This should be 1% the size of
the original file. Depends on the block size.
3. Computer B takes this list and does the rolling checksum over the file.
Basically, it calculates the checksum for bytes 0-1023, checks for it in the
list from the client. If it's a match send back a string indicating which
block it is, else send byte 0. Calculate checksum of 1-1024 and do the same.
The rolling checksum is just an optimisation.
4. Computer A receives list of tokens which are either bytes of data or
indications of which block to copy from the original file.

Notice that:
a. The server (computer B) does *all* the work.
b. The data forms a stream. The client can split itself into two and can be
analysing the next file while the server is still processing the current
one. Your above algorithm requires two requests for each file. The streaming
help performance over high latency links.
c. Precalculating checksums on the client is useless
d. Precalculating checksums on the server is also useless because the
storage would be more (remember, checksum for bytes 0-1023, then for 1-1024,
2-1025, etc). It's faster to calculate them than to load them off disk.

So, the main difference between what you are proposing is 1 versus 2
requests per file. And rsync definitly only has one.

Besides, look at the other posts on this thread. Diff requires less download
than rsync.
-- 
Martijn van Oosterhout kleptog@svana.org   http://svana.org/kleptog/
 Ignorance continues to thrive when intelligent people choose to do
 nothing.  Speaking out against censorship and ignorance is the imperative
 of all intelligent people.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: Debian's problems, Debian's future

2002-04-10 Thread Michael Bramer
On Wed, Apr 10, 2002 at 01:26:17AM -0700, Robert Tiberius Johnson wrote:
 This looks like an interesting algorithm, so I decided to compare it to
 the diff scheme analyzed in 
 http://lists.debian.org/debian-devel/2002/debian-devel-200204/msg00502.html
 
 The above message also gives my analysis methodology.
 
 The results:
 
 
 - The following table summarizes the performance of the checksum-based
 scheme and the diff-based scheme under the assumption that users tend to
 perform apt-get update often.  I think disk space is cheap and bandwidth
 is expensive, so 20 days of diffs is the best choice.
 
 Scheme Disk space Bandwidth
 ---
 Checksums (bwidth optimal)26K   81K
 diffs (4 days)32K  331K
 diffs (9 days)71K   66K
 diffs (20 days)  159K   27K

can you explain your counts?

Gruss
Grisu
-- 
Michael Bramer  -  a Debian Linux Developer  http://www.debsupport.de
PGP: finger [EMAIL PROTECTED]  -- Linux Sysadmin   -- Use Debian Linux
Nicht geschehene Taten ziehen oft einen erstaunlichen Mangel an Folgen 
 nach sich. --   S.J. Lec


pgp2Mvffax1Cu.pgp
Description: PGP signature


Re: Debian's problems, Debian's future

2002-04-10 Thread Michael Bramer
On Wed, Apr 10, 2002 at 08:29:49PM +1000, Martijn van Oosterhout wrote:
 On Wed, Apr 10, 2002 at 09:22:50AM +0200, Michael Bramer wrote:
  On Wed, Apr 10, 2002 at 10:25:22AM +1000, Martijn van Oosterhout wrote:
   With the standard rsync algorithm, the rsync checksum files would actually
   be 8 times larger than the original file (you need to store the checksum
   for each possible block in the file).
  
  I don't see that the checksum file is larger than the origanl file. If
  the checksum file is larger, we will have more bytes to download... This
  was not the goal.
 
 That's because the client doesn't not download the checksums. Look below.
 
  maybe I don't understand the rsync algorithm...
  
  IMHO the rsync algorithm is:
   1.) Computer beta splits file B in blocks.
   2.) calculate two checksums 
   a.) weak ``rolling'' 32-bit checksum
   b.) md5sum
   3.) Computer B send this to computer A.
   4.) Computer A search in file A for parts with the same checksums from
   file B
   5.) Computer A request unmatch blocks from computer B and 
   build the file B.
  
  I get this from /usr/share/doc/rsync/tech_report.tex.gz
 
 Computer A wants to download a file F from computer B.
 
 1. Computer A splits it's version into blocks, calculates the checksum for
 each block.
 2. Computer A sends this list to computer B. This should be 1% the size of
 the original file. Depends on the block size.
 3. Computer B takes this list and does the rolling checksum over the file.
 Basically, it calculates the checksum for bytes 0-1023, checks for it in the
 list from the client. If it's a match send back a string indicating which
 block it is, else send byte 0. Calculate checksum of 1-1024 and do the same.
 The rolling checksum is just an optimisation.
 4. Computer A receives list of tokens which are either bytes of data or
 indications of which block to copy from the original file.

all ok. I write the same above, except point '4' and you switch A and
B...

 Notice that:
 a. The server (computer B) does *all* the work.

If you use A as Server, the client make all the work.

 c. Precalculating checksums on the client is useless
 d. Precalculating checksums on the server is also useless because the
 storage would be more (remember, checksum for bytes 0-1023, then for 1-1024,
 2-1025, etc). It's faster to calculate them than to load them off disk.

Precalculating of the _block_ checksums is _not_ useless. This checksums
are only 1% the size of the original file (depends on the block size). 

 So, the main difference between what you are proposing is 1 versus 2
 requests per file. And rsync definitly only has one.

The main difference is: The client and not the server make all the work!

 Besides, look at the other posts on this thread. Diff requires less download
 than rsync.

I read it, but I don't understand it.

But this is not the problem. IMHO the diff is a kind of a hack and a
cached rsync is a nice framework. But this is only my taste...

Maybe I should read the rsync-source-code...Done

  Ok, with the normal rsync program the client make the block checksums
  and the server search in the file...

Thanks for your help.

Gruss
Grisu
-- 
Michael Bramer  -  a Debian Linux Developer  http://www.debsupport.de
PGP: finger [EMAIL PROTECTED]  -- Linux Sysadmin   -- Use Debian Linux
Hummeln koennen wirklich stechen, tun das aber nur in extremen Ausnahme-
Situationen. NT tut in solchen Situationen nichts mehr. aus d.a.s.r


pgphEKKQCFH3u.pgp
Description: PGP signature


Re: Debian's problems, Debian's future

2002-04-10 Thread Erich Schubert
 Scheme Disk space Bandwidth
 ---
 Checksums (bwidth optimal)26K   81K
 diffs (4 days)32K  331K
 diffs (9 days)71K   66K
 diffs (20 days)  159K   27K

What diff options do you use?
As the diffs are expected to be applied to the correct version, they
probably shouldn't contain the old data, but the new data only.

Greetings,
Erich


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: Debian's problems, Debian's future

2002-04-09 Thread Michael Bramer
hello

we sould stop this and start after woody again...

On Thu, Mar 28, 2002 at 08:17:46PM +0100, Jeroen Dekkers wrote:
 On Thu, Mar 28, 2002 at 04:55:17PM +0100, Otto Wyss wrote:
 I'd suggest using diffs, as this brings the best results and is the
   http://lists.debian.org/debian-devel/2001/debian-devel-200111/msg01303.html
   
   (I use apt-pupdate all the time now, it works for me (tm))
   
  Sorry, diffs are simply silly! Use rsync with the uncompressed Packages
  file and diffs aren't necessary. Or use a packer which doesn't hinder
  rsync from saving (gzip --rsyncable). 
 
 This isn't server friendly.

no. sorry. I must say this:

 We can use rsync on the client site. 
  - get a rsync-checksum file (use a fix block size)
  - make the check on the client site and
  - download the file partly per ftp/http 
  - make the new file with the old and downloaded parts

With this the server need only extra rsync-checksum files.


Gruss
Grisu
-- 
Michael Bramer  -  a Debian Linux Developer  http://www.debsupport.de
PGP: finger [EMAIL PROTECTED]  -- Linux Sysadmin   -- Use Debian Linux
Wenn man sich naeher mit Linux beschaeftigt, wird man nie versuchen,
WinNT das Attribut stabil aufzudruecken! ([EMAIL PROTECTED])


pgp24zK6DHCTU.pgp
Description: PGP signature


Re: Debian's problems, Debian's future

2002-04-09 Thread Michael Bramer
On Sat, Mar 30, 2002 at 04:49:25AM +0900, Junichi Uekawa wrote:
 [EMAIL PROTECTED] (Otto Wyss) cum veritate scripsit:
 
  Packages.0 from 28-March is probably the newest and the smallest upgrade
  is problably the diff for one day (209k uncompressed, 50k gzipped). On
  the 28th rsync's download was 130k, today it was less than 100k. I don't
  know why your uncompressed diff is bigger than what rsync says.
 
 Also note that this is a one-time thing, and can be served 
 through normal http protocol, or ftp, or whatever.
 
 rsync requires handholding from the server side.
 Which is unlikely to happen for every single server serving
 Debian mirror.

no. 

technical you can move this all to the client and use ftp/http for the
download of parts of the files..

Gruss
Grisu
-- 
Michael Bramer  -  a Debian Linux Developer  http://www.debsupport.de
PGP: finger [EMAIL PROTECTED]  -- Linux Sysadmin   -- Use Debian Linux
How hard can it be, it's just an operating system? -- Linus Torvalds


pgp2uWU9pT7sG.pgp
Description: PGP signature


Re: Debian's problems, Debian's future

2002-04-09 Thread Michael Bramer
On Sat, Mar 30, 2002 at 02:11:00AM +0100, Wichert Akkerman wrote:
 - I would like to have templates with substitution fields.
   
   Already exists.
  
  Any references?
 
 How about the debconf manual?

but sorry, we have some outdated translations in debconf templates
files. No translator know, if someone change the english template.
Please can we use gettext or something other without 'outdated
translations'? Joey ? 

Gruss
Grisu
-- 
Michael Bramer  -  a Debian Linux Developer  http://www.debsupport.de
PGP: finger [EMAIL PROTECTED]  -- Linux Sysadmin   -- Use Debian Linux
it was hard to write, so it should be hard to read


pgpJ83wRzeC7l.pgp
Description: PGP signature


Re: Debian's problems, Debian's future

2002-04-09 Thread Michael Bramer
On Thu, Mar 28, 2002 at 04:55:17PM +0100, Otto Wyss wrote:
I'd suggest using diffs, as this brings the best results and is the
  
  [diffs for Packages files that is]
  
   wooo!!!
   
   http://people.debian.org/~dancer/Packages-for-main-i386/
   
   
   # Time for suggesting is up, please implement.
  
  Indeed, it appears it has been implemented more than once.
  
  http://lists.debian.org/debian-devel/2001/debian-devel-200111/msg01303.html
  
  (I use apt-pupdate all the time now, it works for me (tm))
  
 Sorry, diffs are simply silly! Use rsync with the uncompressed Packages
 file and diffs aren't necessary. Or use a packer which doesn't hinder
 rsync from saving (gzip --rsyncable). 

right.

Now I search in the lists and found the old mails... 

Maybe someone like to read the mails and reply:
  http://lists.debian.org/debian-devel/2001/debian-devel-200111/msg00757.html


Gruss
Grisu
-- 
Michael Bramer  -  a Debian Linux Developer  http://www.debsupport.de
PGP: finger [EMAIL PROTECTED]  -- Linux Sysadmin   -- Use Debian Linux
it was hard to write, so it should be hard to read


pgp6S2IxrFfDb.pgp
Description: PGP signature


Re: Debian's problems, Debian's future

2002-04-09 Thread Michael Bramer
On Fri, Mar 29, 2002 at 11:16:44AM +0100, Eduard Bloch wrote:
 #include hallo.h
 Joey Hess wrote on Wed Mar 27, 2002 um 02:21:49PM:
  That is a rather misleading summary of the situation, which as a
  subscriber to debian-boot, you should understand better. Have you done
  any testing of the proposed base-config patch?
 
 Sure. Peter's patches are AFAIK not ready and I have a bad feeling about
 his dbootstrap modifications. I have a testing installation image with
 hacked base-config (my patches), but I was disappointed, since many
 debconf templates in called packages templates in the first base-config
 steps were not translated. It is too late to change them all, so I can
 only keep calling it a pity and hope that people mastering customised CD
 sets would contact me or Peter.

can you put this files online? 

Gruss
Grisu
-- 
Michael Bramer  -  a Debian Linux Developer  http://www.debsupport.de
PGP: finger [EMAIL PROTECTED]  -- Linux Sysadmin   -- Use Debian Linux
Now let me explain why this makes intuitive sense.  --- Prof. Larry Wasserman


pgpJ7WkUkIsXF.pgp
Description: PGP signature


Re: Debian's problems, Debian's future

2002-04-09 Thread Martijn van Oosterhout
On Tue, Apr 09, 2002 at 09:09:39AM +0200, Michael Bramer wrote:
 hello
 
 we sould stop this and start after woody again...
 
 On Thu, Mar 28, 2002 at 08:17:46PM +0100, Jeroen Dekkers wrote:
  On Thu, Mar 28, 2002 at 04:55:17PM +0100, Otto Wyss wrote:
  I'd suggest using diffs, as this brings the best results and is the
http://lists.debian.org/debian-devel/2001/debian-devel-200111/msg01303.html

(I use apt-pupdate all the time now, it works for me (tm))

   Sorry, diffs are simply silly! Use rsync with the uncompressed Packages
   file and diffs aren't necessary. Or use a packer which doesn't hinder
   rsync from saving (gzip --rsyncable). 
  
  This isn't server friendly.
 
 no. sorry. I must say this:
 
  We can use rsync on the client site. 
   - get a rsync-checksum file (use a fix block size)
   - make the check on the client site and
   - download the file partly per ftp/http 
   - make the new file with the old and downloaded parts
 
 With this the server need only extra rsync-checksum files.

I beleive this method is patented by somebody, which is why it's not in
use/supported.

Other than that, it's very nice idea. I beleive there may be some
semi-implementations around somewhere. The concept is no different from
normal rsync.
-- 
Martijn van Oosterhout kleptog@svana.org   http://svana.org/kleptog/
 Ignorance continues to thrive when intelligent people choose to do
 nothing.  Speaking out against censorship and ignorance is the imperative
 of all intelligent people.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: Debian's problems, Debian's future

2002-04-09 Thread Michael Bramer
On Tue, Apr 09, 2002 at 06:39:19PM +1000, Martijn van Oosterhout wrote:
 On Tue, Apr 09, 2002 at 09:09:39AM +0200, Michael Bramer wrote:
   This isn't server friendly.
  
  no. sorry. I must say this:
  
   We can use rsync on the client site. 
- get a rsync-checksum file (use a fix block size)
- make the check on the client site and
- download the file partly per ftp/http 
- make the new file with the old and downloaded parts
  
  With this the server need only extra rsync-checksum files.
 
 I beleive this method is patented by somebody, which is why it's not in
 use/supported.
 
 Other than that, it's very nice idea. I beleive there may be some
 semi-implementations around somewhere. The concept is no different from
 normal rsync.

has someone a pointer? 

This is rsync, only the server is the client und the client work as
server... 

Gruss
Grisu
-- 
Michael Bramer  -  a Debian Linux Developer  http://www.debsupport.de
PGP: finger [EMAIL PROTECTED]  -- Linux Sysadmin   -- Use Debian Linux
Da haben wir es, emacs ist eine Religon, kein Editor. Ich bin nicht bereit
 einer Goetze meinen Spreicher zu opfern. -- Werner Olschewski


pgpMi6mUUNMnB.pgp
Description: PGP signature


Re: Debian's problems, Debian's future

2002-04-09 Thread Martijn van Oosterhout
On Tue, Apr 09, 2002 at 10:58:24AM +0200, Michael Bramer wrote:
 On Tue, Apr 09, 2002 at 06:39:19PM +1000, Martijn van Oosterhout wrote:
  I beleive this method is patented by somebody, which is why it's not in
  use/supported.
  
  Other than that, it's very nice idea. I beleive there may be some
  semi-implementations around somewhere. The concept is no different from
  normal rsync.
 
 has someone a pointer? 
 
 This is rsync, only the server is the client und the client work as
 server... 

Unfortunatly no. I just remember it as a passing comment while talking with
Andrew Tridgell (creator of rsync).

A google search turns up oblique references at:

http://rproxy.samba.org/doc/notes/server-generated-signatures.txt
http://www.sharemation.com/~milele/public/rsync-specification.htm (near bottom)
http://pserver.samba.org/cgi-bin/cvsweb/rproxy/doc/calu_paper/calu_paper.tex?annotate=1.1
http://olstrans.sourceforge.net/release/OLS2000-rsync/OLS2000-rsync.html

Someone on debianplanet suggests it may be a rumour. I don't know, I can't
find any precise patent numbers.

HTH,
-- 
Martijn van Oosterhout kleptog@svana.org   http://svana.org/kleptog/
 Ignorance continues to thrive when intelligent people choose to do
 nothing.  Speaking out against censorship and ignorance is the imperative
 of all intelligent people.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: Debian's problems, Debian's future

2002-04-09 Thread Steve Langasek
On Tue, Apr 09, 2002 at 09:53:44AM +0200, Michael Bramer wrote:
 On Sat, Mar 30, 2002 at 02:11:00AM +0100, Wichert Akkerman wrote:
  - I would like to have templates with substitution fields.

Already exists.
   
   Any references?
  
  How about the debconf manual?

 but sorry, we have some outdated translations in debconf templates
 files. No translator know, if someone change the english template.
 Please can we use gettext or something other without 'outdated
 translations'? Joey ? 

If you are concerned that translators receive automatic notification 
when a source debconf template has changed, that's an infrastructure 
problem.  Neither debconf nor gettext has automatic translator 
notifications built-in, and debconf's templates are not an inferior 
solution for not providing this.

Debconf, if used correctly, does correctly handle merging of outdated
translations.  See debconf-mergetemplate(1).

Steve Langasek
postmodern programmer


pgpgqEniZn6jl.pgp
Description: PGP signature


Re: Debian's problems, Debian's future

2002-04-09 Thread Tomas Pospisek's Mailing Lists
On Tue, 9 Apr 2002, Martijn van Oosterhout wrote:

 On Tue, Apr 09, 2002 at 10:58:24AM +0200, Michael Bramer wrote:
  On Tue, Apr 09, 2002 at 06:39:19PM +1000, Martijn van Oosterhout wrote:
   I beleive this method is patented by somebody, which is why it's not in
   use/supported.

Possibly it was only patented in the non-free united companies of america.
So it might well go into non-free (the inversion of the meaning comes straight 
out of 1984).
*t


 Tomas Pospisek
 SourcePole   -  Linux  Open Source Solutions
 http://sourcepole.ch
 Elestastrasse 18, 7310 Bad Ragaz, Switzerland
 Tel: +41 (81) 330 77 11



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: Debian's problems, Debian's future

2002-04-09 Thread Martijn van Oosterhout
On Tue, Apr 09, 2002 at 03:24:42PM +0200, Tomas Pospisek's Mailing Lists wrote:
 On Tue, 9 Apr 2002, Martijn van Oosterhout wrote:
 
  On Tue, Apr 09, 2002 at 10:58:24AM +0200, Michael Bramer wrote:
   On Tue, Apr 09, 2002 at 06:39:19PM +1000, Martijn van Oosterhout wrote:
I beleive this method is patented by somebody, which is why it's not in
use/supported.
 
 Possibly it was only patented in the non-free united companies of america.
 So it might well go into non-free (the inversion of the meaning comes 
 straight out of 1984).

Well, a lot of patents are recognised across borders. And someone could
write it in a country that doesn't recognise software patents, but the DeCSS
stuff showed that that's not safe either.

Software patents are just plain irritating.
-- 
Martijn van Oosterhout kleptog@svana.org   http://svana.org/kleptog/
 Ignorance continues to thrive when intelligent people choose to do
 nothing.  Speaking out against censorship and ignorance is the imperative
 of all intelligent people.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: Debian's problems, Debian's future

2002-04-09 Thread Jeroen Dekkers
On Tue, Apr 09, 2002 at 09:09:39AM +0200, Michael Bramer wrote:
 hello
 
 we sould stop this and start after woody again...
 
 On Thu, Mar 28, 2002 at 08:17:46PM +0100, Jeroen Dekkers wrote:
  On Thu, Mar 28, 2002 at 04:55:17PM +0100, Otto Wyss wrote:
   Sorry, diffs are simply silly! Use rsync with the uncompressed Packages
   file and diffs aren't necessary. Or use a packer which doesn't hinder
   rsync from saving (gzip --rsyncable). 
  
  This isn't server friendly.
 
 no. sorry. I must say this:
 
  We can use rsync on the client site. 
   - get a rsync-checksum file (use a fix block size)
   - make the check on the client site and
   - download the file partly per ftp/http 
   - make the new file with the old and downloaded parts
 
 With this the server need only extra rsync-checksum files.

IMHO it's better to make just diffs instead of extra rsync-checksum
files and then having to download all parts of those files.

Jeroen Dekkers
-- 
Jabber supporter - http://www.jabber.org Jabber ID: [EMAIL PROTECTED]
Debian GNU supporter - http://www.debian.org http://www.gnu.org
IRC: [EMAIL PROTECTED]


pgpmOLJmDBDJh.pgp
Description: PGP signature


Re: Debian's problems, Debian's future

2002-04-09 Thread Michael Bramer
On Tue, Apr 09, 2002 at 08:02:14AM -0500, Steve Langasek wrote:
 On Tue, Apr 09, 2002 at 09:53:44AM +0200, Michael Bramer wrote:
  On Sat, Mar 30, 2002 at 02:11:00AM +0100, Wichert Akkerman wrote:
   - I would like to have templates with substitution fields.
 
 Already exists.

Any references?
   
   How about the debconf manual?
 
  but sorry, we have some outdated translations in debconf templates
  files. No translator know, if someone change the english template.
  Please can we use gettext or something other without 'outdated
  translations'? Joey ? 
 
 If you are concerned that translators receive automatic notification 
 when a source debconf template has changed, that's an infrastructure 
 problem.  Neither debconf nor gettext has automatic translator 
 notifications built-in, and debconf's templates are not an inferior 
 solution for not providing this.

I know this. And as infrastructure we can use the ddtp.

I have already work on this. But in the last weeks I don't have real
time and I break this sub project.

 Debconf, if used correctly, does correctly handle merging of outdated
 translations.  See debconf-mergetemplate(1).

ok. thanks. I don't know this. Maybe I must RTFM... 


Gruss
Grisu
-- 
Michael Bramer  -  a Debian Linux Developer  http://www.debsupport.de
PGP: finger [EMAIL PROTECTED]  -- Linux Sysadmin   -- Use Debian Linux
Ja, aber der Bootvorgang ist doch so sch?n mit den Wolken und so. Das
st?rt meiner Meinung nach garnicht. (Martin Heinz zum Rebooten von M$-W)


pgp8vSydUMyki.pgp
Description: PGP signature


Re: Debian's problems, Debian's future

2002-04-09 Thread Michael Bramer
On Tue, Apr 09, 2002 at 04:34:43PM +0200, Jeroen Dekkers wrote:
 On Tue, Apr 09, 2002 at 09:09:39AM +0200, Michael Bramer wrote:
  no. sorry. I must say this:
  
   We can use rsync on the client site. 
- get a rsync-checksum file (use a fix block size)
- make the check on the client site and
- download the file partly per ftp/http 
- make the new file with the old and downloaded parts
  
  With this the server need only extra rsync-checksum files.
 
 IMHO it's better to make just diffs instead of extra rsync-checksum
 files and then having to download all parts of those files.

you propose to add 'some' diff files for all files on ftp-master.d.o? 

With rsync we need only one rsync-checksum file per normal file and
all apt's need only download the neededs parts.

You get the point?

Gruss
Grisu
-- 
Michael Bramer  -  a Debian Linux Developer  http://www.debsupport.de
PGP: finger [EMAIL PROTECTED]  -- Linux Sysadmin   -- Use Debian Linux
Like sex in high school, everyone's talking about Linux, but is anyone 
 doing it?  -- Computer Currents


pgpA31TJ6OvFp.pgp
Description: PGP signature


Re: Debian's problems, Debian's future

2002-04-09 Thread Michael Bramer
On Tue, Apr 09, 2002 at 10:25:04PM +1000, Martijn van Oosterhout wrote:
 On Tue, Apr 09, 2002 at 10:58:24AM +0200, Michael Bramer wrote:
  On Tue, Apr 09, 2002 at 06:39:19PM +1000, Martijn van Oosterhout wrote:
   I beleive this method is patented by somebody, which is why it's not in
   use/supported.
   
   Other than that, it's very nice idea. I beleive there may be some
   semi-implementations around somewhere. The concept is no different from
   normal rsync.
  
  has someone a pointer? 
  
  This is rsync, only the server is the client und the client work as
  server... 
 
 Unfortunatly no. I just remember it as a passing comment while talking with
 Andrew Tridgell (creator of rsync).
 
 A google search turns up oblique references at:
 
 http://rproxy.samba.org/doc/notes/server-generated-signatures.txt

 'The current RProxy specifications at sourceforge.net do not have
  the client calculating the signature. Instead, the client gets the
  signature from the server when it first downloads the file, and saves
  this signature (just like an ETag) for use when re-loading the file.
  This mechanism was chosen only because of possible patent problems
  with client calculation of signature. These patent problems may need
  to be investigated.'

 Read the mails. The checksum-file is _download_ from the server and
 _not_ calculated from the client!

 http://www.sharemation.com/~milele/public/rsync-specification.htm (near 
 bottom)

the same...

 http://pserver.samba.org/cgi-bin/cvsweb/rproxy/doc/calu_paper/calu_paper.tex?annotate=1.1

the same...

 http://olstrans.sourceforge.net/release/OLS2000-rsync/OLS2000-rsync.html

the opposite point.

maybe I don't understand some points...
3 times server generateted checksums are forbidden by partent and one
time a client generateted checksums are forbidden...

Gruss
Grisu
-- 
Michael Bramer  -  a Debian Linux Developer  http://www.debsupport.de
PGP: finger [EMAIL PROTECTED]  -- Linux Sysadmin   -- Use Debian Linux
GNU does not eliminate all the world's problems, only some of them.
- Richard Stallman - The GNU Manifesto, 1985


pgpKim1jMD89C.pgp
Description: PGP signature


Re: Debian's problems, Debian's future

2002-04-09 Thread Jason Gunthorpe

On Tue, 9 Apr 2002, Michael Bramer wrote:

   - make the check on the client site and
   - download the file partly per ftp/http 
   - make the new file with the old and downloaded parts
 
 With this the server need only extra rsync-checksum files.

Rumor around rsync circles is that this is patented.

Jason


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: Debian's problems, Debian's future

2002-04-09 Thread Josselin Mouette
Le mar 09/04/2002 à 20:13, Jason Gunthorpe a écrit :

- make the check on the client site and
- download the file partly per ftp/http 
- make the new file with the old and downloaded parts
  
  With this the server need only extra rsync-checksum files.
 
 Rumor around rsync circles is that this is patented.

Then it is still possible to implement this on the mirrors outside the
US. That would already save a lot of bandwidth...

-- 
 .''`.   Josselin Mouette/\./\
: :' :   [EMAIL PROTECTED]
`. `'
  `-  Debian GNU/Linux -- The power of freedom


signature.asc
Description: PGP signature


Re: Debian's problems, Debian's future

2002-04-09 Thread Otto Wyss
 http://lists.debian.org/debian-devel/2001/debian-devel-200111/msg00757.html

Thanks for this pointer.

My debiansynch script never runs into problem 1. rsync -r since it
always does single file transfers. And for problem 2. rsync of near
identical files it's not astonishing using a high cpu load for a short
period, an ftp transfer simply distributes its cpu load over a longer
period.

O. Wyss

-- 
Author of Debian partial mirror synch script
(http://dpartialmirror.sourceforge.net/;)


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: Debian's problems, Debian's future

2002-04-09 Thread Martijn van Oosterhout
On Tue, Apr 09, 2002 at 05:02:34PM +0200, Michael Bramer wrote:
 you propose to add 'some' diff files for all files on ftp-master.d.o? 
 
 With rsync we need only one rsync-checksum file per normal file and
 all apt's need only download the neededs parts.
 
 You get the point?

With the standard rsync algorithm, the rsync checksum files would actually
be 8 times larger than the original file (you need to store the checksum
for each possible block in the file).

What you are suggesting is that the server store checksums for precalculated
blocks on the server. This would be 4 bytes per 1k in the original file or
so. The transaction proceeds as follows:

1. Client asks for checksum list off server
2. Client calculates checksums for local file
3. Client compares list of server with list of client
4. Client downloads changed regions.

Note, this is not the rsync algorithm, but the one that is possibly
patented.

-- 
Martijn van Oosterhout kleptog@svana.org   http://svana.org/kleptog/
 Ignorance continues to thrive when intelligent people choose to do
 nothing.  Speaking out against censorship and ignorance is the imperative
 of all intelligent people.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: Debian's problems, Debian's future

2002-04-08 Thread Tollef Fog Heen
* Jeroen Dekkers 

| It does also other things, like making distribution creation more
| flexible. I'm thinking of having a some kind of package file for every
| source package. That would include the current information and maybe a
| lot more things like URL of upstream, license, etc. This file would be
| stored in every package pool directory
| (i.e. pool/main/f/foobar/Packages). 
| 
| Then we create a lot of bigger Packages files, only including the
| packagename, version number and some other things which might be
| useful (but not too much). Those bigger Packages files can be a lot
| more flexible, for example we could have a different Package file for
| different licenses, different upstream projects (gnome, kde, gnu, X,
| etc), different use of machines (server, desktop), etc.

(I know, old mail, but I am catching up)

It seems like you want to put the control file outside the deb package
and add more information to it.  (And have apt-ftparchive not include
all the information from the control file into the packages file.)

Is this about correct?

-- 
Tollef Fog Heen
Unix _IS_ user friendly... It's just selective about who its friends are.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]




Re: Debian's problems, Debian's future

2002-04-08 Thread Jeroen Dekkers
On Sun, Apr 07, 2002 at 10:28:12PM +0200, Tollef Fog Heen wrote:
 * Jeroen Dekkers 
 
 | It does also other things, like making distribution creation more
 | flexible. I'm thinking of having a some kind of package file for every
 | source package. That would include the current information and maybe a
 | lot more things like URL of upstream, license, etc. This file would be
 | stored in every package pool directory
 | (i.e. pool/main/f/foobar/Packages). 
 | 
 | Then we create a lot of bigger Packages files, only including the
 | packagename, version number and some other things which might be
 | useful (but not too much). Those bigger Packages files can be a lot
 | more flexible, for example we could have a different Package file for
 | different licenses, different upstream projects (gnome, kde, gnu, X,
 | etc), different use of machines (server, desktop), etc.
 
 (I know, old mail, but I am catching up)
 
 It seems like you want to put the control file outside the deb package
 and add more information to it.  (And have apt-ftparchive not include
 all the information from the control file into the packages file.)
 
 Is this about correct?

Yes, at least adding everything which is now in the normal Packages
file. The normal Packages file would just be an index then. I think
this is the best way to do the things I want.

Jeroen Dekkers
-- 
Jabber supporter - http://www.jabber.org Jabber ID: [EMAIL PROTECTED]
Debian GNU supporter - http://www.debian.org http://www.gnu.org
IRC: [EMAIL PROTECTED]


pgpdUSNzhHMUs.pgp
Description: PGP signature


Re: Debian's problems, Debian's future

2002-04-06 Thread Peter Cordes
Adam Majer wrote:
 On Wed, Mar 27, 2002 at 01:53:00PM +0100, Eduard Bloch wrote:
  1) Large packages files
  [... 3 level idea ...]
 
 I would suggest a solution that is much easier to manage. That is, packages
 should be sorted according to the date that the package was modified.
 This could be accompished with adding Last Update field to
 Packages that would indicate when the package is installed.

 This way, we could implement a partial update for Packages by the server
 simply parsing the cream from the top of the milk :) This would make
 fetching Packages a lot faster.

 This would require a small CGI on the server that would support this type
 of fetch, but it could save a lot of bandwidth for the server and for
 the user.

 Here's how to make it possible without a CGI script; just support for
fetching the last bit of a file.

 If you don't store the last update times in the Pacakges file, you can
download just the last-update info, which should be a lot smaller than the
packages file.  Once you have this info, you know which part of the Packages
file is the same as the one on the server.  Then, you fetch from that point
to the end of the server's file.

 You can make the dates file small by storing the dates only to the day
accuracy, maybe as 32bit ints instead of text, or something.  It should be
pretty small after gzipping.  (high accuracy dates aren't needed because not
many packages are updated in a day, and downloading a few extra package
descriptions is no problem.)
 
 I think this all works :)  The only hard part is finding the right offset
in a gzipped file given that you know how much of the begging of two
uncompressed files match.

---
#define X(x,y) x##y
Peter Cordes ;  e-mail: X([EMAIL PROTECTED] , ns.ca)

The gods confound the man who first found out how to distinguish the hours!
 Confound him, too, who in this place set up a sundial, to cut and hack
 my day so wretchedly into small pieces! -- Plautus, 200 BCE


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]