Devl digest, Vol 1 #160 - 12 msgs

[email protected] Mon, 29 Jan 2001 03:47:01 -0800 (PST)

Send Devl mailing list submissions to
        devl at freenetproject.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://www.uprizer.com/mailman/listinfo/devl
or, via email, send a message with subject or body 'help' to
        devl-request at freenetproject.org


You can reach the person managing the list at
        devl-admin at freenetproject.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Devl digest..."


Today's Topics:

   1. Re: Proposal: algorithm for forgetting documents in datastore (Peter Todd)
   2. Re: Proposal: algorithm for forgetting documents in datastore (Oskar 
Sandberg)
   3. Re: Proposal: algorithm for forgetting documents in datastore (Tavin Cole)
   4. Re: Proposal: algorithm for forgetting documents in datastore (Tavin Cole)
   5. Re: Proposal: algorithm for forgetting documents in datastore (Oskar 
Sandberg)
   6. Re: Proposal: algorithm for forgetting documents in datastore (Scott G. 
Miller)
   7. Re: Proposal: algorithm for forgetting documents in datastore (Scott G. 
Miller)
   8. Re: Proposal: algorithm for forgetting documents in datastore (Ian Clarke)
   9. Using DOM for FProxy security (Ian Clarke)
  10. More on using DOM for FProxy security (Ian Clarke)
  11. Re: Proposal: algorithm for forgetting documents in datastore (L. 
Antonides)
  12. Re: Proposal: algorithm for forgetting documents in datastore (Theodore 
Hong)

--__--__--

Message: 1
From: Peter Todd <[email protected]>
To: devl at freenetproject.org
Subject: Re: [freenet-devl] Proposal: algorithm for forgetting documents in 
datastore
Date: Sun, 28 Jan 2001 20:33:18 -0500
Reply-To: devl at freenetproject.org

On Sun, 28 Jan 2001, you wrote:
> > Continuous or discrete file sizes (ie only powers of 2, or any size).
> 
> Definitely only powers of 2.  Not only good for defeating traffic analysis,
> but it will make it a lot easier to analyze & simulate caching/forgetting
> algorithms.

What exactly does a powers of 2 file size mean? IE is the file size
specified as a power of two, so you can only have 2,4, 8, 16, 32, 64
etc. or do you mean powers of two added together? (ie 64 + 32, 128 +
64 etc.)

The former seems to imply a file size limit of about 2^23 or 2^24,
any higher and the wastage gets to be astronomical.

-- 
retep at penguinpowered.com http://retep.tripod.com 


--__--__--

Message: 2
Date: Mon, 29 Jan 2001 03:13:43 +0100
From: Oskar Sandberg <[email protected]>
To: devl at freenetproject.org
Subject: Re: [freenet-devl] Proposal: algorithm for forgetting documents in 
datastore
Reply-To: devl at freenetproject.org

On Sun, Jan 28, 2001 at 08:33:18PM -0500, Peter Todd wrote:
> On Sun, 28 Jan 2001, you wrote:
> > > Continuous or discrete file sizes (ie only powers of 2, or any size).
> > 
> > Definitely only powers of 2.  Not only good for defeating traffic analysis,
> > but it will make it a lot easier to analyze & simulate caching/forgetting
> > algorithms.
> 
> What exactly does a powers of 2 file size mean? IE is the file size
> specified as a power of two, so you can only have 2,4, 8, 16, 32, 64
> etc. or do you mean powers of two added together? (ie 64 + 32, 128 +
> 64 etc.)

The former, but probably with 2^10 as the minimum.

> The former seems to imply a file size limit of about 2^23 or 2^24,
> any higher and the wastage gets to be astronomical.

With todays capacity, yes, and 8 or 16 megs seems about right for a
reasonable max size today. The good thing about this is that we avoid
making any arbitrary decision of the maximum filesize, as technology
improves, and the cost of throwing away 2^x becomes manageable the
effective max size is increased to 2^x+1. (Remember, Moore's law is also
exponential...)

-- 
'DeCSS would be fine. Where is it?'
'Here,' Montag touched his head.
'Ah,' Granger smiled and nodded.

Oskar Sandberg
md98-osa at nada.kth.se


--__--__--

Message: 3
Date: Sun, 28 Jan 2001 21:12:36 -0500
From: Tavin Cole <[email protected]>
To: devl at freenetproject.org
Subject: Re: [freenet-devl] Proposal: algorithm for forgetting documents in 
datastore
Reply-To: devl at freenetproject.org

On Mon, Jan 29, 2001 at 02:04:40AM +0100, Oskar Sandberg wrote:
> > Okay, this makes sense, thanks.  However, it seems very difficult to account
> > for the importance of popularity when you look at things this way.  Our
> > phase space should include popularity as well as bytesize and time.
> > E[Ti]*sizei = k assumes every file is equally popular.
> 
> Yeah, I meant that all along. Ie, k is not constant but a function of the
> number of requests that is shared by all i. Sorry.

That still assumes each file is equally popular.  You've got to work Pi into
the equation somehow.

--

// Tavin Cole


--__--__--

Message: 4
Date: Sun, 28 Jan 2001 21:16:42 -0500
From: Tavin Cole <[email protected]>
To: devl at freenetproject.org
Subject: Re: [freenet-devl] Proposal: algorithm for forgetting documents in 
datastore
Reply-To: devl at freenetproject.org

On Sun, Jan 28, 2001 at 08:33:18PM -0500, Peter Todd wrote:
> On Sun, 28 Jan 2001, you wrote:
> > > Continuous or discrete file sizes (ie only powers of 2, or any size).
> > 
> > Definitely only powers of 2.  Not only good for defeating traffic analysis,
> > but it will make it a lot easier to analyze & simulate caching/forgetting
> > algorithms.
> 
> What exactly does a powers of 2 file size mean? IE is the file size
> specified as a power of two, so you can only have 2,4, 8, 16, 32, 64
> etc. or do you mean powers of two added together? (ie 64 + 32, 128 +
> 64 etc.)

Strictly filesize = 2^n.

> The former seems to imply a file size limit of about 2^23 or 2^24,
> any higher and the wastage gets to be astronomical.

No, if the file gets so big that you're dealing with astronomical
waste, it should be split into smaller pieces.

-- 

// Tavin Cole


--__--__--

Message: 5
Date: Mon, 29 Jan 2001 03:39:55 +0100
From: Oskar Sandberg <[email protected]>
To: devl at freenetproject.org
Subject: Re: [freenet-devl] Proposal: algorithm for forgetting documents in 
datastore
Reply-To: devl at freenetproject.org

On Sun, Jan 28, 2001 at 09:12:36PM -0500, Tavin Cole wrote:
> On Mon, Jan 29, 2001 at 02:04:40AM +0100, Oskar Sandberg wrote:
> > > Okay, this makes sense, thanks.  However, it seems very difficult to 
> > > account
> > > for the importance of popularity when you look at things this way.  Our
> > > phase space should include popularity as well as bytesize and time.
> > > E[Ti]*sizei = k assumes every file is equally popular.
> > 
> > Yeah, I meant that all along. Ie, k is not constant but a function of the
> > number of requests that is shared by all i. Sorry.
> 
> That still assumes each file is equally popular.  You've got to work Pi into
> the equation somehow.

The number of requests for a file _is_ how popular it is...

-- 
'DeCSS would be fine. Where is it?'
'Here,' Montag touched his head.
'Ah,' Granger smiled and nodded.

Oskar Sandberg
md98-osa at nada.kth.se


--__--__--

Message: 6
Date: Sun, 28 Jan 2001 22:47:56 -0500
To: devl at freenetproject.org
Subject: Re: [freenet-devl] Proposal: algorithm for forgetting documents in 
datastore
From: "Scott G. Miller" <[email protected]>
Reply-To: devl at freenetproject.org


--ZoaI/ZTpAVc4A5k6
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Jan 29, 2001 at 03:13:43AM +0100, Oskar Sandberg wrote:
> On Sun, Jan 28, 2001 at 08:33:18PM -0500, Peter Todd wrote:
> > On Sun, 28 Jan 2001, you wrote:
> > > > Continuous or discrete file sizes (ie only powers of 2, or any size=
).
> > >=20
> > > Definitely only powers of 2.  Not only good for defeating traffic ana=
lysis,
> > > but it will make it a lot easier to analyze & simulate caching/forget=
ting
> > > algorithms.
> >=20
> > What exactly does a powers of 2 file size mean? IE is the file size
> > specified as a power of two, so you can only have 2,4, 8, 16, 32, 64
> > etc. or do you mean powers of two added together? (ie 64 + 32, 128 +
> > 64 etc.)
>=20
> The former, but probably with 2^10 as the minimum.
>=20
> > The former seems to imply a file size limit of about 2^23 or 2^24,
> > any higher and the wastage gets to be astronomical.
>=20
> With todays capacity, yes, and 8 or 16 megs seems about right for a
> reasonable max size today. The good thing about this is that we avoid
> making any arbitrary decision of the maximum filesize, as technology
> improves, and the cost of throwing away 2^x becomes manageable the
> effective max size is increased to 2^x+1. (Remember, Moore's law is also
> exponential...)
>=20
Plus you get to do file splitting.  You split the file into the
largest size less than the next step up that would actually waste,
then make the next part cover the remaining bytes.  An 1100k file for
example would be split into a 1024k file, and a padded 128k file (or if
you're very space conscious, 1024 + 64 + 32k(padded)

        Scott


--ZoaI/ZTpAVc4A5k6
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE6dOfsr9IW4v3mHtQRAg1IAJ99K4+CZGoc1ZpO0Hk2MKH1SJGpJgCgj4bY
r4Kgb57vw6YWQrbbSvt5xvw=
=bm+p
-----END PGP SIGNATURE-----

--ZoaI/ZTpAVc4A5k6--


--__--__--

Message: 7
Date: Sun, 28 Jan 2001 22:52:10 -0500
To: devl at freenetproject.org
Subject: Re: [freenet-devl] Proposal: algorithm for forgetting documents in 
datastore
From: "Scott G. Miller" <[email protected]>
Reply-To: devl at freenetproject.org


--xo44VMWPx7vlQ2+2
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Jan 28, 2001 at 07:33:25PM -0500, Tavin Cole wrote:
> On Sun, Jan 28, 2001 at 04:38:49PM +0000, Theodore Hong wrote:
> > Tavin Cole <tavin at mailandnews.com> wrote:
> > > It's clear to me now that the only sane mechanism for deciding what f=
iles
> > > to keep in the document store is one that consistently preserves the
> > > files that receive the greatest number of requests.  So let's define =
P as
> > > the measure of a file's request history.  Simplistically,
> > >
> > >  P =3D total no. of requests for this file / time since file was stor=
ed
> > >
> > > In practice, though, I think we'd want to limit the past time interva=
l to
> > > what is "recent".  Also, the units in the denominator should not be t=
oo
> > > granular, to keep P from blowing out of proportion for brand new file=
s.
> > > Something in the neighborhood of hours or minutes, probably.
> >=20
> > Limiting the past time interval would require picking an arbitrary cuto=
ff.
> > The best way is to apply a decay factor to old requests, something like:
>=20
> Yes, I was going to suggest that :)   But any decay model is just as arbi=
trary
> as an arbitrary cutoff...
>=20
>=20
> > P =3D #requests in the past hour
> >     + (#requests in the previous 2 hrs)/2
> >     + (#requests in the previous 4 hrs)/4
> >     + (#requests in the previous 8 hrs)/8
> >     + ...
> >=20
> > (I'm sure there's a nicer way to express this as a continuous function=
=20
> > using an integral, but I can't think of it at the moment.  Anyway, you
> > get the idea.)
>=20
> Sure, although I don't think we should be decaying P until something is
> at least a day or two old.

Why are we decaying by real time anyway?  Shouldn't we ditch time in favor
of a 'time' measured by requests and inserts passing through the node?


--xo44VMWPx7vlQ2+2
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE6dOjqr9IW4v3mHtQRAgEIAJ0VV5qxMU8yBZ+nYzMAvDElThnTpwCghXFP
7A3USIceMmwDjLySk4zrXf8=
=HAGr
-----END PGP SIGNATURE-----

--xo44VMWPx7vlQ2+2--


--__--__--

Message: 8
Date: Sun, 28 Jan 2001 20:03:32 -0800
From: Ian Clarke <[email protected]>
To: devl at freenetproject.org
Subject: Re: [freenet-devl] Proposal: algorithm for forgetting documents in 
datastore
Reply-To: devl at freenetproject.org


--adJ1OR3c6QgCpb/j
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Jan 28, 2001 at 08:04:27PM -0500, Tavin Cole wrote:
> On Sun, Jan 28, 2001 at 12:17:11PM -0800, Ian Clarke wrote:
> > The choice of units is essentially arbitrary but *will* effect the
> > behaviour of the system.  The need to make arbitrary decisions are a
> > symptom of a flawed design.  This was one of the advantages of using a
> > LRU cache.
> Choosing an inferior algorithm to run away from the arbitrary decision
> necessitated by a better algorithm is equally flawed.

Of course, but there is no solid evidence that the size-biased LRU-cache
is inferior.  It probably has problems, but nobody has offered any proof
yet.
=20
> > Before we start proposing replacements for the current mechanism,
> > someone should actually figure out if there is anything wrong with it!
> > If it isn't broken, don't fix it.
>=20
> I am going to run some simulations..  I think there's adequate reason to
> be suspicious of the current mechanism :)

I am sure that you are right, but I just want to make sure that we are
not jumping from the saucepan into the frying-pan.

Ian.

--adJ1OR3c6QgCpb/j
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE6dOuUQtgxRWSmsqwRAvumAJ9vxS5CqsZgAxNjzY4H/Bzredl63wCdFw9E
LxfyMOtFNLBQh1XVnE64pME=
=fZZs
-----END PGP SIGNATURE-----

--adJ1OR3c6QgCpb/j--


--__--__--

Message: 9
Date: Sun, 28 Jan 2001 23:32:39 -0800
From: Ian Clarke <[email protected]>
To: devl at freenetproject.org
Subject: [freenet-devl] Using DOM for FProxy security
Reply-To: devl at freenetproject.org


--Dxnq1zWXvFF0Q93v
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Anyone familiar with DOM - as supported by IE5, Mozilla, and Netscape 6?

If not, find out about it here:

http://www.scottandrew.com/index.php?dom/index.html

It is possible that it could be used to have a client web-browser test
documents before displaying them, making on-the-fly changes, and all
while outsourcing the tough stuff from FProxy!

Anyone fancy learning a new language (actually it is just Javascript)?

Ian.

--Dxnq1zWXvFF0Q93v
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE6dRyXQtgxRWSmsqwRApYSAJ9PGf3LPSG53+2xOVjwX0TJVt2LMACeJC+t
SmV3VbPpdYJ93FjrtqaQdWU=
=25wl
-----END PGP SIGNATURE-----

--Dxnq1zWXvFF0Q93v--


--__--__--

Message: 10
Date: Sun, 28 Jan 2001 23:41:50 -0800
From: Ian Clarke <[email protected]>
To: devl at freenetproject.org
Subject: [freenet-devl] More on using DOM for FProxy security
Reply-To: devl at freenetproject.org


--Yylu36WmvOXNoKYn
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

So I have been looking through this and it is quite exciting - for
example:

x = document.getElementsByTagName("IMG")

Places an array of all images in x, you could then iterate through these
like this (this probably isn't quite right but I can't be bothered to
look up javascript syntax):

for (i=0; i<x.length(); i++)
{
  src = x.item(0).getAttribute("SRC");
  /// check src for correctness/security and modify if nescessary
  x.item(0).setAttribute("SRC", src);
}

Similar strategies could be adopted to make warning dialog boxes pop up
if someone clicks on an off-freenet hyperlink etc.

We could have great fun with this ;-)

Ian.

--Yylu36WmvOXNoKYn
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE6dR6+QtgxRWSmsqwRAiyGAJ9CyWVcPHnZvIv0n8q60Dw30OnKdgCeJ55k
+ktRFXNHzxR9RAnsop+ZBbs=
=AhWE
-----END PGP SIGNATURE-----

--Yylu36WmvOXNoKYn--


--__--__--

Message: 11
Date: Mon, 29 Jan 2001 09:09:57 +0100
From: "L. Antonides" <[email protected]>
To: devl at freenetproject.org
Subject: Re: [freenet-devl] Proposal: algorithm for forgetting documents in 
datastore
Reply-To: devl at freenetproject.org

On Sat, Jan 27, 2001 at 10:52:07 -0500, Tavin Cole wrote:
> On Sat, Jan 27, 2001 at 04:34:10PM +0100, Sebastian Spaeth wrote:
> > Tavin Cole wrote:
> > [snip]
> > > It seems totally reasonble for a node to reject caching documents greater
> > > in size than a certain fraction of the total datastore size (1% or less
> > > I should think).  That prevents your worst-case example, but saves us from
> > > making relative value judgments using file size as the metric..
> > 
> > That sounds like an elegant proposal. I like it. When forwarding
> > documents, would each node decide on its own whether to cache a file or
> > not?
> > Like using a max file size for: a)% of cache size or b)absolute max size
> > (e.g. 10MB)?
> > It doesn't really solve the problem though, as it might just encourage
> > filesplitting as well.
> 
> Encouraging file splitting may be a good thing.
> 

I've got one question. If people insert files without breaking them isn't it 
possible to guess what kind 
of file it is, therefore giving up on security? I'm having problems with 
forcing people to split up files 
so I was thinking if it isn't possible to give a file a fake size?

-- 
-- 

Lodewijk Antonides      |   Cistron Telecom 
lantonides at cistron.nl   |   www.cistron-telecom.nl


--__--__--

Message: 12
From: Theodore Hong <[email protected]>
Subject: Re: [freenet-devl] Proposal: algorithm for forgetting documents in 
datastore
To: devl at freenetproject.org
Date: Mon, 29 Jan 2001 11:46:02 +0000 (GMT)
Reply-To: devl at freenetproject.org

Tavin Cole <tavin at mailandnews.com> wrote:
> On Sun, Jan 28, 2001 at 04:38:49PM +0000, Theodore Hong wrote:
> > Tavin Cole <tavin at mailandnews.com> wrote:
> > > When a new file is added to the store, as a result of either a request or
> > > an insert, it could be allocated 1 request in the above equation.  When
> > > the store is full, you basically want to discard the entries with lowest
> > > P.
> > >
> > > This is trivial if all files are the same size.  The question is, if a
> > > file is going to displace >1 smaller files, do we compare its P to the
> > > summed P of the smaller files?  Do we compare its P to some function of
> > > the P of the smaller files, where f(P1..n) < P1 + .. + Pn ?  Or do we
> > > ignore the size difference and just look at the highest P of the smaller
> > > files?
> > 
> > I think the most straightforward way is simply to compare the one file's P
> > to the sum of all the P's it would displace.  This maximizes the number of
> > successful requests.  For example, one 100k file getting 11 requests/day
> > should be ranked higher than ten 10k files getting 1 request/day each.
> 
> Perhaps...  but we don't want to make larger files non-viable either.  They
> can only be split into so many pieces.
> 
> Suppose the choice is between a big file with P=100 and 5 small files with
> P=25.  I'd rather keep the big one with P=100.. the other files can find
> homes in other caches..  otherwise this super-popular big file wouldn't
> ever be cached anywhere, since you could probably always find enough small
> files with P=1 to add up to more than 100!

Sure, you could probably find such a high-P collection of small files, but
that's actually the opposite of what you want.  What you want to do is to
minimize the P that you kick out -- that is, find a set of small files
whose size adds up to the size of your big file and whose combined P is as
low as possible.  That's the set to evict.

I would guess it would be quite rare that you can't find any such set with
combined P lower than the P of the big file -- that means you don't have
any spam in your datastore at all!  And if that's the case, then no, I
don't think you should evict the small files -- the big file can find a
home somewhere else in a more spam-ridden datastore, increasing the overall
P value stored in Freenet.

theo




--__--__--

_______________________________________________
Devl mailing list
Devl at freenetproject.org
http://www.uprizer.com/mailman/listinfo/devl


End of Devl Digest

Devl digest, Vol 1 #160 - 12 msgs

Reply via email to