Devl digest, Vol 1 #155 - 15 msgs

[email protected] Sun, 28 Jan 2001 01:25:02 -0800 (PST)

Send Devl mailing list submissions to
        devl at freenetproject.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://www.uprizer.com/mailman/listinfo/devl
or, via email, send a message with subject or body 'help' to
        devl-request at freenetproject.org

You can reach the person managing the list at
        devl-admin at freenetproject.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Devl digest..."

Today's Topics:

   1. Re: Proposal: algorithm for forgetting documents in
       datastore (Chris Anderson)
   2. Re: Proposal: algorithm for forgetting documents in datastore (Oskar 
Sandberg)
   3. Re: Problems! -> 0.3.7 any day now (Ian Clarke)
   4. Re: Proposal: algorithm for forgetting documents in datastore (Scott G. 
Miller)
   5. Re: Proposal: algorithm for forgetting documents in
       datastore (Chris Anderson)
   6. Re: Proposal: algorithm for forgetting documents in datastore (Tavin Cole)
   7. Re: Proposal: algorithm for forgetting documents in datastore (Tavin Cole)
   8. Re: Proposal: algorithm for forgetting documents in datastore (Tavin Cole)
   9. Re: Proposal: algorithm for forgetting documents in datastore (Tavin Cole)
  10. Re: Proposal: algorithm for forgetting documents in datastore (Tavin Cole)
  11. Re: Suggestion for solving the small vs. large file debate (Dev Random)
  12. Re: Proposal: algorithm for forgetting documents in datastore (Ian Clarke)
  13. Re: Suggestion for solving the small vs. large file debate (Ian Clarke)

--__--__--

Message: 1
Date: Sat, 27 Jan 2001 17:49:49 -0500 (EST)
From: Chris Anderson <[email protected]>
To: devl at freenetproject.org
Subject: Re: [freenet-devl] Proposal: algorithm for forgetting documents in
 datastore
Reply-To: devl at freenetproject.org

On Sat, Jan 27, 2001, Mark J. Roberts wrote:

> 
> Of course. What prompted me to ask is that dataStoreSize is set quite
> low by default (1000). A more efficient system could raise that value,
> which would presumably improve routing.

Well, you could treat a routing entry as a data item... That way you would
have a maximum of dataCache / sizeof(RoutingEntry) elements.

At some point you are going to have to consider the efficiencies of
caching data vs routing requests, which is ultimately what this thread is
about.

Onward Tavin "Statistician" Cole, onward.

--__--__--

Message: 2
Date: Sat, 27 Jan 2001 23:53:24 +0100
From: Oskar Sandberg <[email protected]>
To: devl at freenetproject.org
Subject: Re: [freenet-devl] Proposal: algorithm for forgetting documents in 
datastore
Reply-To: devl at freenetproject.org

On Sat, Jan 27, 2001 at 04:17:30PM -0600, Mark J. Roberts wrote:
> On Sat, 27 Jan 2001, Oskar Sandberg wrote:
< > 
> > Um, the reason we have a dataStoreSize var is that bigger routing tables
> > are good, we don't want to get rid of the entries from the routing table
> > just because we don't have space to store the data.
> 
> Of course. What prompted me to ask is that dataStoreSize is set quite low
> by default (1000). A more efficient system could raise that value, which
> would presumably improve routing.

The current DataStore is quite lousy (linear search), so I don't really
feel comfortable making it much bigger. Tavin is working on a more
efficient routing table for 0.4, so when that is in we may wish to make it
larger, but there are also memory limits (a NodeReference may be around
half a kB, but not every entry will be to a unique NodeReference of
course).

It is probably not as simple as bigger = better though...

-- 
'DeCSS would be fine. Where is it?'
'Here,' Montag touched his head.
'Ah,' Granger smiled and nodded.

Oskar Sandberg
md98-osa at nada.kth.se

--__--__--

Message: 3
Date: Sat, 27 Jan 2001 19:25:39 -0800
From: Ian Clarke <[email protected]>
To: devl at freenetproject.org
Subject: Re: [freenet-devl] Problems! -> 0.3.7 any day now
Reply-To: devl at freenetproject.org

--7ZAtKRhVyVSsbBD2
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sat, Jan 27, 2001 at 02:56:25PM -0500, Timm Murray wrote:
>=20
> Ian Clarke wrote on 1/22/01 3:24 pm:
>=20
> >When I want to use FProxy=20
> >with a browser, or a version=20
> >of a browser, for which=20
> >someone hasn't written a=20
> >freenet: plugin (perhaps it is=20
> >a closed source browser and=20
> >writing such a plugin is=20
> >impossible).
>=20
> Get a real browser and junk the closed browser.  Even IE allows
> plugins for this.

It is just this kind of attitude which causes open source projects to
fail - "we don't care about people who are too stupid to use our
software/write their own plugin/don't use our favourite web browser...".

Ian.

--7ZAtKRhVyVSsbBD2
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE6c5EzQtgxRWSmsqwRApjYAJ9r1yQN6q+rejrkF1kaauDq1lyEtgCfRHbK
npKHX513u7NQ+KA/aDOIBww=
=DA6D
-----END PGP SIGNATURE-----

--7ZAtKRhVyVSsbBD2--

--__--__--

Message: 4
Date: Sat, 27 Jan 2001 23:08:46 -0500
To: devl at freenetproject.org
Subject: Re: [freenet-devl] Proposal: algorithm for forgetting documents in 
datastore
From: "Scott G. Miller" <[email protected]>
Reply-To: devl at freenetproject.org

--0lnxQi9hkpPO77W3
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

> > the bizarre situation where all the insert nodes pass on the request an=
d=20
> > return success, even though none of them store it.  Maybe nodes should =
return=20
> > failure instead of passing through an insert they can't cache...
>=20
> I haven't really been able to make up my mind on this. Should I reply
> with a RequestFailed on Inserts for data larger than whatever the nodes
> limit is?
No, because won't the downstream node try its next choice?  If it does,
then there's no way the request will follow that same path.

--0lnxQi9hkpPO77W3
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE6c5tOr9IW4v3mHtQRAtB2AJoChaTVe9Eq7r7M+djFxERdLEAulQCaAnib
6mnZ2PctCHC6yT2ptR/6AbA=
=PY/+
-----END PGP SIGNATURE-----

--0lnxQi9hkpPO77W3--

--__--__--

Message: 5
Date: Sun, 28 Jan 2001 00:45:45 -0500 (EST)
From: Chris Anderson <[email protected]>
To: devl at freenetproject.org
Subject: Re: [freenet-devl] Proposal: algorithm for forgetting documents in
 datastore
Reply-To: devl at freenetproject.org

On Sat, 27 Jan 2001, Scott G. Miller wrote:

> >
> > > the bizarre situation where all the insert nodes pass on the
> > > request and return success, even though none of them store it.
> > > Maybe nodes should return failure instead of passing through an
> > > insert they can't cache...
> >
> > I haven't really been able to make up my mind on this. Should I
> > reply with a RequestFailed on Inserts for data larger than
> > whatever the nodes limit is?
>
> No, because won't the downstream node try its next choice?  If it
> does, then there's no way the request will follow that same path.

Damn those small files.  Heck, maybe different routing tables should
be used for different file sizes... the size is almost encoded in the
key.

--__--__--

Message: 6
Date: Sun, 28 Jan 2001 02:25:01 -0500
From: Tavin Cole <[email protected]>
To: devl at freenetproject.org
Subject: Re: [freenet-devl] Proposal: algorithm for forgetting documents in 
datastore
Reply-To: devl at freenetproject.org

On Sat, Jan 27, 2001 at 11:53:24PM +0100, Oskar Sandberg wrote:
> On Sat, Jan 27, 2001 at 04:17:30PM -0600, Mark J. Roberts wrote:
> > Of course. What prompted me to ask is that dataStoreSize is set quite low
> > by default (1000). A more efficient system could raise that value, which
> > would presumably improve routing.
> 
> The current DataStore is quite lousy (linear search), so I don't really
> feel comfortable making it much bigger. Tavin is working on a more
> efficient routing table for 0.4, so when that is in we may wish to make it
> larger, but there are also memory limits (a NodeReference may be around
> half a kB, but not every entry will be to a unique NodeReference of
> course).
> 
> It is probably not as simple as bigger = better though...

What I'm planning to do for 0.4 is logically separate the routing table
from the document store.  The document store will have an upper limit in
disk space but no (practical) limit on the number of files stored.

I'm not sure how the routing table will be constrained -- as Oskar was
saying it's not so simple as just restricting the number of entries b/c
the memory and disk space usage will be determined by the number of
unique NodeReferences as well as the number of keys in the routing
table.  However it will be a red-black BST so it can definitely be
larger than 1000.

-- 

// Tavin Cole

--__--__--

Message: 7
Date: Sun, 28 Jan 2001 02:28:26 -0500
From: Tavin Cole <[email protected]>
To: devl at freenetproject.org
Subject: Re: [freenet-devl] Proposal: algorithm for forgetting documents in 
datastore
Reply-To: devl at freenetproject.org

On Sat, Jan 27, 2001 at 05:14:55PM -0500, Benjamin Coates wrote:
> >It seems totally reasonble for a node to reject caching documents greater
> >in size than a certain fraction of the total datastore size (1% or less
> >I should think).  That prevents your worst-case example, but saves us from
> >making relative value judgments using file size as the metric..
> 
> 1% is pretty tiny, that's 1MB on the default datastore size, and also leads 
> to 
> the bizarre situation where all the insert nodes pass on the request and 
> return success, even though none of them store it.  Maybe nodes should return 
> failure instead of passing through an insert they can't cache...

OTOH, it's probably not good if a datastore can only hold a handful of keys
b/c they are all a considerable fraction of its max size.  1% may not be
right, tho.

-- 

// Tavin Cole

--__--__--

Message: 8
Date: Sun, 28 Jan 2001 02:31:20 -0500
From: Tavin Cole <[email protected]>
To: devl at freenetproject.org
Subject: Re: [freenet-devl] Proposal: algorithm for forgetting documents in 
datastore
Reply-To: devl at freenetproject.org

On Sat, Jan 27, 2001 at 11:48:11PM +0100, Oskar Sandberg wrote:
> On Sat, Jan 27, 2001 at 05:14:55PM -0500, Benjamin Coates wrote:
> <> 
> > 1% is pretty tiny, that's 1MB on the default datastore size, and also leads 
> > to 
> > the bizarre situation where all the insert nodes pass on the request and 
> > return success, even though none of them store it.  Maybe nodes should 
> > return 
> > failure instead of passing through an insert they can't cache...
> 
> I haven't really been able to make up my mind on this. Should I reply
> with a RequestFailed on Inserts for data larger than whatever the nodes
> limit is?

But even if the node can't store the data b/c it is too large, it could
stream it to an upstream node which can store the data, right?

How would we address this?

-- 

// Tavin Cole

--__--__--

Message: 9
Date: Sun, 28 Jan 2001 02:51:41 -0500
From: Tavin Cole <[email protected]>
To: devl at freenetproject.org
Subject: Re: [freenet-devl] Proposal: algorithm for forgetting documents in 
datastore
Reply-To: devl at freenetproject.org

On Sat, Jan 27, 2001 at 05:23:52PM +0100, Oskar Sandberg wrote:
> > > Even a model that perfectly matched the size integral over expected time
> > > would be based on a flawed premise.
> > 
> > It's also based on a pseudomath unless someone can formulate this as an
> > equation.  For example, what are the units of this thing?
> > megabytes/sec * dt doesn't make a heck of a lot of sense..
> 
> I think a lot of probability theorists would be a rather insulted at
> calling their field pseudo math (I, OTOH...). 

Hey, that's a compliment.  It's almost math ;^p  --  actually what makes
you think I was referring to probability theorists anyway?

> In theory the equation is very simple: E[Ti]*sizei = k, where k is a
> constant, Ti is the random variable giving for the length of time a
> document i stays in the cache, and sizei is the size of document i.
> Figuring out the formula for Ti and thus the expected value from an
> algorithm (or in this situation, actually the opposite - finding an
> algorithm that gives E[Ti] = k / sizei) is of course the difficult part,
> but it's neither impossible nor fuzzy in any way.

So, I'm guessing E[] denotes summation and you are meaning to write
E[Ti*sizei] = k .. am I right?  Is this summation based on a snapshot
of the datastore at a given moment, where the Ti is the time for each
document to live from that moment?  Is k a function of the total
number of documents (I hope)?

> What units we deal with doesn't really matter since you are only looking
> at ratios anyways.

No, the units don't matter for computations, but they are important when
you try to assess the *meaning* of the math.  For example, I'm pretty
skeptical of a quantity measured in byte-seconds, and don't see any a
priori reason why it deserves a principle of least action... if you'll
excuse the metaphor :)    But I'll table that argument until you can
set me straight on the specifics of the above equation.

-- 

// Tavin Cole

--__--__--

Message: 10
Date: Sun, 28 Jan 2001 03:14:30 -0500
From: Tavin Cole <[email protected]>
To: devl at freenetproject.org
Subject: Re: [freenet-devl] Proposal: algorithm for forgetting documents in 
datastore
Reply-To: devl at freenetproject.org

The only legitimate metric we have for the value of any file on Freenet is
its request history.  To my understanding, even Ian was not arguing that
smaller files are intrinsically more valuable, but rather that we need to
be concerned about how the value of a large file compares to the aggregate
value of the number of small files it potentially replaces.

It's clear to me now that the only sane mechanism for deciding what files
to keep in the document store is one that consistently preserves the files
that receive the greatest number of requests.  So let's define P as the
measure of a file's request history.  Simplistically,

  P = total no. of requests for this file / time since file was stored

In practice, though, I think we'd want to limit the past time interval
to what is "recent".  Also, the units in the denominator should not
be too granular, to keep P from blowing out of proportion for brand
new files.  Something in the neighborhood of hours or minutes, probably.

When a new file is added to the store, as a result of either a request
or an insert, it could be allocated 1 request in the above equation.
When the store is full, you basically want to discard the entries
with lowest P.

This is trivial if all files are the same size.  The question is, if
a file is going to displace >1 smaller files, do we compare its P
to the summed P of the smaller files?  Do we compare its P to some
function of the P of the smaller files, where f(P1..n) < P1 + .. + Pn ?
Or do we ignore the size difference and just look at the highest P
of the smaller files?

-- 

// Tavin Cole

--__--__--

Message: 11
Date: Sun, 28 Jan 2001 01:11:00 -0800
From: Dev Random <[email protected]>
To: devl at freenetproject.org
Subject: Re: [freenet-devl] Suggestion for solving the small vs. large file 
debate
Reply-To: devl at freenetproject.org

--xHFwDpU9dbj6ez1V
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Yeah, I have proposed this awhile ago with each bucket storing files
twice as large as the previous bucket.  The algorithm for retiring files
was to retire based on age in seconds multiplied by the size of the file.
The file with the largest such "weight" would be retired if the datastore
was out of space.

On Sat, Jan 27, 2001 at 04:19:20PM +0000, Alex Barnell wrote:
> Sorry if this has been mentioned before.
>=20
> DataStore size (diskspace) =3D ds
> Document filesize upper limit for a node =3D ul (ds / 4 or something)
>=20
> Divide the datastore into n sub-datastores of size ds/n, call
> these sub-dataStores SDS1, DS2, ..SDSn.=20
>=20
> For any freenet document on the node with filesize s, then it
> should be stored in SDSx, where  (x - 1)ul/n < s < xul/n.
>=20
> Each sub-dataStore maintains it's own stack, references etc. I have no
> idea what a good value for n is. SDSs could be allowed to grow to a
> size > ds/n if other SDSs are under-utilised, but must give the diskspace
> back if needed by other SDSs.
>=20
> This method lets the node best represent the variety of files on Freenet,
> making sure there is always room for popular files both large and small.
>=20
> --=20
> Regards,
> Alex
>=20
>=20
> _______________________________________________
> Devl mailing list
> Devl at freenetproject.org
> http://www.uprizer.com/mailman/listinfo/devl

--=20
Dev Random
Fingerprint: 3ABC FCEF 1BCE 4528 E4FD  15EB 173A 76D2 6959 DAF1

--xHFwDpU9dbj6ez1V
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.1 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE6c+IjFzp20mlZ2vERAmAoAKCn8+phxPMn21HompaBCKcevDDKDgCfegss
uhcgMQcCWwfG3TZebXFDOsQ=
=PNE6
-----END PGP SIGNATURE-----

--xHFwDpU9dbj6ez1V--

--__--__--

Message: 12
Date: Sun, 28 Jan 2001 01:27:54 -0800
From: Ian Clarke <[email protected]>
To: devl at freenetproject.org
Subject: Re: [freenet-devl] Proposal: algorithm for forgetting documents in 
datastore
Reply-To: devl at freenetproject.org

--z4+8/lEcDcG5Ke9S
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Sun, Jan 28, 2001 at 03:14:30AM -0500, Tavin Cole wrote:
> The only legitimate metric we have for the value of any file on Freenet is
> its request history.  To my understanding, even Ian was not arguing that
> smaller files are intrinsically more valuable, but rather that we need to
> be concerned about how the value of a large file compares to the aggregate
> value of the number of small files it potentially replaces.

What do you mean *EVEN* Ian? ;-)

> It's clear to me now that the only sane mechanism for deciding what files
> to keep in the document store is one that consistently preserves the files
> that receive the greatest number of requests.  So let's define P as the
> measure of a file's request history.  Simplistically,
>=20
>   P =3D total no. of requests for this file / time since file was stored
>=20
> In practice, though, I think we'd want to limit the past time interval
> to what is "recent".  Also, the units in the denominator should not
> be too granular, to keep P from blowing out of proportion for brand
> new files.  Something in the neighborhood of hours or minutes, probably.

The problem here, and the issue that lead me to initially choose a Least
Recently Used (LRU) cache, is what units should be used for the time
since the file was last accessed/stored/whatever.  Deciding on this
requires making an arbitrary decision on how quickly requests for a file
should be "forgotten", and I don't like arbitrary decisions.  Employing
a LRU cache avoided this issue since the actual times involved are not
relevant, just the order in which events happened.

Ian.

--z4+8/lEcDcG5Ke9S
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE6c+YaQtgxRWSmsqwRAmxAAJ4hSS4bzvtGcijZ4BJgpHJfRii6mwCeLy07
3ZG8gDsl4sCmGcc9sGvIJb8=
=74GX
-----END PGP SIGNATURE-----

--z4+8/lEcDcG5Ke9S--

--__--__--

Message: 13
Date: Sun, 28 Jan 2001 01:29:41 -0800
From: Ian Clarke <[email protected]>
To: devl at freenetproject.org
Subject: Re: [freenet-devl] Suggestion for solving the small vs. large file 
debate
Reply-To: devl at freenetproject.org

--T7mxYSe680VjQnyC
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Sun, Jan 28, 2001 at 01:11:00AM -0800, Dev Random wrote:
> Yeah, I have proposed this awhile ago with each bucket storing files
> twice as large as the previous bucket.  The algorithm for retiring files
> was to retire based on age in seconds multiplied by the size of the file.
> The file with the largest such "weight" would be retired if the datastore
> was out of space.

But why seconds, and what units would you choose for the size of the
file?  This requires making an arbitrary decision, and arbitrary
decisions are bad ("hey, lets have a memory limit of 640k, nobody will
ever need more than that!").

Ian.

--T7mxYSe680VjQnyC
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE6c+aFQtgxRWSmsqwRAig4AJ4i2lzIjmH/LAlsnnzew/RpQA7s3wCdEflv
5Kp5Mvl9Q8ZUsrdOFcYnoR8=
=Li+7
-----END PGP SIGNATURE-----

--T7mxYSe680VjQnyC--

--__--__--

_______________________________________________
Devl mailing list
Devl at freenetproject.org
http://www.uprizer.com/mailman/listinfo/devl

End of Devl Digest

Devl digest, Vol 1 #155 - 15 msgs

Reply via email to