Re: [fossil-users] Off-topic faith declarations (was Re: "how to use git to lose data")

Stephan Beal Wed, 03 Sep 2014 08:46:29 -0700

On Tue, Sep 2, 2014 at 8:35 PM, David Given <d...@cowlark.com> wrote:


> I have no idea whether this is feasible or not. I don't really know how
> different SQLite's SQL dialect is from other databases --- since
> discovering SQLite I haven't really felt a need to get into MySQL or
> Postgres --- and there may also be killer non-API assumptions: e.g.,
> with SQLite, local processing is cheap. With a remote DBMS, local
> processing is expensive, because you have to push all the data over the
> wire. So if Fossil's expecting to be able to cheaply enumerate big
> chunks of the version graph, that's unlikely to adapt well to a remote
> database.
>

See my previous reply for some on that. Note, also, that the C APIs are
_exceedingly_ different. The MySQL C API is an absolute nightmare to work
with. Postgres' C API (insofar as i can determine) cannot figure out on its
own how many custom parameters a statement contains - the caller has to
pass that himself, which makes in impossible to use in general-purpose APIs
without writing code which parses SQL and figure out how many times bound
parameters it contains. (Maybe there's a C API for this somewhere - i
couldn't find it).

sqlite's API, OTOH, is an absolute dream to work with, in terms of
client-side effort vs sqlite effort. i.e. you have to type a little bit,
and for that sqlite does a lot. MySQL requires you know how big each result
column will be, in advance, so that you can allocate the memory for it (in
advance, and then manage it yourself). It's truly painful.


> I believe NetBSD has one of the bigger Fossil repositories, and they've
> complained that some Fossil operations are slow (commits and updates ---
> http://2011.eurobsdcon.org/papers/sonnenberger/fossilizing.pdf claims
> 8-10 seconds for an operation which git and hg do in 1.
>

If the pkgsrc repo can do _anything_ in 8-10 seconds, i'd be impressed ;).
AFAIK, it's the biggest repo out there, in terms of # of files/size. There
are several quite massive ones, though (TCL core is the biggest repo i've
played with).


> I have a vague memory of a discussion here coming to the conclusion that
> this is because of large chains of deltas having to be resolved?
>

Partially, yes, and partially do the the repo-cksum which came up earlier
(a.k.a. "the R-card").

(I myself have a 250MB Fossil repository with a number of large files in
> it (multi-megabyte jpegs) and have noticed that operations are
> noticeably slower there than on small repositories.)
>

Applying fossil deltas requires, in general, 2.X copies in memory at once:
the original, the delta, and the output version. That gets expensive
quickly, in terms of memory, for large files. It would be really easy to
run out of memory on smaller systems (e.g. an RPi), and fossil's
out-of-memory policy is to abort() (which, as horrible as that initially
sounds, cuts the amount of implementation code tremendously because we
never have to check for allocation errors - many of the libfossil
counterparts of fossil core algos are 2x as long, solely because of the
additional error checking needed for the lib-style API).


-- 
----- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf

_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Re: [fossil-users] Off-topic faith declarations (was Re: "how to use git to lose data")

Reply via email to