Re: [pacman-dev] pacman cold caches performance, too much stat()ing

2009-12-13 Thread Dimitrios Apostolou

On Fri, 11 Dec 2009, Nagy Gabor wrote:

-Su is even worse: We have to read all desc files in sync (to get
%REPLACES%)


I changed pacman DB using the following script, and applied the attached 
script to pacman so that "pacman -Su" is possible only by reading 
"depends" files, not "desc".


for d in $PACDB/sync/*/* $PACDB/local/*
do
sed -ne '/%REPLACES%/,/^$/p' $d/desc >> $d/depends
sed -i -e '/%REPLACES%/,/^$/d' $d/desc
done


The timings I got for the command "pacman -Sup -b $PACDB" are the 
following.


Before:
2m15s (cold caches)
0.98s (hot caches)

After:
2m11s (cold caches)
0.78s (hot caches)

Unfortunately it didn't provide such a big improvement as I expected. I 
can now justify this because desc files in sync/ were being read *instead* 
of depends, not in addition to them. So I _replaced_ the read("*/desc") 
with read("*/depends"), but disk seeks are still happening, even for these 
smaller files.


I believe big differences will show when people have many upgrades 
pending, in which case the already cached depends will come handy. 
Unfortunately I only had 6 upgrades pending.


But I think it's vital that not all descriptions and stuff are read during 
a common operation like -Su, and with the provided script (run only for 
local, since sync should be upgraded on the server) the transition should 
be smooth for end-users. What's your opinion?



Dimitrisdiff --git a/lib/libalpm/be_files.c b/lib/libalpm/be_files.c
index d09d72a..a6087bd 100644
--- a/lib/libalpm/be_files.c
+++ b/lib/libalpm/be_files.c
@@ -460,12 +460,6 @@ int _alpm_db_read(pmdb_t *db, pmpkg_t *info, pmdbinfrq_t 
inforeq)
goto error;
}
STRDUP(info->md5sum, _alpm_strtrim(line), goto 
error);
-   } else if(strcmp(line, "%REPLACES%") == 0) {
-   while(fgets(line, sline, fp) && 
strlen(_alpm_strtrim(line))) {
-   char *linedup;
-   STRDUP(linedup, _alpm_strtrim(line), 
goto error);
-   info->replaces = 
alpm_list_add(info->replaces, linedup);
-   }
} else if(strcmp(line, "%FORCE%") == 0) {
info->force = 1;
}
@@ -528,6 +522,12 @@ int _alpm_db_read(pmdb_t *db, pmpkg_t *info, pmdbinfrq_t 
inforeq)
STRDUP(linedup, _alpm_strtrim(line), 
goto error);
info->conflicts = 
alpm_list_add(info->conflicts, linedup);
}
+   } else if(strcmp(line, "%REPLACES%") == 0) {
+   while(fgets(line, sline, fp) && 
strlen(_alpm_strtrim(line))) {
+   char *linedup;
+   STRDUP(linedup, _alpm_strtrim(line), 
goto error);
+   info->replaces = 
alpm_list_add(info->replaces, linedup);
+   }
} else if(strcmp(line, "%PROVIDES%") == 0) {
while(fgets(line, sline, fp) && 
strlen(_alpm_strtrim(line))) {
char *linedup;
diff --git a/lib/libalpm/package.c b/lib/libalpm/package.c
index 83a2fb8..387cb1b 100644
--- a/lib/libalpm/package.c
+++ b/lib/libalpm/package.c
@@ -387,8 +387,8 @@ alpm_list_t SYMEXPORT *alpm_pkg_get_replaces(pmpkg_t *pkg)
ASSERT(handle != NULL, return(NULL));
ASSERT(pkg != NULL, return(NULL));
 
-   if(pkg->origin == PKG_FROM_CACHE && !(pkg->infolevel & INFRQ_DESC)) {
-   _alpm_db_read(pkg->origin_data.db, pkg, INFRQ_DESC);
+   if(pkg->origin == PKG_FROM_CACHE && !(pkg->infolevel & INFRQ_DEPENDS)) {
+   _alpm_db_read(pkg->origin_data.db, pkg, INFRQ_DEPENDS);
}
return pkg->replaces;
 }



Re: [pacman-dev] pacman cold caches performance, too much stat()ing

2009-12-13 Thread Sebastian Nowicki


On 13/12/2009, at 2:31 AM, Nagy Gabor wrote:


On Sat, 12 Dec 2009, Dimitrios Apostolou wrote:

Hello list,

I have been investigating the slow performance of pacman regarding
the cold caches scenario and I'm trying to write some proof of
concept code that improves things a lot for some cases. However I
need your help regarding some facts I might have misunderstood, and
any pointers to the source code you also give me would also help a
lot. I wouldn't like to lose time changing stuff that would break
current functionality. So here are some first questions that come
to mind, just by using strace:

When doing "pacman -Q blah" I can see that besides the getdents()
syscalls in /var/lib/pacman/local (probably caused by readdir()),
there are also stat() and access() calls for every single
subdirectory. Why are the last ones necessary? Isn't readdir enough?

The same goes when doing "pacman -S blah". But in that case it
stat()'s both 'local' and 'sync' directories, so worst case is
really bad, it will stat() all contents of local, core, extra and
community...


Regarding the stat() and access() operations I finally found out why
they happen exactly:

In case of corrupted db the sync, for example, directory might
contain files, not subdirectories. So in that case
_alpm_db_populate() just makes sure it's a directory. However
stat()ing thousands of files is too much of a price to pay.
Similarly, access() checks it is accessible by the user.

In the attached patch I have just removed the relevant lines, with
the following rationale: In the rare case of corrupted db, even if we
do open("sync/not_a_dir/depends") it will still fail and we'll catch
the failure there, no need to investigate the cause further, just
write a message like "couldn't access sync/not_a_dir/depends".

By dropping caches ("echo 3 > /proc/sys/vm/drop_caches") before
running, I measure a nice performance boost on my old laptop: "pacman
-Q gdb" time is reduced from about 7s to 2.5s.


Hm. This is a nice time boost... Did you test this with other
operations, too?


What do you think? Is it possible to remove those checks?
Dimitris


The best solution would be to rewrite our whole database crap as Dan
said. I am pretty sure that this patch would not cause any harm irl,  
but

our code would become a little bit more dangerous: As I see,
db_read(INFRQ_BASE) would become a ~NOP function and db_populate would
become a simple "ls" function (the only remaining sanity check is
splitname).


It occurs to me time and time again, that it would be a good idea to  
try and abstract the database functions, to allow different database  
backends to be "plugged in." This would make experimentation with  
backends a lot easier, since you just compile a different file in (the  
interface remains the same). A library called libpkg[1] does something  
along these lines, by leveraging function pointers. Unfortunately I  
don't have a lot of time to look into it further, but it's an  
interesting idea.


[1]: http://libpkg.berlios.de/doc/trunk/html/pkg__db_8h_source.html




Re: [pacman-dev] pacman cold caches performance, too much stat()ing

2009-12-13 Thread Dimitrios Apostolou

On Sat, 12 Dec 2009, Nagy Gabor wrote:

(the only remaining sanity check is splitname).


Correct, and it has just caught an error for me:

$ ./src/pacman/.libs/lt-pacman -Sup
:: Starting full system upgrade...
error: invalid name for database entry '.lastupdate'
error: invalid name for database entry '.lastupdate'
error: invalid name for database entry '.lastupdate'
 local database is up to date


It seems there is a non-directory file inside the sync db, I missed this 
one among the heaps of strace output. Is this intentional? Is this 
actually used? If it is, and since _alpm_db_populate() knows about the db 
structure, it should avoid that specific case.


The following patch takes care of that special case and also reserves all 
hidden files for special database purposes, not db entries:


--- a/lib/libalpm/be_files.c
+++ b/lib/libalpm/be_files.c
@@ -238,7 +238,8 @@ int _alpm_db_populate(pmdb_t *db)
const char *name = ent->d_name;
pmpkg_t *pkg;

-   if(strcmp(name, ".") == 0 || strcmp(name, "..") == 0) {
+   /* skip hidden files and '.' and '..' subdirectories */
+   if (name[0] == '.') {
continue;
}
pkg = _alpm_pkg_new();



Dimitris



Re: [pacman-dev] pacman cold caches performance, too much stat()ing

2009-12-12 Thread Dimitrios Apostolou

On Sat, 12 Dec 2009, Nagy Gabor wrote:

On Sat, 12 Dec 2009, Dimitrios Apostolou wrote:
Regarding the stat() and access() operations I finally found out why
they happen exactly:

In case of corrupted db the sync, for example, directory might
contain files, not subdirectories. So in that case
_alpm_db_populate() just makes sure it's a directory. However
stat()ing thousands of files is too much of a price to pay.
Similarly, access() checks it is accessible by the user.

In the attached patch I have just removed the relevant lines, with
the following rationale: In the rare case of corrupted db, even if we
do open("sync/not_a_dir/depends") it will still fail and we'll catch
the failure there, no need to investigate the cause further, just
write a message like "couldn't access sync/not_a_dir/depends".

By dropping caches ("echo 3 > /proc/sys/vm/drop_caches") before
running, I measure a nice performance boost on my old laptop: "pacman
-Q gdb" time is reduced from about 7s to 2.5s.


Hm. This is a nice time boost... Did you test this with other
operations, too?


I didn't time it, but strace shows this improvement applies to -Qi, -Si, 
-Su as well. It doesn't show that much however because all these 
operations actually read() thousands of files (depends, desc) which is 
much worse than stat(). :-)





What do you think? Is it possible to remove those checks?
Dimitris


The best solution would be to rewrite our whole database crap as Dan 
said. I am pretty sure that this patch would not cause any harm irl, but


Because I really like the ease of use of the current format, I'll try 
improving things with minimum changes to it. If we can avoid a complete 
backend rewrite with minor changes, that is a good think, isn't it?


our code would become a little bit more dangerous: As I see, 
db_read(INFRQ_BASE) would become a ~NOP function and db_populate would 
become a simple "ls" function (the only remaining sanity check is 
splitname).


Exactly! Just a simple ls should be necessary, that was my initial 
motivation. And I have thought of a way to even avoid that readdir(), but 
I should get some measurements first.



Dimitris



Re: [pacman-dev] pacman cold caches performance, too much stat()ing

2009-12-12 Thread Nagy Gabor
> On Sat, 12 Dec 2009, Dimitrios Apostolou wrote:
> > Hello list,
> >
> > I have been investigating the slow performance of pacman regarding
> > the cold caches scenario and I'm trying to write some proof of
> > concept code that improves things a lot for some cases. However I
> > need your help regarding some facts I might have misunderstood, and
> > any pointers to the source code you also give me would also help a
> > lot. I wouldn't like to lose time changing stuff that would break
> > current functionality. So here are some first questions that come
> > to mind, just by using strace:
> >
> > When doing "pacman -Q blah" I can see that besides the getdents()
> > syscalls in /var/lib/pacman/local (probably caused by readdir()),
> > there are also stat() and access() calls for every single
> > subdirectory. Why are the last ones necessary? Isn't readdir enough?
> >
> > The same goes when doing "pacman -S blah". But in that case it
> > stat()'s both 'local' and 'sync' directories, so worst case is
> > really bad, it will stat() all contents of local, core, extra and
> > community...
> 
> Regarding the stat() and access() operations I finally found out why
> they happen exactly:
> 
> In case of corrupted db the sync, for example, directory might
> contain files, not subdirectories. So in that case
> _alpm_db_populate() just makes sure it's a directory. However
> stat()ing thousands of files is too much of a price to pay.
> Similarly, access() checks it is accessible by the user.
> 
> In the attached patch I have just removed the relevant lines, with
> the following rationale: In the rare case of corrupted db, even if we
> do open("sync/not_a_dir/depends") it will still fail and we'll catch
> the failure there, no need to investigate the cause further, just
> write a message like "couldn't access sync/not_a_dir/depends".
> 
> By dropping caches ("echo 3 > /proc/sys/vm/drop_caches") before
> running, I measure a nice performance boost on my old laptop: "pacman
> -Q gdb" time is reduced from about 7s to 2.5s.

Hm. This is a nice time boost... Did you test this with other
operations, too?
 
> What do you think? Is it possible to remove those checks?
> Dimitris
 
The best solution would be to rewrite our whole database crap as Dan
said. I am pretty sure that this patch would not cause any harm irl, but
our code would become a little bit more dangerous: As I see,
db_read(INFRQ_BASE) would become a ~NOP function and db_populate would
become a simple "ls" function (the only remaining sanity check is
splitname).

Bye



Re: [pacman-dev] pacman cold caches performance, too much stat()ing

2009-12-12 Thread Dimitrios Apostolou

On Sat, 12 Dec 2009, Dimitrios Apostolou wrote:

Hello list,

I have been investigating the slow performance of pacman regarding the cold 
caches scenario and I'm trying to write some proof of concept code that 
improves things a lot for some cases. However I need your help regarding some 
facts I might have misunderstood, and any pointers to the source code you 
also give me would also help a lot. I wouldn't like to lose time changing 
stuff that would break current functionality. So here are some first 
questions that come to mind, just by using strace:


When doing "pacman -Q blah" I can see that besides the getdents() syscalls in 
/var/lib/pacman/local (probably caused by readdir()), there are also stat() 
and access() calls for every single subdirectory. Why are the last ones 
necessary? Isn't readdir enough?


The same goes when doing "pacman -S blah". But in that case it stat()'s both 
'local' and 'sync' directories, so worst case is really bad, it will stat() 
all contents of local, core, extra and community...


Regarding the stat() and access() operations I finally found out why they 
happen exactly:


In case of corrupted db the sync, for example, directory might contain 
files, not subdirectories. So in that case _alpm_db_populate() just makes 
sure it's a directory. However stat()ing thousands of files is too much of 
a price to pay. Similarly, access() checks it is accessible by the user.


In the attached patch I have just removed the relevant lines, with the 
following rationale: In the rare case of corrupted db, even if we do 
open("sync/not_a_dir/depends") it will still fail and we'll catch the 
failure there, no need to investigate the cause further, just write a 
message like "couldn't access sync/not_a_dir/depends".


By dropping caches ("echo 3 > /proc/sys/vm/drop_caches") before running, I 
measure a nice performance boost on my old laptop: "pacman -Q gdb" time is 
reduced from about 7s to 2.5s.



What do you think? Is it possible to remove those checks?
Dimitris


P.S. Now all that remains is the depends/conflicts/requiredby stuff which 
is by far the hardest... I'm still trying to decipher the patch 
implementing REQUIREDBY that was posted earlier.diff --git a/lib/libalpm/be_files.c b/lib/libalpm/be_files.c
index 90e97a5..7d80ea7 100644
--- a/lib/libalpm/be_files.c
+++ b/lib/libalpm/be_files.c
@@ -222,8 +222,6 @@ int _alpm_db_populate(pmdb_t *db)
 {
int count = 0;
struct dirent *ent = NULL;
-   struct stat sbuf;
-   char path[PATH_MAX];
const char *dbpath;
DIR *dbdir;
 
@@ -243,12 +241,6 @@ int _alpm_db_populate(pmdb_t *db)
if(strcmp(name, ".") == 0 || strcmp(name, "..") == 0) {
continue;
}
-   /* stat the entry, make sure it's a directory */
-   snprintf(path, PATH_MAX, "%s%s", dbpath, name);
-   if(stat(path, &sbuf) != 0 || !S_ISDIR(sbuf.st_mode)) {
-   continue;
-   }
-
pkg = _alpm_pkg_new();
if(pkg == NULL) {
closedir(dbdir);
@@ -337,13 +329,6 @@ int _alpm_db_read(pmdb_t *db, pmpkg_t *info, pmdbinfrq_t 
inforeq)
 
pkgpath = get_pkgpath(db, info);
 
-   if(access(pkgpath, F_OK)) {
-   /* directory doesn't exist or can't be opened */
-   _alpm_log(PM_LOG_DEBUG, "cannot find '%s-%s' in db '%s'\n",
-   info->name, info->version, db->treename);
-   goto error;
-   }
-
/* DESC */
if(inforeq & INFRQ_DESC) {
snprintf(path, PATH_MAX, "%sdesc", pkgpath);



Re: [pacman-dev] pacman cold caches performance, too much stat()ing

2009-12-12 Thread Dimitrios Apostolou

On Sat, 12 Dec 2009, Nagy Gabor wrote:

What do you think about Xav's --print patch:
http://code.toofishes.net/cgit/xavier/pacman.git/log/?h=print


Looks ideal! Will it be included in the master branch? If only I new 
about that some hours ago...



Dimitris




Re: [pacman-dev] pacman cold caches performance, too much stat()ing

2009-12-12 Thread Nagy Gabor
> On Sat, 12 Dec 2009, Dimitrios Apostolou wrote:
> > P.S. Is there some option --pretend I might have missed? What I
> > need is to get exactly the same actions of "pacman -S blah" or
> > "pacman -Su" until the Y/N prompt, as non-root user.
> 
> Regarding this issue, it's the first I tried to fix since I didn't
> want to run my random changes as root. Fortunately I accidentaly
> found about -p option (--print-uris) which does exactly what I need,
> so I quickly hacked a -P (--pretend) version for sync operations,
> which runs as user and only outputs package list and sizes.
> 
> I /think/ that this functionality is required by packagekit, so that
> it can automatically notify the user when updates become available. I 
> remember on fedora something similar is "yum check-update".
> 
> Ofcourse it would be much nicer if it worked for all operations, not
> only sync. For example in a recursive remove it could notify the user
> of all packages that would be removed, the size to be freed etc
> without needing root or locking the db. But I'll skip that part,
> since my focus is elsewhere.
> 
> 
> What do you think?
> Dimitris

What do you think about Xav's --print patch:
http://code.toofishes.net/cgit/xavier/pacman.git/log/?h=print

Bye



Re: [pacman-dev] pacman cold caches performance, too much stat()ing

2009-12-12 Thread Dimitrios Apostolou

On Sat, 12 Dec 2009, Dimitrios Apostolou wrote:
P.S. Is there some option --pretend I might have missed? What I need is to 
get exactly the same actions of "pacman -S blah" or "pacman -Su" until the 
Y/N prompt, as non-root user.


Regarding this issue, it's the first I tried to fix since I didn't want to 
run my random changes as root. Fortunately I accidentaly found about -p 
option (--print-uris) which does exactly what I need, so I quickly hacked 
a -P (--pretend) version for sync operations, which runs as user and only 
outputs package list and sizes.


I /think/ that this functionality is required by packagekit, so that it 
can automatically notify the user when updates become available. I 
remember on fedora something similar is "yum check-update".


Ofcourse it would be much nicer if it worked for all operations, not only 
sync. For example in a recursive remove it could notify the user of all 
packages that would be removed, the size to be freed etc without needing 
root or locking the db. But I'll skip that part, since my focus is 
elsewhere.



What do you think?
Dimitris
diff --git a/src/pacman/conf.h b/src/pacman/conf.h
index c97e5d7..26a4e37 100644
--- a/src/pacman/conf.h
+++ b/src/pacman/conf.h
@@ -60,6 +60,7 @@ typedef struct __config_t {
unsigned short op_s_search;
unsigned short op_s_upgrade;
unsigned short op_s_printuris;
+   unsigned short op_s_pretend;
 
unsigned short group;
pmtransflag_t flags;
diff --git a/src/pacman/pacman.c b/src/pacman/pacman.c
index ff6ef5c..02c93d0 100644
--- a/src/pacman/pacman.c
+++ b/src/pacman/pacman.c
@@ -130,6 +130,7 @@ static void usage(int op, const char * const myname)
printf(_("  -i, --info   view package 
information\n"));
printf(_("  -l, --list view a list of 
packages in a repo\n"));
printf(_("  -p, --print-uris print out URIs for 
given packages and their dependencies\n"));
+   printf(_("  -P, --pretendjust print what it 
/would/ do\n"));
printf(_("  -s, --search  search remote 
repositories for matching strings\n"));
printf(_("  -u, --sysupgrade upgrade installed 
packages (-uu allows downgrade)\n"));
printf(_("  -w, --downloadonly   download packages but 
do not install/upgrade anything\n"));
@@ -370,6 +371,7 @@ static int parseargs(int argc, char *argv[])
{"owns",   no_argument,   0, 'o'},
{"file",   no_argument,   0, 'p'},
{"print-uris", no_argument,   0, 'p'},
+   {"pretend",no_argument,   0, 'P'},
{"quiet",  no_argument,   0, 'q'},
{"root",   required_argument, 0, 'r'},
{"recursive",  no_argument,   0, 's'},
@@ -398,7 +400,7 @@ static int parseargs(int argc, char *argv[])
{0, 0, 0, 0}
};
 
-   while((opt = getopt_long(argc, argv, "RUQSTr:b:vkhscVfmnoldepqituwygz", 
opts, &option_index))) {
+   while((opt = getopt_long(argc, argv, 
"RUQSTr:b:vkhscVfmnoldepPqituwygz", opts, &option_index))) {
alpm_list_t *list = NULL, *item = NULL; /* lists for splitting 
strings */
 
if(opt < 0) {
@@ -512,6 +514,10 @@ static int parseargs(int argc, char *argv[])
config->flags |= PM_TRANS_FLAG_NOCONFLICTS;
config->flags |= PM_TRANS_FLAG_NOLOCK;
break;
+   case 'P':
+   config->op_s_pretend = 1;
+   config->flags |= PM_TRANS_FLAG_NOCONFLICTS;
+   config->flags |= PM_TRANS_FLAG_NOLOCK;
case 'q':
config->quiet = 1;
break;
diff --git a/src/pacman/sync.c b/src/pacman/sync.c
index a2ef616..d32d30d 100644
--- a/src/pacman/sync.c
+++ b/src/pacman/sync.c
@@ -690,6 +690,9 @@ static int sync_trans(alpm_list_t *targets)
 
display_targets(alpm_trans_get_remove(), 0);
display_targets(alpm_trans_get_add(), 1);
+   if(config->op_s_pretend) {
+   goto cleanup;
+   }
printf("\n");
 
int confirm;
@@ -757,8 +760,8 @@ int pacman_sync(alpm_list_t *targets)
 {
alpm_list_t *sync_dbs = NULL;
 
-   /* Display only errors with -Sp and -Sw operations */
-   if((config->flags & PM_TRANS_FLAG_DOWNLOADONLY) || 
config->op_s_printuris) {
+   /* Display only errors with -SP, -Sp and -Sw operations */
+   if((config->flags & PM_TRANS_FLAG_DOWNLOADONLY) || 
config->op_s_printuris || config->op_s_pretend) {
config->logmask &= ~PM_LOG_WARNING;
}
 
@@ -831,7 +834,7 @@ int pacman_sync(alpm_list_t *targets)
}
 
alpm_list_t *targs = alpm_list_strdup(targets);

Re: [pacman-dev] pacman cold caches performance, too much stat()ing

2009-12-12 Thread Dan McGee
On Sat, Dec 12, 2009 at 5:46 AM, Allan McRae  wrote:
> Dimitrios Apostolou wrote:
>>
>> On Fri, 11 Dec 2009, Nagy Gabor wrote:
>>>
>>> depends files are read in order to ensure that the upgraded package
>>> won't break any "old" dependencies.
>>>
>>> Example: local foo requires bar=2.0 (which is installed)
>>> Then "pacman -S bar" is not allowed (if bar in sync has different
>>> version).
>>
>> I see now, thanks! So if we somehow had an %RDEPENDS% field
>> (reverse-dependencies) for every package in local that would not be
>> necessary. I will see if this is doable during every install.
>
> pacman used to do such a thing, but from my understanding it caused more
> issues than it solved so it was removed.

This was the former %REQUIREDBY% field
(http://code.toofishes.net/cgit/dan/pacman.git/commit/?id=7219326dd4d01d7e49b8a40746f5495c1c329c9c).
It did end up causing more problems than it solved, as they never
seemed to be stored quite right. Instead, we switched to computing
them on the fly. This is the reason for the delay on -Qi in the cold
cache case, for instance.

I think most of this thread is the wrong approach to the problem.
Rather than try to meld the DB to fit pacman, we should just swap out
DBs so it doesn't have these bad worst-case conditions.

$ du -sh --apparent-size local/ sync/*
15M local/
8.4Msync/community
31K sync/community-testing
812Ksync/core
11M sync/extra
302Ksync/testing

With those numbers in mind, we're talking about ~30 to ~35 MB of raw
text data here. That is not a lot of data; most hard drives have at
least ~30 MB/sec performance so this whole mess could be read in under
a second if it was stored in a single file. So that would be one way
of thinking about the issue differently.

Other ways of course have come up many times on this list. BDB,
Sqlite, reading the compressed DB directly, etc.

-Dan



Re: [pacman-dev] pacman cold caches performance, too much stat()ing

2009-12-12 Thread Nagy Gabor
> On Fri, 11 Dec 2009, Nagy Gabor wrote:
> > depends files are read in order to ensure that the upgraded package
> > won't break any "old" dependencies.
> >
> > Example: local foo requires bar=2.0 (which is installed)
> > Then "pacman -S bar" is not allowed (if bar in sync has different
> > version).
> 
> I just noticed that local depends are read even when installing a new 
> package (not upgrading an old one). Why is this for?

By debugging I found one more reason: Conflict checking. %CONFLICTS%
field is also stored in depends file.

Bye



Re: [pacman-dev] pacman cold caches performance, too much stat()ing

2009-12-12 Thread Dimitrios Apostolou

On Sat, 12 Dec 2009, Allan McRae wrote:


Dimitrios Apostolou wrote:

 On Fri, 11 Dec 2009, Nagy Gabor wrote:
>  depends files are read in order to ensure that the upgraded package
>  won't break any "old" dependencies.
> 
>  Example: local foo requires bar=2.0 (which is installed)

>  Then "pacman -S bar" is not allowed (if bar in sync has different
>  version).

 I just noticed that local depends are read even when installing a new
 package (not upgrading an old one). Why is this for?



How about this example:

pacman -S foo, foo replaces=bar and provides=bar, installed package baz 
depends on foo>=2.0.


If already installed package baz depends on foo, then foo is already 
installed. Perhaps you mean it depends on bar>=2?




This should fail as bar only provides foo and so it is unknown whether the 
foo>=2.0 dependency for baz is solved.


And here you probably mean that "foo" only provides "bar" without version. 
So doing a pacman -S foo should remove "bar" because of /replaces/, but 
should also keep it because of "baz" requiring the specific version 
already installed.


Anyway I think I get it, it always needs to read local depends. It is 
getting far more complex than I initially thought...


But even for this complex case, an RDEPENDS field for local packages would 
help significantly. Perhaps you remember when it was discussed again, date 
or pacman version?



Dimitris




Re: [pacman-dev] pacman cold caches performance, too much stat()ing

2009-12-12 Thread Allan McRae

Dimitrios Apostolou wrote:

On Fri, 11 Dec 2009, Nagy Gabor wrote:

depends files are read in order to ensure that the upgraded package
won't break any "old" dependencies.

Example: local foo requires bar=2.0 (which is installed)
Then "pacman -S bar" is not allowed (if bar in sync has different
version).


I just noticed that local depends are read even when installing a new 
package (not upgrading an old one). Why is this for?




How about this example:

pacman -S foo, foo replaces=bar and provides=bar, installed package baz 
depends on foo>=2.0.


This should fail as bar only provides foo and so it is unknown whether 
the foo>=2.0 dependency for baz is solved.


Allan



Re: [pacman-dev] pacman cold caches performance, too much stat()ing

2009-12-12 Thread Dimitrios Apostolou

On Fri, 11 Dec 2009, Nagy Gabor wrote:

depends files are read in order to ensure that the upgraded package
won't break any "old" dependencies.

Example: local foo requires bar=2.0 (which is installed)
Then "pacman -S bar" is not allowed (if bar in sync has different
version).


I just noticed that local depends are read even when installing a new 
package (not upgrading an old one). Why is this for?



Dimitris




Re: [pacman-dev] pacman cold caches performance, too much stat()ing

2009-12-12 Thread Allan McRae

Dimitrios Apostolou wrote:

On Fri, 11 Dec 2009, Nagy Gabor wrote:

depends files are read in order to ensure that the upgraded package
won't break any "old" dependencies.

Example: local foo requires bar=2.0 (which is installed)
Then "pacman -S bar" is not allowed (if bar in sync has different
version).


I see now, thanks! So if we somehow had an %RDEPENDS% field
(reverse-dependencies) for every package in local that would not be 
necessary. I will see if this is doable during every install.


pacman used to do such a thing, but from my understanding it caused more 
issues than it solved so it was removed.


Allan




Re: [pacman-dev] pacman cold caches performance, too much stat()ing

2009-12-12 Thread Dimitrios Apostolou

On Fri, 11 Dec 2009, Nagy Gabor wrote:

depends files are read in order to ensure that the upgraded package
won't break any "old" dependencies.

Example: local foo requires bar=2.0 (which is installed)
Then "pacman -S bar" is not allowed (if bar in sync has different
version).


I see now, thanks! So if we somehow had an %RDEPENDS% field
(reverse-dependencies) for every package in local that would not be 
necessary. I will see if this is doable during every install.



P.S. Is there some option --pretend I might have missed? What I need
is to get exactly the same actions of "pacman -S blah" or "pacman
-Su" until the Y/N prompt, as non-root user.


-Su is even worse: We have to read all desc files in sync (to get
%REPLACES%)


Ah you are right, all desc files are being read. It really takes a lot of 
time on my old laptop. If this is only for the %REPLACES% field then a 
first action would be to put that info in a separate file or in the 
"depends" file. I'll try that too, thanks for your help!



Dimitris



Re: [pacman-dev] pacman cold caches performance, too much stat()ing

2009-12-11 Thread Nagy Gabor
> Hello list,
> 
> I have been investigating the slow performance of pacman regarding
> the cold caches scenario and I'm trying to write some proof of
> concept code that improves things a lot for some cases. However I
> need your help regarding some facts I might have misunderstood, and
> any pointers to the source code you also give me would also help a
> lot. I wouldn't like to lose time changing stuff that would break
> current functionality. So here are some first questions that come to
> mind, just by using strace:
> 
> When doing "pacman -Q blah" I can see that besides the getdents() 
> syscalls in /var/lib/pacman/local (probably caused by readdir()),
> there are also stat() and access() calls for every single
> subdirectory. Why are the last ones necessary? Isn't readdir enough?
> 
> The same goes when doing "pacman -S blah". But in that case it
> stat()'s both 'local' and 'sync' directories, so worst case is really
> bad, it will stat() all contents of local, core, extra and
> community...
> 
> In the case of "pacman -S" I measured that a great deal of time goes
> also to reading the "depends" files of all packages in local, please
> enlighten me what this is for. I have thought of a new way to store
> dependencies that should improve things, but I should first be sure
> it doesn't break anything and get some measurements myself.
 
depends files are read in order to ensure that the upgraded package
won't break any "old" dependencies.

Example: local foo requires bar=2.0 (which is installed)
Then "pacman -S bar" is not allowed (if bar in sync has different
version).

> 
> Thanks in advance,
> Dimitris
> 
> 
> P.S. Is there some option --pretend I might have missed? What I need
> is to get exactly the same actions of "pacman -S blah" or "pacman
> -Su" until the Y/N prompt, as non-root user.

-Su is even worse: We have to read all desc files in sync (to get
%REPLACES%)

Bye