Carl Lowenstein wrote:
On 8/18/07, Carl Lowenstein <[EMAIL PROTECTED]> wrote:
On 8/18/07, Carl Lowenstein <[EMAIL PROTECTED]> wrote:
On 8/18/07, James G. Sack (jim) <[EMAIL PROTECTED]> wrote:
Carl Lowenstein wrote:
On 8/18/07, James G. Sack (jim) <[EMAIL PROTECTED]> wrote:
Carl Lowenstein wrote:
On 8/17/07, Gus Wirth <[EMAIL PROTECTED]> wrote:
Fedora 5 has mlocate, which is different from both but tries to be
compatible. It is written by Miloslav Trmac of Redhat. It doesn't seem
to have any file size issues.

I have confirmed (in ubu 7.04 32bit) the behavior you describe. Running
an strace is informative: It shows calls to lstat64 and access.
Are you tracing locate or updatedb?   In other words, is the problem
in using the database or in building it.
locate.
  sudo strace -o /tmp/s.out locate '*DVD*iso'

I know the leading * is redundant, but maybe only is some locates. It
does seem to behave different in {s,m}locate on ubu704/f7.

On a 64bit f7, there are calls to access, but not any lstat.

I bet a look at the code in the vicinity of the lstat64 call might show
a variable type mismatch.
So it is fixed there.  Now I will look for slocate sources.

Slocate was replaced by Mlocate in FC4.  Had to download 4 CDs worth
of FC3 sources because I couldn't find them here online, and couldn't
find burned CDs either.  Fortunately RoadRunner is working well today,
and I got 4 parallel streams at 200kB/sec from the RedHat FTP site.

Now I have source for the slocate that matches the one I am using.  Or
"amusing" as those two words came out on first typing.  Build from
source, try running on a copy of the resident slocate.db.  The sample
2GB files were not "slocate"d as usual.  Tried strace, learned that
the lstat64 calls are all for reading system things like ld.so.cache
and libc.so.6 and nsswitch.conf etc. etc.  And one for reading
slocate.db.

All the hard work is done reading slocate.db in 4kb chunks.  It is a
funny compressed ASCII thing, with common leading strings such as
directory names suppressed.  If, for instance, I do

$ sudo strings slocate.db | grep -i centos
I find in the output the string "CentOS-5.0-i386-bin-DVD" which is the
name of the directory where I expect to find one of the >2GB files.
But I don't find the name of the file there.  So maybe it never got
into the database.  Now I have to look around some more.

strings is a powerful tool.  I find that using the names of the two
corner cases, the file just under 2GB and the file just over 2GB, the
first one is present in slocate.db and the second one is not.  So I
have been looking in the wrong place for the problem.

Names of files of size >2GB do not make it into the slocate database.

Stay tuned for more information, as I go look at updatedb.

The smoking gun just lost its smoke, I wish I knew why.
Running updatedb on a subset of the whole directory structure, this
phenomenon (large files don't appear in the database) does not seem to
happen.

One can restrict the set of files that updatedb indexes, and specify
where the output goes.

$ updatedb -U /data1/Torrents -o /tmp/Torrents.db  # this is the
directory where I keep stuff
$ strings /tmp/Torrents.db > /tmp/Torrents_strings
                                                            # also get
strings from total system database
$ sudo strings /var/lib/slocate/slocate.db > /tmp/slocate_strings

Now look in these "strings" files for evidence of what was indexed.

$ grep -i centos slocate_strings | grep -i "dvd.*iso"
centos-4.3-alpha-bindvd.iso
$ grep -i centos Torrents_strings | grep -i "dvd.*iso"
/CentOS-4.4-i386-binDVD.iso
centos-4.3-alpha-bindvd.iso
/CentOS-5.0-i386-bin-DVD.iso

There are two more files found in the subset database than in the
whole system database.  By some strange coincidence, both of these
files are >2GB in size.  But I don't know why they get excluded.
Certainly not from any exclusion rule in /etc/updatedb.conf.

I have no knowledge here, just a thought, perhaps crazy, but here goes. Could it be some kind of timeout problem? When the CPU is bogged down with a large set of files to be filed into the db, could it be that the large files are just taking too long to respond, and thereby end up getting skipped?

Like I said before, just an uninformed thought.


Oh, yes, using slocate on these truncated databases finds all the
files it should.

Answer to your posting of 3:52 pm coming in next installment.

    carl


--
Ralph

--------------------
How do you test an uncooperative intelligence when it's smarter than you? 
--Stewart Stremler


--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list

Reply via email to