Re: File search
On Friday, January 21st, 2022 at 9:03 AM, Ludovic Courtès wrote: > The database for 18K packages is quite big: > > --8<---cut here---start->8--- > > $ du -h /tmp/db* > > 389M /tmp/db > > 82M /tmp/db.gz > > 61M /tmp/db.zst > > --8<---cut here---end--->8--- > [snip] > In terms of privacy, I think it’s better if we can avoid making > one request per file searched for. Off-line operation would be > sweet, and it comes with responsiveness; fast off-line search is > necessary for things like ‘command-not-found’ (where the shell > tells you what package to install when a command is not found). Offline operation is crucial, and I don't think it's desirable to download tens or hundreds of megabytes. What about creating & distributing a bloom filter per package, with members being file names? This would allow us to dramatically reduce the size of data we distribute, at the cost of not giving 100% reliable answers. We've established, though, that some information is better than none, and the uncertainty can be resolved by querying a web service or building the package locally and searching its directory.
Re: WSDD Service Module
Hello, thanks for your reply. Ludovic Courtès writes: > My understanding is that you intend the ‘interface’ field to be either > #f or a string, is that right? I think it rather be a list of strings, since wsdd takes the list of interfaces to listen to. So it should expand to --interface eth0 --interface eth1, etc. > When you write: > > (interface) > > that means: “call the procedure bound to ‘interface’, passing it zero > arguments”. However, if ‘interface’ is a string, you cannot call it, so > you get a wrong-type-to-apply error. > > Likewise, ‘for-each’ expects its second argument to be a list. But > here, ‘interface’ is supposedly a string, not a list, so if you do: > > (for-each (lambda …) interface) > > you’ll get a wrong-type-argument error. So I changed it, that interface is usually an empty list now, and with for-each I'd like to have it expanded. Good thing is, I've gotten at least a step further, but only after hard coding the list as an argument in the for-each expression. So it should work? It still doesn't. And I still don't understand how it is somehow not passed as a list properly. One thing I noticed, after hard coding the argument, the procedure is not properly expanded in the constructor. How come? This is the output in the service file: --8<---cut here---start->8--- (make-forkexec-constructor (list "/gnu/store/6jpn21wnnyz59ii634hfbk34yy48nxrq-wsdd-0.6.4/bin/wsdd" "--hoplimit" "1" for-each (lambda (arg) (format #t "--interface ~s " (arg))) (interface) "--workgroup" "WORKGROUP") #:user "wsdd" #:group "wsdd" #:log-file "/var/log/wsdd.log") --8<---cut here---end--->8--- My for-each procedure and list works in my REPL, but it fails in Guix. It might have to do with me trying to get on with make-forkexec-constructor. So this constructor needs a list of strings? I put interfaces into a let*, and would call it in the constructor. Unfortunately this results into an invalid G-expression. Thanks for you help. It is taking its time to get comfortable with Guile. I've attached the current state of the service too. wsdd.scm Description: Binary data Kind regards Simon
Re: WSDD Service Module
Ludovic Courtès writes: > (+Cc: guix-devel.) Hope I get it right this time. > Then I recommend calling it ‘interfaces’ (plural). Will do. > #~(make-forkexec-constructor > … #$@(map (lambda …) interfaces) > …) > > which would expand to: > > (make-forkexec-constructor > … "--interface=eth0" "--interface=eth1" > …) Yes this makes more sense, and it works too! I got too focused on trying to use for-each. Time to complete this service. Thanks a lot! Simon
Re: WSDD Service Module
Hi, (+Cc: guix-devel.) Simon Streit skribis: > Ludovic Courtès writes: >> My understanding is that you intend the ‘interface’ field to be either >> #f or a string, is that right? > > I think it rather be a list of strings, Then I recommend calling it ‘interfaces’ (plural). >> When you write: >> >> (interface) >> >> that means: “call the procedure bound to ‘interface’, passing it zero >> arguments”. However, if ‘interface’ is a string, you cannot call it, so >> you get a wrong-type-to-apply error. >> >> Likewise, ‘for-each’ expects its second argument to be a list. But >> here, ‘interface’ is supposedly a string, not a list, so if you do: >> >> (for-each (lambda …) interface) >> >> you’ll get a wrong-type-argument error. > > So I changed it, that interface is usually an empty list now, and with > for-each I'd like to have it expanded. Good thing is, I've gotten at > least a step further, but only after hard coding the list as an argument > in the for-each expression. So it should work? It still doesn't. And > I still don't understand how it is somehow not passed as a list > properly. > > One thing I noticed, after hard coding the argument, the procedure is > not properly expanded in the constructor. How come? This is the output > in the service file: > > (make-forkexec-constructor > (list "/gnu/store/6jpn21wnnyz59ii634hfbk34yy48nxrq-wsdd-0.6.4/bin/wsdd" > "--hoplimit" "1" for-each >(lambda >(arg) > (format #t "--interface ~s " > (arg))) >(interface) >"--workgroup" "WORKGROUP") > #:user "wsdd" #:group "wsdd" #:log-file "/var/log/wsdd.log") This reads “(interface)”, meaning that (1) ‘interface’ must be a procedure, since you’re calling it, and (2) ‘interface’ must be bound (the variable must be defined there). However, ‘interface’ is unbound here. You probably meant to write, within your gexp: #~(make-forkexec-constructor … #$@(map (lambda …) interfaces) …) which would expand to: (make-forkexec-constructor … "--interface=eth0" "--interface=eth1" …) The #$@ bit (‘ungexp-splicing’) means that the (map …) bit executes beforehand and that its results is staged in that generated shepherd file. HTH, Ludo’.
Re: File search
On 2022-01-25 12:20, Oliver Propst wrote: On 2022-01-25 12:15, Ludovic Courtès wrote: I'm also not an expert at Sql-Lite but I can state that the effort looks very nice and promising Ludovic :) And definitely a step-up from the current implementation (obviously).. -- Kinds regards Oliver Propst https://twitter.com/Opropst
Re: File search
On 2022-01-25 12:15, Ludovic Courtès wrote: I'm also not an expert at Sql-Lite but I can state that the effort looks very nice and promising Ludovic :) -- Kinds regards Oliver Propst https://twitter.com/Opropst
Re: File search
Maxim Cournoyer skribis: > I also had the idea of making it a package... this way only the people > who opt to install the database locally would incur the cost (in > bandwidth). > > Perhaps a question for Vagrant: talking about size, is this SQLite > database file comparable or smaller in size to the apt-file database > that needs to be downloaded? With the Debian software catalog being > about 30% bigger, I'd expect a similarly bigger file size. > > If Debian is doing better in terms of database file size, we could look > at how they're doing it. As a back-of-the-envelope estimate, here’s the amount of text that needs to be available in the database: --8<---cut here---start->8--- ludo@berlin ~/src$ sqlite3 -csv /tmp/db 'select name,version from packages; select name from directories;select name from files;'|wc -c 197689978 ludo@berlin ~/src$ guile -c '(pk (/ 197689978 (expt 2. 20)))' ;;; (188.5318546295166) ludo@berlin ~/src$ du -h /tmp/db 389M/tmp/db --8<---cut here---end--->8--- So roughly, SQLite with this particular schema ends up taking twice as much space as the lower bound. We can do a bit better (I’m not an expert, so I’m just trying things naively) by dropping the index and cleaning up the database: --8<---cut here---start->8--- ludo@berlin ~/src$ cp /tmp/db{,.without-index} ludo@berlin ~/src$ sqlite3 /tmp/db.without-index SQLite version 3.32.3 2020-06-18 14:00:33 Enter ".help" for usage hints. sqlite> drop index IndexFiles; sqlite> .quit ludo@berlin ~/src$ du -h /tmp/db.without-index 389M/tmp/db.without-index ludo@berlin ~/src$ sqlite3 /tmp/db.without-index SQLite version 3.32.3 2020-06-18 14:00:33 Enter ".help" for usage hints. sqlite> vacuum; sqlite> .quit ludo@berlin ~/src$ du -h /tmp/db.without-index 290M/tmp/db.without-index --8<---cut here---end--->8--- With compression: --8<---cut here---start->8--- ludo@berlin ~/src$ zstd -19 < /tmp/db.without-index > /tmp/db.without-index.zst ludo@berlin ~/src$ du -h /tmp/db.without-index.zst 37M /tmp/db.without-index.zst --8<---cut here---end--->8--- (Down from 61MB.) For comparison, this is smaller than guile, perl, gtk+, and roughly the same as glibc:out. For the record, with compression, the lower bound is about 12 MiB: --8<---cut here---start->8--- ludo@berlin ~/src$ sqlite3 -csv /tmp/db 'select name,version from packages; select name from directories;select name from files;'|zstd -19|wc -c 12128674 ludo@berlin ~/src$ guile -c '(pk (/ 12128674 (expt 2. 20)))' ;;; (11.566804885864258) --8<---cut here---end--->8--- All this to say that we could distribute the database in a form that gets closer to the optimal size, at the expense of extra processing on the client side upon reception to put it into shape (creating an index, etc.). Ludo’.