Re: File search

2022-01-25 Thread Ryan Prior
On Friday, January 21st, 2022 at 9:03 AM, Ludovic Courtès  wrote:

> The database for 18K packages is quite big:
>
> --8<---cut here---start->8---
>
> $ du -h /tmp/db*
>
> 389M /tmp/db
>
> 82M /tmp/db.gz
>
> 61M /tmp/db.zst
>
> --8<---cut here---end--->8---
> [snip]
> In terms of privacy, I think it’s better if we can avoid making
> one request per file searched for. Off-line operation would be
> sweet, and it comes with responsiveness; fast off-line search is
> necessary for things like ‘command-not-found’ (where the shell
> tells you what package to install when a command is not found).

Offline operation is crucial, and I don't think it's desirable to download tens 
or hundreds of megabytes. What about creating & distributing a bloom filter per 
package, with members being file names? This would allow us to dramatically 
reduce the size of data we distribute, at the cost of not giving 100% reliable 
answers. We've established, though, that some information is better than none, 
and the uncertainty can be resolved by querying a web service or building the 
package locally and searching its directory.



Re: WSDD Service Module

2022-01-25 Thread Simon Streit

Hello, thanks for your reply.

Ludovic Courtès  writes:
> My understanding is that you intend the ‘interface’ field to be either
> #f or a string, is that right?

I think it rather be a list of strings, since wsdd takes the list of
interfaces to listen to.  So it should expand to --interface eth0
--interface eth1, etc.

> When you write:
>
>   (interface)
>
> that means: “call the procedure bound to ‘interface’, passing it zero
> arguments”.  However, if ‘interface’ is a string, you cannot call it, so
> you get a wrong-type-to-apply error.
>
> Likewise, ‘for-each’ expects its second argument to be a list.  But
> here, ‘interface’ is supposedly a string, not a list, so if you do:
>
>   (for-each (lambda …) interface)
>
> you’ll get a wrong-type-argument error.

So I changed it, that interface is usually an empty list now, and with
for-each I'd like to have it expanded.  Good thing is, I've gotten at
least a step further, but only after hard coding the list as an argument
in the for-each expression.  So it should work?  It still doesn't.  And
I still don't understand how it is somehow not passed as a list
properly.

One thing I noticed, after hard coding the argument, the procedure is
not properly expanded in the constructor.  How come?  This is the output
in the service file:
--8<---cut here---start->8---
(make-forkexec-constructor
 (list "/gnu/store/6jpn21wnnyz59ii634hfbk34yy48nxrq-wsdd-0.6.4/bin/wsdd" 
"--hoplimit" "1" for-each
   (lambda
   (arg)
 (format #t "--interface ~s "
 (arg)))
   (interface)
   "--workgroup" "WORKGROUP")
 #:user "wsdd" #:group "wsdd" #:log-file "/var/log/wsdd.log")
--8<---cut here---end--->8---

My for-each procedure and list works in my REPL, but it fails in Guix.
It might have to do with me trying to get on with
make-forkexec-constructor.  So this constructor needs a list of strings?
I put interfaces into a let*, and would call it in the constructor.
Unfortunately this results into an invalid G-expression.


Thanks for you help.  It is taking its time to get comfortable with
Guile.

I've attached the current state of the service too.



wsdd.scm
Description: Binary data


Kind regards
Simon


Re: WSDD Service Module

2022-01-25 Thread Simon Streit
Ludovic Courtès  writes:

> (+Cc: guix-devel.)

Hope I get it right this time.

> Then I recommend calling it ‘interfaces’ (plural).

Will do.

>   #~(make-forkexec-constructor
>   … #$@(map (lambda …) interfaces)
>   …)
>
> which would expand to:
>
>   (make-forkexec-constructor
> … "--interface=eth0" "--interface=eth1"
> …)

Yes this makes more sense, and it works too!  I got too focused
on trying to use for-each.

Time to complete this service.


Thanks a lot!
Simon



Re: WSDD Service Module

2022-01-25 Thread Ludovic Courtès
Hi,

(+Cc: guix-devel.)

Simon Streit  skribis:

> Ludovic Courtès  writes:
>> My understanding is that you intend the ‘interface’ field to be either
>> #f or a string, is that right?
>
> I think it rather be a list of strings,

Then I recommend calling it ‘interfaces’ (plural).

>> When you write:
>>
>>   (interface)
>>
>> that means: “call the procedure bound to ‘interface’, passing it zero
>> arguments”.  However, if ‘interface’ is a string, you cannot call it, so
>> you get a wrong-type-to-apply error.
>>
>> Likewise, ‘for-each’ expects its second argument to be a list.  But
>> here, ‘interface’ is supposedly a string, not a list, so if you do:
>>
>>   (for-each (lambda …) interface)
>>
>> you’ll get a wrong-type-argument error.
>
> So I changed it, that interface is usually an empty list now, and with
> for-each I'd like to have it expanded.  Good thing is, I've gotten at
> least a step further, but only after hard coding the list as an argument
> in the for-each expression.  So it should work?  It still doesn't.  And
> I still don't understand how it is somehow not passed as a list
> properly.  
>
> One thing I noticed, after hard coding the argument, the procedure is
> not properly expanded in the constructor.  How come?  This is the output
> in the service file:
>
> (make-forkexec-constructor
>  (list "/gnu/store/6jpn21wnnyz59ii634hfbk34yy48nxrq-wsdd-0.6.4/bin/wsdd" 
> "--hoplimit" "1" for-each
>(lambda
>(arg)
>  (format #t "--interface ~s "
>  (arg)))
>(interface)
>"--workgroup" "WORKGROUP")
>  #:user "wsdd" #:group "wsdd" #:log-file "/var/log/wsdd.log")

This reads “(interface)”, meaning that (1) ‘interface’ must be a
procedure, since you’re calling it, and (2) ‘interface’ must be bound
(the variable must be defined there).

However, ‘interface’ is unbound here.  You probably meant to write,
within your gexp:

  #~(make-forkexec-constructor
  … #$@(map (lambda …) interfaces)
  …)

which would expand to:

  (make-forkexec-constructor
… "--interface=eth0" "--interface=eth1"
…)

The #$@ bit (‘ungexp-splicing’) means that the (map …) bit executes
beforehand and that its results is staged in that generated shepherd
file.

HTH,
Ludo’.



Re: File search

2022-01-25 Thread Oliver Propst

On 2022-01-25 12:20, Oliver Propst wrote:

On 2022-01-25 12:15, Ludovic Courtès wrote:
I'm also not an expert at Sql-Lite but I can state that the effort
looks very nice and promising Ludovic :)


And definitely a step-up from the current implementation (obviously)..

--
Kinds regards Oliver Propst
https://twitter.com/Opropst



Re: File search

2022-01-25 Thread Oliver Propst

On 2022-01-25 12:15, Ludovic Courtès wrote:
I'm also not an expert at Sql-Lite but I can state that the effort looks 
very nice and promising Ludovic :)


--
Kinds regards Oliver Propst
https://twitter.com/Opropst



Re: File search

2022-01-25 Thread Ludovic Courtès
Maxim Cournoyer  skribis:

> I also had the idea of making it a package... this way only the people
> who opt to install the database locally would incur the cost (in
> bandwidth).
>
> Perhaps a question for Vagrant: talking about size, is this SQLite
> database file comparable or smaller in size to the apt-file database
> that needs to be downloaded?  With the Debian software catalog being
> about 30% bigger, I'd expect a similarly bigger file size.
>
> If Debian is doing better in terms of database file size, we could look
> at how they're doing it.

As a back-of-the-envelope estimate, here’s the amount of text that needs
to be available in the database:

--8<---cut here---start->8---
ludo@berlin ~/src$ sqlite3 -csv  /tmp/db 'select name,version from packages; 
select name from directories;select name from files;'|wc -c
197689978
ludo@berlin ~/src$ guile -c '(pk (/ 197689978 (expt 2. 20)))'

;;; (188.5318546295166)
ludo@berlin ~/src$ du -h /tmp/db
389M/tmp/db
--8<---cut here---end--->8---

So roughly, SQLite with this particular schema ends up taking twice as
much space as the lower bound.

We can do a bit better (I’m not an expert, so I’m just trying things
naively) by dropping the index and cleaning up the database:

--8<---cut here---start->8---
ludo@berlin ~/src$ cp /tmp/db{,.without-index}
ludo@berlin ~/src$ sqlite3  /tmp/db.without-index
SQLite version 3.32.3 2020-06-18 14:00:33
Enter ".help" for usage hints.
sqlite> drop index IndexFiles;
sqlite> .quit
ludo@berlin ~/src$ du -h /tmp/db.without-index 
389M/tmp/db.without-index
ludo@berlin ~/src$ sqlite3  /tmp/db.without-index 
SQLite version 3.32.3 2020-06-18 14:00:33
Enter ".help" for usage hints.
sqlite> vacuum;
sqlite> .quit
ludo@berlin ~/src$ du -h /tmp/db.without-index 
290M/tmp/db.without-index
--8<---cut here---end--->8---

With compression:

--8<---cut here---start->8---
ludo@berlin ~/src$ zstd -19 < /tmp/db.without-index > /tmp/db.without-index.zst
ludo@berlin ~/src$ du -h /tmp/db.without-index.zst 
37M /tmp/db.without-index.zst
--8<---cut here---end--->8---

(Down from 61MB.)  For comparison, this is smaller than guile, perl,
gtk+, and roughly the same as glibc:out.

For the record, with compression, the lower bound is about 12 MiB:

--8<---cut here---start->8---
ludo@berlin ~/src$ sqlite3 -csv  /tmp/db 'select name,version from packages; 
select name from directories;select name from files;'|zstd -19|wc -c
12128674
ludo@berlin ~/src$ guile -c '(pk (/ 12128674 (expt 2. 20)))'

;;; (11.566804885864258)
--8<---cut here---end--->8---

All this to say that we could distribute the database in a form that
gets closer to the optimal size, at the expense of extra processing on
the client side upon reception to put it into shape (creating an index,
etc.).

Ludo’.