Re: LibreJS seems to ignore query strings

Jacob K Tue, 29 Nov 2022 15:23:49 -0800

On 11/17/22 07:40, Yuchen Pei wrote:

On Fri 2022-11-11 15:27:41 -0600, Jacob K wrote:

Hello, thanks for the explanation.
On 11/9/22 19:44, Yuchen Pei wrote:

Hello,

Thanks for the detailed report.
On Tue 2022-11-08 17:07:18 -0600, Jacob K wrote:

[...]


LibreJS removes the query part of a script url as a preprocessing in
most (if not all) functions handling scripts.  This means if you
whitelist https://foo.com/bar.js, https://foo.com/bar.js?blah is also
let through.  OTOH without such whitelisting,
https://foo.com/bar.js?blah is blocked as usual if it is not labelled.
This is because the response processor checks the external script and
rewrites it to /* LibreJS: script blocked ... */.

I suspect the reason for discarding the query part is to avoid having to
whitelist all possible query strings which can be tedious.  Perhaps a
better approach is to refine the whitelisting facility to allow patterns
like globbing and regexes.

Would it make sense to generally keep handling query strings the same,
but make the link the user clicks on go to the version with the query
string included (possibly with a warning that there is a query string
and that whitelisting the script will whitelist all query strings)? That
way clicking "Show" next to a script will always take the user to the
currently blocked or running script.


Definitely.  Patches welcome, otherwise I'll work on it when I get time.

Thank you. I'm not sure if I'll send a patch, as I'm not familiar with LibreJS development, but if I happen to have extra time I may look into understanding LibreJS so I can contribute (I might have more free time this month, but then, there are also other things I want to work on, so maybe not.).


Ideally, I think LibreJS should store checksums of scripts, but it seems
like it only does this for inline scripts currently?


LibreJS does use hashes of scripts, but only in the built-in whiltelist
(see /utilities/hash_script/whitelist).

Best,
Yuchen


Slightly off-topic, but is there a good system set up to add new scripts
to the internal whitelist? I often see free libraries that are not
recognized by LibreJS, and it seems like a group of motivated users
might be better at labeling them than the library developers, at least
when the library developers do not care about LibreJS.


There isn't one yet, but I've been thinking about how to improve the
script recognition. One idea is to set up a server program, that
maintains a database of webpages and external scripts used in these
webpages. Users can submit a url containing only free js, and the server
will run the headless compliance check on the page, display the check
results to the user, and record the results (librejs version, webpage
url, script urls, script hash, status of each script - accepted or
rejected, reason for acceptance (what licenses) / rejection).

The server will provide API endpoints for listing fully compliant urls,
and statistics of scripts (e.g. counts which indicates well-knowness /
popularity of the scripts).  The former can be used by users for
discovery of nice websites, and the latter can be used by librejs users
to whitelist scripts by hashes / names and librejs developers to decide
mechanisms to add for more recognition (for example, if 99% of the
unrecongised scripts are annotated using spdx, then maybe it makes sense
to add a user option in librejs to enable spdx, despite the problems
with the lack of license headers in spdx annotations).  Librejs can also
simply download the database from the server, and provide user options
to auto whitelist scripts by hash (e.g. set a threshold for the counts).

The tricky part is how do we make sure the server only contains free
script.  FSD has a review process, but we probably want something
faster.

I have participated in FSD meetings some, and although we often see really big software with lots of dependencies that take a long time to verify, we also see lots of simple software with no dependencies (or already known-free dependencies) that are verified and added to the FSD rather quickly. I think, if the review process does not look at most dependencies (as LibreJS will usually block nonfree dependencies (an exception would be things like emulators, where a free script might e.g. download a nonfree game not written in JavaScript)) then the review process could be quite fast.

I don't think there is an automatic way to detect whether JavaScript is really source or not, but there are lots of automatic ways to detect licenses (REUSE, SPDX, scancode-toolkit, etc.).

One problem is the server is basically an SaaSS.  The server program
will be free and easy for self-hosting, but we'll probably want one
central server with THE database.  The server runs librejs headless
compliance check, which is computation the user can do on their own
computer.  Alternatively the server can simply take user input for
compliance results, but then users may make mistakes and this opens to
more spam and inaccuracies.

Best,
Yuchen

I think, if the server is basically SaaSS, then that means it should be possible to have everyone run the server locally, perhaps even integrated into the extension. But if the server also shows things like popularity and manual reviews by trusted reviewers, then I think it is not SaaSS, as then it wouldn't make sense to run on each individuals' computer.

OpenPGP_0x8EF548378E806320.asc
Description: OpenPGP public key

OpenPGP_signature
Description: OpenPGP digital signature

Re: LibreJS seems to ignore query strings

Reply via email to