Thanks, I did not realize that Coverage -> Match[n] could be that useful! Though the field Match.Name is not a file name I can os.Open(). How can I directly access the known license texts?
On Thu, Nov 14, 2019 at 10:42 AM Dan Kortschak <d...@kortschak.io> wrote: > > The licensecheck.Match type holds the start and end offsets in the > file. Can't you use that to extract the license portion and either > check it's length against the length of the license or repeat the Check > with only that portion of the file? > > On Thu, 2019-11-14 at 10:24 +0100, fge...@gmail.com wrote: > > Sorry if I was not clear: on walking the file system, that's clear, I > > did not intend to talk about that, only about matching and reporting > > on matching. The example I gave was just to put in context why I > > believe I'd need a different api. > > > > Using the Options field is good enough in the first example. (That's > > how I used licensecheck first.) > > Although for the second example Cover() does not report what I'd > > need. > > > > As far as I've seen currently using func Cover(INPUT []byte, opts > > Options) (Coverage, bool) reports 100% MIT if INPUT matches byte for > > byte 100% MIT. If INPUT has more text than the complete 100% matching > > text of MIT license, for example the MIT license is only in the > > beginning of INPUT and the rest of INPUT is for example Go code, than > > Coverage will report len(INPUT)/len(MIT license) which is less than > > 100%. > > > > In this case, the new api would report 100%, since input contains > > 100% > > MIT license text (and some programming code, which is not relevant > > here). > > > > If I understand correctly the current api is for checking _already_ > > identified license files, which contain _only_ the license text. > > I believe to look for files containing - complete or possibly broken > > - > > license references a different matching is needed. > > > > > > On 11/14/19, Rob Pike <r...@golang.org> wrote: > > > As I understand what you're trying to do, you just need to write a > > > tree > > > walker, perhaps using filepath.Walk, that opens each file and calls > > > Cover > > > on it. You can set the Options field to control the threshold for > > > reporting, and use the result of that to choose which licenses to > > > report. > > > > > > I don't believe an API change is called for. > > > > > > -rob > > > > > > > > > On Thu, Nov 14, 2019 at 6:14 PM <fge...@gmail.com> wrote: > > > > > > > func Cover(input []byte, opts Options) (Coverage, bool) in > > > > licensecheck currently reports len(input)/len(one of the > > > > licenses) for > > > > each known license. I'd need for all known licenses len(known > > > > license)/len(license reference in input). > > > > > > > > I'd like to scan >100000 files (possibly a lot more), where some > > > > of > > > > them (<0.1%) contain full or partial known license texts. > > > > > > > > An example scenario for an example /src, containing >100000 > > > > files: > > > > $ listlicenses /src # to get an overview of 100% matching > > > > license > > > > references > > > > LGPL-2.1 > > > > MIT > > > > $ listlicenses -details /src # same tree, more > > > > detailed > > > > output, > > > > to > > > > see the details > > > > /src/license refers 100% MIT # the bytes in /src/license > > > > correspond > > > > one for one for the MIT license > > > > /src/fonts/LICENSE refers 100% MIT # the bytes in > > > > /src/fonts/LICENSE > > > > correspond one for one for the MIT license > > > > /src/a/Notice refers 100% LGPL-2.1 # same as above with LGPL- > > > > 2.1 > > > > /src/a/b/whatever.go refers 94% GPL2 # most probably a broken > > > > license reference in whatever.go, maybe someone inadvertently > > > > deleted > > > > the last word from the lines containing the GPL2 license text. > > > > Needs > > > > human inspection to check what's the license situation with > > > > whatever.go > > > > /src/c/ConfusingLicenseReferences.c refers 7% ZLIB # > > > > ConfusingLicenseReferences.c has most probably a false positive > > > > report > > > > for reference to ZLIB > > > > /src/c/ConfusingLicenseReferences.c refers 65% MIT # > > > > ConfusingLicenseReferences.c has only 65% of MIT, the author > > > > intended > > > > to refer to MIT, but some inadvertent edit later broke the > > > > license > > > > reference in ConfusingLicenseReferences.c > > > > > > > > Command listlicenses iterates over all files in the subtree, > > > > gathering > > > > all full or partial (broken) license references. Command > > > > listlicenses > > > > uses the functionality similar to github.com/google/licensecheck > > > > to > > > > check the files in the file system. > > > > > > > > > > > > > > > > thanks! > > > > > > > > On 11/13/19, Rob Pike <r...@golang.org> wrote: > > > > > Can you please explain in more detail what you're asking for? I > > > > > don't > > > > > understand the problem you have or why the current package > > > > > cannot > > > > > handle > > > > > it. > > > > > > > > > > -rob > > > > > > > > > > > > > > > On Wed, Nov 13, 2019 at 7:05 PM <fge...@gmail.com> wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > "licensecheck classifies license files and heuristically > > > > > > determines > > > > > > how well they correspond to known open source licenses." > > > > > > > > > > > > I'd like to identify license references in the file system. > > > > > > If I > > > > > > understand correctly package licensecheck in it's current > > > > > > form is not > > > > > > useful to help with this. > > > > > > If it's still possible, could you please share a hint how to > > > > > > do that? > > > > > > (input: byte array, output: license references in the byte > > > > > > array) > > > > > > If I understand correctly and I can't use licensecheck in > > > > > > it's current > > > > > > form, which one is preferred: > > > > > > extend current api, (maybe: func Refers(input []byte) > > > > > > (References, > > > > > > bool) or fork+rename the package? (References{...} being > > > > > > similar to > > > > > > Coverage{...}) > > > > > > > > > > > > thanks, > > > > > > Gergely Födémesi > > > > > > > > > > > > -- > > > > > > You received this message because you are subscribed to the > > > > > > Google > > > > > > > > Groups > > > > > > "golang-nuts" group. > > > > > > To unsubscribe from this group and stop receiving emails from > > > > > > it, send > > > > > > > > an > > > > > > email to golang-nuts+unsubscr...@googlegroups.com. > > > > > > To view this discussion on the web visit > > > > > > > > > > > > > > > https://groups.google.com/d/msgid/golang-nuts/CA%2BctqrqKKUPTHihMLhLTH5O-tBm1qENQV6y41Qwde4jHp1kNmA%40mail.gmail.com > > > > > > . > > > > > > > > > > > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CA%2BctqroO4YcqOxKfWzVNveqJ7%3D0d%3Dwe1XYgahRi%2BYiftWTdFww%40mail.gmail.com.