> On Oct 31, 2016, at 12:41 PM, Jan Grashöfer <[email protected]> wrote:
> 
>> Theoretically, if the package was just temporarily unavailable, the next 
>> time the aggregation process runs, it would get listed again
> 
> How, if it is completely removed?

Oh, duh, I see what you mean.  I guess the answer is related to something we 
haven’t yet spec’d out: how should the structure of a package source’s index 
files change to adapt to the new scheme of aggregating metadata?

A package source could look like:

https://github.com/bro/packages
        0xxon/
                packages.index
                bro-sumstats-counttable.meta
        sethhall/
                packages.index
                credit-card-exposure.meta
                ssn-exposure.meta
                domain-tld.meta

Contents of sethhall/packages.index:

        https://github.com/sethhall/credit-card-exposure
        https://github.com/sethhall/ssn-exposure
        https://github.com/sethhall/domain-tld

Contents of sethhall/ssn-exposure.meta:

        # Automatically generated, do not edit.
        [master]
        url = https://github.com/sethhall/ssn-exposure
        tags = file analysis, social security number, ssn, dlp, data loss
        description = Detect and log US Social Security numbers.
        script_dir = scripts

        [1.0.0]
        …

        [2.0.0]
        …

The packages.index files are manually modified by users during the act of 
package registration.  The *.meta files are automatically created by the 
metadata aggregation process as it crawls the URLs listed in packages.index.

If a package is in packages.index, we say that its state is “registered”.  
Then, once it has a *.meta file, we say that its state is “listed”.  If a 
package is “listed”, then bro-pkg users can see it show up from “search” and 
“list” commands.  If the metadata aggregation process finds an 
invalid/unreachable package, it removes it’s *.meta file, but keeps it 
“registered" in packages.index, so the next crawl will still attempt to list 
the package in case it was just temporarily unavailable.

Thoughts?  Is it useful to collect metadata for each version or just the 
latest?  “Latest" here would mean the latest release version tag or, if none 
exist, the latest master branch commit.

If per-version metadata collection isn’t needed, the structure outlined above 
still works, but the existing structure would alsol: just stick latest metadata 
directly into bro-pkg.index (mixing autogenerated data w/ user-entered data).

- Jon

_______________________________________________
bro-dev mailing list
[email protected]
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev

Reply via email to