> On Oct 31, 2016, at 12:41 PM, Jan Grashöfer <[email protected]> wrote: > >> Theoretically, if the package was just temporarily unavailable, the next >> time the aggregation process runs, it would get listed again > > How, if it is completely removed?
Oh, duh, I see what you mean. I guess the answer is related to something we haven’t yet spec’d out: how should the structure of a package source’s index files change to adapt to the new scheme of aggregating metadata? A package source could look like: https://github.com/bro/packages 0xxon/ packages.index bro-sumstats-counttable.meta sethhall/ packages.index credit-card-exposure.meta ssn-exposure.meta domain-tld.meta Contents of sethhall/packages.index: https://github.com/sethhall/credit-card-exposure https://github.com/sethhall/ssn-exposure https://github.com/sethhall/domain-tld Contents of sethhall/ssn-exposure.meta: # Automatically generated, do not edit. [master] url = https://github.com/sethhall/ssn-exposure tags = file analysis, social security number, ssn, dlp, data loss description = Detect and log US Social Security numbers. script_dir = scripts [1.0.0] … [2.0.0] … The packages.index files are manually modified by users during the act of package registration. The *.meta files are automatically created by the metadata aggregation process as it crawls the URLs listed in packages.index. If a package is in packages.index, we say that its state is “registered”. Then, once it has a *.meta file, we say that its state is “listed”. If a package is “listed”, then bro-pkg users can see it show up from “search” and “list” commands. If the metadata aggregation process finds an invalid/unreachable package, it removes it’s *.meta file, but keeps it “registered" in packages.index, so the next crawl will still attempt to list the package in case it was just temporarily unavailable. Thoughts? Is it useful to collect metadata for each version or just the latest? “Latest" here would mean the latest release version tag or, if none exist, the latest master branch commit. If per-version metadata collection isn’t needed, the structure outlined above still works, but the existing structure would alsol: just stick latest metadata directly into bro-pkg.index (mixing autogenerated data w/ user-entered data). - Jon _______________________________________________ bro-dev mailing list [email protected] http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
