+1 to Apache Legal advice On Mon, Jan 13, 2020 at 3:30 PM Otto Fowler <ottobackwa...@gmail.com> wrote:
> Justin, > > I have not read the new license, but the idea I believe is yes, whomever > downloads and accepts the new license is then responsible for adherence to > the applicable law, > Which is why the Apache Foundation cannot be that entity I would think. > > We may want to send this past Apache Legal? > > > > On January 13, 2020 at 17:22:21, Justin Leet (justinjl...@gmail.com) > wrote: > > On the whole, I agree. I think the immediate focus should be to rip out > maxmind from default usage in master, even if it's done a bit roughly. > Getting master at least building for people would probably be a good first > step. > > Couple further thoughts > > - JUnit 5 supports a variety of conditional testing conditions. It might > be possible to do "Test is enabled if property > maxmind.geo.database.location is set", and just provide instructions to > users on actually doing it instead of having to manually enable/disable the > test via code. > - Remove DB download/install from Ambari. I agree with removing the > actual dl/load, but still keeping the default location setup. > - I'm inclined to think it should be removed from the default demo > enrichments. We might be able to replace the GEO_GET with some dummy > stellar and just produce similar outputs, since our demo data has a limited > set of IPs, iirc. Obviously, we'd want to document that fairly well and > also let users manually set it up for non-demo data. > - CLI tool should be fine for letting users load, it's more or less why > it existed in the first place. Maybe we also add a check if someone tries > using the old URL and let them know. > - We need to document that these changes occurred for CCPA reasons, so > that users can evaluate the consequences. > - Given that I am not even slightly a lawyer, my understanding here is > likely wrong, but are there further implications here? Say a user runs a > bunch of data through Metron (both enrichments and profiler) using these > IPs. A subset of those IPs is removed as it shouldn't be used. Would a user > then have to remove all data derived from these IPs (e.g. anything in > ES/Solr, anything in HDFS, any profiles using the data?). If they do have > to remove it, I assume that's not directly out problem (since they have to > do cleanup), but we're probably not making it easy on them in terms of > providing the ability to clean up that sort of information and rerun > profiles and such. > > > On Mon, Jan 13, 2020 at 5:20 PM Otto Fowler <ottobackwa...@gmail.com> > wrote: > > > I agree with all of that, my only thinking on contrib is, if that > component > > is not tested and has no coverage, and needs manual steps, then we may > want > > to separate it. > > We would have to have some handle on full dev as well right? > > > > On January 13, 2020 at 16:57:13, Michael Miklavcic ( > > michael.miklav...@gmail.com) wrote: > > > > Hey Otto, > > > > As I mentioned above, we have had this issue with other components > before, > > e.g. mysql. I don't see a compelling reason to discontinue or push this > > component to contrib just yet - it's a type of enrichment that happens to > > require an additional manual step. Per the article ( > > > > > > https://blog.maxmind.com/2019/12/18/significant-changes-to-accessing-and-using-geolite2-databases/ > > ), > > > > Maxmind is not requiring purchase, merely a user to register due to GDPR > > and CCPA requirements. That being said, it does cause issues for the > > integration tests. I think we should minimally add an @Ignore annotation > to > > the testLoadGeoIpDatabase test in MaxmindDbEnrichmentLoaderTest and > > provide manual instructions for running the integration test if/when > > someone submits a PR affecting this code. > > > > Users can still use the geolite DB by uploading it to HDFS. > (Incidentally, > > this even provides a crude mechanism for versioning, should a user choose > > to use the config path that way). We separated the deployment side of the > > geolite DB from the consumption side > > > > > > https://github.com/apache/metron/blob/master/metron-platform/metron-enrichment/metron-enrichment-common/src/main/java/org/apache/metron/enrichment/adapters/maxmind/geo/GeoLiteCityDatabase.java#L135 > > . > > > > All that needs to happen is the user would drop the file in HDFS. This > had > > previously happened via Ambari - see here - > > > > > > https://github.com/apache/metron/blob/master/metron-deployment/packaging/ambari/metron-mpack/src/main/resources/common-services/METRON/CURRENT/package/scripts/enrichment_commands.py#L95 > > . > > > > We should probably do a couple things: > > > > 1. Minimally, remove the DB download and install from Ambari - we might > > choose to keep around the HDFS path creation so that we have a reasonable > > default prepped and ready. > > 2. Add documentation to manually install the geolite DB (register with > > maxmind) OR remove it from our default demo enrichments shipped with the > > platform. > > > > Manual DB file loading would be managed using the CLI tool. You should be > > able to load it via local file URL ( > > https://en.wikipedia.org/wiki/File_URI_scheme) or via a custom hosted > > solution via the --geo_url option. See more details here - > > > > > > https://github.com/apache/metron/tree/master/metron-platform/metron-data-management#geolite2-loader > > > > Anything else I'm missing? Probably worth some feedback from Justin Leet > on > > this as well. > > > > Thanks, > > Mike > > > > On Mon, Jan 13, 2020 at 1:59 PM Otto Fowler <ottobackwa...@gmail.com> > > wrote: > > > > > Hi Tom, that is true, and I think that is the only viable approach. We > do > > > however use the database for testing during build, and we do however > > setup > > > the components that use the data base in the ’sample’ flow with the > > > simulated sensors for our vagrant deploy… and our contrib/docker deploy > > > etc. > > > > > > So, having the user download their own properly licensed version ( and > > > having the user responsible for the privacy law issues ) is fine, but I > > > think we need to talk through all the ways we are going to change the > > > build, what it means for testing that component ( does it move to > contrib > > ? > > > ), and the default deployment to vagrant/topology. > > > > > > > > > > > > On January 13, 2020 at 13:41:01, Yerex, Tom (tom.ye...@ubc.ca) wrote: > > > > > > Hi Otto, > > > > > > Thank you for raising this in the discussion. > > > > > > It seems to me that Maxmind is proactive about providing instructions > and > > > code to deliver updates to the local system. I can recall being > surprised > > > that the current Metron solution seemed to do more than I expected, > i.e., > > I > > > thought I would need to get Maxmind files into the local file system > > where > > > Metron would pick those up and load them into HDFS and instead Metron > did > > > it all. > > > > > > Perhaps the approach to simplify Metron and have it load files from the > > > local file system into HDFS, how you get the files to the local file > > system > > > is up to you? > > > > > > > > > On 2020-01-13, 3:52 AM, "Otto Fowler" <ottobackwa...@gmail.com> > wrote: > > > > > > https://issues.apache.org/jira/browse/METRON–2340 > > > > > > > > > > > > > > > https://blog.maxmind.com/2019/12/18/significant-changes-to-accessing-and-using-geolite2-databases/ > > > > > > Maxmind has changed the way the distribute and license the geolite2 > > > database that we use in our builds and distribution. > > > > > > Master build is broken, and users are having issues setting up metron ( > > > https://the-asf.slack.com/archives/CB7Q6AN3T/p1578556024012200) > > > > > > > > > We need to fix the build and figure out how we are going to move on > from > > > this. > > > > > >