> On Jun 1, 2018, at 11:37 AM, Azoff, Justin S <jaz...@illinois.edu> wrote: > > I could never figure out what was causing the problem, and it's possible that > &synchronized not doing anything anymore is why it's better now. I'm mostly > using &synchronized for syncing input files across all the workers and one of > them does have 300k entries in it. That file is fairly constant though, only > a few k changes every 5 minutes and nothing that should use 20G of ram.
FWIW, I figured out what was causing this problem. While the file wasn't changing that much, I was using something like curl -o file.new $URL && mv file.new file.csv to download the file, and apparently unless you pass -f to curl, it doesn't actually exit with a non-zero status code on server errors. This was causing a server error page to be written to the csv file every now and then. When this happened: * the input reader would throw a warning that the file couldn't be parsed, and clear out the set * bro would then clear the set, triggering a removal of 300k items across all nodes (56 in the case of the test cluster) * 5 minutes later the next download would work * bro would then fill back in the set, and trigger 300k items to be synced to all 56 nodes again. so within 5 minutes, 300,000*56*2 updates would be kicked off, which is 33million updates. This seemed to max out the proxies for 30 minutes. The raw size of the data is only ~4M, or 261M total, which makes it a little crazy that memory usage would blow up by dozens of gigabytes of ram. &synchronized not having an effect in master made this problem go away, and adding a -f to curl on our pre-broker clusters fixed those too. All the more reason to port the method of distributing the data off of &synchronized. I think I will just run the curl command on the worker nodes too, effectively replacing &synchronized with curl. — Justin Azoff _______________________________________________ bro-dev mailing list bro-dev@bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev