> On Jun 1, 2018, at 11:37 AM, Azoff, Justin S <[email protected]> wrote:
>
> I could never figure out what was causing the problem, and it's possible that
> &synchronized not doing anything anymore is why it's better now. I'm mostly
> using &synchronized for syncing input files across all the workers and one of
> them does have 300k entries in it. That file is fairly constant though, only
> a few k changes every 5 minutes and nothing that should use 20G of ram.
FWIW, I figured out what was causing this problem. While the file wasn't
changing that much, I was using something like
curl -o file.new $URL && mv file.new file.csv
to download the file, and apparently unless you pass -f to curl, it doesn't
actually exit with a non-zero status code on server errors.
This was causing a server error page to be written to the csv file every now
and then. When this happened:
* the input reader would throw a warning that the file couldn't be parsed, and
clear out the set
* bro would then clear the set, triggering a removal of 300k items across all
nodes (56 in the case of the test cluster)
* 5 minutes later the next download would work
* bro would then fill back in the set, and trigger 300k items to be synced to
all 56 nodes again.
so within 5 minutes, 300,000*56*2 updates would be kicked off, which is
33million updates. This seemed to max out the proxies for 30 minutes.
The raw size of the data is only ~4M, or 261M total, which makes it a little
crazy that memory usage would blow up by dozens of gigabytes of ram.
&synchronized not having an effect in master made this problem go away, and
adding a -f to curl on our pre-broker clusters fixed those too.
All the more reason to port the method of distributing the data off of
&synchronized. I think I will just run the curl command on the worker nodes
too,
effectively replacing &synchronized with curl.
—
Justin Azoff
_______________________________________________
bro-dev mailing list
[email protected]
http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev