Some of the back and forth with rstory happened on the admin list; some of
it was unicast.  I put together all of the patches, even though there are
now at least 2 fixes for the problem, and published them at
https://github.com/fenner/net-snmp/commits/rstory-getbulk-patches .

I can't imagine that the changes are useful as they are (other than the
added debugging, perhaps) but may be useful to see what kinds of things we
ran into.

And my opinion remains that the fundamental problem here is that the
reverse encoding knows how to stop encoding in time, but doesn't know about
the overhead that usm adds, and for some reason the changes in 5.8 try to
address this by switching to forward encoding, which relies on the state
that was freed (and really probably only helps because forward encoding
uses a fixed-sized 1472-byte buffer).  (It's possible that the delayed free
that Sam introduced helps the "state that was freed" part.  With 3
different solutions for the problem it's hard to figure out exactly how to
proceed.)

  Bill


On Mon, May 20, 2019 at 9:28 AM Bill Fenner <fen...@gmail.com> wrote:

> On Tue, May 14, 2019 at 1:38 PM Bart Van Assche <bvanass...@acm.org>
> wrote:
>
>> On 5/14/19 4:01 PM, Bill Fenner wrote:
>> > Perhaps getbulk no longer dumps core, but I can not get it to return
>> > anything but GENERR any more, and, it seems to leak memory.
>> >
>> > Any "large enough" request seems to fail in this way, e.g.,
>> > snmpbulkget -v 3 ... -Cn 5 -Cr 50 sysUpTime sysUpTime sysUpTime
>> > sysUpTime sysUpTime .1
>> >
>> > This is particularly frustrating because code was added to 5.8 to
>> > rebuild a getbulk reply if it's too big.  But there was already code
>> > to not build the PDU too big, it's just not taking the v3 header into
>> > account properly (that's my best guess, at least).  So now there are
>> > two different mechanisms to send a "right-size" reply and they both
>> > don't work.  Around 5.8 release time I was working with Robert Story
>> > to fix this, but that effort kind of petered out and Robert's work
>> > didn't make it into git.
>> >
>> > Can anyone get getbulk to work in the current 5.8-patches with SNMPv3?
>> >
>> > I've attached a test script with 504 failing test cases when I run it
>> > against an unmodified V5-8-patches branch, and net-snmp leaks about a
>> > meg of RAM during the test.  This is an adaptation of my internal test
>> > to the net-snmp test framework; the complete test would use all
>> > supported values for -l, -a, -x but for now this is the simplest one
>> > using nanp.
>>
>>
>> Hi Bill,
>>
>> A new test has been added
>> (testing/fulltests/default/T0221snmpbulkget_large_simple). That test
>> passes on my setup. Can you have a look whether that test covers the
>> issue you ran into?
>>
>
> It doesn't - the problems arise when the agent decides that the message is
> too big and doesn't fit in the buffer.  (E.g., the usage
> of --sendMessageMaxSize in my test, or, the repetition of .1 .1 .1 .1 on
> the snmpbulkget command line).
>
> Regarding the "memory leak": RSS is not a reliable source of information
>> to detect memory leaks. If you want to verify whether or not a new
>> memory leak has been introduced please use Valgrind.
>>
>
> Yes, my internal test uses valgrind and shows the leaks, I was using RSS
> growth as a proxy.  I also used a script something like
> http://www.net-snmp.org/wiki/index.php/Running_Net-SNMP_Regression_Tests_under_Valgrind
>  and
> it showed leaks when I ran the test that I sent, e.g.,
>
> ==32438== 31,584 (8,736 direct, 22,848 indirect) bytes in 168 blocks are
> definitely lost in loss record 756 of 758
> ==32438==    at 0x402D05E: calloc (vg_replace_malloc.c:711)
> ==32438==    by 0x430EA7A: usm_malloc_usmStateReference (snmpusm.c:288)
> ==32438==    by 0x43117A7: usm_process_in_msg (snmpusm.c:2431)
> ==32438==    by 0x4312B44: usm_secmod_process_in_msg (snmpusm.c:2335)
> ==32438==    by 0x42D45EF: snmpv3_parse (snmp_api.c:3938)
>
> When I run your new test, I agree it passes:
> ~/src/test-for-bart/testing @fenner-test-for-bart.sjc% ./RUNFULLTESTS -r
> T0221
> large SNMPv3 bulkget .. ok
>
> If I add "--sendMessageMaxSize=1472", then it fails in the same way as my
> test fails: returning genError.
>
> Your subsequent patch may address this problem if the manager does not
> specify a max response size, but as this is part of the SNMPv3 protocol and
> we do not necessarily control what the manager does, it's not sufficient.
>
>   Bill
>
>
_______________________________________________
Net-snmp-coders mailing list
Net-snmp-coders@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/net-snmp-coders

Reply via email to