Re: help in tracking down 1481 memory leak (with reproduction steps)

2018-10-10 Thread DC*
Hello,

There's a tentative fix at [1].
If anyone could take a look at it and try it out that would speed things
up.

I'm running on that branch +24hrs with no issues so far.

Best regards,

  [1]: https://github.com/freenet/fred/pull/640


Re: help in tracking down 1481 memory leak (with reproduction steps)

2018-10-08 Thread DC*
This is somewhat preliminary but I got to this point: [1] (`always use
BouncyCastle in KeyGenUtils`)

Found some (old) posts about memory leaks with dynamic
providers[2][3][4]. In [2] it's mentioned a statical method to install
providers.

I'm still running test against those commits [1] and the previous one
[5].

Best regards,

  [1]:
https://github.com/freenet/fred/commit/abad64d133ed9a674d5f666f48db178a85652b9e
  [2]:
http://www.bouncycastle.org/wiki/display/JA1/Provider+Installation
  [3]: http://disq.us/p/198y23e
  [4]:
http://bouncy-castle.1462172.n4.nabble.com/Re-memory-leak-td4655694.html
  [5]:
https://github.com/freenet/fred/commit/8988466283ee43f8ac35c308d4af3fc59172472f


Re: help in tracking down 1481 memory leak (with reproduction steps)

2018-10-08 Thread DC*
On 2018-10-08 18:00, Arne Babenhauserheide wrote:
> DC*  writes:
>
> Do you have experience with profiling Java for memory leaks?
> 

No, I have no experience but I'm trying to triangle the issues commit by
commit to see where the issue was introduced and reduce the scope.

> The only lead I have right now is that something with threading might go
> wrong, since we now have native thread priorities and these might be
> stalling something which would release references to objects.

I saw some commits relating to threading, so it may be the case.

I tried to see the changes between 1480 and 1481 but there are no
tags/branches to easily compare.
I'm moving between commits by cherry-picking, is there a better way?

Best regards,


Re: help in tracking down 1481 memory leak (with reproduction steps)

2018-10-08 Thread Arne Babenhauserheide

DC*  writes:
> Here are my logs (log.level DEBUG). My node restarted several times at
> 15m, 20m, 30m. The log named `check-alive.log` is the output from the
> gist (it's cut off ubut shows enough information).

Thank you! Yours is the first reproduction outside my own machines. I
was short of concluding that it’s just something borked here, but it
seems there’s an actual (and serious) problem with 1481.

> If there is anything else I could help with, let me know.

Do you have experience with profiling Java for memory leaks?

The only lead I have right now is that something with threading might go
wrong, since we now have native thread priorities and these might be
stalling something which would release references to objects.

Best wishes,
Arne
-- 
Unpolitisch sein
heißt politisch sein
ohne es zu merken


signature.asc
Description: PGP signature


Re: help in tracking down 1481 memory leak (with reproduction steps)

2018-10-08 Thread Arne Babenhauserheide

DC*  writes:
> Are there any debug/logging/stack trace setting we could enable to see where 
> it died?

You can set logging in wrapper.conf, see the wrapper.logfile.loglevel
and wrapper.console.loglevel lines.

> I'm gonna setup an container to try this out.

Thank you!

Best wishes,
Arne
-- 
Unpolitisch sein
heißt politisch sein
ohne es zu merken


signature.asc
Description: PGP signature


help in tracking down 1481 memory leak (with reproduction steps)

2018-10-07 Thread Arne Babenhauserheide
Hi,

The past weeks I’ve been stumped trying to track down a severe memory
leak which prevents releasing 1481.

I have a hard time tracking down where and why exactly it happens,
therefore I’d be very grateful for your help.

The following shows how to reproduce the problem on GNU/Linux: Getting
Freenet 1481 to crash with an Out-of-Memory error within less than 30
minutes. The gist is: Upload a file.

tee freenet-1481-OOM-reproduction.sh << EOF

wget 
https://github.com/freenet/fred/releases/download/build01481/new_installer_offline_1481.jar
java -jar new_installer_offline_1481.jar
# click through the setup wizard and the in-browser first-run wizard, give 
Freenet high upload bandwidth (i.e. 164kiB/s)

# give freenet time to start the FCP server
sleep 180

# prepare a file to upload
INSERTFILE="$(mktemp /tmp/insert.temp.XX)"
head -c 100M < /dev/urandom > "$INSERTFILE"
IDENT=testupload"${INSERTFILE##*.}"

# prepare the command to connect to freenet and upload the file
# connect with HELLO
TEMPFILE="$(mktemp /tmp/insert.temp.XX)"
echo ClientHello > $TEMPFILE
echo "Name=Upload-Test${INSERTFILE##*.}" >> $TEMPFILE
echo ExpectedVersion=2 >> $TEMPFILE
echo End >> $TEMPFILE
echo >> $TEMPFILE

# upload with ClientPut
echo ClientPut >> $TEMPFILE
echo "DontCompress=true" >> $TEMPFILE
echo "URI=CHK@/testupload" >> $TEMPFILE
echo "Identifier=$IDENT" >> $TEMPFILE
echo MaxRetries=-1 >> $TEMPFILE
echo UploadFrom=direct >> $TEMPFILE
echo DataLength=$(ls -l $INSERTFILE | cut -d " " -f 5) >> $TEMPFILE
echo Persistence=forever >> $TEMPFILE
echo Global=true >> $TEMPFILE
echo End >> $TEMPFILE
cat $INSERTFILE >> $TEMPFILE

# run the insert
(cat $TEMPFILE | nc 127.0.0.1 9481) &

# watch how long the node lives
for i in {1..100}; do curl 'http://127.0.0.1:/stats/?fproxyAdvancedMode=2' 
2>/dev/null | grep -io nodeUptimeSession.*'<' | grep -io '[^;]*s<' | grep -io 
'.*s' ; curl 'http://127.0.0.1:/stats/?fproxyAdvancedMode=2' 2>/dev/null | 
grep -io '[^>]* java memory.*&' | grep -io '[^&]*'; sleep 5; done

EOF

I hope this allows you to reproduce the problem — and I would be very
happy if you could find and fix the source of the problem! This has been
blocking the release of 1481 for far too long.

With 33 peers as target (but only up to 25 actually connected, my
connection isn’t that fast), this gets Freenet to die with an OOM within
less than 15 minutes (last successful stats site at 14m17s).

Best wishes,
Arne
--
Unpolitisch sein
heißt politisch sein
ohne es zu merken


signature.asc
Description: PGP signature