Mark, I like the task about jars. I have a hint for a student who wants to approach it. Harmony jar reading code has numerous limitations and assumptions (e.g. Harmony limits a size of a jar file). It is important to keep most of limitations as is, resisting a desire to eliminate them all at once. Otherwise instead of performance gain one may face that popular applications slow down.
Thanks. On Sun, Mar 1, 2009 at 5:05 PM, Mark Hindess <[email protected]> wrote: > > In message <[email protected]>, Mark Hindess writes: >> >> >> In message <[email protected]>, >> Sian January writes: >> > >> > Hi everyone, >> > >> > Do we want to propose any projects for Google Summer of Code 2009? It >> > was quite successful last year for Harmony, with two students >> > completing the programme, so definitely worth doing in my opinion. >> > >> > http://code.google.com/soc/ >> > >> > Thanks, >> > Sian >> >> I've a couple of items on my todo list that might make an interesting >> GSoC project. While looking at file descriptor usage between Harmony >> and RI I noticed that the RI typically reads jar files with an >> open/mmap/close sequence and then uses the mapped memory to access the >> file. Harmony uses open and uses seek/read to access the file. There >> are a couple of issues here: >> >> * some applications that use lots of jar files will not work on Harmony >> because they will run out of file descriptors even though they will >> work on the RI > > I notice while looking a the strace from the latest "trival" test case > in the "Problems with NIO" thread that on the RI the client connect > socket is always fd=4 where as on DRLVM it is fd=110 so the difference > is quite significant. This got me wondering what the difference would > be when running something like Eclipse with lots of plugin jars. Just > loading a fairly trivial workspace on Sun and DRLVM results in using > 586 and 674 file descriptors respectively. So it looks like not all > jars are loaded using the mmap trick but DRLVM would still run out of > descriptors roughly 100 sooner than the RI. > > -Mark > >> * code with memory access rather than seek/read will be a lots simpler >> to read/maintain >> >> * what are the performance implications? >> >> I'd quite like to investigate this but don't seem to be finding the time. >> >> It might also be interesting to explore the possibility of exploiting >> parallelism (compare gzip/pigz). >> >> It might also be worth seeing if there is any performance benefit to using >> the inflateBack api (compare gzip/gun - gun is in the zlib source examples >> directory). >> >> If people think these ideas are concrete enough to explore then I'll add >> an item to the wiki. >> >> Regards, >> Mark. >> > > > -- С уважением, Алексей Федотов, http://people.apache.org/~aaf/
