Ha! didn't realize you had already done all that, will read/answer the other email.
Cheers, Albert El dissabte, 3 de març de 2018, a les 20:25:32 CET, Albert Astals Cid va escriure: > El dimecres, 21 de febrer de 2018, a les 7:13:57 CET, Adam Reichold va > > escriure: > > Hello again, > > > > Am 21.02.2018 um 00:31 schrieb Albert Astals Cid: > > > El dimarts, 20 de febrer de 2018, a les 8:58:24 CET, Adam Reichold va > > > > > > escriure: > > >> Hello again, > > >> > > >> Am 18.02.2018 um 23:23 schrieb Adam Reichold: > > >>> Am 18.02.2018 um 23:08 schrieb Albert Astals Cid: > > >>>> El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam Reichold > > >>>> va > > >>>> > > >>>> escriure: > > >>>>> Hello, > > >>>>> > > >>>>> the attached patch replaced Poppler's internal hash table > > >>>>> implementation > > >>>>> GooHash by std::unordered_map found in the C++ standard library > > >>>>> since > > >>>>> C++11. This continues Poppler's slow drift towards standard library > > >>>>> containers and removes one of the home-grown data structures with > > >>>>> the > > >>>>> main goals of reducing code size and improving the long term > > >>>>> maintainability of the code base. > > >>>> > > >>>> Do you have any benchmarks on "rendering" speed and code size? > > >>> > > >>> Sorry, not at hand. I will try to prepare them during the week. > > >> > > >> I did run Splash rendering benchmarks of this branch against master > > >> with > > >> the result of rendering the circa 2400 documents of the TeXLive > > > > > >> documentation present on my machine being: > > > I'm wondering if those 2400 documents are diverse enough, which they may > > > not be given they are all coming from "the same place". > > > > They seem pretty diverse w.r.t. content, some being text heavy and some > > graphics rich. But I guess they are definitely not diverse technically > > as all are prepared using TeX itself. > > > > The main problem on my side is that I have failed to find my DVD copy of > > the Poppler regtest document collection until now. :-\ In any case, I am > > reasonably confident on the zero sum result since GooHash does not seem > > to live in any performance critical code path. > > > > >> Cumulative run time: > > >> Result: 90.95 min ∓ 1.1 % > > >> Reference: 91.57 min ∓ 1.2 % > > >> Deviation: -0.0 % > > >> > > >> Cumulative memory usage: > > >> Result: 37.2 MB ∓ 0.7 % > > >> Reference: 37.0 MB ∓ 0.7 % > > >> Deviation: +0.0 % > > >> > > >> (Where result is this patch and the reference is master.) (The > > >> measurement was taken using the perftest script which I proposed here > > >> some time ago for which I'll attach the patch again for > > >> reproduceability.) > > > > > > Did you really attach this before? i really don't remember it :D > > > > This was a very long-winded thread ending on 2016-01-01 centered around > > the regtest framework. It went from custom driver using QTest's > > benchmark functionality to custom backend in the regtest framework to > > separate Python script. The script still has problems to reliably > > measure really short things like running "pdftotext" for which a custom > > C++ driver might be needed, but for long-running stuff like "pdftoppm", > > the results seem reasonable. > > > > > Anyhow it seems the change is mostly a zero-sum runtime wise. > > > > Indeed. (Although thinking really really long term, I think a Poppler > > that is using std::vector, std::string and friends everywhere, could > > loose a lot of distributed fat in the form of a lot of memory > > indirections and benefit from the highly optimized standard library. But > > it just is not a single patch thing to reap any of these benefits.) > > > > > Which bring us to the code side. First i know it sucks but your spacing > > > is > > > broken, please check the whole patch > > > > > > - nameToGID->removeInt(macGlyphNames[j]); > > > - nameToGID->add(new GooString(macGlyphNames[j]), i); > > > + nameToGID[macGlyphNames[j]] = i; > > > > > > i could fix it, but it's better (for me) if you do :D > > > > Definitely for me to fix. The main problem is that the spacing in those > > files was already broken and I am unsure whether I should fix the > > surrounding spacing even if I am not touching it otherwise. Due to that, > > there sometimes is no correct way in the current patch. If you do not > > say otherwise, I will try to make at least the touched blocks of code > > consistent. > > Are you sure the spacing is broken? I'd say it's just xpdf weird spacing > rules of using 2 soft spaces and then hard tabs at 8. > > > > Now onto the code, > > > > > > const auto emplaceRangeMap = [&](const char* encodingName, GBool > > > unicodeOut,> > > > > > > UnicodeMapRange* ranges, int len) { > > > > > > residentUnicodeMaps.emplace( > > > > > > std::piecewise_construct, > > > std::forward_as_tuple(encodingName), > > > std::forward_as_tuple(encodingName, unicodeOut, ranges, len) > > > > > > ); > > > > > > }; > > > > > > It's the first time in my life i see std::piecewise_construct and > > > std::forward_as_tuple, yes that probably makes me a bad C++ developer, > > > but > > > given there's lots of us around, it doesn't make me happy now i don't > > > understand what the code does. > > > > The problem is that most internal Poppler types lack proper construction > > and assignment operators, hence I need to work harder to construct those > > UnicodeMap instances in-place inside the map (GooHash just stored a > > pointer to it, so there was no problem.) > > > > The whole piecewise_construct and forward_as_tuple dance just ensures, > > that those parameters to emplace are properly grouped before being > > unpacked to become the parameters of std::string and UnicodeMap > > construction again. If UnicodeMap was movable, this would probably look > > like > > > > residentUnicodeMaps.emplace( > > > > encodingName, > > UnicodeMap{encodingName, unicodeOut, ranges, len} > > > > ); > > > > If you like, I can try to make Unicode a move-only type and simplify the > > mentioned helper functions? > > you can give it a try, not sure how easy that's going to be. > > > > Then there's the benefit/risk ratio. The code using GooHash is mostly > > > rocksolid and we haven't probably touched any line in it for a long time > > > and we have probably neither written new code that uses GooHash. > > > > > > In that regard we risk breaking working code. > > > > > > The benefit is not more speed nor less memory usage as your measurements > > > show. > > > > > > Mostly the benefit is "removing code from maintainership", which i agree > > > is a good thing, but as mentioned before it's some code "we mostly > > > ignore", so it's not that much of a good thing. > > > > I very much agree with the risk assessment. > > > > But I also think the code will ossify (or maybe already is?) due to > > those custom data structures and the less than idiomatic C++ usage. > > Hence I think, Poppler would not just loose code, but the remaining code > > should become easier to maintain. (Of course, the piecewise_construct > > fiasco shows that this has intermediate costs. But again, I think this > > is just an incentive to provide types with the usual C++ semantics which > > should make all code using those types better.) > > Yeah, i guess you're right. > > Cheers, > Albert > > > Best regards, Adam. > > > > > So i'm hesitant of what to do. It really sounds like a nice thing to not > > > have custom structures, but on the other hand it's not like they get > > > much > > > in the way of coding. > > > > > > I'd really appreciate other people's comments on this. > > > > > > Cheers, > > > > > > Albert > > >> > > >> I'll also attach the detailed comparison, but the gist seems to be that > > >> if there are significant changes, the run time is reduced but the > > >> memory > > >> usage is increased in the majority of cases. But this does not seem to > > >> show up in the cumulative results. > > >> > > >> Best regards, Adam. > > >> > > >> P.S.: One could try to improve the memory usage by tuning the load > > >> factor or calling shrink_to_fit where appropriate. Would you like me to > > >> try to do this? > > >> > > >> P.P.S.: One obvious area for improvement would be better > > >> interoperability between GooString and std::string, i.e. not converting > > >> them as C strings so that the length information does not need to be > > >> recomputed. I will try to prepare this as a separate patch on top of > > >> this one or should I include this here? > > >> > > >> Best regards, Adam. > > >> > > >>> Concerning code size, a release build of libpoppler.so goes from > > >>> > > >>> text data bss dec hex filename > > >>> > > >>> 2464034 288852 360 2753246 2a02de libpoppler.so.73.0.0 > > >>> > > >>> for the current master to > > >>> > > >>> text data bss dec hex filename > > >>> > > >>> 2482129 288756 360 2771245 2a492d libpoppler.so.73.0.0 > > >>> > > >>> with the patch applied, i.e. a 0.65% increase in binary size. > > >>> > > >>> > > >>> Please note that in my original message, I was referring only to > > >>> source > > >>> code size, i.e. > > >>> > > >>> git diff --stat master...remove-goo-hash > > >>> > > >>> 18 files changed, 168 insertions(+), 803 deletions(-) > > >>> > > >>>> Cheers, > > >>>> > > >>>> Albert > > >>> > > >>> Best regards, Adam. > > >>> > > >>>>> Best regards, Adam. > > >>>> > > >>>> _______________________________________________ > > >>>> poppler mailing list > > >>>> poppler@lists.freedesktop.org > > >>>> https://lists.freedesktop.org/mailman/listinfo/poppler > > >>> > > >>> _______________________________________________ > > >>> poppler mailing list > > >>> poppler@lists.freedesktop.org > > >>> https://lists.freedesktop.org/mailman/listinfo/poppler > > > > > > _______________________________________________ > > > poppler mailing list > > > poppler@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/poppler > > _______________________________________________ > poppler mailing list > poppler@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/poppler _______________________________________________ poppler mailing list poppler@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/poppler