El dimecres, 2 de maig de 2018, a les 19:11:19 CEST, Adam Reichold va escriure: > Hello, > > Am 01.05.2018 um 23:57 schrieb Albert Astals Cid: > > El diumenge, 4 de març de 2018, a les 22:59:20 CEST, Albert Astals Cid va > > > > escriure: > >> El dissabte, 3 de març de 2018, a les 20:51:55 CET, Adam Reichold va > > > > escriure: > >>> Hello Albert, > >>> > >>> please note that in the previous patch, I made a mistake with the move > >>> constructor (the ref count should not be moved/swapped since it is tied > >>> to the instance, not its value). Attached is a fixed patch. > >>> > >>> I found the issue during further performance tests which however only > >>> continued to strengthen to zero sum result. In any case, I extend the > >>> Python-based perftest script with a C++-based driver to be able to > >>> reliably measure single-page runtimes for this, which I will also attach > >>> if anyone is interested in this. > >> > >> Ok, so it seems there's a mild agreement (or at least no strong > >> disagreement) that this may be the way forward. I will first wait to get > >> poppler 0.63 out of the door (that is already late several months, > >> between > >> freedesktop shutting down ssh access, them forgetting to tell us it was > >> open again, my disk dying and then me procastinating as usual) and once > >> it's out I'll come back and reevaluate this. > > > > I was trying to apply the patch for running a quick regtest but running > > git am over it gave me some warning/error I wasn't sure how to get out of > > correctly.
regtest results are good, i guess i'll have a second quick look at the patches and unless someone disagrees i'll push them. Cheers, Albert > > > > Could you try rebasing the patch on top of current master? > > > > Cheers, > > > > Albert > > Attached is a version rebased against the current master. > > Best regards, > Adam > > >> Cheers, > >> > >> Albert > >>> > >>> Best regards, Adam. > >>> > >>> Am 03.03.2018 um 20:29 schrieb Albert Astals Cid: > >>>> Ha! didn't realize you had already done all that, will read/answer the > >>>> other email. > >>>> > >>>> Cheers, > >>>> > >>>> Albert > >>>> > >>>> El dissabte, 3 de març de 2018, a les 20:25:32 CET, Albert Astals Cid > >>>> va > >>>> > >>>> escriure: > >>>>> El dimecres, 21 de febrer de 2018, a les 7:13:57 CET, Adam Reichold va > >>>>> > >>>>> escriure: > >>>>>> Hello again, > >>>>>> > >>>>>> Am 21.02.2018 um 00:31 schrieb Albert Astals Cid: > >>>>>>> El dimarts, 20 de febrer de 2018, a les 8:58:24 CET, Adam Reichold > >>>>>>> va > >>>>>>> > >>>>>>> escriure: > >>>>>>>> Hello again, > >>>>>>>> > >>>>>>>> Am 18.02.2018 um 23:23 schrieb Adam Reichold: > >>>>>>>>> Am 18.02.2018 um 23:08 schrieb Albert Astals Cid: > >>>>>>>>>> El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam > >>>>>>>>>> Reichold > >>>>>>>>>> va > >>>>>>>>>> > >>>>>>>>>> escriure: > >>>>>>>>>>> Hello, > >>>>>>>>>>> > >>>>>>>>>>> the attached patch replaced Poppler's internal hash table > >>>>>>>>>>> implementation > >>>>>>>>>>> GooHash by std::unordered_map found in the C++ standard library > >>>>>>>>>>> since > >>>>>>>>>>> C++11. This continues Poppler's slow drift towards standard > >>>>>>>>>>> library > >>>>>>>>>>> containers and removes one of the home-grown data structures > >>>>>>>>>>> with > >>>>>>>>>>> the > >>>>>>>>>>> main goals of reducing code size and improving the long term > >>>>>>>>>>> maintainability of the code base. > >>>>>>>>>> > >>>>>>>>>> Do you have any benchmarks on "rendering" speed and code size? > >>>>>>>>> > >>>>>>>>> Sorry, not at hand. I will try to prepare them during the week. > >>>>>>>> > >>>>>>>> I did run Splash rendering benchmarks of this branch against master > >>>>>>>> with > >>>>>>>> the result of rendering the circa 2400 documents of the TeXLive > >>>>>>> > >>>>>>>> documentation present on my machine being: > >>>>>>> I'm wondering if those 2400 documents are diverse enough, which they > >>>>>>> may > >>>>>>> not be given they are all coming from "the same place". > >>>>>> > >>>>>> They seem pretty diverse w.r.t. content, some being text heavy and > >>>>>> some > >>>>>> graphics rich. But I guess they are definitely not diverse > >>>>>> technically > >>>>>> as all are prepared using TeX itself. > >>>>>> > >>>>>> The main problem on my side is that I have failed to find my DVD copy > >>>>>> of > >>>>>> the Poppler regtest document collection until now. :-\ In any case, I > >>>>>> am > >>>>>> reasonably confident on the zero sum result since GooHash does not > >>>>>> seem > >>>>>> to live in any performance critical code path. > >>>>>> > >>>>>>>> Cumulative run time: > >>>>>>>> Result: 90.95 min ∓ 1.1 % > >>>>>>>> Reference: 91.57 min ∓ 1.2 % > >>>>>>>> Deviation: -0.0 % > >>>>>>>> > >>>>>>>> Cumulative memory usage: > >>>>>>>> Result: 37.2 MB ∓ 0.7 % > >>>>>>>> Reference: 37.0 MB ∓ 0.7 % > >>>>>>>> Deviation: +0.0 % > >>>>>>>> > >>>>>>>> (Where result is this patch and the reference is master.) (The > >>>>>>>> measurement was taken using the perftest script which I proposed > >>>>>>>> here > >>>>>>>> some time ago for which I'll attach the patch again for > >>>>>>>> reproduceability.) > >>>>>>> > >>>>>>> Did you really attach this before? i really don't remember it :D > >>>>>> > >>>>>> This was a very long-winded thread ending on 2016-01-01 centered > >>>>>> around > >>>>>> the regtest framework. It went from custom driver using QTest's > >>>>>> benchmark functionality to custom backend in the regtest framework to > >>>>>> separate Python script. The script still has problems to reliably > >>>>>> measure really short things like running "pdftotext" for which a > >>>>>> custom > >>>>>> C++ driver might be needed, but for long-running stuff like > >>>>>> "pdftoppm", > >>>>>> the results seem reasonable. > >>>>>> > >>>>>>> Anyhow it seems the change is mostly a zero-sum runtime wise. > >>>>>> > >>>>>> Indeed. (Although thinking really really long term, I think a Poppler > >>>>>> that is using std::vector, std::string and friends everywhere, could > >>>>>> loose a lot of distributed fat in the form of a lot of memory > >>>>>> indirections and benefit from the highly optimized standard library. > >>>>>> But > >>>>>> it just is not a single patch thing to reap any of these benefits.) > >>>>>> > >>>>>>> Which bring us to the code side. First i know it sucks but your > >>>>>>> spacing > >>>>>>> is > >>>>>>> broken, please check the whole patch > >>>>>>> > >>>>>>> - nameToGID->removeInt(macGlyphNames[j]); > >>>>>>> - nameToGID->add(new GooString(macGlyphNames[j]), i); > >>>>>>> + nameToGID[macGlyphNames[j]] = i; > >>>>>>> > >>>>>>> i could fix it, but it's better (for me) if you do :D > >>>>>> > >>>>>> Definitely for me to fix. The main problem is that the spacing in > >>>>>> those > >>>>>> files was already broken and I am unsure whether I should fix the > >>>>>> surrounding spacing even if I am not touching it otherwise. Due to > >>>>>> that, > >>>>>> there sometimes is no correct way in the current patch. If you do not > >>>>>> say otherwise, I will try to make at least the touched blocks of code > >>>>>> consistent. > >>>>> > >>>>> Are you sure the spacing is broken? I'd say it's just xpdf weird > >>>>> spacing > >>>>> rules of using 2 soft spaces and then hard tabs at 8. > >>>>> > >>>>>>> Now onto the code, > >>>>>>> > >>>>>>> const auto emplaceRangeMap = [&](const char* encodingName, GBool > >>>>>>> unicodeOut,> > >>>>>>> > >>>>>>> UnicodeMapRange* ranges, int len) { > >>>>>>> > >>>>>>> residentUnicodeMaps.emplace( > >>>>>>> > >>>>>>> std::piecewise_construct, > >>>>>>> std::forward_as_tuple(encodingName), > >>>>>>> std::forward_as_tuple(encodingName, unicodeOut, ranges, len) > >>>>>>> > >>>>>>> ); > >>>>>>> > >>>>>>> }; > >>>>>>> > >>>>>>> It's the first time in my life i see std::piecewise_construct and > >>>>>>> std::forward_as_tuple, yes that probably makes me a bad C++ > >>>>>>> developer, > >>>>>>> but > >>>>>>> given there's lots of us around, it doesn't make me happy now i > >>>>>>> don't > >>>>>>> understand what the code does. > >>>>>> > >>>>>> The problem is that most internal Poppler types lack proper > >>>>>> construction > >>>>>> and assignment operators, hence I need to work harder to construct > >>>>>> those > >>>>>> UnicodeMap instances in-place inside the map (GooHash just stored a > >>>>>> pointer to it, so there was no problem.) > >>>>>> > >>>>>> The whole piecewise_construct and forward_as_tuple dance just > >>>>>> ensures, > >>>>>> that those parameters to emplace are properly grouped before being > >>>>>> unpacked to become the parameters of std::string and UnicodeMap > >>>>>> construction again. If UnicodeMap was movable, this would probably > >>>>>> look > >>>>>> like > >>>>>> > >>>>>> residentUnicodeMaps.emplace( > >>>>>> > >>>>>> encodingName, > >>>>>> UnicodeMap{encodingName, unicodeOut, ranges, len} > >>>>>> > >>>>>> ); > >>>>>> > >>>>>> If you like, I can try to make Unicode a move-only type and simplify > >>>>>> the > >>>>>> mentioned helper functions? > >>>>> > >>>>> you can give it a try, not sure how easy that's going to be. > >>>>> > >>>>>>> Then there's the benefit/risk ratio. The code using GooHash is > >>>>>>> mostly > >>>>>>> rocksolid and we haven't probably touched any line in it for a long > >>>>>>> time > >>>>>>> and we have probably neither written new code that uses GooHash. > >>>>>>> > >>>>>>> In that regard we risk breaking working code. > >>>>>>> > >>>>>>> The benefit is not more speed nor less memory usage as your > >>>>>>> measurements > >>>>>>> show. > >>>>>>> > >>>>>>> Mostly the benefit is "removing code from maintainership", which i > >>>>>>> agree > >>>>>>> is a good thing, but as mentioned before it's some code "we mostly > >>>>>>> ignore", so it's not that much of a good thing. > >>>>>> > >>>>>> I very much agree with the risk assessment. > >>>>>> > >>>>>> But I also think the code will ossify (or maybe already is?) due to > >>>>>> those custom data structures and the less than idiomatic C++ usage. > >>>>>> Hence I think, Poppler would not just loose code, but the remaining > >>>>>> code > >>>>>> should become easier to maintain. (Of course, the piecewise_construct > >>>>>> fiasco shows that this has intermediate costs. But again, I think > >>>>>> this > >>>>>> is just an incentive to provide types with the usual C++ semantics > >>>>>> which > >>>>>> should make all code using those types better.) > >>>>> > >>>>> Yeah, i guess you're right. > >>>>> > >>>>> Cheers, > >>>>> > >>>>> Albert > >>>>>> > >>>>>> Best regards, Adam. > >>>>>> > >>>>>>> So i'm hesitant of what to do. It really sounds like a nice thing to > >>>>>>> not > >>>>>>> have custom structures, but on the other hand it's not like they get > >>>>>>> much > >>>>>>> in the way of coding. > >>>>>>> > >>>>>>> I'd really appreciate other people's comments on this. > >>>>>>> > >>>>>>> Cheers, > >>>>>>> > >>>>>>> Albert > >>>>>>>> > >>>>>>>> I'll also attach the detailed comparison, but the gist seems to be > >>>>>>>> that > >>>>>>>> if there are significant changes, the run time is reduced but the > >>>>>>>> memory > >>>>>>>> usage is increased in the majority of cases. But this does not seem > >>>>>>>> to > >>>>>>>> show up in the cumulative results. > >>>>>>>> > >>>>>>>> Best regards, Adam. > >>>>>>>> > >>>>>>>> P.S.: One could try to improve the memory usage by tuning the load > >>>>>>>> factor or calling shrink_to_fit where appropriate. Would you like > >>>>>>>> me > >>>>>>>> to > >>>>>>>> try to do this? > >>>>>>>> > >>>>>>>> P.P.S.: One obvious area for improvement would be better > >>>>>>>> interoperability between GooString and std::string, i.e. not > >>>>>>>> converting > >>>>>>>> them as C strings so that the length information does not need to > >>>>>>>> be > >>>>>>>> recomputed. I will try to prepare this as a separate patch on top > >>>>>>>> of > >>>>>>>> this one or should I include this here? > >>>>>>>> > >>>>>>>> Best regards, Adam. > >>>>>>>> > >>>>>>>>> Concerning code size, a release build of libpoppler.so goes from > >>>>>>>>> > >>>>>>>>> text data bss dec hex filename > >>>>>>>>> > >>>>>>>>> 2464034 288852 360 2753246 2a02de libpoppler.so.73.0.0 > >>>>>>>>> > >>>>>>>>> for the current master to > >>>>>>>>> > >>>>>>>>> text data bss dec hex filename > >>>>>>>>> > >>>>>>>>> 2482129 288756 360 2771245 2a492d libpoppler.so.73.0.0 > >>>>>>>>> > >>>>>>>>> with the patch applied, i.e. a 0.65% increase in binary size. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Please note that in my original message, I was referring only to > >>>>>>>>> source > >>>>>>>>> code size, i.e. > >>>>>>>>> > >>>>>>>>> git diff --stat master...remove-goo-hash > >>>>>>>>> > >>>>>>>>> 18 files changed, 168 insertions(+), 803 deletions(-) > >>>>>>>>> > >>>>>>>>>> Cheers, > >>>>>>>>>> > >>>>>>>>>> Albert > >>>>>>>>> > >>>>>>>>> Best regards, Adam. > >>>>>>>>> > >>>>>>>>>>> Best regards, Adam. > >>>>>>>>>> > >>>>>>>>>> _______________________________________________ > >>>>>>>>>> poppler mailing list > >>>>>>>>>> poppler@lists.freedesktop.org > >>>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/poppler > >>>>>>>>> > >>>>>>>>> _______________________________________________ > >>>>>>>>> poppler mailing list > >>>>>>>>> poppler@lists.freedesktop.org > >>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/poppler > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> poppler mailing list > >>>>>>> poppler@lists.freedesktop.org > >>>>>>> https://lists.freedesktop.org/mailman/listinfo/poppler > >>>>> > >>>>> _______________________________________________ > >>>>> poppler mailing list > >>>>> poppler@lists.freedesktop.org > >>>>> https://lists.freedesktop.org/mailman/listinfo/poppler > >>>> > >>>> _______________________________________________ > >>>> poppler mailing list > >>>> poppler@lists.freedesktop.org > >>>> https://lists.freedesktop.org/mailman/listinfo/poppler > >> > >> _______________________________________________ > >> poppler mailing list > >> poppler@lists.freedesktop.org > >> https://lists.freedesktop.org/mailman/listinfo/poppler > > > > _______________________________________________ > > poppler mailing list > > poppler@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/poppler _______________________________________________ poppler mailing list poppler@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/poppler