Re: [poppler] [RFC] Replace GooHash by std::unordered_map
El dijous, 3 de maig de 2018, a les 0:57:21 CEST, Albert Astals Cid va escriure: > El dimecres, 2 de maig de 2018, a les 19:11:19 CEST, Adam Reichold va > > escriure: > > Hello, > > > > Am 01.05.2018 um 23:57 schrieb Albert Astals Cid: > > > El diumenge, 4 de març de 2018, a les 22:59:20 CEST, Albert Astals Cid > > > va > > > > > > escriure: > > >> El dissabte, 3 de març de 2018, a les 20:51:55 CET, Adam Reichold va > > > > > > escriure: > > >>> Hello Albert, > > >>> > > >>> please note that in the previous patch, I made a mistake with the move > > >>> constructor (the ref count should not be moved/swapped since it is > > >>> tied > > >>> to the instance, not its value). Attached is a fixed patch. > > >>> > > >>> I found the issue during further performance tests which however only > > >>> continued to strengthen to zero sum result. In any case, I extend the > > >>> Python-based perftest script with a C++-based driver to be able to > > >>> reliably measure single-page runtimes for this, which I will also > > >>> attach > > >>> if anyone is interested in this. > > >> > > >> Ok, so it seems there's a mild agreement (or at least no strong > > >> disagreement) that this may be the way forward. I will first wait to > > >> get > > >> poppler 0.63 out of the door (that is already late several months, > > >> between > > >> freedesktop shutting down ssh access, them forgetting to tell us it was > > >> open again, my disk dying and then me procastinating as usual) and once > > >> it's out I'll come back and reevaluate this. > > > > > > I was trying to apply the patch for running a quick regtest but running > > > git am over it gave me some warning/error I wasn't sure how to get out > > > of > > > correctly. > > regtest results are good, i guess i'll have a second quick look at the > patches and unless someone disagrees i'll push them. It's in :) Cheers, Albert ___ poppler mailing list poppler@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/poppler
Re: [poppler] [RFC] Replace GooHash by std::unordered_map
El dimecres, 2 de maig de 2018, a les 19:11:19 CEST, Adam Reichold va escriure: > Hello, > > Am 01.05.2018 um 23:57 schrieb Albert Astals Cid: > > El diumenge, 4 de març de 2018, a les 22:59:20 CEST, Albert Astals Cid va > > > > escriure: > >> El dissabte, 3 de març de 2018, a les 20:51:55 CET, Adam Reichold va > > > > escriure: > >>> Hello Albert, > >>> > >>> please note that in the previous patch, I made a mistake with the move > >>> constructor (the ref count should not be moved/swapped since it is tied > >>> to the instance, not its value). Attached is a fixed patch. > >>> > >>> I found the issue during further performance tests which however only > >>> continued to strengthen to zero sum result. In any case, I extend the > >>> Python-based perftest script with a C++-based driver to be able to > >>> reliably measure single-page runtimes for this, which I will also attach > >>> if anyone is interested in this. > >> > >> Ok, so it seems there's a mild agreement (or at least no strong > >> disagreement) that this may be the way forward. I will first wait to get > >> poppler 0.63 out of the door (that is already late several months, > >> between > >> freedesktop shutting down ssh access, them forgetting to tell us it was > >> open again, my disk dying and then me procastinating as usual) and once > >> it's out I'll come back and reevaluate this. > > > > I was trying to apply the patch for running a quick regtest but running > > git am over it gave me some warning/error I wasn't sure how to get out of > > correctly. regtest results are good, i guess i'll have a second quick look at the patches and unless someone disagrees i'll push them. Cheers, Albert > > > > Could you try rebasing the patch on top of current master? > > > > Cheers, > > > > Albert > > Attached is a version rebased against the current master. > > Best regards, > Adam > > >> Cheers, > >> > >> Albert > >>> > >>> Best regards, Adam. > >>> > >>> Am 03.03.2018 um 20:29 schrieb Albert Astals Cid: > Ha! didn't realize you had already done all that, will read/answer the > other email. > > Cheers, > > Albert > > El dissabte, 3 de març de 2018, a les 20:25:32 CET, Albert Astals Cid > va > > escriure: > > El dimecres, 21 de febrer de 2018, a les 7:13:57 CET, Adam Reichold va > > > > escriure: > >> Hello again, > >> > >> Am 21.02.2018 um 00:31 schrieb Albert Astals Cid: > >>> El dimarts, 20 de febrer de 2018, a les 8:58:24 CET, Adam Reichold > >>> va > >>> > >>> escriure: > Hello again, > > Am 18.02.2018 um 23:23 schrieb Adam Reichold: > > Am 18.02.2018 um 23:08 schrieb Albert Astals Cid: > >> El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam > >> Reichold > >> va > >> > >> escriure: > >>> Hello, > >>> > >>> the attached patch replaced Poppler's internal hash table > >>> implementation > >>> GooHash by std::unordered_map found in the C++ standard library > >>> since > >>> C++11. This continues Poppler's slow drift towards standard > >>> library > >>> containers and removes one of the home-grown data structures > >>> with > >>> the > >>> main goals of reducing code size and improving the long term > >>> maintainability of the code base. > >> > >> Do you have any benchmarks on "rendering" speed and code size? > > > > Sorry, not at hand. I will try to prepare them during the week. > > I did run Splash rendering benchmarks of this branch against master > with > the result of rendering the circa 2400 documents of the TeXLive > >>> > documentation present on my machine being: > >>> I'm wondering if those 2400 documents are diverse enough, which they > >>> may > >>> not be given they are all coming from "the same place". > >> > >> They seem pretty diverse w.r.t. content, some being text heavy and > >> some > >> graphics rich. But I guess they are definitely not diverse > >> technically > >> as all are prepared using TeX itself. > >> > >> The main problem on my side is that I have failed to find my DVD copy > >> of > >> the Poppler regtest document collection until now. :-\ In any case, I > >> am > >> reasonably confident on the zero sum result since GooHash does not > >> seem > >> to live in any performance critical code path. > >> > Cumulative run time: > Result: 90.95 min ∓ 1.1 % > Reference: 91.57 min ∓ 1.2 % > Deviation: -0.0 % > > Cumulative memory usage: > Result: 37.2 MB ∓ 0.7 % > Reference: 37.0 MB ∓ 0.7 % > Deviation: +0.0 % > > (
Re: [poppler] [RFC] Replace GooHash by std::unordered_map
Hello, Am 01.05.2018 um 23:57 schrieb Albert Astals Cid: > El diumenge, 4 de març de 2018, a les 22:59:20 CEST, Albert Astals Cid va > escriure: >> El dissabte, 3 de març de 2018, a les 20:51:55 CET, Adam Reichold va > escriure: >>> Hello Albert, >>> >>> please note that in the previous patch, I made a mistake with the move >>> constructor (the ref count should not be moved/swapped since it is tied >>> to the instance, not its value). Attached is a fixed patch. >>> >>> I found the issue during further performance tests which however only >>> continued to strengthen to zero sum result. In any case, I extend the >>> Python-based perftest script with a C++-based driver to be able to >>> reliably measure single-page runtimes for this, which I will also attach >>> if anyone is interested in this. >> >> Ok, so it seems there's a mild agreement (or at least no strong >> disagreement) that this may be the way forward. I will first wait to get >> poppler 0.63 out of the door (that is already late several months, between >> freedesktop shutting down ssh access, them forgetting to tell us it was >> open again, my disk dying and then me procastinating as usual) and once >> it's out I'll come back and reevaluate this. > > I was trying to apply the patch for running a quick regtest but running git > am > over it gave me some warning/error I wasn't sure how to get out of correctly. > > Could you try rebasing the patch on top of current master? > > Cheers, > Albert Attached is a version rebased against the current master. Best regards, Adam >> >> Cheers, >> Albert >> >>> Best regards, Adam. >>> >>> Am 03.03.2018 um 20:29 schrieb Albert Astals Cid: Ha! didn't realize you had already done all that, will read/answer the other email. Cheers, Albert El dissabte, 3 de març de 2018, a les 20:25:32 CET, Albert Astals Cid va escriure: > El dimecres, 21 de febrer de 2018, a les 7:13:57 CET, Adam Reichold va > > escriure: >> Hello again, >> >> Am 21.02.2018 um 00:31 schrieb Albert Astals Cid: >>> El dimarts, 20 de febrer de 2018, a les 8:58:24 CET, Adam Reichold va >>> >>> escriure: Hello again, Am 18.02.2018 um 23:23 schrieb Adam Reichold: > Am 18.02.2018 um 23:08 schrieb Albert Astals Cid: >> El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam >> Reichold >> va >> >> escriure: >>> Hello, >>> >>> the attached patch replaced Poppler's internal hash table >>> implementation >>> GooHash by std::unordered_map found in the C++ standard library >>> since >>> C++11. This continues Poppler's slow drift towards standard >>> library >>> containers and removes one of the home-grown data structures with >>> the >>> main goals of reducing code size and improving the long term >>> maintainability of the code base. >> >> Do you have any benchmarks on "rendering" speed and code size? > > Sorry, not at hand. I will try to prepare them during the week. I did run Splash rendering benchmarks of this branch against master with the result of rendering the circa 2400 documents of the TeXLive >>> documentation present on my machine being: >>> I'm wondering if those 2400 documents are diverse enough, which they >>> may >>> not be given they are all coming from "the same place". >> >> They seem pretty diverse w.r.t. content, some being text heavy and >> some >> graphics rich. But I guess they are definitely not diverse technically >> as all are prepared using TeX itself. >> >> The main problem on my side is that I have failed to find my DVD copy >> of >> the Poppler regtest document collection until now. :-\ In any case, I >> am >> reasonably confident on the zero sum result since GooHash does not >> seem >> to live in any performance critical code path. >> Cumulative run time: Result: 90.95 min ∓ 1.1 % Reference: 91.57 min ∓ 1.2 % Deviation: -0.0 % Cumulative memory usage: Result: 37.2 MB ∓ 0.7 % Reference: 37.0 MB ∓ 0.7 % Deviation: +0.0 % (Where result is this patch and the reference is master.) (The measurement was taken using the perftest script which I proposed here some time ago for which I'll attach the patch again for reproduceability.) >>> >>> Did you really attach this before? i really don't remember it :D >> >> This was a very long-winded thread ending on 2016-01-01 centered >> around >> the regtest framework. It went from custom driver using QTest's >> benchmark functionality to custom backend in the regtest framew
Re: [poppler] [RFC] Replace GooHash by std::unordered_map
El diumenge, 4 de març de 2018, a les 22:59:20 CEST, Albert Astals Cid va escriure: > El dissabte, 3 de març de 2018, a les 20:51:55 CET, Adam Reichold va escriure: > > Hello Albert, > > > > please note that in the previous patch, I made a mistake with the move > > constructor (the ref count should not be moved/swapped since it is tied > > to the instance, not its value). Attached is a fixed patch. > > > > I found the issue during further performance tests which however only > > continued to strengthen to zero sum result. In any case, I extend the > > Python-based perftest script with a C++-based driver to be able to > > reliably measure single-page runtimes for this, which I will also attach > > if anyone is interested in this. > > Ok, so it seems there's a mild agreement (or at least no strong > disagreement) that this may be the way forward. I will first wait to get > poppler 0.63 out of the door (that is already late several months, between > freedesktop shutting down ssh access, them forgetting to tell us it was > open again, my disk dying and then me procastinating as usual) and once > it's out I'll come back and reevaluate this. I was trying to apply the patch for running a quick regtest but running git am over it gave me some warning/error I wasn't sure how to get out of correctly. Could you try rebasing the patch on top of current master? Cheers, Albert > > Cheers, > Albert > > > Best regards, Adam. > > > > Am 03.03.2018 um 20:29 schrieb Albert Astals Cid: > > > Ha! didn't realize you had already done all that, will read/answer the > > > other email. > > > > > > Cheers, > > > > > > Albert > > > > > > El dissabte, 3 de març de 2018, a les 20:25:32 CET, Albert Astals Cid va > > > > > > escriure: > > >> El dimecres, 21 de febrer de 2018, a les 7:13:57 CET, Adam Reichold va > > >> > > >> escriure: > > >>> Hello again, > > >>> > > >>> Am 21.02.2018 um 00:31 schrieb Albert Astals Cid: > > El dimarts, 20 de febrer de 2018, a les 8:58:24 CET, Adam Reichold va > > > > escriure: > > > Hello again, > > > > > > Am 18.02.2018 um 23:23 schrieb Adam Reichold: > > >> Am 18.02.2018 um 23:08 schrieb Albert Astals Cid: > > >>> El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam > > >>> Reichold > > >>> va > > >>> > > >>> escriure: > > Hello, > > > > the attached patch replaced Poppler's internal hash table > > implementation > > GooHash by std::unordered_map found in the C++ standard library > > since > > C++11. This continues Poppler's slow drift towards standard > > library > > containers and removes one of the home-grown data structures with > > the > > main goals of reducing code size and improving the long term > > maintainability of the code base. > > >>> > > >>> Do you have any benchmarks on "rendering" speed and code size? > > >> > > >> Sorry, not at hand. I will try to prepare them during the week. > > > > > > I did run Splash rendering benchmarks of this branch against master > > > with > > > the result of rendering the circa 2400 documents of the TeXLive > > > > > documentation present on my machine being: > > I'm wondering if those 2400 documents are diverse enough, which they > > may > > not be given they are all coming from "the same place". > > >>> > > >>> They seem pretty diverse w.r.t. content, some being text heavy and > > >>> some > > >>> graphics rich. But I guess they are definitely not diverse technically > > >>> as all are prepared using TeX itself. > > >>> > > >>> The main problem on my side is that I have failed to find my DVD copy > > >>> of > > >>> the Poppler regtest document collection until now. :-\ In any case, I > > >>> am > > >>> reasonably confident on the zero sum result since GooHash does not > > >>> seem > > >>> to live in any performance critical code path. > > >>> > > > Cumulative run time: > > > Result: 90.95 min ∓ 1.1 % > > > Reference: 91.57 min ∓ 1.2 % > > > Deviation: -0.0 % > > > > > > Cumulative memory usage: > > > Result: 37.2 MB ∓ 0.7 % > > > Reference: 37.0 MB ∓ 0.7 % > > > Deviation: +0.0 % > > > > > > (Where result is this patch and the reference is master.) (The > > > measurement was taken using the perftest script which I proposed > > > here > > > some time ago for which I'll attach the patch again for > > > reproduceability.) > > > > Did you really attach this before? i really don't remember it :D > > >>> > > >>> This was a very long-winded thread ending on 2016-01-01 centered > > >>> around > > >>> the regtest framework. It went from custom driver using QTest's > > >>> benchmark functionality to custom backend in the regtest framework to > > >>> separate Python script. The script still ha
Re: [poppler] [RFC] Replace GooHash by std::unordered_map
El dissabte, 3 de març de 2018, a les 20:51:55 CET, Adam Reichold va escriure: > Hello Albert, > > please note that in the previous patch, I made a mistake with the move > constructor (the ref count should not be moved/swapped since it is tied > to the instance, not its value). Attached is a fixed patch. > > I found the issue during further performance tests which however only > continued to strengthen to zero sum result. In any case, I extend the > Python-based perftest script with a C++-based driver to be able to > reliably measure single-page runtimes for this, which I will also attach > if anyone is interested in this. Ok, so it seems there's a mild agreement (or at least no strong disagreement) that this may be the way forward. I will first wait to get poppler 0.63 out of the door (that is already late several months, between freedesktop shutting down ssh access, them forgetting to tell us it was open again, my disk dying and then me procastinating as usual) and once it's out I'll come back and reevaluate this. Cheers, Albert > > Best regards, Adam. > > Am 03.03.2018 um 20:29 schrieb Albert Astals Cid: > > Ha! didn't realize you had already done all that, will read/answer the > > other email. > > > > Cheers, > > > > Albert > > > > El dissabte, 3 de març de 2018, a les 20:25:32 CET, Albert Astals Cid va > > > > escriure: > >> El dimecres, 21 de febrer de 2018, a les 7:13:57 CET, Adam Reichold va > >> > >> escriure: > >>> Hello again, > >>> > >>> Am 21.02.2018 um 00:31 schrieb Albert Astals Cid: > El dimarts, 20 de febrer de 2018, a les 8:58:24 CET, Adam Reichold va > > escriure: > > Hello again, > > > > Am 18.02.2018 um 23:23 schrieb Adam Reichold: > >> Am 18.02.2018 um 23:08 schrieb Albert Astals Cid: > >>> El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam Reichold > >>> va > >>> > >>> escriure: > Hello, > > the attached patch replaced Poppler's internal hash table > implementation > GooHash by std::unordered_map found in the C++ standard library > since > C++11. This continues Poppler's slow drift towards standard library > containers and removes one of the home-grown data structures with > the > main goals of reducing code size and improving the long term > maintainability of the code base. > >>> > >>> Do you have any benchmarks on "rendering" speed and code size? > >> > >> Sorry, not at hand. I will try to prepare them during the week. > > > > I did run Splash rendering benchmarks of this branch against master > > with > > the result of rendering the circa 2400 documents of the TeXLive > > > documentation present on my machine being: > I'm wondering if those 2400 documents are diverse enough, which they > may > not be given they are all coming from "the same place". > >>> > >>> They seem pretty diverse w.r.t. content, some being text heavy and some > >>> graphics rich. But I guess they are definitely not diverse technically > >>> as all are prepared using TeX itself. > >>> > >>> The main problem on my side is that I have failed to find my DVD copy of > >>> the Poppler regtest document collection until now. :-\ In any case, I am > >>> reasonably confident on the zero sum result since GooHash does not seem > >>> to live in any performance critical code path. > >>> > > Cumulative run time: > > Result: 90.95 min ∓ 1.1 % > > Reference: 91.57 min ∓ 1.2 % > > Deviation: -0.0 % > > > > Cumulative memory usage: > > Result: 37.2 MB ∓ 0.7 % > > Reference: 37.0 MB ∓ 0.7 % > > Deviation: +0.0 % > > > > (Where result is this patch and the reference is master.) (The > > measurement was taken using the perftest script which I proposed here > > some time ago for which I'll attach the patch again for > > reproduceability.) > > Did you really attach this before? i really don't remember it :D > >>> > >>> This was a very long-winded thread ending on 2016-01-01 centered around > >>> the regtest framework. It went from custom driver using QTest's > >>> benchmark functionality to custom backend in the regtest framework to > >>> separate Python script. The script still has problems to reliably > >>> measure really short things like running "pdftotext" for which a custom > >>> C++ driver might be needed, but for long-running stuff like "pdftoppm", > >>> the results seem reasonable. > >>> > Anyhow it seems the change is mostly a zero-sum runtime wise. > >>> > >>> Indeed. (Although thinking really really long term, I think a Poppler > >>> that is using std::vector, std::string and friends everywhere, could > >>> loose a lot of distributed fat in the form of a lot of memory > >>> indirections and benefit from the highly optimized standard library. But > >>> it jus
Re: [poppler] [RFC] Replace GooHash by std::unordered_map
Hello Albert, please note that in the previous patch, I made a mistake with the move constructor (the ref count should not be moved/swapped since it is tied to the instance, not its value). Attached is a fixed patch. I found the issue during further performance tests which however only continued to strengthen to zero sum result. In any case, I extend the Python-based perftest script with a C++-based driver to be able to reliably measure single-page runtimes for this, which I will also attach if anyone is interested in this. Best regards, Adam. Am 03.03.2018 um 20:29 schrieb Albert Astals Cid: > Ha! didn't realize you had already done all that, will read/answer the other > email. > > Cheers, > Albert > > El dissabte, 3 de març de 2018, a les 20:25:32 CET, Albert Astals Cid va > escriure: >> El dimecres, 21 de febrer de 2018, a les 7:13:57 CET, Adam Reichold va >> >> escriure: >>> Hello again, >>> >>> Am 21.02.2018 um 00:31 schrieb Albert Astals Cid: El dimarts, 20 de febrer de 2018, a les 8:58:24 CET, Adam Reichold va escriure: > Hello again, > > Am 18.02.2018 um 23:23 schrieb Adam Reichold: >> Am 18.02.2018 um 23:08 schrieb Albert Astals Cid: >>> El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam Reichold >>> va >>> >>> escriure: Hello, the attached patch replaced Poppler's internal hash table implementation GooHash by std::unordered_map found in the C++ standard library since C++11. This continues Poppler's slow drift towards standard library containers and removes one of the home-grown data structures with the main goals of reducing code size and improving the long term maintainability of the code base. >>> >>> Do you have any benchmarks on "rendering" speed and code size? >> >> Sorry, not at hand. I will try to prepare them during the week. > > I did run Splash rendering benchmarks of this branch against master > with > the result of rendering the circa 2400 documents of the TeXLive > documentation present on my machine being: I'm wondering if those 2400 documents are diverse enough, which they may not be given they are all coming from "the same place". >>> >>> They seem pretty diverse w.r.t. content, some being text heavy and some >>> graphics rich. But I guess they are definitely not diverse technically >>> as all are prepared using TeX itself. >>> >>> The main problem on my side is that I have failed to find my DVD copy of >>> the Poppler regtest document collection until now. :-\ In any case, I am >>> reasonably confident on the zero sum result since GooHash does not seem >>> to live in any performance critical code path. >>> > Cumulative run time: > Result: 90.95 min ∓ 1.1 % > Reference: 91.57 min ∓ 1.2 % > Deviation: -0.0 % > > Cumulative memory usage: > Result: 37.2 MB ∓ 0.7 % > Reference: 37.0 MB ∓ 0.7 % > Deviation: +0.0 % > > (Where result is this patch and the reference is master.) (The > measurement was taken using the perftest script which I proposed here > some time ago for which I'll attach the patch again for > reproduceability.) Did you really attach this before? i really don't remember it :D >>> >>> This was a very long-winded thread ending on 2016-01-01 centered around >>> the regtest framework. It went from custom driver using QTest's >>> benchmark functionality to custom backend in the regtest framework to >>> separate Python script. The script still has problems to reliably >>> measure really short things like running "pdftotext" for which a custom >>> C++ driver might be needed, but for long-running stuff like "pdftoppm", >>> the results seem reasonable. >>> Anyhow it seems the change is mostly a zero-sum runtime wise. >>> >>> Indeed. (Although thinking really really long term, I think a Poppler >>> that is using std::vector, std::string and friends everywhere, could >>> loose a lot of distributed fat in the form of a lot of memory >>> indirections and benefit from the highly optimized standard library. But >>> it just is not a single patch thing to reap any of these benefits.) >>> Which bring us to the code side. First i know it sucks but your spacing is broken, please check the whole patch - nameToGID->removeInt(macGlyphNames[j]); - nameToGID->add(new GooString(macGlyphNames[j]), i); + nameToGID[macGlyphNames[j]] = i; i could fix it, but it's better (for me) if you do :D >>> >>> Definitely for me to fix. The main problem is that the spacing in those >>> files was already broken and I am unsure whether I should fix the >>> surrounding spacing even if I am not touching it otherwise. Due to that, >>> there sometimes is no correct way in the current patch. If you do not >>> say otherwise, I will try
Re: [poppler] [RFC] Replace GooHash by std::unordered_map
Ha! didn't realize you had already done all that, will read/answer the other email. Cheers, Albert El dissabte, 3 de març de 2018, a les 20:25:32 CET, Albert Astals Cid va escriure: > El dimecres, 21 de febrer de 2018, a les 7:13:57 CET, Adam Reichold va > > escriure: > > Hello again, > > > > Am 21.02.2018 um 00:31 schrieb Albert Astals Cid: > > > El dimarts, 20 de febrer de 2018, a les 8:58:24 CET, Adam Reichold va > > > > > > escriure: > > >> Hello again, > > >> > > >> Am 18.02.2018 um 23:23 schrieb Adam Reichold: > > >>> Am 18.02.2018 um 23:08 schrieb Albert Astals Cid: > > El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam Reichold > > va > > > > escriure: > > > Hello, > > > > > > the attached patch replaced Poppler's internal hash table > > > implementation > > > GooHash by std::unordered_map found in the C++ standard library > > > since > > > C++11. This continues Poppler's slow drift towards standard library > > > containers and removes one of the home-grown data structures with > > > the > > > main goals of reducing code size and improving the long term > > > maintainability of the code base. > > > > Do you have any benchmarks on "rendering" speed and code size? > > >>> > > >>> Sorry, not at hand. I will try to prepare them during the week. > > >> > > >> I did run Splash rendering benchmarks of this branch against master > > >> with > > >> the result of rendering the circa 2400 documents of the TeXLive > > > > > >> documentation present on my machine being: > > > I'm wondering if those 2400 documents are diverse enough, which they may > > > not be given they are all coming from "the same place". > > > > They seem pretty diverse w.r.t. content, some being text heavy and some > > graphics rich. But I guess they are definitely not diverse technically > > as all are prepared using TeX itself. > > > > The main problem on my side is that I have failed to find my DVD copy of > > the Poppler regtest document collection until now. :-\ In any case, I am > > reasonably confident on the zero sum result since GooHash does not seem > > to live in any performance critical code path. > > > > >> Cumulative run time: > > >> Result: 90.95 min ∓ 1.1 % > > >> Reference: 91.57 min ∓ 1.2 % > > >> Deviation: -0.0 % > > >> > > >> Cumulative memory usage: > > >> Result: 37.2 MB ∓ 0.7 % > > >> Reference: 37.0 MB ∓ 0.7 % > > >> Deviation: +0.0 % > > >> > > >> (Where result is this patch and the reference is master.) (The > > >> measurement was taken using the perftest script which I proposed here > > >> some time ago for which I'll attach the patch again for > > >> reproduceability.) > > > > > > Did you really attach this before? i really don't remember it :D > > > > This was a very long-winded thread ending on 2016-01-01 centered around > > the regtest framework. It went from custom driver using QTest's > > benchmark functionality to custom backend in the regtest framework to > > separate Python script. The script still has problems to reliably > > measure really short things like running "pdftotext" for which a custom > > C++ driver might be needed, but for long-running stuff like "pdftoppm", > > the results seem reasonable. > > > > > Anyhow it seems the change is mostly a zero-sum runtime wise. > > > > Indeed. (Although thinking really really long term, I think a Poppler > > that is using std::vector, std::string and friends everywhere, could > > loose a lot of distributed fat in the form of a lot of memory > > indirections and benefit from the highly optimized standard library. But > > it just is not a single patch thing to reap any of these benefits.) > > > > > Which bring us to the code side. First i know it sucks but your spacing > > > is > > > broken, please check the whole patch > > > > > > - nameToGID->removeInt(macGlyphNames[j]); > > > - nameToGID->add(new GooString(macGlyphNames[j]), i); > > > + nameToGID[macGlyphNames[j]] = i; > > > > > > i could fix it, but it's better (for me) if you do :D > > > > Definitely for me to fix. The main problem is that the spacing in those > > files was already broken and I am unsure whether I should fix the > > surrounding spacing even if I am not touching it otherwise. Due to that, > > there sometimes is no correct way in the current patch. If you do not > > say otherwise, I will try to make at least the touched blocks of code > > consistent. > > Are you sure the spacing is broken? I'd say it's just xpdf weird spacing > rules of using 2 soft spaces and then hard tabs at 8. > > > > Now onto the code, > > > > > > const auto emplaceRangeMap = [&](const char* encodingName, GBool > > > unicodeOut,> > > > > > > UnicodeMapRange* ranges, int len) { > > > > > > residentUnicodeMaps.emplace( > > > > > > std::piecewise_construct, > > > std::forward_as_tuple(encodingName), > > > std::forward_
Re: [poppler] [RFC] Replace GooHash by std::unordered_map
El dimecres, 21 de febrer de 2018, a les 7:13:57 CET, Adam Reichold va escriure: > Hello again, > > Am 21.02.2018 um 00:31 schrieb Albert Astals Cid: > > El dimarts, 20 de febrer de 2018, a les 8:58:24 CET, Adam Reichold va > > > > escriure: > >> Hello again, > >> > >> Am 18.02.2018 um 23:23 schrieb Adam Reichold: > >>> Am 18.02.2018 um 23:08 schrieb Albert Astals Cid: > El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam Reichold va > > escriure: > > Hello, > > > > the attached patch replaced Poppler's internal hash table > > implementation > > GooHash by std::unordered_map found in the C++ standard library since > > C++11. This continues Poppler's slow drift towards standard library > > containers and removes one of the home-grown data structures with the > > main goals of reducing code size and improving the long term > > maintainability of the code base. > > Do you have any benchmarks on "rendering" speed and code size? > >>> > >>> Sorry, not at hand. I will try to prepare them during the week. > >> > >> I did run Splash rendering benchmarks of this branch against master with > >> the result of rendering the circa 2400 documents of the TeXLive > > > >> documentation present on my machine being: > > I'm wondering if those 2400 documents are diverse enough, which they may > > not be given they are all coming from "the same place". > > They seem pretty diverse w.r.t. content, some being text heavy and some > graphics rich. But I guess they are definitely not diverse technically > as all are prepared using TeX itself. > > The main problem on my side is that I have failed to find my DVD copy of > the Poppler regtest document collection until now. :-\ In any case, I am > reasonably confident on the zero sum result since GooHash does not seem > to live in any performance critical code path. > > >> Cumulative run time: > >> Result: 90.95 min ∓ 1.1 % > >> Reference: 91.57 min ∓ 1.2 % > >> Deviation: -0.0 % > >> > >> Cumulative memory usage: > >> Result: 37.2 MB ∓ 0.7 % > >> Reference: 37.0 MB ∓ 0.7 % > >> Deviation: +0.0 % > >> > >> (Where result is this patch and the reference is master.) (The > >> measurement was taken using the perftest script which I proposed here > >> some time ago for which I'll attach the patch again for > >> reproduceability.) > > > > Did you really attach this before? i really don't remember it :D > > This was a very long-winded thread ending on 2016-01-01 centered around > the regtest framework. It went from custom driver using QTest's > benchmark functionality to custom backend in the regtest framework to > separate Python script. The script still has problems to reliably > measure really short things like running "pdftotext" for which a custom > C++ driver might be needed, but for long-running stuff like "pdftoppm", > the results seem reasonable. > > > Anyhow it seems the change is mostly a zero-sum runtime wise. > > Indeed. (Although thinking really really long term, I think a Poppler > that is using std::vector, std::string and friends everywhere, could > loose a lot of distributed fat in the form of a lot of memory > indirections and benefit from the highly optimized standard library. But > it just is not a single patch thing to reap any of these benefits.) > > > Which bring us to the code side. First i know it sucks but your spacing is > > broken, please check the whole patch > > > > - nameToGID->removeInt(macGlyphNames[j]); > > - nameToGID->add(new GooString(macGlyphNames[j]), i); > > + nameToGID[macGlyphNames[j]] = i; > > > > i could fix it, but it's better (for me) if you do :D > > Definitely for me to fix. The main problem is that the spacing in those > files was already broken and I am unsure whether I should fix the > surrounding spacing even if I am not touching it otherwise. Due to that, > there sometimes is no correct way in the current patch. If you do not > say otherwise, I will try to make at least the touched blocks of code > consistent. Are you sure the spacing is broken? I'd say it's just xpdf weird spacing rules of using 2 soft spaces and then hard tabs at 8. > > > Now onto the code, > > > > const auto emplaceRangeMap = [&](const char* encodingName, GBool > > unicodeOut,> > > UnicodeMapRange* ranges, int len) { > > > > residentUnicodeMaps.emplace( > > > > std::piecewise_construct, > > std::forward_as_tuple(encodingName), > > std::forward_as_tuple(encodingName, unicodeOut, ranges, len) > > > > ); > > > > }; > > > > It's the first time in my life i see std::piecewise_construct and > > std::forward_as_tuple, yes that probably makes me a bad C++ developer, but > > given there's lots of us around, it doesn't make me happy now i don't > > understand what the code does. > > The problem is that most internal Poppler types lack proper construction > and assignment oper
Re: [poppler] [RFC] Replace GooHash by std::unordered_map
El dimecres, 28 de febrer de 2018, a les 11:02:11 CET, Oliver Sander va escriure: > On 21.02.2018 00:31, Albert Astals Cid wrote: > > I'd really appreciate other people's comments on this. > > I am generally in favour of changes like this, provided there is no > significant increase in run-time or binary code size. I think such changes > do decrease long-term maintenance costs, and they make it easier for new > developers to understand the code. The current GooHash implementation may > be a reasonable implementation of a hash table (I didn't check), but it is > completely undocumented, and if I ever need to know what it does, I will > have to read the code. On the other hand, while that emplaceRangeMap lambda > looks strange to me on first sight too, all this modern C++ stuff is well > described in books, online articles and blog posts. > BTW: I always thought that changes like this couldn't be made because it > makes backporting patches from xpdf more difficult/impossible. Is this not > an issue anymore? we never got around to merge xpdf 3.04 changes and given the relatively big change i did regarding Object i guess we can say that porting xpdf code isn't really a big motivation anymore for not making improving changes. Cheers, Albert > > Best, > Oliver ___ poppler mailing list poppler@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/poppler
Re: [poppler] [RFC] Replace GooHash by std::unordered_map
On 21.02.2018 00:31, Albert Astals Cid wrote: > > I'd really appreciate other people's comments on this. I am generally in favour of changes like this, provided there is no significant increase in run-time or binary code size. I think such changes do decrease long-term maintenance costs, and they make it easier for new developers to understand the code. The current GooHash implementation may be a reasonable implementation of a hash table (I didn't check), but it is completely undocumented, and if I ever need to know what it does, I will have to read the code. On the other hand, while that emplaceRangeMap lambda looks strange to me on first sight too, all this modern C++ stuff is well described in books, online articles and blog posts. BTW: I always thought that changes like this couldn't be made because it makes backporting patches from xpdf more difficult/impossible. Is this not an issue anymore? Best, Oliver smime.p7s Description: S/MIME Cryptographic Signature ___ poppler mailing list poppler@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/poppler
Re: [poppler] [RFC] Replace GooHash by std::unordered_map
Hello again, attached is an updated version of the patch that: * Improves the conversions between GooString and std::string. * Fixes the spacing (And yes, the spacing wasn't broken. I was broken. I just can't get my head around 2 spaces indentation with an 8 spaces tab. Sorry for that.) * Extends UnicodeMap to become a move-only type so that the initialization can stay basically the same as when using GooHash storing pointers to UnicodeMap. Best regards, ADam. Am 21.02.2018 um 07:13 schrieb Adam Reichold: > Hello again, > > Am 21.02.2018 um 00:31 schrieb Albert Astals Cid: >> El dimarts, 20 de febrer de 2018, a les 8:58:24 CET, Adam Reichold va >> escriure: >>> Hello again, >>> >>> Am 18.02.2018 um 23:23 schrieb Adam Reichold: Am 18.02.2018 um 23:08 schrieb Albert Astals Cid: > El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam Reichold va > > escriure: >> Hello, >> >> the attached patch replaced Poppler's internal hash table implementation >> GooHash by std::unordered_map found in the C++ standard library since >> C++11. This continues Poppler's slow drift towards standard library >> containers and removes one of the home-grown data structures with the >> main goals of reducing code size and improving the long term >> maintainability of the code base. > > Do you have any benchmarks on "rendering" speed and code size? Sorry, not at hand. I will try to prepare them during the week. >>> >>> I did run Splash rendering benchmarks of this branch against master with >>> the result of rendering the circa 2400 documents of the TeXLive >>> documentation present on my machine being: >> >> I'm wondering if those 2400 documents are diverse enough, which they may not >> be given they are all coming from "the same place". > > They seem pretty diverse w.r.t. content, some being text heavy and some > graphics rich. But I guess they are definitely not diverse technically > as all are prepared using TeX itself. > > The main problem on my side is that I have failed to find my DVD copy of > the Poppler regtest document collection until now. :-\ In any case, I am > reasonably confident on the zero sum result since GooHash does not seem > to live in any performance critical code path. > >>> >>> Cumulative run time: >>> Result: 90.95 min ∓ 1.1 % >>> Reference: 91.57 min ∓ 1.2 % >>> Deviation: -0.0 % >>> Cumulative memory usage: >>> Result: 37.2 MB ∓ 0.7 % >>> Reference: 37.0 MB ∓ 0.7 % >>> Deviation: +0.0 % >>> >>> (Where result is this patch and the reference is master.) (The >>> measurement was taken using the perftest script which I proposed here >>> some time ago for which I'll attach the patch again for reproduceability.) >> >> Did you really attach this before? i really don't remember it :D > > This was a very long-winded thread ending on 2016-01-01 centered around > the regtest framework. It went from custom driver using QTest's > benchmark functionality to custom backend in the regtest framework to > separate Python script. The script still has problems to reliably > measure really short things like running "pdftotext" for which a custom > C++ driver might be needed, but for long-running stuff like "pdftoppm", > the results seem reasonable. > >> Anyhow it seems the change is mostly a zero-sum runtime wise. > > Indeed. (Although thinking really really long term, I think a Poppler > that is using std::vector, std::string and friends everywhere, could > loose a lot of distributed fat in the form of a lot of memory > indirections and benefit from the highly optimized standard library. But > it just is not a single patch thing to reap any of these benefits.) > >> Which bring us to the code side. First i know it sucks but your spacing is >> broken, please check the whole patch >> >> -nameToGID->removeInt(macGlyphNames[j]); >> -nameToGID->add(new GooString(macGlyphNames[j]), i); >> + nameToGID[macGlyphNames[j]] = i; >> >> i could fix it, but it's better (for me) if you do :D > > Definitely for me to fix. The main problem is that the spacing in those > files was already broken and I am unsure whether I should fix the > surrounding spacing even if I am not touching it otherwise. Due to that, > there sometimes is no correct way in the current patch. If you do not > say otherwise, I will try to make at least the touched blocks of code > consistent. > >> Now onto the code, >> >> const auto emplaceRangeMap = [&](const char* encodingName, GBool >> unicodeOut, >> UnicodeMapRange* ranges, int len) { >> residentUnicodeMaps.emplace( >> std::piecewise_construct, >> std::forward_as_tuple(encodingName), >> std::forward_as_tuple(encodingName, unicodeOut, ranges, len) >> ); >> }; >> >> It's the first time in my life i see std::piecewise_construct and >> std::forward_as_tuple, yes that probably makes me a bad C++ developer, but >> given there's lots of
Re: [poppler] [RFC] Replace GooHash by std::unordered_map
Hello again, Am 21.02.2018 um 00:31 schrieb Albert Astals Cid: > El dimarts, 20 de febrer de 2018, a les 8:58:24 CET, Adam Reichold va > escriure: >> Hello again, >> >> Am 18.02.2018 um 23:23 schrieb Adam Reichold: >>> Am 18.02.2018 um 23:08 schrieb Albert Astals Cid: El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam Reichold va escriure: > Hello, > > the attached patch replaced Poppler's internal hash table implementation > GooHash by std::unordered_map found in the C++ standard library since > C++11. This continues Poppler's slow drift towards standard library > containers and removes one of the home-grown data structures with the > main goals of reducing code size and improving the long term > maintainability of the code base. Do you have any benchmarks on "rendering" speed and code size? >>> >>> Sorry, not at hand. I will try to prepare them during the week. >> >> I did run Splash rendering benchmarks of this branch against master with >> the result of rendering the circa 2400 documents of the TeXLive >> documentation present on my machine being: > > I'm wondering if those 2400 documents are diverse enough, which they may not > be given they are all coming from "the same place". They seem pretty diverse w.r.t. content, some being text heavy and some graphics rich. But I guess they are definitely not diverse technically as all are prepared using TeX itself. The main problem on my side is that I have failed to find my DVD copy of the Poppler regtest document collection until now. :-\ In any case, I am reasonably confident on the zero sum result since GooHash does not seem to live in any performance critical code path. >> >> Cumulative run time: >> Result: 90.95 min ∓ 1.1 % >> Reference: 91.57 min ∓ 1.2 % >> Deviation: -0.0 % >> Cumulative memory usage: >> Result: 37.2 MB ∓ 0.7 % >> Reference: 37.0 MB ∓ 0.7 % >> Deviation: +0.0 % >> >> (Where result is this patch and the reference is master.) (The >> measurement was taken using the perftest script which I proposed here >> some time ago for which I'll attach the patch again for reproduceability.) > > Did you really attach this before? i really don't remember it :D This was a very long-winded thread ending on 2016-01-01 centered around the regtest framework. It went from custom driver using QTest's benchmark functionality to custom backend in the regtest framework to separate Python script. The script still has problems to reliably measure really short things like running "pdftotext" for which a custom C++ driver might be needed, but for long-running stuff like "pdftoppm", the results seem reasonable. > Anyhow it seems the change is mostly a zero-sum runtime wise. Indeed. (Although thinking really really long term, I think a Poppler that is using std::vector, std::string and friends everywhere, could loose a lot of distributed fat in the form of a lot of memory indirections and benefit from the highly optimized standard library. But it just is not a single patch thing to reap any of these benefits.) > Which bring us to the code side. First i know it sucks but your spacing is > broken, please check the whole patch > > - nameToGID->removeInt(macGlyphNames[j]); > - nameToGID->add(new GooString(macGlyphNames[j]), i); > + nameToGID[macGlyphNames[j]] = i; > > i could fix it, but it's better (for me) if you do :D Definitely for me to fix. The main problem is that the spacing in those files was already broken and I am unsure whether I should fix the surrounding spacing even if I am not touching it otherwise. Due to that, there sometimes is no correct way in the current patch. If you do not say otherwise, I will try to make at least the touched blocks of code consistent. > Now onto the code, > > const auto emplaceRangeMap = [&](const char* encodingName, GBool > unicodeOut, > UnicodeMapRange* ranges, int len) { > residentUnicodeMaps.emplace( > std::piecewise_construct, > std::forward_as_tuple(encodingName), > std::forward_as_tuple(encodingName, unicodeOut, ranges, len) > ); > }; > > It's the first time in my life i see std::piecewise_construct and > std::forward_as_tuple, yes that probably makes me a bad C++ developer, but > given there's lots of us around, it doesn't make me happy now i don't > understand what the code does. The problem is that most internal Poppler types lack proper construction and assignment operators, hence I need to work harder to construct those UnicodeMap instances in-place inside the map (GooHash just stored a pointer to it, so there was no problem.) The whole piecewise_construct and forward_as_tuple dance just ensures, that those parameters to emplace are properly grouped before being unpacked to become the parameters of std::string and UnicodeMap construction again. If UnicodeMap was movable, this would probably look like residentUnicodeMaps.emplace(
Re: [poppler] [RFC] Replace GooHash by std::unordered_map
El dimarts, 20 de febrer de 2018, a les 8:58:24 CET, Adam Reichold va escriure: > Hello again, > > Am 18.02.2018 um 23:23 schrieb Adam Reichold: > > Am 18.02.2018 um 23:08 schrieb Albert Astals Cid: > >> El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam Reichold va > >> > >> escriure: > >>> Hello, > >>> > >>> the attached patch replaced Poppler's internal hash table implementation > >>> GooHash by std::unordered_map found in the C++ standard library since > >>> C++11. This continues Poppler's slow drift towards standard library > >>> containers and removes one of the home-grown data structures with the > >>> main goals of reducing code size and improving the long term > >>> maintainability of the code base. > >> > >> Do you have any benchmarks on "rendering" speed and code size? > > > > Sorry, not at hand. I will try to prepare them during the week. > > I did run Splash rendering benchmarks of this branch against master with > the result of rendering the circa 2400 documents of the TeXLive > documentation present on my machine being: I'm wondering if those 2400 documents are diverse enough, which they may not be given they are all coming from "the same place". > > Cumulative run time: > Result: 90.95 min ∓ 1.1 % > Reference: 91.57 min ∓ 1.2 % > Deviation: -0.0 % > Cumulative memory usage: > Result: 37.2 MB ∓ 0.7 % > Reference: 37.0 MB ∓ 0.7 % > Deviation: +0.0 % > > (Where result is this patch and the reference is master.) (The > measurement was taken using the perftest script which I proposed here > some time ago for which I'll attach the patch again for reproduceability.) Did you really attach this before? i really don't remember it :D Anyhow it seems the change is mostly a zero-sum runtime wise. Which bring us to the code side. First i know it sucks but your spacing is broken, please check the whole patch - nameToGID->removeInt(macGlyphNames[j]); - nameToGID->add(new GooString(macGlyphNames[j]), i); + nameToGID[macGlyphNames[j]] = i; i could fix it, but it's better (for me) if you do :D Now onto the code, const auto emplaceRangeMap = [&](const char* encodingName, GBool unicodeOut, UnicodeMapRange* ranges, int len) { residentUnicodeMaps.emplace( std::piecewise_construct, std::forward_as_tuple(encodingName), std::forward_as_tuple(encodingName, unicodeOut, ranges, len) ); }; It's the first time in my life i see std::piecewise_construct and std::forward_as_tuple, yes that probably makes me a bad C++ developer, but given there's lots of us around, it doesn't make me happy now i don't understand what the code does. Then there's the benefit/risk ratio. The code using GooHash is mostly rocksolid and we haven't probably touched any line in it for a long time and we have probably neither written new code that uses GooHash. In that regard we risk breaking working code. The benefit is not more speed nor less memory usage as your measurements show. Mostly the benefit is "removing code from maintainership", which i agree is a good thing, but as mentioned before it's some code "we mostly ignore", so it's not that much of a good thing. So i'm hesitant of what to do. It really sounds like a nice thing to not have custom structures, but on the other hand it's not like they get much in the way of coding. I'd really appreciate other people's comments on this. Cheers, Albert > > I'll also attach the detailed comparison, but the gist seems to be that > if there are significant changes, the run time is reduced but the memory > usage is increased in the majority of cases. But this does not seem to > show up in the cumulative results. > > Best regards, Adam. > > P.S.: One could try to improve the memory usage by tuning the load > factor or calling shrink_to_fit where appropriate. Would you like me to > try to do this? > > P.P.S.: One obvious area for improvement would be better > interoperability between GooString and std::string, i.e. not converting > them as C strings so that the length information does not need to be > recomputed. I will try to prepare this as a separate patch on top of > this one or should I include this here? > > Best regards, Adam. > > > Concerning code size, a release build of libpoppler.so goes from > > > >textdata bss dec hex filename > > > > 2464034 288852 360 2753246 2a02de libpoppler.so.73.0.0 > > > > for the current master to > > > >textdata bss dec hex filename > > > > 2482129 288756 360 2771245 2a492d libpoppler.so.73.0.0 > > > > with the patch applied, i.e. a 0.65% increase in binary size. > > > > > > Please note that in my original message, I was referring only to source > > code size, i.e. > > > > git diff --stat master...remove-goo-hash > > > > 18 files changed, 168 insertions(+), 803 deletions(-) > > > >> Cheers, > >> > >> Albert > > > > Best regards, Adam. > > > >>>
Re: [poppler] [RFC] Replace GooHash by std::unordered_map
Hello again again, I obviously forgot the attachments to the previous mail. Doh... Regards, Adam. Am 20.02.2018 um 08:58 schrieb Adam Reichold: > Hello again, > > Am 18.02.2018 um 23:23 schrieb Adam Reichold: >> >> >> Am 18.02.2018 um 23:08 schrieb Albert Astals Cid: >>> El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam Reichold va >>> escriure: Hello, the attached patch replaced Poppler's internal hash table implementation GooHash by std::unordered_map found in the C++ standard library since C++11. This continues Poppler's slow drift towards standard library containers and removes one of the home-grown data structures with the main goals of reducing code size and improving the long term maintainability of the code base. >>> >>> Do you have any benchmarks on "rendering" speed and code size? >> >> Sorry, not at hand. I will try to prepare them during the week. > > I did run Splash rendering benchmarks of this branch against master with > the result of rendering the circa 2400 documents of the TeXLive > documentation present on my machine being: > > Cumulative run time: > Result: 90.95 min ∓ 1.1 % > Reference: 91.57 min ∓ 1.2 % > Deviation: -0.0 % > Cumulative memory usage: > Result: 37.2 MB ∓ 0.7 % > Reference: 37.0 MB ∓ 0.7 % > Deviation: +0.0 % > > (Where result is this patch and the reference is master.) (The > measurement was taken using the perftest script which I proposed here > some time ago for which I'll attach the patch again for reproduceability.) > > I'll also attach the detailed comparison, but the gist seems to be that > if there are significant changes, the run time is reduced but the memory > usage is increased in the majority of cases. But this does not seem to > show up in the cumulative results. > > Best regards, Adam. > > P.S.: One could try to improve the memory usage by tuning the load > factor or calling shrink_to_fit where appropriate. Would you like me to > try to do this? > > P.P.S.: One obvious area for improvement would be better > interoperability between GooString and std::string, i.e. not converting > them as C strings so that the length information does not need to be > recomputed. I will try to prepare this as a separate patch on top of > this one or should I include this here? > > Best regards, Adam. > >> Concerning code size, a release build of libpoppler.so goes from >> >>textdata bss dec hex filename >> 2464034 288852 360 2753246 2a02de libpoppler.so.73.0.0 >> >> for the current master to >> >>textdata bss dec hex filename >> 2482129 288756 360 2771245 2a492d libpoppler.so.73.0.0 >> >> with the patch applied, i.e. a 0.65% increase in binary size. >> >> >> Please note that in my original message, I was referring only to source >> code size, i.e. >> >> git diff --stat master...remove-goo-hash >> 18 files changed, 168 insertions(+), 803 deletions(-) >> >>> Cheers, >>> Albert >> >> Best regards, Adam. >> Best regards, Adam. >>> >>> >>> >>> >>> ___ >>> poppler mailing list >>> poppler@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/poppler >>> >> >> >> >> ___ >> poppler mailing list >> poppler@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/poppler >> > > > > ___ > poppler mailing list > poppler@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/poppler > From f080d40f1dba236a2d317fc7c9d9c77b5db8654c Mon Sep 17 00:00:00 2001 From: Adam Reichold Date: Tue, 20 Feb 2018 08:41:04 +0100 Subject: [PATCH] Add simple Python script to measure and comprare rendering performance of Poppler builds. --- perftest/compare.py | 94 +++ perftest/measure.py | 139 ++ perftest/poppler-perftest | 40 + perftest/util.py | 29 ++ 4 files changed, 302 insertions(+) create mode 100644 perftest/compare.py create mode 100644 perftest/measure.py create mode 100755 perftest/poppler-perftest create mode 100644 perftest/util.py diff --git a/perftest/compare.py b/perftest/compare.py new file mode 100644 index ..c19f1cab --- /dev/null +++ b/perftest/compare.py @@ -0,0 +1,94 @@ +import pickle +import zlib + +from util import reference, reldev, maxabs + +def collect_stats3(stats, entry): +if stats is None: +stats = (0, 0) + +sum, acc = stats +mean, stdev = entry +stats = (sum + mean, acc + abs(stdev / mean)) + +return stats + +def collect_stats2(stats, entry): +if stats is None: +stats = { 'run_time': None, 'memory_usage': None } + +stats['run_time'] = collect_stats3(stats['run_time'], entry['run_time']) +stats['memory_usage'] = collect_stats3(stats['memory
Re: [poppler] [RFC] Replace GooHash by std::unordered_map
Hello again, Am 18.02.2018 um 23:23 schrieb Adam Reichold: > > > Am 18.02.2018 um 23:08 schrieb Albert Astals Cid: >> El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam Reichold va >> escriure: >>> Hello, >>> >>> the attached patch replaced Poppler's internal hash table implementation >>> GooHash by std::unordered_map found in the C++ standard library since >>> C++11. This continues Poppler's slow drift towards standard library >>> containers and removes one of the home-grown data structures with the >>> main goals of reducing code size and improving the long term >>> maintainability of the code base. >> >> Do you have any benchmarks on "rendering" speed and code size? > > Sorry, not at hand. I will try to prepare them during the week. I did run Splash rendering benchmarks of this branch against master with the result of rendering the circa 2400 documents of the TeXLive documentation present on my machine being: Cumulative run time: Result: 90.95 min ∓ 1.1 % Reference: 91.57 min ∓ 1.2 % Deviation: -0.0 % Cumulative memory usage: Result: 37.2 MB ∓ 0.7 % Reference: 37.0 MB ∓ 0.7 % Deviation: +0.0 % (Where result is this patch and the reference is master.) (The measurement was taken using the perftest script which I proposed here some time ago for which I'll attach the patch again for reproduceability.) I'll also attach the detailed comparison, but the gist seems to be that if there are significant changes, the run time is reduced but the memory usage is increased in the majority of cases. But this does not seem to show up in the cumulative results. Best regards, Adam. P.S.: One could try to improve the memory usage by tuning the load factor or calling shrink_to_fit where appropriate. Would you like me to try to do this? P.P.S.: One obvious area for improvement would be better interoperability between GooString and std::string, i.e. not converting them as C strings so that the length information does not need to be recomputed. I will try to prepare this as a separate patch on top of this one or should I include this here? Best regards, Adam. > Concerning code size, a release build of libpoppler.so goes from > >textdata bss dec hex filename > 2464034 288852 360 2753246 2a02de libpoppler.so.73.0.0 > > for the current master to > >textdata bss dec hex filename > 2482129 288756 360 2771245 2a492d libpoppler.so.73.0.0 > > with the patch applied, i.e. a 0.65% increase in binary size. > > > Please note that in my original message, I was referring only to source > code size, i.e. > > git diff --stat master...remove-goo-hash > 18 files changed, 168 insertions(+), 803 deletions(-) > >> Cheers, >> Albert > > Best regards, Adam. > >>> >>> Best regards, Adam. >> >> >> >> >> ___ >> poppler mailing list >> poppler@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/poppler >> > > > > ___ > poppler mailing list > poppler@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/poppler > signature.asc Description: OpenPGP digital signature ___ poppler mailing list poppler@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/poppler
Re: [poppler] [RFC] Replace GooHash by std::unordered_map
Am 18.02.2018 um 23:08 schrieb Albert Astals Cid: > El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam Reichold va > escriure: >> Hello, >> >> the attached patch replaced Poppler's internal hash table implementation >> GooHash by std::unordered_map found in the C++ standard library since >> C++11. This continues Poppler's slow drift towards standard library >> containers and removes one of the home-grown data structures with the >> main goals of reducing code size and improving the long term >> maintainability of the code base. > > Do you have any benchmarks on "rendering" speed and code size? Sorry, not at hand. I will try to prepare them during the week. Concerning code size, a release build of libpoppler.so goes from textdata bss dec hex filename 2464034 288852 360 2753246 2a02de libpoppler.so.73.0.0 for the current master to textdata bss dec hex filename 2482129 288756 360 2771245 2a492d libpoppler.so.73.0.0 with the patch applied, i.e. a 0.65% increase in binary size. Please note that in my original message, I was referring only to source code size, i.e. git diff --stat master...remove-goo-hash 18 files changed, 168 insertions(+), 803 deletions(-) > Cheers, > Albert Best regards, Adam. >> >> Best regards, Adam. > > > > > ___ > poppler mailing list > poppler@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/poppler > signature.asc Description: OpenPGP digital signature ___ poppler mailing list poppler@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/poppler
Re: [poppler] [RFC] Replace GooHash by std::unordered_map
El diumenge, 18 de febrer de 2018, a les 16:55:37 CET, Adam Reichold va escriure: > Hello, > > the attached patch replaced Poppler's internal hash table implementation > GooHash by std::unordered_map found in the C++ standard library since > C++11. This continues Poppler's slow drift towards standard library > containers and removes one of the home-grown data structures with the > main goals of reducing code size and improving the long term > maintainability of the code base. Do you have any benchmarks on "rendering" speed and code size? Cheers, Albert > > Best regards, Adam. ___ poppler mailing list poppler@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/poppler