Re: Rust extensions: the next step
On Thu, 18 Oct 2018 19:15:06 +0200, Georges Racinet wrote: > On 10/18/2018 04:09 PM, Yuya Nishihara wrote: > > I expect "rustext" (or its upper layer) to be a shim over Rust-based modules > > and cexts. So if you do policy.importmod('parsers'), it will return > > cext.parsers, whereas policy.importmod('ancestor') will return > > rustext.ancestor, > > though I have no idea if there will be cext/pure.ancestor. > Yes, it's quite possible to add a new module policy this way. After all, > from mercurial.policy, it behaves in the same way as the cext package > does and the fact that we have a single shared library instead of > several ones is an implementation detail, hidden by Python's import > machinery. > > But this opens another, longer term, question: currently what I have in > mercurial.rustext.ancestor has only a fragment of what > mercurial.ancestor provides. Therefore to have mercurial.policy handle > it, we'll need either to take such partial cases into account, or decide > to translate the whole Python module in Rust. For the time being, I'm > simply doing an import and catch the error to fallback to the Python > version. That could be handled by policy._modredirects, e.g. _modredirects = { ('rustext', 'parsers'): ('cext', 'parsers'), ('cext', 'ancestor'): ('pure', 'ancestor'), # and move pure-python implementation to pure/ancestor.py } But yeah, it will depend on the number of redirects whether doing that will make things clearer or not. We can decide that later. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: Rust extensions: the next step
On 10/18/2018 12:22 PM, Gregory Szorc wrote: > One open item is full PyPy/cffi support. Ideally we’d only write the native > code interface once. But I think that means cffi everywhere and last I > looked, CPython into cffi was a bit slower compared to native extensions. I’m > willing to ignore cffi support for now (PyPy can use pure Python and rely on > JIT for faster execution). Maybe something like milksnake can help us here? > But I’m content with using the cpython crate to maintain a Rust-based > extension: that’s little different from what we do today and we shouldn’t let > perfect be the enemy of good. One nice thing with the cpython crate is that it's just using the CPython ABI. Therefore, there's nothing we can't do – only things that are less practical. It's not very intuitive, but it should be ok with a bit of practice. About cffi, if milksnake can automate it, that's an easy win to be added later (for now I still need to call in the C modules from the Rust code). In both cases, we need to tighten it with comprehensive integration tests. Cheers, -- Georges Racinet Anybox SAS, http://anybox.fr Téléphone: +33 6 51 32 07 27 GPG: B59E 22AB B842 CAED 77F7 7A7F C34F A519 33AB 0A35, sur serveurs publics ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: Rust extensions: the next step
On 10/18/2018 04:09 PM, Yuya Nishihara wrote: > On Thu, 18 Oct 2018 08:58:04 -0400, Josef 'Jeff' Sipek wrote: >> On Thu, Oct 18, 2018 at 12:22:16 +0200, Gregory Szorc wrote: >> ... >>> Something else we may want to consider is a single Python module exposing >>> the Rust code instead of N. Rust’s more aggressive cross function >>> compilation optimization could result in better performance if everything >>> is linked/exposed in a single shared library/module/extension. Maybe this >>> is what you are proposing? It is unclear if Rust code is linked into the >>> Python extension or loaded from a shared shared library. >> (Warning: I suck at python, aren't an expert on rust, but have more >> knowledge about ELF linking/loading/etc. than is healthy.) >> >> Isn't there also a distinction between code layout (separate crates) and the >> actual binary that cargo/rustc builds? IOW, the code could (and probably >> should) be nicely separated but rustc can combine all the crates' code into >> one big binary for loading into python. Since it would see all the code, it >> can do its fancy optimizations without impacting code readability. > IIUC, it is. Perhaps, the rustext is a single binary exporting multiple > submodules? Yes totally, it's exactly as Josef writes. To demonstrate, here's what I have : $ ls mercurial/*.so mercurial/rustext.so mercurial/zstd.so $ python Python 2.7.13 (default, Nov 24 2017, 17:33:09) [GCC 6.3.0 20170516] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from mercurial import rustext >>> dir(rustext) ['GraphError', '__doc__', '__file__', '__name__', '__package__', 'ancestors'] >>> from mercurial.rustext import ancestors >>> ancestors is rustext.ancestors True >>> dir(ancestors) ['AncestorsIterator', '__doc__', '__name__', '__package__'] So, in short, it's a single shared library that can hold a bunch of modules. The submodules are themselves initialized from the Rust code. Here's the definition of 'rustext' itself. It follows the pattern expected by Josef. $ tail rust/hg-cpython/src/lib.rs mod ancestors; // corresponds to src/ancestors.rs mod exceptions; py_module_initializer!(rustext, initrustext, PyInit_rustext, |py, m| { m.add(py, "__doc__", "Mercurial core concepts - Rust implementation")?; m.add(py, "ancestors", ancestors::init_module(py)?)?; m.add(py, "GraphError", py.get_type::())?; Ok(()) }); (Mark confirmed to me during the sprint that adding submodules on the fly was doable). Indeed I hope the Rust compiler can do lots of optimizations in that single shared library object. > > I expect "rustext" (or its upper layer) to be a shim over Rust-based modules > and cexts. So if you do policy.importmod('parsers'), it will return > cext.parsers, whereas policy.importmod('ancestor') will return > rustext.ancestor, > though I have no idea if there will be cext/pure.ancestor. Yes, it's quite possible to add a new module policy this way. After all, from mercurial.policy, it behaves in the same way as the cext package does and the fact that we have a single shared library instead of several ones is an implementation detail, hidden by Python's import machinery. But this opens another, longer term, question: currently what I have in mercurial.rustext.ancestor has only a fragment of what mercurial.ancestor provides. Therefore to have mercurial.policy handle it, we'll need either to take such partial cases into account, or decide to translate the whole Python module in Rust. For the time being, I'm simply doing an import and catch the error to fallback to the Python version. Regards, -- Georges Racinet Anybox SAS, http://anybox.fr Téléphone: +33 6 51 32 07 27 GPG: B59E 22AB B842 CAED 77F7 7A7F C34F A519 33AB 0A35, sur serveurs publics ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: Rust extensions: the next step
On Thu, 18 Oct 2018 08:58:04 -0400, Josef 'Jeff' Sipek wrote: > On Thu, Oct 18, 2018 at 12:22:16 +0200, Gregory Szorc wrote: > ... > > Something else we may want to consider is a single Python module exposing > > the Rust code instead of N. Rust’s more aggressive cross function > > compilation optimization could result in better performance if everything > > is linked/exposed in a single shared library/module/extension. Maybe this > > is what you are proposing? It is unclear if Rust code is linked into the > > Python extension or loaded from a shared shared library. > > (Warning: I suck at python, aren't an expert on rust, but have more > knowledge about ELF linking/loading/etc. than is healthy.) > > Isn't there also a distinction between code layout (separate crates) and the > actual binary that cargo/rustc builds? IOW, the code could (and probably > should) be nicely separated but rustc can combine all the crates' code into > one big binary for loading into python. Since it would see all the code, it > can do its fancy optimizations without impacting code readability. IIUC, it is. Perhaps, the rustext is a single binary exporting multiple submodules? I expect "rustext" (or its upper layer) to be a shim over Rust-based modules and cexts. So if you do policy.importmod('parsers'), it will return cext.parsers, whereas policy.importmod('ancestor') will return rustext.ancestor, though I have no idea if there will be cext/pure.ancestor. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: Rust extensions: the next step
On Thu, Oct 18, 2018 at 12:22:16 +0200, Gregory Szorc wrote: ... > Something else we may want to consider is a single Python module exposing > the Rust code instead of N. Rust’s more aggressive cross function > compilation optimization could result in better performance if everything > is linked/exposed in a single shared library/module/extension. Maybe this > is what you are proposing? It is unclear if Rust code is linked into the > Python extension or loaded from a shared shared library. (Warning: I suck at python, aren't an expert on rust, but have more knowledge about ELF linking/loading/etc. than is healthy.) Isn't there also a distinction between code layout (separate crates) and the actual binary that cargo/rustc builds? IOW, the code could (and probably should) be nicely separated but rustc can combine all the crates' code into one big binary for loading into python. Since it would see all the code, it can do its fancy optimizations without impacting code readability. Jeff. -- C is quirky, flawed, and an enormous success. - Dennis M. Ritchie. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: Rust extensions: the next step
> On Oct 17, 2018, at 18:45, Georges Racinet wrote: > > Hi all, > > first, many thanks for the Stockholm sprint, it was my first interaction > with the Mercurial community, and it's been very welcoming to me. > > I've been pursuing some experiments I started then to convert the Rust > bindings I've done in the patch series about ancestry iteration (now > landed) to a proper Python extension, using the cpython crate and Python > capsules. In short, it works. > > Early benchmarking shows that it's a few percent slower than the direct > bindings through C code, which I think is acceptable compared to the > other benefits (clearer integration, easier to generalise, no C code at > all). > > The end result is a unique shared library importable as > 'mercurial.rustext', which is itself made of several submodules, ie, one > can do: > >from mercurial.rustext.ancestor import AncestorsIterator This all sounds very reasonable to me. One open item is full PyPy/cffi support. Ideally we’d only write the native code interface once. But I think that means cffi everywhere and last I looked, CPython into cffi was a bit slower compared to native extensions. I’m willing to ignore cffi support for now (PyPy can use pure Python and rely on JIT for faster execution). Maybe something like milksnake can help us here? But I’m content with using the cpython crate to maintain a Rust-based extension: that’s little different from what we do today and we shouldn’t let perfect be the enemy of good. Something else we may want to consider is a single Python module exposing the Rust code instead of N. Rust’s more aggressive cross function compilation optimization could result in better performance if everything is linked/exposed in a single shared library/module/extension. Maybe this is what you are proposing? It is unclear if Rust code is linked into the Python extension or loaded from a shared shared library. > > It will take me some more time, though, to get that experiment into a > reviewable state (have to switch soon to other, unrelated, works) and > we're too close to the freeze anyway, but if someone wants to see it, I > can share it right away. > > Also, I could summarize some of these thoughts on the Oxidation wiki > page. Greg, are you okay with that ? Yes, please update the Oxidation wiki! I’ve been meaning to update it with results of discussions at the sprint. I’ve just been busy trying to finish my patches for 4.8... > > Regards, > > -- > Georges Racinet > Anybox SAS, http://anybox.fr > Téléphone: +33 6 51 32 07 27 > GPG: B59E 22AB B842 CAED 77F7 7A7F C34F A519 33AB 0A35, sur serveurs publics > > > ___ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel