Dear Vincent, thanks for your reply.

To sum up my reply: Should I try to replace bitsets completely by 
RaoringBitmap and see how it performs? (I need some help with the 
benchmarking, as I don't know good benchmarks for other use cases.)


to 1. Okay. I wasn't aware of that. I just copied the behaviour of 
bitset.pxi/pyx/pxd. I didn't know that this kind of usage is deprecated.

to 2. I was looking at roaring bitmaps for a while now 
https://github.com/Ezibenroc/PyRoaringBitMap
They claim it's a simple pip install. So it might be suitable as a standard 
package in sage.

It only supports values up to 2**32. But seriously, our current 
implementation requests 512 MB continous memory in this case. So this 
performs awful anyway.

I was reluctant to use it, because I thought, you cannot avoid allocating 
memory over and over.
Only after you and Travis insisted to give up our private implemtation, I 
looked up, if they have all the features I want.
In fact the only thing I need is this pull request 
https://github.com/Ezibenroc/PyRoaringBitMap/pull/59 for exposing a 
definition.
(And we can always include this definition ourselves from the header.)

With this to perform `dest = A & B`, I need to copy `A` first and to do the 
operation inplace then. This is a bit slower than optimal, but avoids 
allocation, if possible.
So after a couple of initial reallocs there shouldn't be any more memory 
allocations.
The optimized subset check is much more important to me, which appears to 
be at least as good as my implementation.

Depending on what you and other people think of, I can try to make use of 
RoaringBitMap or any other library that likely performs well.
https://trac.sagemath.org/ticket/30549 is still a good step towards this, 
as it makes it a lot easier to just replace `bitset_t` by something else.

I could try to create a branch that completely rips out bitset.pxi and 
replaces it by a different implementation (with same names as to not change 
anything else). Then many people could try to come up with benchmarks in 
different parts of sage.

E.g. this might make dense graphs much better in comparison.

to 3. I didn't try many things yet. For my use case the only thing I need 
(at the moment) is fast intersection and fast subset check. Dense bitsets 
perform pretty good with that as it is hard to beat 256 bits per CPU-step 
(whatever this means). I could perform a bit better however,
by keeping an array of the significant chunks. I wanted to try different 
things. This is the main reason for https://trac.sagemath.org/ticket/30549, 
because my design choice is awful, if you want to change one implemtation 
detail. With this ticket I can pretty much tell the face-structure that it 
consists of a roaring bitmap of atoms and coatoms and adapt a few basic 
functions to the name scheme of that (I also need to take care of freeing 
the memory, but hey, that's just writing a couple of __dealloc__ methods).



vdelecroix schrieb am Mittwoch, 16. September 2020 um 14:32:28 UTC+2:

> Dear Jonathan,
>
> 0. It would be good to benefit from extended CPU instructions
> for having faster bitsets.
>
> 1. As mentioned in the Cython documentation [1], pxi files are
> not the advised way to deal with Cython code. inline functions
> are perfectly usable when written in pxd headers. See for
> example the header file sage/libs/linbox/conversion.pxd which
> has no pyx implementation since everything is inlined.
>
> 2. Many people care about efficient bitsets and I doubt that
> there is no finely tuned library around. I would rather try
> to find an actively maintained open source project focused
> on bitsets rather making it part of Sage internals.
>
> 3. Depending on the application you have in mind you might
> want to distinguish between sparse and dense bitsets. Sparse
> could be implemented as sets via has tables or binary trees,
> etc
>
> [1] 
>
> https://cython.readthedocs.io/en/latest/src/userguide/language_basics.html?highlight=pxi#cython-file-types
>
> Le 16/09/2020 à 11:59, 'jonatha...@googlemail.com' via sage-devel a 
> écrit :
> > Dear all, I want to redesign the bitset structure of combinatorial
> > polyhedron and move it to `data_structures/bitset.pxi`. This includes 
> some
> > changes to bitset.pxi. Please comment, whether the proposed design 
> changes
> > on the ticket are ok. Mostly they are the following:
> > 
> > 1. Define most of the functions in bitset.pxi for fuzed types.
> > 
> > 2. Move functions that can be optimized by intrinsics to a seperate file.
> > 
> > 3. (Not yet done, but also important.) Optimize some functions in
> > bitset.pxi by intrinsics. This includes an overalignment condition. One
> > disadvantage of overalignment is that it makes realloc much more
> > complicated. Small bitsets also need more memory.
> > 
> > 4. Indirect typecast of functions in `bitset.pxi` will no longer be
> > possible. E.g. you cannot call `bitset_add` with signature `(bitset_t,
> > PyObject)`, but `(bitset_t, int)` or `(bitset_t, size_t)` or similar are
> > ok. See https://trac.sagemath.org/ticket/30572.
> > 
> > Eventually, this should go into smaller tickets. But until then it would 
> be
> > good to know, if the general direction is acceptable or which part is 
> not.
> > 
> > See https://trac.sagemath.org/ticket/30549 for more detail.
> > 
>

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sage-devel+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sage-devel/f8f74e72-64a5-48f6-8838-e0a7a71662dan%40googlegroups.com.

Reply via email to