Hi,
I hope I'm not flooding with this topic. I've did some research and
couldn't find anything relevant on this topic.
My team is developing a large scale CAD application that has a large
memory footprint, requiring strong machines to run.
The application uses pointers massively.
During one of our optimization cycles I noticed that since most
objects are aligned on 8-byte boundaries, it's possible to drop the
lower 3 bits of the address and reconstruct the full address later.
Basically, as long as the application is in the 32G range (2^32*2^3),
it's possible to represent aligned pointers using an unsigned int - 4
bytes.
Seeing as obtaining the address from the compressed representation
only costs a left shift (which is very cheap), this trick will have an
insignificant impact in highly polymorphic code (and perhaps even in
general - I am not sure how well the compiler will optimize multiple
calls to the same address using this mechanism).
After giving this some thought, I realized 2 points:
1. The virtual function table pointer cannot be compressed without
handling the compiler.
2. Spreading this (if the idea actually makes sense ;)) will also be
easier through the compiler.
3. 32G of addressable range may not start at 0 and may be dispersed
(although in reality sbrk usually starts a bit above 0 and grows
continuously), so this may require some base address shifting/loader
changes (??)
Would be nice to get your opinions on it.
If it makes any sense and you can give me some basic hints, I may be
able to tailor it into the gcc development branch.
In case I was unable to make sense above I'm attaching a simple
implementation I did just to check the concept. :)
Thanks for your time,
Yair
---------------------------------------------------------------------------------------------------------------------------------
#ifndef __ALIGNED_PTR_H__
#define __ALIGNED_PTR_H__
// An memory-efficient pointer implementation.
//
// Generally, pointers enable access at byte-level granularity.
// 64-bit pointers are useful to enable access to 2^64 unique byte addresses,
// which is useful for applications with a large memory foot-print.
//
// When byte-level granularity is not needed (example: some allocators return
// addresses aligned to sizeof(void*)), it is possible to address >4GB using
// a 32-bit value.
//
// The implementation below assumes the allocator's alignment is
2^ALIGNMENT_BITS,
// and thus access to pointers only requires shifting the stored
unsigned integer left,
// which is faster than multiplication which would otherwise be necessary.
//
// For example, if ALIGNMENT_BITS = 3, the actual alignment is 2^3 (8),
// which provides access to addresses up to 2^35 (32GB).
//
// Note that if unaligned addresses, or addresses farther than the
allowed limit,
// are sent to aligned_ptr it will assert, and so while the user will have to
// rerun with full 64-bit pointers, there is no risk of memory corruption.
//
// Yair Lifshitz, June 2008 :)
#include "assertions.h"
extern unsigned long __aligned_ptr_malloc_base__;
template <class T, unsigned int ALIGNMENT_BITS = 3>
class aligned_ptr
{
public:
typedef aligned_ptr<T, ALIGNMENT_BITS> self_type;
aligned_ptr(): m_ptr(0) {}
aligned_ptr(T val)
{
assert(is_aligned_ptr(val));
m_ptr = remove_base(val) >> ALIGNMENT_BITS;
}
T operator-> () {
return ptr();
}
const T operator->() const {
return ptr();
}
T ptr() {
unsigned long ptr = m_ptr << ALIGNMENT_BITS;
return reinterpret_cast<T>(ptr);
}
T ptr() const {
unsigned long ptr = m_ptr << ALIGNMENT_BITS;
return reinterpret_cast<T>(ptr);
}
operator T () const {return ptr();}
self_type& operator= (T val)
{
*this = self_type(val);
return *this;
}
static bool is_aligned_ptr(T val)
{
unsigned long val_reffed_to_base = remove_base(val);
unsigned long ALIGNMENT_MASK = (1 << ALIGNMENT_BITS) - 1;
if ((val_reffed_to_base & ALIGNMENT_MASK) != 0) return false;
if (val_reffed_to_base >= ((unsigned long)1 << (sizeof(unsigned
int)*8))) return false;
return true;
}
private:
static unsigned long remove_base(const T val)
{
return (unsigned long)val - __aligned_ptr_malloc_base__;
}
unsigned int m_ptr;
};
#endif