Along similar lines... (sorry for hijacking thread) I assume that this is even more applicable for key choice given the way keys participate in indexes? I have been using UUID, but it is way overkill for my needs. What are others using? Is there convenient way of doing (e.g.) 8 characters strings?
On Fri, Mar 12, 2010 at 9:15 PM, Kay Kay <kaykay.uni...@gmail.com> wrote: > Some of our current experiences go along similiar lines , where we saw a > ~20-30% of ram savings by using abbreviations in the key space. > > But the biggest advantage came actually with defining the right schema and > column families, as per the query pattern of the jobs. We keep the column > families no more than 5 and have relatively *thin* columns , but revisit the > schema with more tables , if that gets stretched , as applicable of course. > > > > > On 3/12/10 12:02 PM, Lars Francke wrote: > >> Will I save a lot of space (especially if I have many small columns)? >>> >>> >> I don't have any hard numbers for you but I tested it and I remember >> that on a dataset of about 10-20 GB I could save about 200-500 MB >> (this was with compression enabled) just by not using descriptive >> sting qualifiers that weren't data by itself. A lot of small columns >> for me too (mostly counters). I use a simple mapping of short byte >> arrays to strings so that it is still very easy to use in the client. >> >> I asked that very same question a few months ago on IRC but I think >> nobody answered so I'd be interested in what others do as well. >> >> Cheers, >> Lars >> >> > >