Along similar lines... (sorry for hijacking thread)

I assume that this is even more applicable for key choice given the way keys
participate in indexes?  I have been using UUID, but it is way overkill for
my needs.  What are others using?  Is there convenient way of doing (e.g.) 8
characters strings?




On Fri, Mar 12, 2010 at 9:15 PM, Kay Kay <kaykay.uni...@gmail.com> wrote:

> Some of our current experiences go along similiar lines , where we saw a
> ~20-30% of ram savings by using abbreviations in the key space.
>
> But the biggest advantage came actually with defining the right schema and
> column families, as per the query pattern of the jobs. We keep the column
> families no more than 5 and have relatively *thin* columns , but revisit the
> schema with more tables , if that gets stretched , as applicable of course.
>
>
>
>
> On 3/12/10 12:02 PM, Lars Francke wrote:
>
>> Will I save a lot of space (especially if I have many small columns)?
>>>
>>>
>> I don't have any hard numbers for you but I tested it and I remember
>> that on a dataset of about 10-20 GB I could save about 200-500 MB
>> (this was with compression enabled) just by not using descriptive
>> sting qualifiers that weren't data by itself. A lot of small columns
>> for me too (mostly counters). I use a simple mapping of short byte
>> arrays to strings so that it is still very easy to use in the client.
>>
>> I asked that very same question a few months ago on IRC but I think
>> nobody answered so I'd be interested in what others do as well.
>>
>> Cheers,
>> Lars
>>
>>
>
>

Reply via email to