Jeremy,

I was looking at the merge index code and I think the issue is that the
method by which segments are chosen for compaction may be very slow to get
to the larger segments.

1. Merge Index only schedules merging when a buffer is rolled-over to a
segment.  This means there will _always_ be at least one small segment in
the list of potential segments to merge.

2. To determine which segments to merge the mean of all segment sizes is
taken.

Over time the mean will skew left of the bulk of the distribution.  This
means most compactions will touch only recent, smaller segments and it will
take many iterations before one of the larger ones is included.  To help
verify this you could list all you segment sizes again and compare them
with the last run.  My guess is you'll have about the same number of
segments but the smallest one will have grown a bit.  It depends how much
unique data you re-indexed.

Depending on the distribution of your segment sizes I think it might be
possible to reclaim some of this space via repeated compaction calls.  It
turns out there is a way to manually invoke compaction.  It's just not easy
to get too.  Try running the following gist on one of your nodes
https://gist.github.com/3996286.  Try running merge_index:compact over and
over again and each time check for changes in the segment file sizes.


-Z

On Thu, Nov 1, 2012 at 11:25 AM, Jeremy Raymond <jeraym...@gmail.com> wrote:

> I reindexed a bunch of items that are still in the search index but no
> disk space was reclaimed. Is there any Riak console Erlang voodoo I
> can do to convince Riak Search that now would be a good time to
> compact the merge_index?
>
> --
> Jeremy
>
>
> On Tue, Oct 30, 2012 at 4:26 PM, Jeremy Raymond <jeraym...@gmail.com>
> wrote:
> > I've posted the list of buffer files [1] and segment files [2].
> >
> > The current data set I have in Riak is static, so no new items are
> > being written. So this looks like the reason as to why compaction
> > isn't happening since there is no time based trigger on the merge
> > index. To get compaction to kick in, I should be able to to just
> > reindex (by reading and rewriting) some of the existing items in
> > buckets that are still indexed? Earlier today I upgraded to Riak 1.2
> > and ran a Search read repair [3] in an attempt to kick of compaction.
> > Compaction didn't kick in, but instead disk consumption increased
> > again. Should Search Repair trigger compaction or only writing objects
> > to the KV store?
> >
> > [1]:https://gist.github.com/3982718
> > [2]:https://gist.github.com/3982730
> > [3]:
> http://docs.basho.com/riak/latest/cookbooks/Repairing-Search-Indexes/#Running-a
> > Repair
> >
> > --
> > Jeremy
> >
> >
> > On Tue, Oct 30, 2012 at 3:47 PM, Ryan Zezeski <rzeze...@basho.com>
> wrote:
> >> find /var/lib/riak/merge_index -name 'buffer.*' | xargs ls -lah
> >>
> >> find /var/lib/riak/merge_index -name 'segment.*' | xargs ls -lah
>
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to