Re: Deleting items from search index increases disk usage

Jeremy Raymond Mon, 05 Nov 2012 10:06:58 -0800

I gave this approach a try on the test cluster and it worked. I disabled
search on all buckets. Then dropped the merge index on each node via the MI
Pid. This freed up the space (it didn't remove all the subfolders in the
merge_index but the disk usage went down to a few hundred K). Then I
reindex the items I wished to be searchable after re-enabling search on the
buckets.


--
Jeremy


On Fri, Nov 2, 2012 at 1:09 PM, Ryan Zezeski <rzeze...@basho.com> wrote:

> Jeremy,
>
> On Fri, Nov 2, 2012 at 12:31 PM, Jeremy Raymond <jeraym...@gmail.com>wrote:
>
>> I cycled through the compaction on another node. Again after 3 rounds
>> compaction has stopped. On one node the merge index is 26 GB on the other
>> 21 GB. So it looks like I've hit the 5 segment compaction no-op condition
>> on both nodes.
>
>
> I concur.  This condition seems arbitrary to me and I'm not sure if there
> is a good reason for it to exist.  But it's there and the only way we could
> remove it for you is to hot-load a new beam.
>
>
>> What would account for the difference in merge_index size? Shouldn't
>> these be relatively the same? There must still be tombstones in there...
>>
>
> Riak Search uses term-based partitioning.  It could be that you have some
> terms that are more frequent than others which would account for some of
> the difference.
>
>
>>
>> On my production cluster the merge_index is ~44GB. I estimate that
>> approximately 90 - 95% of the index data belongs to the bucket I no longer
>> want indexed. Manually deleting items from the index then manually
>> triggering compaction doesn't look like it will scale. Will this workflow
>> work to re-build the search index. I need to keep the cluster available for
>> writes while doing this:
>>
>> 1. In a rolling fashion, disable Riak Search one node at a time.
>> 2. Delete the contents of the merge_index on each node.
>> 3. In a rolling fashion, re-enable Riak Search on each node.
>> 4. Reindex the items to be included in the search index.
>>
>
> No, instead of disabling Riak Search you'll want to take the nodes down
> one at a time, remove the merge index data, restart.  After doing this for
> all nodes then re-index your data.
>
>
>>
>> This should do the trick right? Do I need to disable search before
>> clearing out the merge_index folders or would disabling the search index on
>> the buckets via search-cmd be enough (and then re-enabling) before
>> re-indexing?
>>
>
> Again, don't bother disabling search.  The key is to take the nodes down
> because merge index caches stuff in memory.
>
> Actually, I thought of another way to achieve the same result without
> taking the nodes down.  If you have a non-production cluster to test this
> on that would be a good precaution.  I'm 99% sure this should work without
> issue.
>
> 1. Make sure no indexes are incoming, do this either at your client or
> uninstall all search hooks
>
> For each node:
>
> 2. Get a list of the MI Pids like in the manual compaction example
> 3. For each MI Pid call merge_index:drop(MIPid)
> 3a. Verify the data files were removed on disk
>
> After performing steps 2 & 3 on each node:
>
> 4. Re-write the objects you want indexed (of course remember to re-install
> the hooks if you removed them in step 1)
>
> -Z
>
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Deleting items from search index increases disk usage

Reply via email to