Re: aggregate query

Ron Pastore Mon, 11 Nov 2013 11:53:00 -0800

It's probably not a big enough use case to justify another piece of
architecture, also the distributed nature and availability of Riak is why
it's great for this data and the rest of the app functionality.  I'm gonna
try implementing a lightweight transaction with ruby procs to wrap the key
rotation.  As long as we cover the failure scenarios between the secondary
index and KV, with at least a rollback and reference log for mishaps it
should be all good, plus doing the local updates and verification before we
discard anything permanently. Thanks again.



On Mon, Nov 11, 2013 at 1:58 PM, Mark A. Basil, Jr.
<[email protected]>wrote:

> At its very core, Riak is meant to provide the alternate benefits of
> availability and speed.  Transactions are out-of-scope of its use case.  If
> you’re still thinking in terms of transactions - and have a justified need
> for them - you might consider standing up a relational DB alongside for the
> crypto storage.
>
>
>
> *From:* Ron Pastore [mailto:[email protected]]
> *Sent:* Monday, November 11, 2013 11:10 AM
> *To:* Mark A. Basil, Jr.
> *Cc:* [email protected]
> *Subject:* Re: aggregate query
>
>
>
> Thanks Mark.  Yeah I like the idea of operating locally and ensuring
> everything before i remove the key.
>
>
>
> My fears with data loss mainly pertain to mid-operation failures and them
> leading to a discrepancy between my encrypted values and whatever secondary
> method i have of storing their usages.   So yeah, i guess my issue is
> somewhat the same regardless of the secondary storage method.
>
>
>
> I think with link walking, as with the search index, it comes down to the
> need to transactionally manage the updating of the secondary place where
> key usages are stored, with the primary being on the value itself (inside
> meta).  Essentially, i'd have to manually handle rollback procedures for
> cases like when an item is stored but key usage storing (search or
> linkwalk/bucket ) fails.   This get tricky because what if part of the
> rollback then fails, without persisting some sort of transaction log (may
> be fine if that's the only way) it's hard to guarantee my secondary store
> isn't missing something at the time i go to discard a key.
>
>
>
>
>
>
>
>
>
> On Mon, Nov 11, 2013 at 10:20 AM, Mark A. Basil, Jr. <
> [email protected]> wrote:
>
> Operate on the data locally, validating the decryption process as a final
> step after the re-encrypted value is put back into the db.
>
>
>
> Also, you don’t have to do it all in one step.  Pull a list of keys down,
> break them up, and test your batch job on a small portion.  If you’re
> concerned with data loss, ensure you haven’t lost any before you delete the
> updated value locally.
>
>
>
> The most efficient way to have set up your map would be a bucket to map
> the keyname to the thing it’s encrypted.
>
> Alternatively, you could have added a keys bucket that uses Links which
> would relate the key name to the thing that was encrypted by it.
>
>
>
> Lastly, it seems strange that your concerns with data loss are related to
> how you’ll be pulling the list of keys which need updated.  They really
> shouldn’t be related.
>
>
>
> -m
>
>
>
>
>
>
>
> *From:* riak-users [mailto:[email protected]] *On Behalf
> Of *Ron Pastore
> *Sent:* Monday, November 11, 2013 9:42 AM
> *To:* [email protected]
> *Subject:* aggregate query
>
>
>
> Hi All,
>
> I posted this question to Stack Overflow a few days back but not much
> luck.  Hoping someone here has some thoughts.
>
> I have a use case for an aggregate query across the entire db and all
> buckets, I'm wondering the best query method to use, leaning towards
> multiple secondary index calls. This won't be a frequently used feature,
> possibly invoked once a week or so via scheduled job or something.
>
> Some records have a value in their meta attribute that I'd like to
> match/target for the selection.  After the selection I'll need to update
> those records.
>
> From what I've read, secondary index looks great but it is limited to a
> single bucket? I also saw "list buckets", which has warnings about
> production use, though not sure if that's applicable to such infrequently
> used functionality. Thought maybe i could list buckets then perform the
> secondary index query on each.
>
> Is there a better way? MapReduce seems heavy, having to load every KV off
> the file system. Search seem possible too but index setup/maintenance seems
> overkill if there's an easier way.
>
> UPDATE:  i went ahead with a Search index but am now second guessing that.
>  This lookup will be part of an encryption key rotation, where we'll be
> finding certain values from Riak that are encrypted with a given key then
> re-encrypting with a new key.  So, if there are discrepancies or failed
> operations between the actual encrypted values and the search index, there
> is a potential for data loss, as we'll be discarding keys once rotated.
>
>
>
> Sorry for the long winded description.  Any help would be greatly
> appreciated.
>
>
>
>
>
>
>

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: aggregate query

Reply via email to