Re: Custom data-types
Alex, In short, no, you can't create custom types through schemas. Schemas currently only refer to Riak Search 2. We would love that too, but it hasn't happened yet. The problem is not conceiving of a data type but making its behavior both sensible and convergent in the face of concurrent activity or network partitions. For instance, say that two tweets come in around the same time. Who goes first in the stack you described? How can multiple independent copies reason about which ones to drop from the bottom of the stack to keep it bounded to 100? What happens if a replica is separated from the others for a while and has really stale entries, is it valid to serve those to a user? What happens when one replica pushes an element and another one pops it at the same time? These sound like they might be trivial problems, but they are incredibly hard to reason about in the general case. You have to reason about the ordering of events, the scope of their effects, and decide on a least-surprising behavior to expose to the user. Although we have given a pretty familiar/friendly interface to the data types shipping in 2.0, their behavior is strictly different from the types you would use in a single-threaded program in local memory. On Thu, Aug 28, 2014 at 4:47 PM, Alex De la rosa alex.rosa@gmail.com wrote: Hi there, Correct me if I'm wrong, but I think I read somewhere that custom data-types can be created through schemas or something like that. So, apart from COUNTERS, SETS and MAPS we could have some custom defined ones. I would love to have a STACKS data-type that would work like a FIFO stack, so I could save the last 100 objects for some action. Imagine we are building Twitter where millions of tweets are sent all the time, but we want to quickly know the last 100 tweets for a user. Imagine something like: obj.stacks['last_tweets'].add(id_of_last_tweet) IN: last_tweet --- STACK_OF_100_TWEETS --- OUT: older than the 100th goes out Is this possible? If so, how to do it? Thanks and Best Regards, Alex ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com -- Sean Cribbs s...@basho.com Software Engineer Basho Technologies, Inc. http://basho.com/ ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak Secondary Index Limits
Hi Bryan, Index entries are just keys in LevelDB like normal values are. So, performance is relatively constant at write time but is O(N) at read (because you are scanning the index). The high-cardinality term will definitely be expensive to enumerate, but the low-cardinality terms will be much less so. The index entries in LevelDB look roughly like this: {i, IndexName, IndexTerm} = PrimaryKey Since the keys are encoded with sext, we encode the start key of the range (or the equality query), start an iterator and then seek to that start key and start reading values. When a key is reached that exceeds the range or equality query, we stop iterating. Of course, that is an oversimplification, there are issues with backpressure from the request coordinator process against the iterators, streaming merge sort if you want the results back in order or paginated, etc. Also, all 2I queries are run via coverage, which ensures that the entire keyspace is covered but hits most if not all nodes in the cluster. On Thu, Aug 28, 2014 at 9:18 PM, Bryan br...@go-factory.net wrote: Hi Everyone, Apologies as this has probably been asked before. Unfortunately I have not been able to parse through the list serve to find a reasonable answer and the Basho wiki docs seem to be missing this information. I have read up on the secondary index docs. I am interested to better understand how the secondary indexes perform when there is a very low distribution of values that are indexed. For example, lets say I have a bucket with 1 million objects that I create a secondary index on. Now lets say the index is on a value that has an uneven distribution where one of the values is not selective while the others are, such that 60% of the values fall into a single indexed value, while the remaining 40% have a good distribution. For example, I have a record (i.e. object) where the indexed field is ‘foobar_bin'. I have 1 million objects in the bucket that have 100 unique ‘foobar’ values distributed over the 1 million objects. One of the values repeats for 60% of the records (600K) and the rest have an even distribution of about 4%. How will the secondary indexes perform with this and is this an appropriate use of the secondary indexes? Finally, what I have read is not completely clear on what happens if the indexed value is updated when the value has such a low degree of selectivity? We have less than 512 partitions and are using the erlang client. Thanks in advance - any insights will be much appreciated! Cheers, Bryan Bryan Hughes Go Factory http://www.go-factory.net ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com -- Sean Cribbs s...@basho.com Software Engineer Basho Technologies, Inc. http://basho.com/ ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak Secondary Index Limits
I made a minor mistake in my example, the PrimaryKey is part of the index key, whereas the value contains nothing. It's more like this: {i, IndexName, IndexTerm, PrimaryKey} = So for the initial seek, we construct a key like so: {i, foobar_bin, baz, } On Fri, Aug 29, 2014 at 8:44 AM, Sean Cribbs s...@basho.com wrote: Hi Bryan, Index entries are just keys in LevelDB like normal values are. So, performance is relatively constant at write time but is O(N) at read (because you are scanning the index). The high-cardinality term will definitely be expensive to enumerate, but the low-cardinality terms will be much less so. The index entries in LevelDB look roughly like this: {i, IndexName, IndexTerm} = PrimaryKey Since the keys are encoded with sext, we encode the start key of the range (or the equality query), start an iterator and then seek to that start key and start reading values. When a key is reached that exceeds the range or equality query, we stop iterating. Of course, that is an oversimplification, there are issues with backpressure from the request coordinator process against the iterators, streaming merge sort if you want the results back in order or paginated, etc. Also, all 2I queries are run via coverage, which ensures that the entire keyspace is covered but hits most if not all nodes in the cluster. On Thu, Aug 28, 2014 at 9:18 PM, Bryan br...@go-factory.net wrote: Hi Everyone, Apologies as this has probably been asked before. Unfortunately I have not been able to parse through the list serve to find a reasonable answer and the Basho wiki docs seem to be missing this information. I have read up on the secondary index docs. I am interested to better understand how the secondary indexes perform when there is a very low distribution of values that are indexed. For example, lets say I have a bucket with 1 million objects that I create a secondary index on. Now lets say the index is on a value that has an uneven distribution where one of the values is not selective while the others are, such that 60% of the values fall into a single indexed value, while the remaining 40% have a good distribution. For example, I have a record (i.e. object) where the indexed field is ‘foobar_bin'. I have 1 million objects in the bucket that have 100 unique ‘foobar’ values distributed over the 1 million objects. One of the values repeats for 60% of the records (600K) and the rest have an even distribution of about 4%. How will the secondary indexes perform with this and is this an appropriate use of the secondary indexes? Finally, what I have read is not completely clear on what happens if the indexed value is updated when the value has such a low degree of selectivity? We have less than 512 partitions and are using the erlang client. Thanks in advance - any insights will be much appreciated! Cheers, Bryan Bryan Hughes Go Factory http://www.go-factory.net ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com -- Sean Cribbs s...@basho.com Software Engineer Basho Technologies, Inc. http://basho.com/ -- Sean Cribbs s...@basho.com Software Engineer Basho Technologies, Inc. http://basho.com/ ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Custom data-types
Hi Sean, Seems I was wrong, that makes total sense now that you exposed it, looked a too good feature to me, but seems is not that easy. By the way, how does schemas really work for Riak Search? I went back and read the documentation but didn't see a real difference from using the default schema. Thanks! Alex On Fri, Aug 29, 2014 at 3:36 PM, Sean Cribbs s...@basho.com wrote: Alex, In short, no, you can't create custom types through schemas. Schemas currently only refer to Riak Search 2. We would love that too, but it hasn't happened yet. The problem is not conceiving of a data type but making its behavior both sensible and convergent in the face of concurrent activity or network partitions. For instance, say that two tweets come in around the same time. Who goes first in the stack you described? How can multiple independent copies reason about which ones to drop from the bottom of the stack to keep it bounded to 100? What happens if a replica is separated from the others for a while and has really stale entries, is it valid to serve those to a user? What happens when one replica pushes an element and another one pops it at the same time? These sound like they might be trivial problems, but they are incredibly hard to reason about in the general case. You have to reason about the ordering of events, the scope of their effects, and decide on a least-surprising behavior to expose to the user. Although we have given a pretty familiar/friendly interface to the data types shipping in 2.0, their behavior is strictly different from the types you would use in a single-threaded program in local memory. On Thu, Aug 28, 2014 at 4:47 PM, Alex De la rosa alex.rosa@gmail.com wrote: Hi there, Correct me if I'm wrong, but I think I read somewhere that custom data-types can be created through schemas or something like that. So, apart from COUNTERS, SETS and MAPS we could have some custom defined ones. I would love to have a STACKS data-type that would work like a FIFO stack, so I could save the last 100 objects for some action. Imagine we are building Twitter where millions of tweets are sent all the time, but we want to quickly know the last 100 tweets for a user. Imagine something like: obj.stacks['last_tweets'].add(id_of_last_tweet) IN: last_tweet --- STACK_OF_100_TWEETS --- OUT: older than the 100th goes out Is this possible? If so, how to do it? Thanks and Best Regards, Alex ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com -- Sean Cribbs s...@basho.com Software Engineer Basho Technologies, Inc. http://basho.com/ ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Riak VS Graph Databases
Hi there, For some time already I have in mind building a kind of social network myself. Is pretty ambitious project although it doesn't have in mind to be a new facebook; but still data will be quite big and complex. I like Riak and I had been following since version 0.14, and new additions in Riak 2.0 seem to help a lot in how to model the data; although relationships will be unavoidable. Some friends suggested me to use Graph Databases instead. How would Riak compare to Graph Databases for this use case? Is it doable to create a social network entirely from Riak? Or may not be recommended? Thanks! Alex ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak VS Graph Databases
Maybe what you are looking for is a combination of both, say, your KV data in Riak with a combination of background processes able to build the necessary searching graphs in Neo4J, in such way your data is secure in a Riak cluster and searchable on several Neo4J servers. That's just an idea which might be not do-able, hope it helps, Guido. On 29/08/14 15:54, Alex De la rosa wrote: Hi there, For some time already I have in mind building a kind of social network myself. Is pretty ambitious project although it doesn't have in mind to be a new facebook; but still data will be quite big and complex. I like Riak and I had been following since version 0.14, and new additions in Riak 2.0 seem to help a lot in how to model the data; although relationships will be unavoidable. Some friends suggested me to use Graph Databases instead. How would Riak compare to Graph Databases for this use case? Is it doable to create a social network entirely from Riak? Or may not be recommended? Thanks! Alex ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak VS Graph Databases
Hi Guido, This could be a solution; although I would try to do it in an homogeneous system where only one NoSQL DB would be around if possible :) Thanks! Alex On Fri, Aug 29, 2014 at 5:03 PM, Guido Medina guido.med...@temetra.com wrote: Maybe what you are looking for is a combination of both, say, your KV data in Riak with a combination of background processes able to build the necessary searching graphs in Neo4J, in such way your data is secure in a Riak cluster and searchable on several Neo4J servers. That's just an idea which might be not do-able, hope it helps, Guido. On 29/08/14 15:54, Alex De la rosa wrote: Hi there, For some time already I have in mind building a kind of social network myself. Is pretty ambitious project although it doesn't have in mind to be a new facebook; but still data will be quite big and complex. I like Riak and I had been following since version 0.14, and new additions in Riak 2.0 seem to help a lot in how to model the data; although relationships will be unavoidable. Some friends suggested me to use Graph Databases instead. How would Riak compare to Graph Databases for this use case? Is it doable to create a social network entirely from Riak? Or may not be recommended? Thanks! Alex ___ riak-users mailing listriak-users@lists.basho.comhttp://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak VS Graph Databases
Hi Alex Did you have a look at orientdb, it might fit the need that you describe. Pedro. On Aug 29, 2014 5:07 PM, Alex De la rosa alex.rosa@gmail.com wrote: Hi Guido, This could be a solution; although I would try to do it in an homogeneous system where only one NoSQL DB would be around if possible :) Thanks! Alex On Fri, Aug 29, 2014 at 5:03 PM, Guido Medina guido.med...@temetra.com wrote: Maybe what you are looking for is a combination of both, say, your KV data in Riak with a combination of background processes able to build the necessary searching graphs in Neo4J, in such way your data is secure in a Riak cluster and searchable on several Neo4J servers. That's just an idea which might be not do-able, hope it helps, Guido. On 29/08/14 15:54, Alex De la rosa wrote: Hi there, For some time already I have in mind building a kind of social network myself. Is pretty ambitious project although it doesn't have in mind to be a new facebook; but still data will be quite big and complex. I like Riak and I had been following since version 0.14, and new additions in Riak 2.0 seem to help a lot in how to model the data; although relationships will be unavoidable. Some friends suggested me to use Graph Databases instead. How would Riak compare to Graph Databases for this use case? Is it doable to create a social network entirely from Riak? Or may not be recommended? Thanks! Alex ___ riak-users mailing listriak-users@lists.basho.comhttp://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
First Time Post, am I in the right place for this Post
Hello, I love the idea of an s3 alternative and was wondering if there was anyway to use my s3 acces and secret keys with riak-cs. I tried substituting my keys for the generated ones in both riak-cs and stanchion app.config files to no avail. Thanks for any help you can provide, and if I posted in the wrong place place please point me in the right direction. Kindly Spiro This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, notify the sender immediately by return email and delete the message and any attachments from your system. ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak VS Graph Databases
In a dream world my friend, we have Riak, PostgreSQL and Solr and might have to include a sort of query-able Big Table implementation in the future like Cassandra (we will try to avoid this last thing until we can't) Your Graph DB will have trade off versus KV fetch in general, I don't think you have the tools you need in Riak to find the relationships per category nor the Riak advantages to do quick KV operations (data storage wise) nor cluster replication. It won't be that simple without many trade off to build an homogeneous system. Guido. On 29/08/14 16:06, Alex De la rosa wrote: Hi Guido, This could be a solution; although I would try to do it in an homogeneous system where only one NoSQL DB would be around if possible :) Thanks! Alex On Fri, Aug 29, 2014 at 5:03 PM, Guido Medina guido.med...@temetra.com mailto:guido.med...@temetra.com wrote: Maybe what you are looking for is a combination of both, say, your KV data in Riak with a combination of background processes able to build the necessary searching graphs in Neo4J, in such way your data is secure in a Riak cluster and searchable on several Neo4J servers. That's just an idea which might be not do-able, hope it helps, Guido. On 29/08/14 15:54, Alex De la rosa wrote: Hi there, For some time already I have in mind building a kind of social network myself. Is pretty ambitious project although it doesn't have in mind to be a new facebook; but still data will be quite big and complex. I like Riak and I had been following since version 0.14, and new additions in Riak 2.0 seem to help a lot in how to model the data; although relationships will be unavoidable. Some friends suggested me to use Graph Databases instead. How would Riak compare to Graph Databases for this use case? Is it doable to create a social network entirely from Riak? Or may not be recommended? Thanks! Alex ___ riak-users mailing list riak-users@lists.basho.com mailto:riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak VS Graph Databases
Yeah, I know it might be hard to only use Riak, but I want to try see how much I can do with only 1 system. If later I have to add more complexity to the system, so be it :) but i will squeeze my brain as much as i can to model the data in a way not much relationships may be required and probably Riak might be enough... but we will see :) Thanks! Alex On Fri, Aug 29, 2014 at 5:20 PM, Guido Medina guido.med...@temetra.com wrote: In a dream world my friend, we have Riak, PostgreSQL and Solr and might have to include a sort of query-able Big Table implementation in the future like Cassandra (we will try to avoid this last thing until we can't) Your Graph DB will have trade off versus KV fetch in general, I don't think you have the tools you need in Riak to find the relationships per category nor the Riak advantages to do quick KV operations (data storage wise) nor cluster replication. It won't be that simple without many trade off to build an homogeneous system. Guido. On 29/08/14 16:06, Alex De la rosa wrote: Hi Guido, This could be a solution; although I would try to do it in an homogeneous system where only one NoSQL DB would be around if possible :) Thanks! Alex On Fri, Aug 29, 2014 at 5:03 PM, Guido Medina guido.med...@temetra.com wrote: Maybe what you are looking for is a combination of both, say, your KV data in Riak with a combination of background processes able to build the necessary searching graphs in Neo4J, in such way your data is secure in a Riak cluster and searchable on several Neo4J servers. That's just an idea which might be not do-able, hope it helps, Guido. On 29/08/14 15:54, Alex De la rosa wrote: Hi there, For some time already I have in mind building a kind of social network myself. Is pretty ambitious project although it doesn't have in mind to be a new facebook; but still data will be quite big and complex. I like Riak and I had been following since version 0.14, and new additions in Riak 2.0 seem to help a lot in how to model the data; although relationships will be unavoidable. Some friends suggested me to use Graph Databases instead. How would Riak compare to Graph Databases for this use case? Is it doable to create a social network entirely from Riak? Or may not be recommended? Thanks! Alex ___ riak-users mailing listriak-users@lists.basho.comhttp://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak Secondary Index Limits
Hi Sean, Sweet! Thanks for the explanation. Much appreciated and very helpful. Just a bit more clarification, on an equality lookup, where the ‘foobar’ key has a value ‘barfoo’ that is the very low-cardinality, are those indexed objects individually stored as a key/value term which then is enumerated over, versus a key/list_of_values term? In other words, if I have 600K records where foobar=barfoo, will this be 600K reads, or 1 read with a list of 600K entries which are then enumerated over in memory? Cheers, Bryan Bryan Hughes Go Factory http://www.go-factory.net On Aug 29, 2014, at 6:46 AM, Sean Cribbs s...@basho.com wrote: I made a minor mistake in my example, the PrimaryKey is part of the index key, whereas the value contains nothing. It's more like this: {i, IndexName, IndexTerm, PrimaryKey} = So for the initial seek, we construct a key like so: {i, foobar_bin, baz, } On Fri, Aug 29, 2014 at 8:44 AM, Sean Cribbs s...@basho.com wrote: Hi Bryan, Index entries are just keys in LevelDB like normal values are. So, performance is relatively constant at write time but is O(N) at read (because you are scanning the index). The high-cardinality term will definitely be expensive to enumerate, but the low-cardinality terms will be much less so. The index entries in LevelDB look roughly like this: {i, IndexName, IndexTerm} = PrimaryKey Since the keys are encoded with sext, we encode the start key of the range (or the equality query), start an iterator and then seek to that start key and start reading values. When a key is reached that exceeds the range or equality query, we stop iterating. Of course, that is an oversimplification, there are issues with backpressure from the request coordinator process against the iterators, streaming merge sort if you want the results back in order or paginated, etc. Also, all 2I queries are run via coverage, which ensures that the entire keyspace is covered but hits most if not all nodes in the cluster. On Thu, Aug 28, 2014 at 9:18 PM, Bryan br...@go-factory.net wrote: Hi Everyone, Apologies as this has probably been asked before. Unfortunately I have not been able to parse through the list serve to find a reasonable answer and the Basho wiki docs seem to be missing this information. I have read up on the secondary index docs. I am interested to better understand how the secondary indexes perform when there is a very low distribution of values that are indexed. For example, lets say I have a bucket with 1 million objects that I create a secondary index on. Now lets say the index is on a value that has an uneven distribution where one of the values is not selective while the others are, such that 60% of the values fall into a single indexed value, while the remaining 40% have a good distribution. For example, I have a record (i.e. object) where the indexed field is ‘foobar_bin'. I have 1 million objects in the bucket that have 100 unique ‘foobar’ values distributed over the 1 million objects. One of the values repeats for 60% of the records (600K) and the rest have an even distribution of about 4%. How will the secondary indexes perform with this and is this an appropriate use of the secondary indexes? Finally, what I have read is not completely clear on what happens if the indexed value is updated when the value has such a low degree of selectivity? We have less than 512 partitions and are using the erlang client. Thanks in advance - any insights will be much appreciated! Cheers, Bryan Bryan Hughes Go Factory http://www.go-factory.net ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com -- Sean Cribbs s...@basho.com Software Engineer Basho Technologies, Inc. http://basho.com/ -- Sean Cribbs s...@basho.com Software Engineer Basho Technologies, Inc. http://basho.com/ ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak Secondary Index Limits
Correct, there is a key in LevelDB for each Riak key that has the index term attached. This is somewhat mitigated by Snappy compression (600K records might very well compress into a single block), but it is nowhere near the storage efficiency of something like Solr's indexes. It still has to scan. On Fri, Aug 29, 2014 at 11:01 AM, Bryan br...@go-factory.net wrote: Hi Sean, Sweet! Thanks for the explanation. Much appreciated and very helpful. Just a bit more clarification, on an equality lookup, where the ‘foobar’ key has a value ‘barfoo’ that is the very low-cardinality, are those indexed objects individually stored as a key/value term which then is enumerated over, versus a key/list_of_values term? In other words, if I have 600K records where foobar=barfoo, will this be 600K reads, or 1 read with a list of 600K entries which are then enumerated over in memory? Cheers, Bryan Bryan Hughes Go Factory http://www.go-factory.net On Aug 29, 2014, at 6:46 AM, Sean Cribbs s...@basho.com wrote: I made a minor mistake in my example, the PrimaryKey is part of the index key, whereas the value contains nothing. It's more like this: {i, IndexName, IndexTerm, PrimaryKey} = So for the initial seek, we construct a key like so: {i, foobar_bin, baz, } On Fri, Aug 29, 2014 at 8:44 AM, Sean Cribbs s...@basho.com wrote: Hi Bryan, Index entries are just keys in LevelDB like normal values are. So, performance is relatively constant at write time but is O(N) at read (because you are scanning the index). The high-cardinality term will definitely be expensive to enumerate, but the low-cardinality terms will be much less so. The index entries in LevelDB look roughly like this: {i, IndexName, IndexTerm} = PrimaryKey Since the keys are encoded with sext, we encode the start key of the range (or the equality query), start an iterator and then seek to that start key and start reading values. When a key is reached that exceeds the range or equality query, we stop iterating. Of course, that is an oversimplification, there are issues with backpressure from the request coordinator process against the iterators, streaming merge sort if you want the results back in order or paginated, etc. Also, all 2I queries are run via coverage, which ensures that the entire keyspace is covered but hits most if not all nodes in the cluster. On Thu, Aug 28, 2014 at 9:18 PM, Bryan br...@go-factory.net wrote: Hi Everyone, Apologies as this has probably been asked before. Unfortunately I have not been able to parse through the list serve to find a reasonable answer and the Basho wiki docs seem to be missing this information. I have read up on the secondary index docs. I am interested to better understand how the secondary indexes perform when there is a very low distribution of values that are indexed. For example, lets say I have a bucket with 1 million objects that I create a secondary index on. Now lets say the index is on a value that has an uneven distribution where one of the values is not selective while the others are, such that 60% of the values fall into a single indexed value, while the remaining 40% have a good distribution. For example, I have a record (i.e. object) where the indexed field is ‘foobar_bin'. I have 1 million objects in the bucket that have 100 unique ‘foobar’ values distributed over the 1 million objects. One of the values repeats for 60% of the records (600K) and the rest have an even distribution of about 4%. How will the secondary indexes perform with this and is this an appropriate use of the secondary indexes? Finally, what I have read is not completely clear on what happens if the indexed value is updated when the value has such a low degree of selectivity? We have less than 512 partitions and are using the erlang client. Thanks in advance - any insights will be much appreciated! Cheers, Bryan Bryan Hughes Go Factory http://www.go-factory.net ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com -- Sean Cribbs s...@basho.com Software Engineer Basho Technologies, Inc. http://basho.com/ -- Sean Cribbs s...@basho.com Software Engineer Basho Technologies, Inc. http://basho.com/ -- Sean Cribbs s...@basho.com Software Engineer Basho Technologies, Inc. http://basho.com/ ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Buckets
Hello, Is there a practical limit to the number of buckets defined in a given Riak installation? E.g.: I could have a bucket called people with records for each person. Or, I could have a bucket for each person with records related to that person. But clearly, in the second case, the number of buckets could get quite large. Thanks, LRP ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Buckets
Hi Lloyd, As long as the buckets use the default bucket properties (Riak 1.X series) or share a bucket type (Riak 2.X series), there is no real limit to how many you can create outside of server capacity limits. -- Luke Bakken CSE lbak...@basho.com On Fri, Aug 29, 2014 at 10:08 AM, ll...@writersglen.com wrote: Hello, Is there a practical limit to the number of buckets defined in a given Riak installation? E.g.: I could have a bucket called people with records for each person. Or, I could have a bucket for each person with records related to that person. But clearly, in the second case, the number of buckets could get quite large. Thanks, LRP ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Buckets
Good to know. Many thanks, Luke. Lloyd -Original Message- From: Luke Bakken lbak...@basho.com Sent: Friday, August 29, 2014 1:18pm To: ll...@writersglen.com Cc: riak-users riak-users@lists.basho.com Subject: Re: Buckets Hi Lloyd, As long as the buckets use the default bucket properties (Riak 1.X series) or share a bucket type (Riak 2.X series), there is no real limit to how many you can create outside of server capacity limits. -- Luke Bakken CSE lbak...@basho.com On Fri, Aug 29, 2014 at 10:08 AM, ll...@writersglen.com wrote: Hello, Is there a practical limit to the number of buckets defined in a given Riak installation? E.g.: I could have a bucket called people with records for each person. Or, I could have a bucket for each person with records related to that person. But clearly, in the second case, the number of buckets could get quite large. Thanks, LRP ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: First Time Post, am I in the right place for this Post
Hi Spiro, The keys are generated by Riak CS and stored in the database. You'll want to use the Riak CS generated ones. -- Luke Bakken CSE lbak...@basho.com On Fri, Aug 29, 2014 at 8:12 AM, Spiro N sp...@greenvirtualsolutions.com wrote: Hello, I love the idea of an s3 alternative and was wondering if there was anyway to use my s3 acces and secret keys with riak-cs. I tried substituting my keys for the generated ones in both riak-cs and stanchion app.config files to no avail. Thanks for any help you can provide, and if I posted in the wrong place place please point me in the right direction. Kindly Spiro This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, notify the sender immediately by return email and delete the message and any attachments from your system. ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com