Re: Custom data-types

2014-08-29 Thread Sean Cribbs
Alex,

In short, no, you can't create custom types through schemas. Schemas
currently only refer to Riak Search 2.

We would love that too, but it hasn't happened yet. The problem is not
conceiving of a data type but making its behavior both sensible and
convergent in the face of concurrent activity or network partitions.
For instance, say that two tweets come in around the same time. Who
goes first in the stack you described? How can multiple independent
copies reason about which ones to drop from the bottom of the stack to
keep it bounded to 100? What happens if a replica is separated from
the others for a while and has really stale entries, is it valid to
serve those to a user? What happens when one replica pushes an element
and another one pops it at the same time?

These sound like they might be trivial problems, but they are
incredibly hard to reason about in the general case. You have to
reason about the ordering of events, the scope of their effects, and
decide on a least-surprising behavior to expose to the user. Although
we have given a pretty familiar/friendly interface to the data types
shipping in 2.0, their behavior is strictly different from the types
you would use in a single-threaded program in local memory.

On Thu, Aug 28, 2014 at 4:47 PM, Alex De la rosa
alex.rosa@gmail.com wrote:
 Hi there,

 Correct me if I'm wrong, but I think I read somewhere that custom data-types
 can be created through schemas or something like that. So, apart from
 COUNTERS, SETS and MAPS we could have some custom defined ones.

 I would love to have a STACKS data-type that would work like a FIFO stack,
 so I could save the last 100 objects for some action. Imagine we are
 building Twitter where millions of tweets are sent all the time, but we want
 to quickly know the last 100 tweets for a user. Imagine something like:

 obj.stacks['last_tweets'].add(id_of_last_tweet)

 IN: last_tweet --- STACK_OF_100_TWEETS --- OUT: older than the 100th goes
 out

 Is this possible? If so, how to do it?

 Thanks and Best Regards,
 Alex

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




-- 
Sean Cribbs s...@basho.com
Software Engineer
Basho Technologies, Inc.
http://basho.com/

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Secondary Index Limits

2014-08-29 Thread Sean Cribbs
Hi Bryan,

Index entries are just keys in LevelDB like normal values are. So,
performance is relatively constant at write time but is O(N) at read
(because you are scanning the index). The high-cardinality term will
definitely be expensive to enumerate, but the low-cardinality terms
will be much less so. The index entries in LevelDB look roughly like
this:

{i, IndexName, IndexTerm} = PrimaryKey

Since the keys are encoded with sext, we encode the start key of the
range (or the equality query), start an iterator and then seek to that
start key and start reading values. When a key is reached that exceeds
the range or equality query, we stop iterating.

Of course, that is an oversimplification, there are issues with
backpressure from the request coordinator process against the
iterators, streaming merge sort if you want the results back in order
or paginated, etc. Also, all 2I queries are run via coverage, which
ensures that the entire keyspace is covered but hits most if not all
nodes in the cluster.

On Thu, Aug 28, 2014 at 9:18 PM, Bryan br...@go-factory.net wrote:
 Hi Everyone,

 Apologies as this has probably been asked before. Unfortunately I have not
 been able to parse through the list serve to find a reasonable answer and
 the Basho wiki docs seem to be missing this information. I have read up on
 the secondary index docs.

 I am interested to better understand how the secondary indexes perform when
 there is a very low distribution of values that are indexed. For example,
 lets say I have a bucket with 1 million objects that I create a secondary
 index on. Now lets say the index is on a value that has an uneven
 distribution where one of the values is not selective while the others are,
 such that 60% of the values fall into a single indexed value, while the
 remaining 40% have a good distribution.

 For example, I have a record (i.e. object) where the indexed field is
 ‘foobar_bin'. I have 1 million objects in the bucket that have 100 unique
 ‘foobar’ values distributed over the 1 million objects. One of the values
 repeats for 60% of the records (600K) and the rest have an even distribution
 of about 4%.

 How will the secondary indexes perform with this and is this an appropriate
 use of the secondary indexes? Finally, what I have read is not completely
 clear on what happens if the indexed value is updated when the value has
 such a low degree of selectivity?

 We have less than 512 partitions and are using the erlang client.

 Thanks in advance - any insights will be much appreciated!


 Cheers,
 Bryan

 

 Bryan Hughes
 Go Factory
 http://www.go-factory.net

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




-- 
Sean Cribbs s...@basho.com
Software Engineer
Basho Technologies, Inc.
http://basho.com/

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Secondary Index Limits

2014-08-29 Thread Sean Cribbs
I made a minor mistake in my example, the PrimaryKey is part of the
index key, whereas the value contains nothing. It's more like this:

{i, IndexName, IndexTerm, PrimaryKey} = 

So for the initial seek, we construct a key like so:

{i, foobar_bin, baz, }

On Fri, Aug 29, 2014 at 8:44 AM, Sean Cribbs s...@basho.com wrote:
 Hi Bryan,

 Index entries are just keys in LevelDB like normal values are. So,
 performance is relatively constant at write time but is O(N) at read
 (because you are scanning the index). The high-cardinality term will
 definitely be expensive to enumerate, but the low-cardinality terms
 will be much less so. The index entries in LevelDB look roughly like
 this:

 {i, IndexName, IndexTerm} = PrimaryKey

 Since the keys are encoded with sext, we encode the start key of the
 range (or the equality query), start an iterator and then seek to that
 start key and start reading values. When a key is reached that exceeds
 the range or equality query, we stop iterating.

 Of course, that is an oversimplification, there are issues with
 backpressure from the request coordinator process against the
 iterators, streaming merge sort if you want the results back in order
 or paginated, etc. Also, all 2I queries are run via coverage, which
 ensures that the entire keyspace is covered but hits most if not all
 nodes in the cluster.

 On Thu, Aug 28, 2014 at 9:18 PM, Bryan br...@go-factory.net wrote:
 Hi Everyone,

 Apologies as this has probably been asked before. Unfortunately I have not
 been able to parse through the list serve to find a reasonable answer and
 the Basho wiki docs seem to be missing this information. I have read up on
 the secondary index docs.

 I am interested to better understand how the secondary indexes perform when
 there is a very low distribution of values that are indexed. For example,
 lets say I have a bucket with 1 million objects that I create a secondary
 index on. Now lets say the index is on a value that has an uneven
 distribution where one of the values is not selective while the others are,
 such that 60% of the values fall into a single indexed value, while the
 remaining 40% have a good distribution.

 For example, I have a record (i.e. object) where the indexed field is
 ‘foobar_bin'. I have 1 million objects in the bucket that have 100 unique
 ‘foobar’ values distributed over the 1 million objects. One of the values
 repeats for 60% of the records (600K) and the rest have an even distribution
 of about 4%.

 How will the secondary indexes perform with this and is this an appropriate
 use of the secondary indexes? Finally, what I have read is not completely
 clear on what happens if the indexed value is updated when the value has
 such a low degree of selectivity?

 We have less than 512 partitions and are using the erlang client.

 Thanks in advance - any insights will be much appreciated!


 Cheers,
 Bryan

 

 Bryan Hughes
 Go Factory
 http://www.go-factory.net

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




 --
 Sean Cribbs s...@basho.com
 Software Engineer
 Basho Technologies, Inc.
 http://basho.com/



-- 
Sean Cribbs s...@basho.com
Software Engineer
Basho Technologies, Inc.
http://basho.com/

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Custom data-types

2014-08-29 Thread Alex De la rosa
Hi Sean,

Seems I was wrong, that makes total sense now that you exposed it, looked a
too good feature to me, but seems is not that easy.

By the way, how does schemas really work for Riak Search? I went back and
read the documentation but didn't see a real difference from using the
default schema.

Thanks!
Alex


On Fri, Aug 29, 2014 at 3:36 PM, Sean Cribbs s...@basho.com wrote:

 Alex,

 In short, no, you can't create custom types through schemas. Schemas
 currently only refer to Riak Search 2.

 We would love that too, but it hasn't happened yet. The problem is not
 conceiving of a data type but making its behavior both sensible and
 convergent in the face of concurrent activity or network partitions.
 For instance, say that two tweets come in around the same time. Who
 goes first in the stack you described? How can multiple independent
 copies reason about which ones to drop from the bottom of the stack to
 keep it bounded to 100? What happens if a replica is separated from
 the others for a while and has really stale entries, is it valid to
 serve those to a user? What happens when one replica pushes an element
 and another one pops it at the same time?

 These sound like they might be trivial problems, but they are
 incredibly hard to reason about in the general case. You have to
 reason about the ordering of events, the scope of their effects, and
 decide on a least-surprising behavior to expose to the user. Although
 we have given a pretty familiar/friendly interface to the data types
 shipping in 2.0, their behavior is strictly different from the types
 you would use in a single-threaded program in local memory.

 On Thu, Aug 28, 2014 at 4:47 PM, Alex De la rosa
 alex.rosa@gmail.com wrote:
  Hi there,
 
  Correct me if I'm wrong, but I think I read somewhere that custom
 data-types
  can be created through schemas or something like that. So, apart from
  COUNTERS, SETS and MAPS we could have some custom defined ones.
 
  I would love to have a STACKS data-type that would work like a FIFO
 stack,
  so I could save the last 100 objects for some action. Imagine we are
  building Twitter where millions of tweets are sent all the time, but we
 want
  to quickly know the last 100 tweets for a user. Imagine something like:
 
  obj.stacks['last_tweets'].add(id_of_last_tweet)
 
  IN: last_tweet --- STACK_OF_100_TWEETS --- OUT: older than the 100th
 goes
  out
 
  Is this possible? If so, how to do it?
 
  Thanks and Best Regards,
  Alex
 
  ___
  riak-users mailing list
  riak-users@lists.basho.com
  http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
 



 --
 Sean Cribbs s...@basho.com
 Software Engineer
 Basho Technologies, Inc.
 http://basho.com/

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Riak VS Graph Databases

2014-08-29 Thread Alex De la rosa
Hi there,

For some time already I have in mind building a kind of social network
myself. Is pretty ambitious project although it doesn't have in mind to be
a new facebook; but still data will be quite big and complex.

I like Riak and I had been following since version 0.14, and new additions
in Riak 2.0 seem to help a lot in how to model the data; although
relationships will be unavoidable.

Some friends suggested me to use Graph Databases instead. How would Riak
compare to Graph Databases for this use case? Is it doable to create a
social network entirely from Riak? Or may not be recommended?

Thanks!
Alex
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak VS Graph Databases

2014-08-29 Thread Guido Medina
Maybe what you are looking for is a combination of both, say, your KV 
data in Riak with a combination of background processes able to build 
the necessary searching graphs in Neo4J, in such way your data is secure 
in a Riak cluster and searchable on several Neo4J servers.


That's just an idea which might be not do-able, hope it helps,

Guido.

On 29/08/14 15:54, Alex De la rosa wrote:

Hi there,

For some time already I have in mind building a kind of social network 
myself. Is pretty ambitious project although it doesn't have in mind 
to be a new facebook; but still data will be quite big and complex.


I like Riak and I had been following since version 0.14, and new 
additions in Riak 2.0 seem to help a lot in how to model the data; 
although relationships will be unavoidable.


Some friends suggested me to use Graph Databases instead. How would 
Riak compare to Graph Databases for this use case? Is it doable to 
create a social network entirely from Riak? Or may not be recommended?


Thanks!
Alex


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak VS Graph Databases

2014-08-29 Thread Alex De la rosa
Hi Guido,

This could be a solution; although I would try to do it in an homogeneous
system where only one NoSQL DB would be around if possible :)

Thanks!
Alex


On Fri, Aug 29, 2014 at 5:03 PM, Guido Medina guido.med...@temetra.com
wrote:

  Maybe what you are looking for is a combination of both, say, your KV
 data in Riak with a combination of background processes able to build the
 necessary searching graphs in Neo4J, in such way your data is secure in a
 Riak cluster and searchable on several Neo4J servers.

 That's just an idea which might be not do-able, hope it helps,

 Guido.


 On 29/08/14 15:54, Alex De la rosa wrote:

 Hi there,

  For some time already I have in mind building a kind of social network
 myself. Is pretty ambitious project although it doesn't have in mind to be
 a new facebook; but still data will be quite big and complex.

  I like Riak and I had been following since version 0.14, and new
 additions in Riak 2.0 seem to help a lot in how to model the data; although
 relationships will be unavoidable.

  Some friends suggested me to use Graph Databases instead. How would Riak
 compare to Graph Databases for this use case? Is it doable to create a
 social network entirely from Riak? Or may not be recommended?

  Thanks!
 Alex


 ___
 riak-users mailing 
 listriak-users@lists.basho.comhttp://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak VS Graph Databases

2014-08-29 Thread Pedro Larroy
Hi Alex

Did you have a look at orientdb, it might fit the need that you describe.

Pedro.
On Aug 29, 2014 5:07 PM, Alex De la rosa alex.rosa@gmail.com wrote:

 Hi Guido,

 This could be a solution; although I would try to do it in an homogeneous
 system where only one NoSQL DB would be around if possible :)

 Thanks!
 Alex


 On Fri, Aug 29, 2014 at 5:03 PM, Guido Medina guido.med...@temetra.com
 wrote:

  Maybe what you are looking for is a combination of both, say, your KV
 data in Riak with a combination of background processes able to build the
 necessary searching graphs in Neo4J, in such way your data is secure in a
 Riak cluster and searchable on several Neo4J servers.

 That's just an idea which might be not do-able, hope it helps,

 Guido.


 On 29/08/14 15:54, Alex De la rosa wrote:

 Hi there,

  For some time already I have in mind building a kind of social network
 myself. Is pretty ambitious project although it doesn't have in mind to be
 a new facebook; but still data will be quite big and complex.

  I like Riak and I had been following since version 0.14, and new
 additions in Riak 2.0 seem to help a lot in how to model the data; although
 relationships will be unavoidable.

  Some friends suggested me to use Graph Databases instead. How would
 Riak compare to Graph Databases for this use case? Is it doable to create a
 social network entirely from Riak? Or may not be recommended?

  Thanks!
 Alex


 ___
 riak-users mailing 
 listriak-users@lists.basho.comhttp://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


First Time Post, am I in the right place for this Post

2014-08-29 Thread Spiro N
Hello, I love the idea of an s3 alternative and was wondering if there was
anyway to use my s3 acces and secret keys with riak-cs. I tried
substituting my keys for the generated ones in both riak-cs and stanchion
app.config files to no avail.

Thanks for any help you can provide, and if I posted in the wrong place
place please point me in the right direction.

Kindly


Spiro

This message and any attachments are intended only for the use of the
addressee and may contain information that is privileged and confidential.
If the reader of the message is not the intended recipient or an authorized
representative of the intended recipient, you are hereby notified that any
dissemination of this communication is strictly prohibited. If you have
received this communication in error, notify the sender immediately by
return email and delete the message and any attachments from your system.
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak VS Graph Databases

2014-08-29 Thread Guido Medina
In a dream world my friend, we have Riak, PostgreSQL and Solr and might 
have to include a sort of query-able Big Table implementation in the 
future like Cassandra (we will try to avoid this last thing until we can't)


Your Graph DB will have trade off versus KV fetch in general, I don't 
think you have the tools you need in Riak to find the relationships per 
category nor the Riak advantages to do quick KV operations (data storage 
wise) nor cluster replication.


It won't be that simple without many trade off to build an homogeneous 
system.


Guido.

On 29/08/14 16:06, Alex De la rosa wrote:

Hi Guido,

This could be a solution; although I would try to do it in an 
homogeneous system where only one NoSQL DB would be around if possible :)


Thanks!
Alex


On Fri, Aug 29, 2014 at 5:03 PM, Guido Medina 
guido.med...@temetra.com mailto:guido.med...@temetra.com wrote:


Maybe what you are looking for is a combination of both, say, your
KV data in Riak with a combination of background processes able to
build the necessary searching graphs in Neo4J, in such way your
data is secure in a Riak cluster and searchable on several Neo4J
servers.

That's just an idea which might be not do-able, hope it helps,

Guido.


On 29/08/14 15:54, Alex De la rosa wrote:

Hi there,

For some time already I have in mind building a kind of social
network myself. Is pretty ambitious project although it doesn't
have in mind to be a new facebook; but still data will be quite
big and complex.

I like Riak and I had been following since version 0.14, and new
additions in Riak 2.0 seem to help a lot in how to model the
data; although relationships will be unavoidable.

Some friends suggested me to use Graph Databases instead. How
would Riak compare to Graph Databases for this use case? Is it
doable to create a social network entirely from Riak? Or may not
be recommended?

Thanks!
Alex


___
riak-users mailing list
riak-users@lists.basho.com  mailto:riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak VS Graph Databases

2014-08-29 Thread Alex De la rosa
Yeah, I know it might be hard to only use Riak, but I want to try see how
much I can do with only 1 system. If later I have to add more complexity to
the system, so be it :) but i will squeeze my brain as much as i can to
model the data in a way not much relationships may be required and probably
Riak might be enough... but we will see :)

Thanks!
Alex


On Fri, Aug 29, 2014 at 5:20 PM, Guido Medina guido.med...@temetra.com
wrote:

  In a dream world my friend, we have Riak, PostgreSQL and Solr and might
 have to include a sort of query-able Big Table implementation in the future
 like Cassandra (we will try to avoid this last thing until we can't)

 Your Graph DB will have trade off versus KV fetch in general, I don't
 think you have the tools you need in Riak to find the relationships per
 category nor the Riak advantages to do quick KV operations (data storage
 wise) nor cluster replication.

 It won't be that simple without many trade off to build an homogeneous
 system.

 Guido.


 On 29/08/14 16:06, Alex De la rosa wrote:

 Hi Guido,

  This could be a solution; although I would try to do it in an
 homogeneous system where only one NoSQL DB would be around if possible :)

  Thanks!
 Alex


 On Fri, Aug 29, 2014 at 5:03 PM, Guido Medina guido.med...@temetra.com
 wrote:

  Maybe what you are looking for is a combination of both, say, your KV
 data in Riak with a combination of background processes able to build the
 necessary searching graphs in Neo4J, in such way your data is secure in a
 Riak cluster and searchable on several Neo4J servers.

 That's just an idea which might be not do-able, hope it helps,

 Guido.


 On 29/08/14 15:54, Alex De la rosa wrote:

  Hi there,

  For some time already I have in mind building a kind of social network
 myself. Is pretty ambitious project although it doesn't have in mind to be
 a new facebook; but still data will be quite big and complex.

  I like Riak and I had been following since version 0.14, and new
 additions in Riak 2.0 seem to help a lot in how to model the data; although
 relationships will be unavoidable.

  Some friends suggested me to use Graph Databases instead. How would
 Riak compare to Graph Databases for this use case? Is it doable to create a
 social network entirely from Riak? Or may not be recommended?

  Thanks!
 Alex


   ___
 riak-users mailing 
 listriak-users@lists.basho.comhttp://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Secondary Index Limits

2014-08-29 Thread Bryan
Hi Sean,

Sweet! Thanks for the explanation. Much appreciated and very helpful.  Just a 
bit more clarification, on an equality lookup, where the ‘foobar’ key has a 
value ‘barfoo’ that is the very low-cardinality, are those indexed objects 
individually stored as a key/value term which then is enumerated over, versus a 
key/list_of_values term? In other words, if I have 600K records where 
foobar=barfoo, will this be 600K reads, or 1 read with a list of 600K entries 
which are then enumerated over in memory?

Cheers,
Bryan


Bryan Hughes
Go Factory

http://www.go-factory.net

On Aug 29, 2014, at 6:46 AM, Sean Cribbs s...@basho.com wrote:

 I made a minor mistake in my example, the PrimaryKey is part of the
 index key, whereas the value contains nothing. It's more like this:
 
 {i, IndexName, IndexTerm, PrimaryKey} = 
 
 So for the initial seek, we construct a key like so:
 
 {i, foobar_bin, baz, }
 
 On Fri, Aug 29, 2014 at 8:44 AM, Sean Cribbs s...@basho.com wrote:
 Hi Bryan,
 
 Index entries are just keys in LevelDB like normal values are. So,
 performance is relatively constant at write time but is O(N) at read
 (because you are scanning the index). The high-cardinality term will
 definitely be expensive to enumerate, but the low-cardinality terms
 will be much less so. The index entries in LevelDB look roughly like
 this:
 
 {i, IndexName, IndexTerm} = PrimaryKey
 
 Since the keys are encoded with sext, we encode the start key of the
 range (or the equality query), start an iterator and then seek to that
 start key and start reading values. When a key is reached that exceeds
 the range or equality query, we stop iterating.
 
 Of course, that is an oversimplification, there are issues with
 backpressure from the request coordinator process against the
 iterators, streaming merge sort if you want the results back in order
 or paginated, etc. Also, all 2I queries are run via coverage, which
 ensures that the entire keyspace is covered but hits most if not all
 nodes in the cluster.
 
 On Thu, Aug 28, 2014 at 9:18 PM, Bryan br...@go-factory.net wrote:
 Hi Everyone,
 
 Apologies as this has probably been asked before. Unfortunately I have not
 been able to parse through the list serve to find a reasonable answer and
 the Basho wiki docs seem to be missing this information. I have read up on
 the secondary index docs.
 
 I am interested to better understand how the secondary indexes perform when
 there is a very low distribution of values that are indexed. For example,
 lets say I have a bucket with 1 million objects that I create a secondary
 index on. Now lets say the index is on a value that has an uneven
 distribution where one of the values is not selective while the others are,
 such that 60% of the values fall into a single indexed value, while the
 remaining 40% have a good distribution.
 
 For example, I have a record (i.e. object) where the indexed field is
 ‘foobar_bin'. I have 1 million objects in the bucket that have 100 unique
 ‘foobar’ values distributed over the 1 million objects. One of the values
 repeats for 60% of the records (600K) and the rest have an even distribution
 of about 4%.
 
 How will the secondary indexes perform with this and is this an appropriate
 use of the secondary indexes? Finally, what I have read is not completely
 clear on what happens if the indexed value is updated when the value has
 such a low degree of selectivity?
 
 We have less than 512 partitions and are using the erlang client.
 
 Thanks in advance - any insights will be much appreciated!
 
 
 Cheers,
 Bryan
 
 
 
 Bryan Hughes
 Go Factory
 http://www.go-factory.net
 
 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
 
 
 
 
 --
 Sean Cribbs s...@basho.com
 Software Engineer
 Basho Technologies, Inc.
 http://basho.com/
 
 
 
 -- 
 Sean Cribbs s...@basho.com
 Software Engineer
 Basho Technologies, Inc.
 http://basho.com/


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Secondary Index Limits

2014-08-29 Thread Sean Cribbs
Correct, there is a key in LevelDB for each Riak key that has the
index term attached. This is somewhat mitigated by Snappy compression
(600K records might very well compress into a single block), but it is
nowhere near the storage efficiency of something like Solr's indexes.
It still has to scan.

On Fri, Aug 29, 2014 at 11:01 AM, Bryan br...@go-factory.net wrote:
 Hi Sean,

 Sweet! Thanks for the explanation. Much appreciated and very helpful.  Just a 
 bit more clarification, on an equality lookup, where the ‘foobar’ key has a 
 value ‘barfoo’ that is the very low-cardinality, are those indexed objects 
 individually stored as a key/value term which then is enumerated over, versus 
 a key/list_of_values term? In other words, if I have 600K records where 
 foobar=barfoo, will this be 600K reads, or 1 read with a list of 600K entries 
 which are then enumerated over in memory?

 Cheers,
 Bryan
 

 Bryan Hughes
 Go Factory

 http://www.go-factory.net

 On Aug 29, 2014, at 6:46 AM, Sean Cribbs s...@basho.com wrote:

 I made a minor mistake in my example, the PrimaryKey is part of the
 index key, whereas the value contains nothing. It's more like this:

 {i, IndexName, IndexTerm, PrimaryKey} = 

 So for the initial seek, we construct a key like so:

 {i, foobar_bin, baz, }

 On Fri, Aug 29, 2014 at 8:44 AM, Sean Cribbs s...@basho.com wrote:
 Hi Bryan,

 Index entries are just keys in LevelDB like normal values are. So,
 performance is relatively constant at write time but is O(N) at read
 (because you are scanning the index). The high-cardinality term will
 definitely be expensive to enumerate, but the low-cardinality terms
 will be much less so. The index entries in LevelDB look roughly like
 this:

 {i, IndexName, IndexTerm} = PrimaryKey

 Since the keys are encoded with sext, we encode the start key of the
 range (or the equality query), start an iterator and then seek to that
 start key and start reading values. When a key is reached that exceeds
 the range or equality query, we stop iterating.

 Of course, that is an oversimplification, there are issues with
 backpressure from the request coordinator process against the
 iterators, streaming merge sort if you want the results back in order
 or paginated, etc. Also, all 2I queries are run via coverage, which
 ensures that the entire keyspace is covered but hits most if not all
 nodes in the cluster.

 On Thu, Aug 28, 2014 at 9:18 PM, Bryan br...@go-factory.net wrote:
 Hi Everyone,

 Apologies as this has probably been asked before. Unfortunately I have not
 been able to parse through the list serve to find a reasonable answer and
 the Basho wiki docs seem to be missing this information. I have read up on
 the secondary index docs.

 I am interested to better understand how the secondary indexes perform when
 there is a very low distribution of values that are indexed. For example,
 lets say I have a bucket with 1 million objects that I create a secondary
 index on. Now lets say the index is on a value that has an uneven
 distribution where one of the values is not selective while the others are,
 such that 60% of the values fall into a single indexed value, while the
 remaining 40% have a good distribution.

 For example, I have a record (i.e. object) where the indexed field is
 ‘foobar_bin'. I have 1 million objects in the bucket that have 100 unique
 ‘foobar’ values distributed over the 1 million objects. One of the values
 repeats for 60% of the records (600K) and the rest have an even 
 distribution
 of about 4%.

 How will the secondary indexes perform with this and is this an appropriate
 use of the secondary indexes? Finally, what I have read is not completely
 clear on what happens if the indexed value is updated when the value has
 such a low degree of selectivity?

 We have less than 512 partitions and are using the erlang client.

 Thanks in advance - any insights will be much appreciated!


 Cheers,
 Bryan

 

 Bryan Hughes
 Go Factory
 http://www.go-factory.net

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




 --
 Sean Cribbs s...@basho.com
 Software Engineer
 Basho Technologies, Inc.
 http://basho.com/



 --
 Sean Cribbs s...@basho.com
 Software Engineer
 Basho Technologies, Inc.
 http://basho.com/




-- 
Sean Cribbs s...@basho.com
Software Engineer
Basho Technologies, Inc.
http://basho.com/

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Buckets

2014-08-29 Thread lloyd
Hello,

Is there a practical limit to the number of buckets defined in a given Riak 
installation?

E.g.: I could have a bucket called people with records for each person. Or, I 
could have a bucket for each person with records related to that person. But 
clearly, in the second case, the number of buckets could get quite large.

Thanks,

LRP




___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Buckets

2014-08-29 Thread Luke Bakken
Hi Lloyd,

As long as the buckets use the default bucket properties (Riak 1.X
series) or share a bucket type (Riak 2.X series), there is no real
limit to how many you can create outside of server capacity limits.

--
Luke Bakken
CSE
lbak...@basho.com


On Fri, Aug 29, 2014 at 10:08 AM,  ll...@writersglen.com wrote:
 Hello,

 Is there a practical limit to the number of buckets defined in a given Riak 
 installation?

 E.g.: I could have a bucket called people with records for each person. Or, I 
 could have a bucket for each person with records related to that person. But 
 clearly, in the second case, the number of buckets could get quite large.

 Thanks,

 LRP




 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Buckets

2014-08-29 Thread lloyd
Good to know.

Many thanks, Luke.

Lloyd

-Original Message-
From: Luke Bakken lbak...@basho.com
Sent: Friday, August 29, 2014 1:18pm
To: ll...@writersglen.com
Cc: riak-users riak-users@lists.basho.com
Subject: Re: Buckets

Hi Lloyd,

As long as the buckets use the default bucket properties (Riak 1.X
series) or share a bucket type (Riak 2.X series), there is no real
limit to how many you can create outside of server capacity limits.

--
Luke Bakken
CSE
lbak...@basho.com


On Fri, Aug 29, 2014 at 10:08 AM,  ll...@writersglen.com wrote:
 Hello,

 Is there a practical limit to the number of buckets defined in a given Riak 
 installation?

 E.g.: I could have a bucket called people with records for each person. Or, I 
 could have a bucket for each person with records related to that person. But 
 clearly, in the second case, the number of buckets could get quite large.

 Thanks,

 LRP




 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: First Time Post, am I in the right place for this Post

2014-08-29 Thread Luke Bakken
Hi Spiro,

The keys are generated by Riak CS and stored in the database. You'll want
to use the Riak CS generated ones.

--
Luke Bakken
CSE
lbak...@basho.com


On Fri, Aug 29, 2014 at 8:12 AM, Spiro N sp...@greenvirtualsolutions.com
wrote:

 Hello, I love the idea of an s3 alternative and was wondering if there was
 anyway to use my s3 acces and secret keys with riak-cs. I tried
 substituting my keys for the generated ones in both riak-cs and stanchion
 app.config files to no avail.

 Thanks for any help you can provide, and if I posted in the wrong place
 place please point me in the right direction.

 Kindly


 Spiro

 This message and any attachments are intended only for the use of the
 addressee and may contain information that is privileged and confidential.
 If the reader of the message is not the intended recipient or an authorized
 representative of the intended recipient, you are hereby notified that any
 dissemination of this communication is strictly prohibited. If you have
 received this communication in error, notify the sender immediately by
 return email and delete the message and any attachments from your system.

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com