efficient way to store 8-bit or 16-bit value?
What do people recommend I do to store a small binary value in a column? I’d rather not simply use a 32-bit int for a single byte value. Can I have a one byte blob? Or should I store it as a single character ASCII string? I imagine each is going to have the overhead of storing the length (or null termination in the case of a string). That overhead may be worse than simply using a 32-bit int. Also is it possible to partition on a single character or substring of characters from a string (or a portion of a blob)? Something like: CREATE TABLE test ( id text, value blob, PRIMARY KEY (string[0:1]) )
user / password authentication advice
Hi, I’m using Cassandra in an environment where many users can login to use an application I’m developing. I’m curious if anyone has any advice or links to documentation / blogs where it discusses common implementations or best practices for user and password authentication. My cursory search online didn’t bring much up on the subject. I suppose the information needn’t even be specific to Cassandra. I imagine a few basic steps will be as follows: user types in username (e.g. email address) and password this is verified against a table storing username and passwords (encrypted in some way) a token is return to the app / web browser to allow further transactions using secure token (e.g. cookie) Obviously I’m only scratching the surface and it’s the detail and best practices of implementing this user / password authentication that I’m curious about. Thank you, Ben
Re: user / password authentication advice
OK, thanks for getting me going in the right direction. I imagine most people would store password and tokenized authentication information in a single table, using the username (e.g. email address) as the key? On Dec 11, 2013, at 10:44 PM, Janne Jalkanen janne.jalka...@ecyrd.com wrote: Hi! You're right, this isn't really Cassandra-specific. Most languages/web frameworks have their own way of doing user authentication, and then you just typically write a plugin that just stores whatever data the system needs in Cassandra. For example, if you're using Java (or Scala or Groovy or anything else JVM-based), Apache Shiro is a good way of doing user authentication and authorization. http://shiro.apache.org/. Just implement a custom Realm for Cassandra and you should be set. /Janne On Dec 12, 2013, at 05:31 , onlinespending onlinespend...@gmail.com wrote: Hi, I’m using Cassandra in an environment where many users can login to use an application I’m developing. I’m curious if anyone has any advice or links to documentation / blogs where it discusses common implementations or best practices for user and password authentication. My cursory search online didn’t bring much up on the subject. I suppose the information needn’t even be specific to Cassandra. I imagine a few basic steps will be as follows: user types in username (e.g. email address) and password this is verified against a table storing username and passwords (encrypted in some way) a token is return to the app / web browser to allow further transactions using secure token (e.g. cookie) Obviously I’m only scratching the surface and it’s the detail and best practices of implementing this user / password authentication that I’m curious about. Thank you, Ben
Re: Exactly one wide row per node for a given CF?
comments below On Dec 9, 2013, at 11:33 PM, Aaron Morton aa...@thelastpickle.com wrote: But this becomes troublesome if I add or remove nodes. What effectively I want is to partition on the unique id of the record modulus N (id % N; where N is the number of nodes). This is exactly the problem consistent hashing (used by cassandra) is designed to solve. If you hash the key and modulo the number of nodes, adding and removing nodes requires a lot of data to move. I want to be able to randomly distribute a large set of records but keep them clustered in one wide row per node. Sounds like you should revisit your data modelling, this is a pretty well known anti pattern. When rows get above a few 10’s of MB things can slow down, when they get above 50 MB they can be a pain, when they get above 100MB it’s a warning sign. And when they get above 1GB, well you you don’t want to know what happens then. It’s a bad idea and you should take another look at the data model. If you have to do it, you can try the ByteOrderedPartitioner which uses the row key as a token, given you total control of the row placement. You should re-read my last paragraph as an example that would clearly benefit from such an approach. If one understands how paging works then you’ll see why how you’d benefit from grouping probabilistically similar data within each node, but also wanting to split data across nodes to reduce hot spotting. Regardless, I no longer think it’s necessary to have a single wide rode per node. Several wide rows per node is just as good, since for all practical purposes paging in the first N key/values per M rows on a node is the same as reading in the first N*M key/values from a single row. So I’m going to do what I alluded to before. Treat the LSB of a record’s id as the partition key, and then cluster on something meaningful (such as geohash) and the prefix of the id. create table test ( id_prefix int, id_suffix int, geohash text, value text, primary key (id_suffix, geohash, id_prefix)); So if this was for a collection of users, they would be randomly distributed across nodes to increase parallelism and reduce hotspots, but within each wide row they'd be meaningfully clustered by geographic region, so as to increase the probability that paged in data is going to contain more of the working set. Cheers - Aaron Morton New Zealand @aaronmorton Co-Founder Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 4/12/2013, at 8:32 pm, Vivek Mishra mishra.v...@gmail.com wrote: So Basically you want to create a cluster of multiple unique keys, but data which belongs to one unique should be colocated. correct? -Vivek On Tue, Dec 3, 2013 at 10:39 AM, onlinespending onlinespend...@gmail.com wrote: Subject says it all. I want to be able to randomly distribute a large set of records but keep them clustered in one wide row per node. As an example, lets say I’ve got a collection of about 1 million records each with a unique id. If I just go ahead and set the primary key (and therefore the partition key) as the unique id, I’ll get very good random distribution across my server cluster. However, each record will be its own row. I’d like to have each record belong to one large wide row (per server node) so I can have them sorted or clustered on some other column. If I say have 5 nodes in my cluster, I could randomly assign a value of 1 - 5 at the time of creation and have the partition key set to this value. But this becomes troublesome if I add or remove nodes. What effectively I want is to partition on the unique id of the record modulus N (id % N; where N is the number of nodes). I have to imagine there’s a mechanism in Cassandra to simply randomize the partitioning without even using a key (and then clustering on some column). Thanks for any help.
Re: Exactly one wide row per node for a given CF?
Where you’ll run into trouble is with compaction. It looks as if pord is some sequentially increasing value. Try your experiment again with a clustering key which is a bit more random at the time of insertion. On Dec 10, 2013, at 5:41 AM, Robert Wille rwi...@fold3.com wrote: I have a question about this statement: When rows get above a few 10’s of MB things can slow down, when they get above 50 MB they can be a pain, when they get above 100MB it’s a warning sign. And when they get above 1GB, well you you don’t want to know what happens then. I tested a data model that I created. Here’s the schema for the table in question: CREATE TABLE bdn_index_pub ( tree INT, pord INT, hpath VARCHAR, PRIMARY KEY (tree, pord) ); As a test, I inserted 100 million records. tree had the same value for every record, and I had 100 million values for pord. hpath averaged about 50 characters in length. My understanding is that all 100 million strings would have been stored in a single row, since they all had the same value for the first component of the primary key. I didn’t look at the size of the table, but it had to be several gigs (uncompressed). Contrary to what Aaron says, I do want to know what happens, because I didn’t experience any issues with this table during my test. Inserting was fast. The last batch of records inserted in approximately the same amount of time as the first batch. Querying the table was fast. What I didn’t do was test the table under load, nor did I try this in a multi-node cluster. If this is bad, can somebody suggest a better pattern? This table was designed to support a query like this: select hpath from bdn_index_pub where tree = :tree and pord = :start and pord = :end. In my application, most trees will have less than a million records. A handful will have 10’s of millions, and one of them will have 100 million. If I need to break up my rows, my first instinct would be to divide each tree into blocks of say 10,000 and change tree to a string that contains the tree and the block number. Something like this: 17:0, 0, ‘/’ … 17:0, , ’/a/b/c’ 17:1,1, ‘/a/b/d’ … I’d then need to issue an extra query for ranges that crossed block boundaries. Any suggestions on a better pattern? Thanks Robert From: Aaron Morton aa...@thelastpickle.com Reply-To: user@cassandra.apache.org Date: Tuesday, December 10, 2013 at 12:33 AM To: Cassandra User user@cassandra.apache.org Subject: Re: Exactly one wide row per node for a given CF? But this becomes troublesome if I add or remove nodes. What effectively I want is to partition on the unique id of the record modulus N (id % N; where N is the number of nodes). This is exactly the problem consistent hashing (used by cassandra) is designed to solve. If you hash the key and modulo the number of nodes, adding and removing nodes requires a lot of data to move. I want to be able to randomly distribute a large set of records but keep them clustered in one wide row per node. Sounds like you should revisit your data modelling, this is a pretty well known anti pattern. When rows get above a few 10’s of MB things can slow down, when they get above 50 MB they can be a pain, when they get above 100MB it’s a warning sign. And when they get above 1GB, well you you don’t want to know what happens then. It’s a bad idea and you should take another look at the data model. If you have to do it, you can try the ByteOrderedPartitioner which uses the row key as a token, given you total control of the row placement. Cheers - Aaron Morton New Zealand @aaronmorton Co-Founder Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 4/12/2013, at 8:32 pm, Vivek Mishra mishra.v...@gmail.com wrote: So Basically you want to create a cluster of multiple unique keys, but data which belongs to one unique should be colocated. correct? -Vivek On Tue, Dec 3, 2013 at 10:39 AM, onlinespending onlinespend...@gmail.com wrote: Subject says it all. I want to be able to randomly distribute a large set of records but keep them clustered in one wide row per node. As an example, lets say I’ve got a collection of about 1 million records each with a unique id. If I just go ahead and set the primary key (and therefore the partition key) as the unique id, I’ll get very good random distribution across my server cluster. However, each record will be its own row. I’d like to have each record belong to one large wide row (per server node) so I can have them sorted or clustered on some other column. If I say have 5 nodes in my cluster, I could randomly assign a value of 1 - 5 at the time of creation and have the partition key set to this value. But this becomes troublesome if I add or remove nodes. What effectively I want is to partition on the unique id
Re: Exactly one wide row per node for a given CF?
Pretty much yes. Although I think it’d be nice if Cassandra handled such a case, I’ve resigned to the fact that it cannot at the moment. The workaround will be to partition on the LSB portion of the id (giving 256 rows spread amongst my nodes) which allows room for scaling, and then cluster each row on geohash or something else. Basically this desire all stems from wanting efficient use of memory. Frequently accessed keys’ values are kept in RAM through the OS page cache. But the page size is 4KB. This is a problem if you are accessing several small records of data (say 200 bytes), since each record only occupies a small % of a page. This is why it’s important to increase the probability that neighboring data on the disk is relevant. Worst case would be to read in a full 4KB page into RAM, of which you’re only accessing one record that’s a couple hundred bytes. All of the other unused data of the page is wastefully occupying RAM. Now project this problem to a collection of millions of small records all indiscriminately and randomly scattered on the disk, and you can easily see how inefficient your memory usage will become. That’s why it’s best to cluster data in some meaningful way, all in an effort to increasing the probability that when one record is accessed in that 4KB block that its neighboring records will also be accessed. This brings me back to the question of this thread. I want to randomly distribute the data amongst the nodes to avoid hot spotting, but within each node I want to cluster the data meaningfully such that the probability that neighboring data is relevant is increased. An example of this would be having a huge collection of small records that store basic user information. If you partition on the unique user id, then you’ll get nice random distribution but with no ability to cluster (each record would occupy its own row). You could partition on say geographical region, but then you’ll end up with hot spotting when one region is more active than another. So ideally you’d like to randomly assign a node to each record to increase parallelism, but then cluster all records on a node by say geohash since it is more likely (however small that may be) that when one user from a geographical region is accessed other users from the same region will also need to be accessed. It’s certainly better than having some random user record next to the one you are accessing at the moment. On Dec 3, 2013, at 11:32 PM, Vivek Mishra mishra.v...@gmail.com wrote: So Basically you want to create a cluster of multiple unique keys, but data which belongs to one unique should be colocated. correct? -Vivek On Tue, Dec 3, 2013 at 10:39 AM, onlinespending onlinespend...@gmail.com wrote: Subject says it all. I want to be able to randomly distribute a large set of records but keep them clustered in one wide row per node. As an example, lets say I’ve got a collection of about 1 million records each with a unique id. If I just go ahead and set the primary key (and therefore the partition key) as the unique id, I’ll get very good random distribution across my server cluster. However, each record will be its own row. I’d like to have each record belong to one large wide row (per server node) so I can have them sorted or clustered on some other column. If I say have 5 nodes in my cluster, I could randomly assign a value of 1 - 5 at the time of creation and have the partition key set to this value. But this becomes troublesome if I add or remove nodes. What effectively I want is to partition on the unique id of the record modulus N (id % N; where N is the number of nodes). I have to imagine there’s a mechanism in Cassandra to simply randomize the partitioning without even using a key (and then clustering on some column). Thanks for any help.
Inefficiency with large set of small documents?
I’m trying to decide what noSQL database to use, and I’ve certainly decided against mongodb due to its use of mmap. I’m wondering if Cassandra would also suffer from a similar inefficiency with small documents. In mongodb, if you have a large set of small documents (each much less than the 4KB page size) you will require far more RAM to fit your working set into memory, since a large percentage of a 4KB chunk could very easily include infrequently accessed data outside of your working set. Cassandra doesn’t use mmap, but it would still have to intelligently discard the excess data that does not pertain to a small document that exists in the same allocation unit on the hard disk when reading it into RAM. As an example lets say your cluster size is 4KB as well, and you have 1000 small 256 byte documents that are scattered on the disk that you want to fetch on a given query (the total number of documents is over 1 billion). I want to make sure it only consumes roughly 256,000 bytes for those 1000 documents and not 4,096,000 bytes. When it first fetches a cluster from disk it may consume 4KB of cache, but it should ultimately only ideally consume the relevant amount of bytes in RAM. If Cassandra just indiscriminately uses RAM in 4KB blocks than that is unacceptable to me, because if my working set at any given time is just 20% of my huge collection of small sized documents, I don’t want to have to use servers with 5X as much RAM. That’s a huge expense. Thanks, Ben P.S. Here’s a detailed post I made this morning in the mongodb user group about this topic. People have often complained that because mongodb memory maps everything and leaves memory management to the OS's virtual memory system, the swapping algorithm isn't optimized for database usage. I disagree with this. For the most part, the swapping or paging algorithm itself can't be much better than the sophisticated algorithms (such as LRU based ones) that OSes have refined over many years. Why reinvent the wheel? Yes, you could potentially ensure that certain data (such as the indexes) never get swapped out to disk, because even if they haven't been accessed recently the cost of reading them back into memory will be too costly when they are in fact needed. But that's not the bigger issue. It breaks down with small documents than page size This is where using virtual memory for everything really becomes an issue. Suppose you've got a bunch of really tiny documents (e.g. ~256 bytes) that are much smaller than the virtual memory page size (e.g. 4KB). Now let's say that you've determined that your working set (e.g. those documents in your collection that constitute say 99% of those accessed in a given hour) to be 20GB. But your entire collection size is actually 100GB (it's just that 20% of your documents are much much likely to be accessed in a given time period. It's not uncommon that a small minority of documents will be accessed a large majority of the time). If your collection is randomly distributed (such as would happen if you simply inserted new documents into your collection) then in this example only about 20% of the documents that fit onto a 4KB page will be part of the working set (i.e. the data that you need frequent access to at the moment). The rest of the data will be made up of much less frequently accessed documents, that should ideally be sitting on disk. So there's a huge inefficiency here. 80% of the data that is in RAM is not even something I need to frequently access. In this example, I would need 5X the amount of RAM to accommodate my working set. Now, as a solution to this problem, you could separate your documents into two (or even a few) collections with the grouping done by access frequency. The problem with this, is that your working set can often change as a function of time of day and day of week. If your application is global, your working set will be far different during 12pm local in NY vs 12pm local in Tokyo. But more even more likely is that the working set is constantly changing as new data is inserted into the database. Popularity of a document is often viral. As an example, a photo that's posted on a social network may start off infrequently accessed but then quickly after hundreds of likes could become very frequently accessed and part of your working set. You'd need to actively monitor your documents and manually move a document from one collection to the other, which is very inefficient. Quite frankly this is not a burden that should be placed on the user anyways. By punting the problem of memory management to the OS, mongodb requires the user to essentially do its job and group data in a way that patches the inefficiencies in its memory management. As far as I'm concerned, not until mongodb steps up and takes control of memory management can it be taken seriously for very large datasets that often require many small documents with ever changing