Re: Random Distribution, yet Order Preserving Partitioner
Hi Takenori, I can't tell for sure without knowing what kind of data you have and how much you have.You can use the random partitioner and use the concept of metadata row that stores the row key, as for example like below {metadata_row}: key1 | key2 | key3 key1:column1 | column2 When you do the read you can always directly query by the key, if you already know it. In the case of range queries, first you query the metadata_row and get the keys you want in the ordered fashion. Then you can do multi_get to get you actual data. The downside is you have to do two read queries, and depending on how much data you have you will end up with a wide metadata row. Manoj On Fri, Aug 23, 2013 at 8:47 AM, Takenori Sato ts...@cloudian.com wrote: Hi Nick, token and key are not same. it was like this long time ago (single MD5 assumed single key) True. That reminds me of making a test with the latest 1.2 instead of our current 1.0! if you want ordered, you probably can arrange your data in a way so you can get it in ordered fashion. Yeah, we have done for a long time. That's called a wide row, right? Or a compound primary key. It can handle some millions of columns, but not more like 10M. I mean, a request for such a row concentrates on a particular node, so the performance degrades. I also had idea for semi-ordered partitioner - instead of single MD5, to have two MD5's. Sounds interesting. But, we need a fully ordered result. Anyway, I will try with the latest version. Thanks, Takenori On Thu, Aug 22, 2013 at 6:12 PM, Nikolay Mihaylov n...@nmmm.nu wrote: my five cents - token and key are not same. it was like this long time ago (single MD5 assumed single key) if you want ordered, you probably can arrange your data in a way so you can get it in ordered fashion. for example long ago, i had single column family with single key and about 2-3 M columns - I do not suggest you to do it this way, because is wrong way, but it is easy to understand the idea. I also had idea for semi-ordered partitioner - instead of single MD5, to have two MD5's. then you can get semi-ordered ranges, e.g. you get ordered all cities in Canada, all cities in US and so on. however in this way things may get pretty non-ballanced Nick On Thu, Aug 22, 2013 at 11:19 AM, Takenori Sato ts...@cloudian.comwrote: Hi, I am trying to implement a custom partitioner that evenly distributes, yet preserves order. The partitioner returns a token by BigInteger as RandomPartitioner does, while does a decorated key by string as OrderPreservingPartitioner does. * for now, since IPartitionerT does not support different types for token and key, BigInteger is simply converted to string Then, I played around with cassandra-cli. As expected, in my 3 nodes test cluster, get/set worked, but list(get_range_slices) didn't. This came from a challenge to overcome a wide row scalability. So, I want to make it work! I am aware that some efforts are required to make get_range_slices work. But are there any other critical problems? For example, it seems there is an assumption that token and key are the same. If this is throughout the whole C* code, this partitioner is not practical. Or have your tried something similar? I would appreciate your feedback! Thanks, Takenori
Re: Large number of files for Leveled Compaction
Not in the case of LeveledCompaction. Only SizeTieredCompaction merges smaller sstables into large ones. With the LeveledCompaction, the sstables are always of fixed size but they are grouped into different levels. You can refer to this page http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra on details of how LeveledCompaction works. Cheers Manoj On Mon, Jun 17, 2013 at 1:54 PM, Franc Carter franc.car...@sirca.org.auwrote: On Mon, Jun 17, 2013 at 2:47 PM, Manoj Mainali mainalima...@gmail.comwrote: With LeveledCompaction, each sstable size is fixed and is defined by sstable_size_in_mb in the compaction configuration of CF definition and default value is 5MB. In you case, you may have not defined your own value, that is why your each sstable is 5MB. And if you dataset is huge, you will see a lot of sstable counts. Ok, seems like I do have (at least) an incomplete understanding. I realise that the minimum size is 5MB, but I thought compaction would merge these into a smaller number of larger sstables ? thanks Cheers Manoj On Fri, Jun 7, 2013 at 1:44 PM, Franc Carter franc.car...@sirca.org.auwrote: Hi, We are trialling Cassandra-1.2(.4) with Leveled compaction as it looks like it may be a win for us. The first step of testing was to push a fairly large slab of data into the Column Family - we did this much faster ( x100) than we would in a production environment. This has left the Column Family with about 140,000 files in the Column Family directory which seems way too high. On two of the nodes the CompactionStats show 2 outstanding tasks and on a third node there are over 13,000 outstanding tasks. However from looking at the log activity it looks like compaction has finished on all nodes. Is this number of files expected/normal ? cheers -- *Franc Carter* | Systems architect | Sirca Ltd marc.zianideferra...@sirca.org.au franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 8355 2514 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215 -- *Franc Carter* | Systems architect | Sirca Ltd marc.zianideferra...@sirca.org.au franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 8355 2514 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215
PerRowSecondaryIndex uses
I am looking into the C* secondary index feature so that I could query the rows based on the column value. In my use case, I wanted to create index of several columns or maybe all columns of a row. (A single row does not have many columns, maybe around 50 - 100 columns) and was looking into PerRowSecondaryIndex, but I could not find a way to create index using this. First, C* does not have a default implementation of it like KeysIndex for PerColumnSecondaryIndex, so I implemented my own. However, I could not find a way on how I could define it on the client side. The only way I was able to use it was to define the PerRowSecondaryIndex class when creating a column index as follows (assuming astyanax client) ColumnDefinition cd = cluster.makeColumnDefinition(); cd.setName(Index1); cd.setIndex(INDEX_NAME, CUSTOM); cd.setValidationClass(AsciiType); cd.setOption(class_name, org.apache.cassandra.db.index.PerRowSecondaryIndexImpl); cfDef.addColumnDefinition(cd); But, this approach would mean although I am creating a index for the whole row I am doing it through a index creation of a single column. Is there a better way of creating the row level index? Any examples on how to do that, if any? Best regards, Manoj
Re: Flushing column families individually in cassandra
In the older versions it was possible, but, in C* 1.2 it is a global configuration so you won't be able to configure it per CF basis. Manoj On Tue, Jun 11, 2013 at 10:32 AM, Tanya Malik sonichedg...@gmail.comwrote: Is it possible in C* 1.2 to configure column families to be flushed individually? So, if I have 3 CFs, and one of them gets full to 64 MB and the others are only at 5 MB, will only the first full CF get flushed to a SStable? Also, is it possible to configure different sized memtables for different column families?
Re: SSDs w/ C* only for commit log?
You can refer to this conversation here http://comments.gmane.org/gmane.comp.db.cassandra.user/27366 Manoj On Tue, Jun 11, 2013 at 10:01 AM, Tanya Malik sonichedg...@gmail.comwrote: If I understand the C* architecture correctly, in order to increase write speed, I only need to put the commit log on SSDs. When the memtable gets flushed to the SStable file later on, that can go to traditional spinning disks, since that happens much later after the write successful has already been returned to the client. So, in the YAML file, I can configure the commitlog_directory to go to SSD and the data_file_directories to go to spinning disks. Would I get any write speed benefits if I had stored the SStables also on SSD?
Re: Cassandra Evaluation/ Benchmarking: Throughput not scaling as expected neither latency showing good numbers
How kind of client are you using in YCSB? If you want to improve latency, try distributing the requests among nodes instead of stressing a single node, try host connection pooling instead of creating connection for each request. Check high level clients like hector or asyantax for use if you are not already using them. Some clients have ring aware request handling. You have a 3 nodes cluster and using a RF of three, that means all the node will get the data. What CL are you using for writes? Latency increases for strong CL. If you want to increase throughput, try increasing the number of clients. Of course, it doesnt mean that throughtput will always increase. My observation was that it will increase and after certain number of clients throughput decrease again. Regards, Manoj Mainali On Wednesday, July 18, 2012, Code Box wrote: The cassandra stress tool gives me values around 2.5 milli seconds for writing. The problem with the Cassandra Stress Tool is that it just gives the average latency numbers and the average latency numbers that i am getting are comparable in some cases. It is the 95 percentile and 99 percentile numbers are the ones that are bad. So it means that the 95% of requests are really bad and the rest 5% are really good that makes the average go down. I want to make sure that the 95% and 99% values are in one digit milli seconds. I want them to be single digit because i have seen people getting those numbers. This is my conclusion till now with all the investigations:- Three node cluster with replication factor of 3 gets me around 10 ms 100% writes with consistency equal to ONE. The reads are really bad and they are around 65ms. I thought that network is the issue so i moved the client on a local machine. Client on the local machine with one node cluster gives me again good average write latencies but the 99%ile and 95%ile are bad. I am getting around 10 ms for write and 25 ms for read. Network Bandwidth between the client and server is 1 Gigabit/second. I was able to at the max generate 25 K requests. So it could be the client is the bottleneck. I am using YCSB. May be i should change my client to some other. Throughput that i got from a client at the maximum local was 35K and remote was 17K. I can try these things now:- Use a different client and see how much numbers i get for 99% and 95%. I am not sure if there is any client that gives me this detailed or i have to write one of my own. Tweak some hard disk settings raid0 and xfs / ext4 and see if that helps. Could be a possibility that the cassandra 0.8 to 1.1 the 95% and 99% numbers have gone down. The throughput numbers have also gone down. Is there any other client that i can use except the cassandra stress tool and YCSB and what ever numbers i have got are they good ? --Akshat Vig. On Tue, Jul 17, 2012 at 9:22 PM, aaron morton aa...@thelastpickle.comwrote: I would benchmark a default installation, then start tweaking. That way you can see if your changes result in improvements. To simplify things further try using the tools/stress utility in the cassandra source distribution first. It's pretty simple to use. Add clients until you see the latency increase and tasks start to back up in nodetool tpstats. If you see it report dropped messages it is over loaded. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 18/07/2012, at 4:48 AM, Code Box wrote: Thanks a lot for your reply guys. I was trying fsyn = batch and window =0ms to see if the disk utilization is happening full on my drive. I checked the numbers using iostat the numbers were around 60% and the CPU usage was also not too high. Configuration of my Setup :- I have three m1.xlarge hosts each having 15 GB RAM and 4 CPU. It has 8 EC2 Compute Units. I have kept the replication factor equal to 3. The typical write size is 1 KB. I tried adding different nodes each with 200 threads and the throughput got split into two. If i do it from a single host with FSync Set to Periodic and Window Size equal to 1000ms and using two nodes i am getting these numbers :- [OVERALL], Throughput(ops/sec), 4771 [INSERT], AverageLatency(us), 18747 [INSERT], MinLatency(us), 1470 [INSERT], MaxLatency(us), 446413 [INSERT], 95thPercentileLatency(ms), 55 [INSERT], 99thPercentileLatency(ms), 167 [OVERALL], Throughput(ops/sec), 4678 [INSERT], AverageLatency(us), 22015 [INSERT], MinLatency(us), 1439 [INSERT], MaxLatency(us), 466149 [INSERT], 95thPercentileLatency(ms), 62 [INSERT], 99thPercentileLatency(ms), 171 Is there something i am doing wrong in cassandra Setup ?? What is the bet Setup for Cassandra to get high throughput and good write latency numbers ? On Tue, Jul 17, 2012 at 7:02 AM, Sylvain Lebresne sylv...@datastax.com
Re: Cassandra Evaluation/ Benchmarking: Throughput not scaling as expected neither latency showing good numbers
Is the Threads in your data the number of clients? How much heap space does each node have? YCSB has a paper on their benchmark tests. You can try comparing your result with theirs and see if you have similarity. Best regards, Manoj On Tuesday, July 17, 2012, Code Box wrote: I am doing Cassandra Benchmarking using YCSB for evaluating the best performance for my application which will be both read and write intensive. I have set up a three cluster environment on EC2 and i am using YCSB in the same availability region as a client. I have tried various combinations of tuning cassandra parameters like FSync ( Setting to batch and periodic ), Increasing the number of rpc_threads, increasing number of concurrent reads and concurrent writes, write consistency one and Quorum i am not getting very great results and also i do not see a linear graph in terms of scalability that is if i increase the number of clients i do not see an increase in the throughput. Here are some sample numbers that i got :- *Test 1:- Write Consistency set to Quorum Write Proportion = 100%. FSync = Batch and Window = 0ms* ThreadsThroughput ( write per sec ) Avg Latency (ms)TP95(ms) TP99(ms) Min(ms)Max(ms) 102149 3.1984 51.499291 1004070 23.82870 2.22602004151 45.96571301.7 1242 300419764.68 1154222.09 216 If you look at the numbers the number of threads do not increase the throughput. Also the latency values are not that great. I am using fsync set to batch and with 0 ms window. *Test 2:- ** Write Consistency set to Quorum Write Proportion = 100%. FSync = Periodic and Window = 1000 ms* * * 1803 1.23712 1.012312.9Q 10015944 5.343925 1.21579.1Q 200196309.047 1970 1.17 1851Q Are these numbers expected numbers or does Cassandra perform better ? Am i missing something ?
Cassandra keeps on logging Finished hinted handoff of 0 rows to endpoint
Hi, I have been running Cassandra 1.0.7 and in the log file I see the log saying Finished hinted handoff of 0 rows to endpoint /{ipaddress} The above issue can be reproduced by the following steps, 1. Start a cluster with 2 node, suppose node1 and node2 2. Create a keyspace with rf=2, create column family 3. Stop node2 4. Insert some rows, suppose 100, to cluster with consistency level 1 5. Restart node2 When the node2 is restarted, node1 sends the hints to the node2 and from the log I see that 100 rows are sent. But, after that in the interval of approximately 10 mins, Cassandra logs Finished hinted handoff of 0 rows to the endpoint .. When I do the list hintscolumnfamily from the cassandra-cli, it shows a result of 1 row, but no columns data. There seems to be a issue raised before, https://issues.apache.org/jira/browse/CASSANDRA-3733, and it says it is fixed in 1.0.7. However, I keep seeing the above log. It seems that the Cassandra is trying to send hints message even when all the hints are delivered and there are no more hints left. Is there a way to solve the above issue? Recently, another issue was also raised https://issues.apache.org/jira/browse/CASSANDRA-3935 and they are similar, but I am not sure if they are caused by the same reason. Does anyone know how to solve the issue? Thanks, Manoj
Re: Cassandra keeps on logging Finished hinted handoff of 0 rows to endpoint
Thanks. On Saturday, February 25, 2012, Brandon Williams dri...@gmail.com wrote: It's a special case of a single sstable existing for hints: https://issues.apache.org/jira/browse/CASSANDRA-3955 On Fri, Feb 24, 2012 at 5:43 AM, Manoj Mainali mainalima...@gmail.com wrote: Hi, I have been running Cassandra 1.0.7 and in the log file I see the log saying Finished hinted handoff of 0 rows to endpoint /{ipaddress} The above issue can be reproduced by the following steps, 1. Start a cluster with 2 node, suppose node1 and node2 2. Create a keyspace with rf=2, create column family 3. Stop node2 4. Insert some rows, suppose 100, to cluster with consistency level 1 5. Restart node2 When the node2 is restarted, node1 sends the hints to the node2 and from the log I see that 100 rows are sent. But, after that in the interval of approximately 10 mins, Cassandra logs Finished hinted handoff of 0 rows to the endpoint .. When I do the list hintscolumnfamily from the cassandra-cli, it shows a result of 1 row, but no columns data. There seems to be a issue raised before, https://issues.apache.org/jira/browse/CASSANDRA-3733, and it says it is fixed in 1.0.7. However, I keep seeing the above log. It seems that the Cassandra is trying to send hints message even when all the hints are delivered and there are no more hints left. Is there a way to solve the above issue? Recently, another issue was also raised https://issues.apache.org/jira/browse/CASSANDRA-3935 and they are similar, but I am not sure if they are caused by the same reason. Does anyone know how to solve the issue? Thanks, Manoj