Re: Random Distribution, yet Order Preserving Partitioner

2013-08-27 Thread Manoj Mainali
Hi Takenori,

I can't tell for sure without knowing what kind of data you have and how
much you have.You can use the random partitioner and use the concept of
metadata row that stores the row key, as for example like below

{metadata_row}: key1 | key2 | key3
key1:column1 | column2

 When you do the read you can always directly query by the key, if you
already know it. In the case of range queries, first you query the
metadata_row and get the keys you want in the ordered fashion. Then you can
do multi_get to get you actual data.

The downside is you have to do two read queries, and depending on how much
data you have you will end up with a wide metadata row.

Manoj


On Fri, Aug 23, 2013 at 8:47 AM, Takenori Sato ts...@cloudian.com wrote:

 Hi Nick,

  token and key are not same. it was like this long time ago (single MD5
 assumed single key)

 True. That reminds me of making a test with the latest 1.2 instead of our
 current 1.0!

  if you want ordered, you probably can arrange your data in a way so you
 can get it in ordered fashion.

 Yeah, we have done for a long time. That's called a wide row, right? Or a
 compound primary key.

 It can handle some millions of columns, but not more like 10M. I mean, a
 request for such a row concentrates on a particular node, so the
 performance degrades.

  I also had idea for semi-ordered partitioner - instead of single MD5,
 to have two MD5's.

 Sounds interesting. But, we need a fully ordered result.

 Anyway, I will try with the latest version.

 Thanks,
 Takenori


 On Thu, Aug 22, 2013 at 6:12 PM, Nikolay Mihaylov n...@nmmm.nu wrote:

 my five cents -
 token and key are not same. it was like this long time ago (single MD5
 assumed single key)

 if you want ordered, you probably can arrange your data in a way so you
 can get it in ordered fashion.
 for example long ago, i had single column family with single key and
 about 2-3 M columns - I do not suggest you to do it this way, because is
 wrong way, but it is easy to understand the idea.

 I also had idea for semi-ordered partitioner - instead of single MD5, to
 have two MD5's.
 then you can get semi-ordered ranges, e.g. you get ordered all cities in
 Canada, all cities in US and so on.
 however in this way things may get pretty non-ballanced

 Nick





 On Thu, Aug 22, 2013 at 11:19 AM, Takenori Sato ts...@cloudian.comwrote:

 Hi,

 I am trying to implement a custom partitioner that evenly distributes,
 yet preserves order.

 The partitioner returns a token by BigInteger as RandomPartitioner does,
 while does a decorated key by string as OrderPreservingPartitioner does.
 * for now, since IPartitionerT does not support different types for
 token and key, BigInteger is simply converted to string

 Then, I played around with cassandra-cli. As expected, in my 3 nodes
 test cluster, get/set worked, but list(get_range_slices) didn't.

 This came from a challenge to overcome a wide row scalability. So, I
 want to make it work!

 I am aware that some efforts are required to make get_range_slices work.
 But are there any other critical problems? For example, it seems there is
 an assumption that token and key are the same. If this is throughout the
 whole C* code, this partitioner is not practical.

 Or have your tried something similar?

 I would appreciate your feedback!

 Thanks,
 Takenori






Re: Large number of files for Leveled Compaction

2013-06-16 Thread Manoj Mainali
Not in the case of LeveledCompaction. Only SizeTieredCompaction merges
smaller sstables into large ones. With the LeveledCompaction, the sstables
are always of fixed size but they are grouped into different levels.

You can refer to this page
http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra on
details of how LeveledCompaction works.

Cheers
Manoj


On Mon, Jun 17, 2013 at 1:54 PM, Franc Carter franc.car...@sirca.org.auwrote:

 On Mon, Jun 17, 2013 at 2:47 PM, Manoj Mainali mainalima...@gmail.comwrote:

 With LeveledCompaction, each sstable size is fixed and is defined by
 sstable_size_in_mb in the compaction configuration of CF definition and
 default value is 5MB. In you case, you may have not defined your own value,
 that is why your each sstable is 5MB. And if you dataset is huge, you will
 see a lot of sstable counts.



 Ok, seems like I do have (at least) an incomplete understanding. I realise
 that the minimum size is 5MB, but I thought compaction would merge these
 into a smaller number of larger sstables ?

 thanks


 Cheers

 Manoj


 On Fri, Jun 7, 2013 at 1:44 PM, Franc Carter 
 franc.car...@sirca.org.auwrote:


 Hi,

 We are trialling Cassandra-1.2(.4) with Leveled compaction as it looks
 like it may be a win for us.

 The first step of testing was to push a fairly large slab of data into
 the Column Family - we did this much faster ( x100) than we would in a
 production environment. This has left the Column Family with about 140,000
 files in the Column Family directory which seems way too high. On two of
 the nodes the CompactionStats show 2 outstanding tasks and on a third node
 there are over 13,000 outstanding tasks. However from looking at the log
 activity it looks like compaction has finished on all nodes.

 Is this number of files expected/normal ?

 cheers

 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 8355 2514

 Level 4, 55 Harrington St, The Rocks NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215






 --

 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au

 franc.car...@sirca.org.au | www.sirca.org.au

 Tel: +61 2 8355 2514

 Level 4, 55 Harrington St, The Rocks NSW 2000

 PO Box H58, Australia Square, Sydney NSW 1215





PerRowSecondaryIndex uses

2013-06-12 Thread Manoj Mainali
I am looking into the C* secondary index feature so that I could query the
rows based on the column value. In my use case, I wanted to create index of
 several columns or maybe all columns of a row. (A single row does not have
many columns, maybe around 50 - 100 columns) and was looking into
PerRowSecondaryIndex, but I could not find a way to create index using this.

First, C* does not have a default implementation of it like KeysIndex for
PerColumnSecondaryIndex, so I implemented my own. However, I could not find
a way on how I could define it on the client side.

The only way I was able to use it was to define the PerRowSecondaryIndex
class when creating a column index as follows (assuming astyanax client)

ColumnDefinition cd = cluster.makeColumnDefinition();
cd.setName(Index1);
cd.setIndex(INDEX_NAME, CUSTOM);
cd.setValidationClass(AsciiType);
cd.setOption(class_name,
org.apache.cassandra.db.index.PerRowSecondaryIndexImpl);
cfDef.addColumnDefinition(cd);

But, this approach would mean although I am creating a index for the whole
row I am doing it through a index creation of a single column.

Is there a better way of creating the row level index? Any examples on how
to do that, if any?

Best regards,

Manoj


Re: Flushing column families individually in cassandra

2013-06-10 Thread Manoj Mainali
In the older versions it was possible, but, in C* 1.2 it is a global
configuration so you won't be able to configure it per CF basis.

Manoj


On Tue, Jun 11, 2013 at 10:32 AM, Tanya Malik sonichedg...@gmail.comwrote:

 Is it possible in C* 1.2 to configure column families to be flushed
 individually?

 So, if I have 3 CFs, and one of them gets full to 64 MB and the others are
 only at 5 MB, will only the first full CF get flushed to a SStable?

 Also, is it possible to configure different sized memtables for different
 column families?





Re: SSDs w/ C* only for commit log?

2013-06-10 Thread Manoj Mainali
You can refer to this conversation here
http://comments.gmane.org/gmane.comp.db.cassandra.user/27366

Manoj


On Tue, Jun 11, 2013 at 10:01 AM, Tanya Malik sonichedg...@gmail.comwrote:

 If I understand the C* architecture correctly, in order to increase write
 speed, I only need to put the commit log on SSDs.

 When the memtable gets flushed to the SStable file later on, that can go
 to traditional spinning disks, since that happens much later after the
 write successful has already been returned to the client.

 So, in the YAML file, I can configure the commitlog_directory to go to SSD
 and the data_file_directories to go to spinning disks.

 Would I get any write speed benefits if I had stored the SStables also on
 SSD?




Re: Cassandra Evaluation/ Benchmarking: Throughput not scaling as expected neither latency showing good numbers

2012-07-18 Thread Manoj Mainali
How kind of client are you using in YCSB? If you want to improve latency,
try distributing the requests among nodes instead of stressing a single
node, try host connection pooling instead of creating connection for each
request. Check high level clients like hector or asyantax for use if you
are not already using them. Some clients have ring aware request handling.

You have a 3 nodes cluster and using a RF of three, that means all the node
will get the data. What CL are you using for writes? Latency increases for
strong CL.

If you want to increase throughput, try increasing the number of clients.
Of course, it doesnt mean that throughtput will always increase. My
observation was that it will increase and after certain number of clients
throughput decrease again.

Regards,
Manoj Mainali


On Wednesday, July 18, 2012, Code Box wrote:

 The cassandra stress tool gives me values around 2.5 milli seconds for
 writing. The problem with the Cassandra Stress Tool is that it just gives
 the average latency numbers and the average latency numbers that i am
 getting are comparable in some cases. It is the 95 percentile and 99
 percentile numbers are the ones that are bad. So it means that the 95% of
 requests are really bad and the rest 5% are really good that makes the
 average go down. I want to make sure that the 95% and 99% values are in one
 digit milli seconds. I want them to be single digit because i have seen
 people getting those numbers.

 This is my conclusion till now with all the investigations:-

 Three node cluster with replication factor of 3 gets me around 10 ms 100%
 writes with consistency equal to ONE. The reads are really bad and they are
 around 65ms.

 I thought that network is the issue so i moved the client on a local
 machine. Client on the local machine with one node cluster gives me again
 good average write latencies but the 99%ile and 95%ile are bad. I am
 getting around 10 ms for write and 25 ms for read.

 Network Bandwidth between the client and server is 1 Gigabit/second. I was
 able to at the max generate 25 K requests. So it could be the client is the
 bottleneck. I am using YCSB. May be i should change my client to some other.

 Throughput that i got from a client at the maximum local was 35K and
 remote was 17K.


 I can try these things now:-

 Use a different client and see how much numbers i get for 99% and 95%. I
 am not sure if there is any client that gives me this detailed or i have to
 write one of my own.

 Tweak some hard disk settings raid0 and xfs / ext4 and see if that helps.

 Could be a possibility that the cassandra 0.8 to 1.1 the 95% and 99%
 numbers have gone down.  The throughput numbers have also gone down.

 Is there any other client that i can use except the cassandra stress tool
 and YCSB  and what ever numbers i have got are they good ?


 --Akshat Vig.




 On Tue, Jul 17, 2012 at 9:22 PM, aaron morton aa...@thelastpickle.comwrote:

 I would benchmark a default installation, then start tweaking. That way
 you can see if your changes result in improvements.

 To simplify things further try using the tools/stress utility in the
 cassandra source distribution first. It's pretty simple to use.

 Add clients until you see the latency increase and tasks start to back up
 in nodetool tpstats. If you see it report dropped messages it is over
 loaded.

 Hope that helps.

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 18/07/2012, at 4:48 AM, Code Box wrote:

 Thanks a lot for your reply guys. I was trying fsyn = batch and window
 =0ms to see if the disk utilization is happening full on my drive. I
 checked the  numbers using iostat the numbers were around 60% and the CPU
 usage was also not too high.

 Configuration of my Setup :-

 I have three m1.xlarge hosts each having 15 GB RAM and 4 CPU. It has 8
 EC2 Compute Units.
 I have kept the replication factor equal to 3. The typical write size is 1
 KB.

 I tried adding different nodes each with 200 threads and the throughput
 got split into two. If i do it from a single host with FSync Set to
 Periodic and Window Size equal to 1000ms and using two nodes i am getting
 these numbers :-


 [OVERALL], Throughput(ops/sec), 4771
 [INSERT], AverageLatency(us), 18747
 [INSERT], MinLatency(us), 1470
 [INSERT], MaxLatency(us), 446413
 [INSERT], 95thPercentileLatency(ms), 55
 [INSERT], 99thPercentileLatency(ms), 167

 [OVERALL], Throughput(ops/sec), 4678
 [INSERT], AverageLatency(us), 22015
 [INSERT], MinLatency(us), 1439
 [INSERT], MaxLatency(us), 466149
 [INSERT], 95thPercentileLatency(ms), 62
 [INSERT], 99thPercentileLatency(ms), 171

 Is there something i am doing wrong in cassandra Setup ?? What is the bet
 Setup for Cassandra to get high throughput and good write latency numbers ?



 On Tue, Jul 17, 2012 at 7:02 AM, Sylvain Lebresne sylv...@datastax.com




Re: Cassandra Evaluation/ Benchmarking: Throughput not scaling as expected neither latency showing good numbers

2012-07-17 Thread Manoj Mainali
Is the Threads in your data the number of clients? How much heap space
does each node have?

YCSB has a paper on their benchmark tests. You can try comparing your
result with theirs and see if you have similarity.

Best regards,
Manoj


On Tuesday, July 17, 2012, Code Box wrote:

 I am doing Cassandra Benchmarking using YCSB for evaluating the best
 performance for my application which will be both read and write intensive.
 I have set up a three cluster environment on EC2 and i am using YCSB in the
 same availability region as a client. I have tried various combinations of
 tuning cassandra parameters like FSync ( Setting to batch and periodic ),
 Increasing the number of rpc_threads, increasing number of concurrent reads
 and concurrent writes, write consistency one and Quorum i am not getting
 very great results and also i do not see a linear graph in terms of
 scalability that is if i increase the number of clients i do not see an
 increase in the throughput.

 Here are some sample numbers that i got :-

 *Test 1:-  Write Consistency set to Quorum Write Proportion = 100%. FSync
 = Batch and Window = 0ms*

 ThreadsThroughput ( write per sec ) Avg Latency (ms)TP95(ms) TP99(ms)
 Min(ms)Max(ms)


  102149 3.1984 51.499291   1004070 23.82870 2.22602004151 45.96571301.7
 1242 300419764.68 1154222.09 216


 If you look at the numbers the number of threads do not increase the
 throughput. Also the latency values are not that great. I am using fsync
 set to batch and with 0 ms window.

 *Test 2:- ** Write Consistency set to Quorum Write Proportion = 100%.
 FSync = Periodic and Window = 1000 ms*
 *
 *
 1803 1.23712 1.012312.9Q 10015944 5.343925 1.21579.1Q 200196309.047 1970
 1.17 1851Q
 Are these numbers expected numbers or does Cassandra perform better ? Am i
 missing something ?



Cassandra keeps on logging Finished hinted handoff of 0 rows to endpoint

2012-02-24 Thread Manoj Mainali
Hi,

I have been running Cassandra 1.0.7 and in the log file I see the log saying

 Finished hinted handoff of 0 rows to endpoint /{ipaddress}

The above issue can be reproduced by the following steps,

1. Start a cluster with 2 node, suppose node1 and node2
2. Create a keyspace with rf=2, create column family
3. Stop node2
4. Insert some rows, suppose 100, to cluster with consistency level 1
5. Restart node2

When the node2 is restarted, node1 sends the hints to the node2 and from
the log I see that 100 rows are sent. But, after that in the interval of
approximately 10 mins, Cassandra logs Finished hinted handoff of 0 rows to
the endpoint ..

When I do the list hintscolumnfamily from the cassandra-cli, it shows a
result of 1 row, but no columns data.

There seems to be a issue raised before,
https://issues.apache.org/jira/browse/CASSANDRA-3733, and it says it is
fixed in 1.0.7. However, I keep seeing the above log.

It seems that the Cassandra is trying to send hints message even when all
the hints are delivered and there are no more hints left. Is there a way to
solve the above issue?
Recently, another issue was also raised
https://issues.apache.org/jira/browse/CASSANDRA-3935 and they are similar,
but I am not sure if they are caused by the same reason.

Does anyone know how to solve the issue?

Thanks,
Manoj


Re: Cassandra keeps on logging Finished hinted handoff of 0 rows to endpoint

2012-02-24 Thread Manoj Mainali
Thanks.

On Saturday, February 25, 2012, Brandon Williams dri...@gmail.com wrote:
 It's a special case of a single sstable existing for hints:
 https://issues.apache.org/jira/browse/CASSANDRA-3955

 On Fri, Feb 24, 2012 at 5:43 AM, Manoj Mainali mainalima...@gmail.com
wrote:
 Hi,

 I have been running Cassandra 1.0.7 and in the log file I see the log
saying

  Finished hinted handoff of 0 rows to endpoint /{ipaddress}

 The above issue can be reproduced by the following steps,

 1. Start a cluster with 2 node, suppose node1 and node2
 2. Create a keyspace with rf=2, create column family
 3. Stop node2
 4. Insert some rows, suppose 100, to cluster with consistency level 1
 5. Restart node2

 When the node2 is restarted, node1 sends the hints to the node2 and from
the
 log I see that 100 rows are sent. But, after that in the interval of
 approximately 10 mins, Cassandra logs Finished hinted handoff of 0 rows
to
 the endpoint ..

 When I do the list hintscolumnfamily from the cassandra-cli, it shows a
 result of 1 row, but no columns data.

 There seems to be a issue raised
 before, https://issues.apache.org/jira/browse/CASSANDRA-3733, and it
says it
 is fixed in 1.0.7. However, I keep seeing the above log.

 It seems that the Cassandra is trying to send hints message even when all
 the hints are delivered and there are no more hints left. Is there a way
to
 solve the above issue?
 Recently, another issue was also
 raised https://issues.apache.org/jira/browse/CASSANDRA-3935 and they are
 similar, but I am not sure if they are caused by the same reason.

 Does anyone know how to solve the issue?

 Thanks,
 Manoj