Re: Nodetool doesn't shows two nodes

2013-02-17 Thread Jared Biel
This is something that I found while using the multi-region snitch -
it uses public IPs for communication. See the original ticket here:
https://issues.apache.org/jira/browse/CASSANDRA-2452. It'd be nice if
it used the private IPs to communicate with nodes that are in the same
region as itself, but I do not believe this is the case. Be aware that
you will be charged for external data transfer even for nodes in the
same region because the traffic will not fall under their free (for
same AZ) or reduced (for intra-AZ) tiers.

If you continue using this snitch in the mean time, it is not
necessary (or recommended) to have those ports open to 0.0.0.0/0.
You'll simply need to add the public IPs of your C* servers to the
correct security group(s) to allow access.

There's something else that's a little strange about the EC2 snitches:
"us-east-1" is (incorrectly) represented as the datacenter "us-east".
Other regions are recognized and named properly (us-west-2 for
example) This is kind-of covered in the ticket here:
https://issues.apache.org/jira/browse/CASSANDRA-4026 I wish it could
be fixed properly.

Good luck!


On 17 February 2013 16:16, Boris Solovyov  wrote:
> OK. I got it. I realized that storage_port wasn't actually open between the
> nodes, because it is using the public IP. (I did find this information in
> the docs, after looking more... it is in section on "Types of snitches." It
> explains everything I found by try and error.)
>
> After opening this port 7000 to all IP addresses, the cluster boots OK and
> the two nodes see each other. Now I have the happy result. But my nodes are
> wide open to the entire internet on port 7000. This is a serious problem.
> This obviously can't be put into production.
>
> I definitely need cross-continent deployment. Single AZ or single region
> deployment is not going to be enough. How do people solve this in practice?


Re: Size Tiered -> Leveled Compaction

2013-02-17 Thread Wei Zhu
We doubled the SStable size to 10M. It still generates a lot of SSTable and we 
don't see much difference of the read latency.  We are able to finish the 
compactions after repair within serveral hours. We will increase the SSTable 
size again if we feel the number of SSTable hurts the performance. 

- Original Message -
From: "Mike" 
To: user@cassandra.apache.org
Sent: Sunday, February 17, 2013 4:50:40 AM
Subject: Re: Size Tiered -> Leveled Compaction


Hello Wei, 

First thanks for this response. 

Out of curiosity, what SSTable size did you choose for your usecase, and what 
made you decide on that number? 

Thanks, 
-Mike 

On 2/14/2013 3:51 PM, Wei Zhu wrote: 




I haven't tried to switch compaction strategy. We started with LCS. 


For us, after massive data imports (5000 w/seconds for 6 days), the first 
repair is painful since there is quite some data inconsistency. For 150G nodes, 
repair brought in about 30 G and created thousands of pending compactions. It 
took almost a day to clear those. Just be prepared LCS is really slow in 1.1.X. 
System performance degrades during that time since reads could go to more 
SSTable, we see 20 SSTable lookup for one read.. (We tried everything we can 
and couldn't speed it up. I think it's single threaded and it's not 
recommended to turn on multithread compaction. We even tried that, it didn't 
help )There is parallel LCS in 1.2 which is supposed to alleviate the pain. 
Haven't upgraded yet, hope it works:) 


http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2 





Since our cluster is not write intensive, only 100 w/seconds. I don't see any 
pending compactions during regular operation. 


One thing worth mentioning is the size of the SSTable, default is 5M which is 
kind of small for 200G (all in one CF) data set, and we are on SSD. It more 
than 150K files in one directory. (200G/5M = 40K SSTable and each SSTable 
creates 4 files on disk) You might want to watch that and decide the SSTable 
size. 


By the way, there is no concept of Major compaction for LCS. Just for fun, you 
can look at a file called $CFName.json in your data directory and it tells you 
the SSTable distribution among different levels. 


-Wei 





From: Charles Brophy  
To: user@cassandra.apache.org 
Sent: Thursday, February 14, 2013 8:29 AM 
Subject: Re: Size Tiered -> Leveled Compaction 


I second these questions: we've been looking into changing some of our CFs to 
use leveled compaction as well. If anybody here has the wisdom to answer them 
it would be of wonderful help. 


Thanks 
Charles 


On Wed, Feb 13, 2013 at 7:50 AM, Mike < mthero...@yahoo.com > wrote: 


Hello, 

I'm investigating the transition of some of our column families from Size 
Tiered -> Leveled Compaction. I believe we have some high-read-load column 
families that would benefit tremendously. 

I've stood up a test DB Node to investigate the transition. I successfully 
alter the column family, and I immediately noticed a large number (1000+) 
pending compaction tasks become available, but no compaction get executed. 

I tried running "nodetool sstableupgrade" on the column family, and the 
compaction tasks don't move. 

I also notice no changes to the size and distribution of the existing SSTables. 

I then run a major compaction on the column family. All pending compaction 
tasks get run, and the SSTables have a distribution that I would expect from 
LeveledCompaction (lots and lots of 10MB files). 

Couple of questions: 

1) Is a major compaction required to transition from size-tiered to leveled 
compaction? 
2) Are major compactions as much of a concern for LeveledCompaction as their 
are for Size Tiered? 

All the documentation I found concerning transitioning from Size Tiered to 
Level compaction discuss the alter table cql command, but I haven't found too 
much on what else needs to be done after the schema change. 

I did these tests with Cassandra 1.1.9. 

Thanks, 
-Mike 







Re: Nodetool doesn't shows two nodes

2013-02-17 Thread Boris Solovyov
OK. I got it. I realized that storage_port wasn't actually open between the
nodes, because it is using the public IP. (I did find this information in
the docs, after looking more... it is in section on "Types of snitches." It
explains everything I found by try and error.)

After opening this port 7000 to all IP addresses, the cluster boots OK and
the two nodes see each other. Now I have the happy result. But my nodes are
wide open to the entire internet on port 7000. This is a serious problem.
This obviously can't be put into production.

I definitely need cross-continent deployment. Single AZ or single region
deployment is not going to be enough. How do people solve this in practice?


Re: nodetool repair with vnodes

2013-02-17 Thread Marco Matarazzo
>> So, to me, it's like the "nodetool repair" command is running always on the 
>> same single node and repairing everything.
> If you use nodetool repair without the -pr flag in your setup (3 nodes and I 
> assume RF 3) it will repair all token ranges in the cluster. 

That's correct, 3 nodes and RF 3. Sorry for not specifying it in the beginning.


So, running it periodically on just one node is enough for cluster maintenance 
? Does this depends on the fact that every vnode data is related with the 
previous and next vnode, and this particular setup makes this enough as it 
cover every physical node?


Also: running it with -pr does output:

[2013-02-17 12:29:25,293] Nothing to repair for keyspace 'system'
[2013-02-17 12:29:25,301] Starting repair command #2, repairing 1 ranges for 
keyspace keyspace_test
[2013-02-17 12:29:28,028] Repair session 487d0650-78f5-11e2-a73a-2f5b109ee83c 
for range (-9177680845984855691,-9171525326632276709] finished
[2013-02-17 12:29:28,028] Repair command #2 finished

… that, as far as I can understand, works on the first vnode on the specified 
node, or so it seems from the output range. Am I right? Is there a way to run 
it only for all vnodes on a single physical node ?

Thank you!

--
Marco Matarazzo


Re: Nodetool doesn't shows two nodes

2013-02-17 Thread Boris Solovyov
No, it doesn't works, same thing: both nodes seems to just exist solo and I
have 2 single-node clusters :-( OK, so now I am confused, and hope list
will help me out. To understand what wrong, I think I need to know what
happens in node bootstrap, and in node join ring. Who does node
communicate, on which address? What information it exchanges? What happens
then? What this process looks like normally?

I have read all docs, several time, don't think I missed it, so it might
not be explain there clearly. I will look again, and look to source code
next.

- Boris


On Sun, Feb 17, 2013 at 4:48 PM, Boris Solovyov wrote:

> Aha! I think I might have something breakthrough. I tried setting public
> IP in listen_address (and therefore in broadcast_address, because as I
> understand it inherits if it is commented out), and in seeds list.  Node
> fails to start, because Cassandra cannot bind to public IP address: it does
> not exists on box. Of course! This is why I cannot see it in ifconfig.
>
> SO, my next theory,
>
>- set listen_address to private IP
>- set broadcast_address to public IP, tells other nodes how to connect
>- set seeds to public IP
>
> I will try this next and continue flood your inbox with stream of
> consciousness try-and-error ;-)
>


Re: Is C* common nickname for Cassandra?

2013-02-17 Thread Boris Solovyov
Is hard to say, really. I guess just feels like not very serious, overly
casual, which mean not treating the project with respect? I guess I believe
if you want something treated with respect you must demonstrate how
seriously you take it oneself. I am sure this is personal opinion only, but
perhaps it is shared by others. Enterprise Pointy Haired Boss might make
purchase decision on this criteria instead of technical merits. You know
they make decision based on how pretty project logo is half the time :-)

Hope this helps
Boris


On Sun, Feb 17, 2013 at 4:42 PM, Michael Kjellman
wrote:

> Why do you feel that link is unprofessional? Just wondering. I actually
> quite like the abbreviation personally.
>
> On Feb 17, 2013, at 1:37 PM, "Boris Solovyov" 
> wrote:
>
> Thanks. I don't know if anyone cares my opinion, but as a newcomer to the
> community, my feedback is that it is not needed. At best it confuses a
> newbie and makes him feel like an outsider. At worst it just looks totally
> unprofessional, like here:
> http://www.planetcassandra.org/blog/post/calling-all-apache-cassandra-speakers
>  it
> is hard to form a good opinion of Cassandra project when it is being
> discussed like that.
>
> Hopefully this is helpful constructive criticism and not just useless
> flamebait or trollbait.
>
> Boris
>
>
> On Fri, Feb 8, 2013 at 11:51 AM, Tyler Hobbs  wrote:
>
>> Yes, C* is short for Cassandra.
>>
>>
>> On Fri, Feb 8, 2013 at 10:43 AM, Boris Solovyov > > wrote:
>>
>>> I see people refer to C* and I assume it mean Cassandra, but just wanted
>>> to check for sure. In case it is somethings else and I miss it :) Do I
>>> right understand?
>>>
>>
>>
>>
>> --
>> Tyler Hobbs
>> DataStax 
>>
>
>


RE: Deleting old items during compaction (WAS: Deleting old items)

2013-02-17 Thread Ilya Grebnov
According to https://issues.apache.org/jira/browse/CASSANDRA-2103 There is
no support for time to live (TTL) on counter columns. Did I miss something?

 

Thanks,

Ilya

From: aaron morton [mailto:aa...@thelastpickle.com] 
Sent: Sunday, February 17, 2013 9:16 AM
To: user@cassandra.apache.org
Subject: Re: Deleting old items during compaction (WAS: Deleting old items)

 

That's what the TTL does. 

 

Manually delete all the older data now, then start using TTL. 

 

Cheers

 

-

Aaron Morton

Freelance Cassandra Developer

New Zealand

 

@aaronmorton

http://www.thelastpickle.com

 

On 13/02/2013, at 11:08 PM, Ilya Grebnov  wrote:





Hi,

 

We looking for solution for same problem. We have a wide column family with
counters and we want to delete old data like 1 months old. One of potential
ideas was to implement hook in compaction code and drop column which we
don't need. Is this a viable option?

 

Thanks,

Ilya

From: aaron morton [mailto:aaron@ 
thelastpickle.com] 
Sent: Tuesday, February 12, 2013 9:01 AM
To:   user@cassandra.apache.org
Subject: Re: Deleting old items

 

So is it possible to delete all the data inserted in some CF between 2 dates
or data older than 1 month ?

No. 

 

You need to issue row level deletes. If you don't know the row key you'll
need to do range scans to locate them. 

 

If you are deleting parts of wide rows consider reducing the
min_compaction_level_threshold on the CF to 2

 

Cheers

 

 

-

Aaron Morton

Freelance Cassandra Developer

New Zealand

 

@aaronmorton

  http://www.thelastpickle.com

 

On 12/02/2013, at 4:21 AM, Alain RODRIGUEZ < 
arodr...@gmail.com> wrote:






Hi,

 

I would like to know if there is a way to delete old/unused data easily ?

 

I know about TTL but there are 2 limitations of TTL:

 

- AFAIK, there is no TTL on counter columns

- TTL need to be defined at write time, so it's too late for data already
inserted.

 

I also could use a standard "delete" but it seems inappropriate for such a
massive.

 

In some cases, I don't know the row key and would like to delete all the
rows starting by, let's say, "1050#..." 

 

Even better, I understood that columns are always inserted in C* with (name,
value, timestamp). So is it possible to delete all the data inserted in some
CF between 2 dates or data older than 1 month ?

 

Alain

 

 



Re: Is C* common nickname for Cassandra?

2013-02-17 Thread Michael Kjellman
Why do you feel that link is unprofessional? Just wondering. I actually quite 
like the abbreviation personally.

On Feb 17, 2013, at 1:37 PM, "Boris Solovyov" 
mailto:boris.solov...@gmail.com>> wrote:

Thanks. I don't know if anyone cares my opinion, but as a newcomer to the 
community, my feedback is that it is not needed. At best it confuses a newbie 
and makes him feel like an outsider. At worst it just looks totally 
unprofessional, like here: 
http://www.planetcassandra.org/blog/post/calling-all-apache-cassandra-speakers 
it is hard to form a good opinion of Cassandra project when it is being 
discussed like that.

Hopefully this is helpful constructive criticism and not just useless flamebait 
or trollbait.

Boris


On Fri, Feb 8, 2013 at 11:51 AM, Tyler Hobbs 
mailto:ty...@datastax.com>> wrote:
Yes, C* is short for Cassandra.


On Fri, Feb 8, 2013 at 10:43 AM, Boris Solovyov 
mailto:boris.solov...@gmail.com>> wrote:
I see people refer to C* and I assume it mean Cassandra, but just wanted to 
check for sure. In case it is somethings else and I miss it :) Do I right 
understand?



--
Tyler Hobbs
DataStax



Re: Is C* common nickname for Cassandra?

2013-02-17 Thread Boris Solovyov
Thanks. I don't know if anyone cares my opinion, but as a newcomer to the
community, my feedback is that it is not needed. At best it confuses a
newbie and makes him feel like an outsider. At worst it just looks totally
unprofessional, like here:
http://www.planetcassandra.org/blog/post/calling-all-apache-cassandra-speakers
it
is hard to form a good opinion of Cassandra project when it is being
discussed like that.

Hopefully this is helpful constructive criticism and not just useless
flamebait or trollbait.

Boris


On Fri, Feb 8, 2013 at 11:51 AM, Tyler Hobbs  wrote:

> Yes, C* is short for Cassandra.
>
>
> On Fri, Feb 8, 2013 at 10:43 AM, Boris Solovyov 
> wrote:
>
>> I see people refer to C* and I assume it mean Cassandra, but just wanted
>> to check for sure. In case it is somethings else and I miss it :) Do I
>> right understand?
>>
>
>
>
> --
> Tyler Hobbs
> DataStax 
>


Re: Nodetool doesn't shows two nodes

2013-02-17 Thread Boris Solovyov
Hi,

I've checked all things Alain suggested and set up a fresh 2-node cluster,
and I still get the same result: each node lists itself as only one.

This time I made the following changes:

   - I set listen_address to the public DNS name. Internally, AWS's DNS
   will map this to the 10.x IP, so this should work correctlly if I
   understand right. These are new EC2 instances, and I did not trust
   configured hostname or so on.
   - I opened all ports between nodes in security group.
   - I kept the snitch at Ec2MultiRegionSnitch. This cluster is small now
   but it will be very large and nationwide if I succeed and choose Cassandra
   for this purpose. Do I right understand that it is not possible to change
   this later, or at least is not easy?
   - I ensured all Alain suggestions, for example cluster_name is same with
   all nodes.
   - I set seed list to public DNS name of first node. This is identical on
   both node.
   - I checked Alain's suggest about auto_bootstrap. Docs say this is not
   needed to set. Is this docs wrong? (I look at DataStax 1.2 PDF docs)

Here is some more debugging evidence. On node 1, the seed,

[root@ip-10-113-19-24 ~]# ifconfig | grep inet.addr
  inet addr:10.113.19.24  Bcast:10.113.19.255  Mask:255.255.254.0
[root@ip-10-113-19-24 ~]# nodetool status
Datacenter: us-east
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens  Owns   Host ID
  Rack
UN  23.22.204.201 20.97 KB   256 100.0%
 4fadd4fd-c57c-4172-95aa-092368ba5743  1a
[root@ip-10-113-19-24 ~]# netstat -antp
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address   Foreign Address
State   PID/Program name
tcp0  0 0.0.0.0:71990.0.0.0:*
LISTEN  1910/java
tcp0  0 0.0.0.0:47298   0.0.0.0:*
LISTEN  1910/java
tcp0  0 0.0.0.0:57030   0.0.0.0:*
LISTEN  1910/java
tcp0  0 0.0.0.0:91600.0.0.0:*
LISTEN  1910/java
tcp0  0 0.0.0.0:90420.0.0.0:*
LISTEN  1910/java
tcp0  0 0.0.0.0:22  0.0.0.0:*
LISTEN  1231/sshd
tcp0  0 10.113.19.24:7000   0.0.0.0:*
LISTEN  1910/java
tcp0  1 10.113.19.24:38948  54.234.147.60:7000
 SYN_SENT1910/java
tcp0  0 10.113.19.24:7000   10.113.19.24:45328
 ESTABLISHED 1910/java
tcp0  0 10.113.19.24:7000   10.114.205.157:47713
 ESTABLISHED 1910/java
tcp0  1 10.113.19.24:45597  23.22.204.201:7000
 SYN_SENT1910/java
tcp0  0 10.113.19.24:45328  10.113.19.24:7000
ESTABLISHED 1910/java

And in the log,

 INFO 20:58:12,472 Node /23.22.204.201 state jump to normal
 INFO 20:58:12,482 Startup completed! Now serving reads.

Now, this looks similar to the problem before with the private IP addresses
being used some times, public other times. By the way, the other node,
whose internal IP address is 10.114.205.157, is connected to this seed node
as you can see.

I think I could understand this problem if I understand which types of
network connections I should expect to see in the netstat, and what output
I should expect to see in the log. Can someone with more experience tell me
what is wrong/unexpected above? And am I working against Amazon's
architecture by using IPs the way I do?

While I wait for answer, I will shut down, delete all data, and reconfigure
with public IP addresses explicitly and not use DNS names :-) I have a
feeling this is the problem. From within Amazon EC2 server, requesting DNS
for a public DNS name returns the private IP address. (However, I still
feel unsure about what is right way to do this, because I do not know if
Cassandra will use DNS resolve and end up trying to connect to a private IP
that Cassandra is not listening.)

Thanks,
- Boris



On Wed, Feb 13, 2013 at 10:37 AM, Boris Solovyov
wrote:

> Thank you Alain. I will check the things you suggest and report my results.
>
> - Boris
>
>
> On Wed, Feb 13, 2013 at 7:54 AM, Alain RODRIGUEZ wrote:
>
>> Hi Boris.
>>
>> "I feel like I have made a beginner's mistake"
>> That's an horrible feeling :D. I'll try to help ;)
>>
>> "cluster_name: 'TS'"
>> Are you sure you used the same name for both node ?
>>
>> "I can connect to port 7000"
>> You can check all the ports needed there
>> http://www.datastax.com/docs/1.2/install/install_ami and open them in
>> security group once and for all so you won't be wondering this anymore.
>>
>> "listen_address: 10.145.232.190"
>> "INFO 19:36:32,710 Node /107.22.114.19 state jump to normal"
>> There is "10.145.232.190" defined as listen address and you logs says
>> that 107.22.114.19 joined the ring and your second ip seems to be
>> 23.21.11.193... When you stop an EC2 server, its internal ip may change.
>> So I recommend you not to do so, but restart them instead. Anyway

Re: nodetool repair with vnodes

2013-02-17 Thread aaron morton
> …so it seems to me that it is running on all vnodes ranges.
Yes.

> Also, whatever the node which I launch the command on is, only one node log 
> is "moving" and is always the same node. 
Not sure what you mean here. 

> So, to me, it's like the "nodetool repair" command is running always on the 
> same single node and repairing everything.
If you use nodetool repair without the -pr flag in your setup (3 nodes and I 
assume RF 3) it will repair all token ranges in the cluster. 

> Is there anything I'm missing ?
Look for messages with "session completed" in the log from the 
AntiEntropyService

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/02/2013, at 12:51 AM, Marco Matarazzo  wrote:

> Greetings. 
> 
> I'm trying to run "nodetool repair" on a Cassandra 1.2.1 cluster of 3 nodes 
> with 256 vnodes each.
> 
> On a pre-1.2 cluster I used to launch a "nodetool repair" on every node every 
> 24hrs. Now I'm getting a differenf behavior, and I'm sure I'm missing 
> something.
> 
> What I see on the command line is: 
> 
> [2013-02-17 10:20:15,186] Starting repair command #1, repairing 768 ranges 
> for keyspace goh_master
> [2013-02-17 10:48:13,401] Repair session 3d140e10-78e3-11e2-af53-d344dbdd69f5 
> for range (6556914650761469337,6580337080281832001] finished
> (…repeat the last line 767 times)
> 
> …so it seems to me that it is running on all vnodes ranges.
> 
> Also, whatever the node which I launch the command on is, only one node log 
> is "moving" and is always the same node. 
> 
> So, to me, it's like the "nodetool repair" command is running always on the 
> same single node and repairing everything.
> 
> I'm sure I'm making some mistakes, and I just can't find any clue of what's 
> wrong with my nodetool usage on the documentation (if anything is wrong, 
> btw). Is there anything I'm missing ?
> 
> --
> Marco Matarazzo
> 
> 



Re: Is there any consolidated literature about Read/Write and Data Consistency in Cassandra ?

2013-02-17 Thread aaron morton
If you want the underlying ideas try the Dynamo paper, the Big Table paper and 
the original Cassandra paper from facebook. 

Start here http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/02/2013, at 7:40 AM, mateus  wrote:

> Like articles with tests and conclusions about it, and such, and not like the 
> documentation in DataStax, or the Cassandra Books.
> 
> Thank you.
> 



Re: Deleting old items

2013-02-17 Thread aaron morton
I'll email the docs people. 

I believe they are saying "use compaction throttling rather than this" not 
"this does nothing"

Although I used this in the last month on a machine with very little ram to 
limit compaction memory use.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/02/2013, at 7:05 AM, Alain RODRIGUEZ  wrote:

> "Can you point to the docs."
> 
> http://www.datastax.com/docs/1.1/configuration/storage_configuration#max-compaction-threshold
> 
> And thanks about the rest of your answers, once again ;-).
> 
> Alain
> 
> 
> 2013/2/16 aaron morton 
>>  Is that a feature that could possibly be developed one day ?
> No. 
> Timestamps are essentially internal implementation used to resolve different 
> values for the same column. 
> 
>> With "min_compaction_level_threshold" did you mean 
>> "min_compaction_threshold"  ? If so, why should I do that, what are the 
>> advantage/inconvenient of reducing this value ?
> 
> Yes, min_compaction_threshold, my bad. 
> If you have a wide row and delete a lot of values you will end up with a lot 
> of tombstones. These may dramatically reduce the read performance until they 
> are purged. Reducing the compaction threshold makes compaction happen more 
> frequently. 
> 
>> Looking at the doc I saw that: "max_compaction_threshold: Ignored in 
>> Cassandra 1.1 and later.". How to ensure that I'll always keep a small 
>> amount of SSTables then ?
> AFAIK it's not. 
> There may be some confusion about the location of the settings in CLI vs CQL. 
> Can you point to the docs. 
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 13/02/2013, at 10:14 PM, Alain RODRIGUEZ  wrote:
> 
>> Hi Aaron, once again thanks for this answer.
>>> "So is it possible to delete all the data inserted in some CF between 2 
>>> dates or data older than 1 month ?"
>> "No. "
>> 
>> Why is there no way of deleting or getting data using the internal timestamp 
>> stored alongside of any inserted column (as described here: 
>> http://www.datastax.com/docs/1.1/ddl/column_family#standard-columns) ? Is 
>> that a feature that could possibly be developed one day ? It could be useful 
>> to perform delete of old data or to bring to a dev cluster just the last 
>> week of data for example.
>> 
>> With "min_compaction_level_threshold" did you mean 
>> "min_compaction_threshold"  ? If so, why should I do that, what are the 
>> advantage/inconvenient of reducing this value ?
>> 
>> Looking at the doc I saw that: "max_compaction_threshold: Ignored in 
>> Cassandra 1.1 and later.". How to ensure that I'll always keep a small 
>> amount of SSTables then ? Why is this deprecated ?
>> 
>> Alain
>> 
>> 
>> 2013/2/12 aaron morton 
>>> So is it possible to delete all the data inserted in some CF between 2 
>>> dates or data older than 1 month ?
>> No. 
>> 
>> You need to issue row level deletes. If you don't know the row key you'll 
>> need to do range scans to locate them. 
>> 
>> If you are deleting parts of wide rows consider reducing the 
>> min_compaction_level_threshold on the CF to 2
>> 
>> Cheers
>> 
>> 
>> -
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 12/02/2013, at 4:21 AM, Alain RODRIGUEZ  wrote:
>> 
>>> Hi,
>>> 
>>> I would like to know if there is a way to delete old/unused data easily ?
>>> 
>>> I know about TTL but there are 2 limitations of TTL:
>>> 
>>> - AFAIK, there is no TTL on counter columns
>>> - TTL need to be defined at write time, so it's too late for data already 
>>> inserted.
>>> 
>>> I also could use a standard "delete" but it seems inappropriate for such a 
>>> massive.
>>> 
>>> In some cases, I don't know the row key and would like to delete all the 
>>> rows starting by, let's say, "1050#..." 
>>> 
>>> Even better, I understood that columns are always inserted in C* with 
>>> (name, value, timestamp). So is it possible to delete all the data inserted 
>>> in some CF between 2 dates or data older than 1 month ?
>>> 
>>> Alain
>> 
>> 
> 
> 



Re: unsubscribe

2013-02-17 Thread Michael Kjellman
Please see the Mailing Lists section of the home page.

http://cassandra.apache.org

user-unsubscr...@cassandra.apache.org



From: James Wong mailto:jwong...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Sunday, February 17, 2013 12:06 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: unsubscribe


On Feb 17, 2013 10:27 AM, "puneet loya" 
mailto:puneetl...@gmail.com>> wrote:
>
> unsubscribe me please.
>
> Thank you


unsubscribe

2013-02-17 Thread James Wong
On Feb 17, 2013 10:27 AM, "puneet loya"  wrote:
>
> unsubscribe me please.
>
> Thank you


Re: can we pull rows out compressed from cassandra(lots of rows)?

2013-02-17 Thread aaron morton
No. 
The rows are uncompressed deep down in the IO stack. 

There is compression in the binary protocol 
http://www.datastax.com/dev/blog/binary-protocol 
https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=blob_plain;f=doc/native_protocol.spec;hb=refs/heads/cassandra-1.2

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 16/02/2013, at 9:35 AM, "Hiller, Dean"  wrote:

> 
> Thanks,
> Dean



Re: cassandra vs. mongodb quick question

2013-02-17 Thread aaron morton
If you have spinning disk and 1G networking and no virtual nodes, I would still 
say 300G to 500G is a soft limit. 

If you are using virtual nodes, SSD, JBOD disk configuration or faster 
networking you may go higher. 

The limiting factors are the time it take to repair, the time it takes to 
replace a node, the memory considerations for 100's of millions of rows. If you 
the performance of those operations is acceptable to you, then go crazy. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 16/02/2013, at 9:05 AM, "Hiller, Dean"  wrote:

> So I found out mongodb varies their node size from 1T to 42T per node 
> depending on the profile.  So if I was going to be writing a lot but rarely 
> changing rows, could I also use cassandra with a per node size of +20T or is 
> that not advisable?
> 
> Thanks,
> Dean



Re: odd production issue today 1.1.4

2013-02-17 Thread aaron morton
There is always this old chestnut 
http://wiki.apache.org/cassandra/FAQ#ubuntu_hangs

A
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 16/02/2013, at 8:22 AM, Edward Capriolo  wrote:

> With hyper threading a core can show up as two or maybe even four
> physical system processors, this is something the kernel does.
> 
> On Fri, Feb 15, 2013 at 11:41 AM, Hiller, Dean  wrote:
>> We ran into an issue today where website became around 10 times slower.  We 
>> found out node 5 out of our 6 nodes was hitting 2100% cpu (cat /proc/cpuinfo 
>> reveals a 16 processor machine).  I am really not sure how to hit 2100% 
>> unless we had 21 processors.  It bounces between 300% and 2100% so I tried 
>> to a do a thread dump and had to use –F which then hotspot hit a nullpointer 
>> :(.
>> 
>> I copied off all my logs after restarting(should have done it before 
>> restarting it).  Any ideas what I could even look for as to what went wrong 
>> with this node?
>> 
>> Also, we know our astyanax for some reason is not setup properly yet so we 
>> probably would not have seen an issue had we had all nodes in the seed 
>> list(which we changed today) as astyanax is supposed to be measuring time 
>> per request and changing which nodes it hits but we know it only hits nodes 
>> in our seedlist right now as we have not fixed that yet.  Our astyanax was 
>> hitting 3,4,5,6 and did not have 1 and 2 in the seed list (we rollout a new 
>> version next wed. with the new seedlist including the last two delaying the 
>> dynamic discovery config we need to look at).
>> 
>> Thanks,
>> Dean
>> 
>> Commands I ran with jstack that didn't work out too well….
>> 
>> [cassandra@a5 ~]$ jstack -l 20907 > threads.txt
>> 20907: Unable to open socket file: target process not responding or HotSpot 
>> VM not loaded
>> The -F option can be used when the target process is not responding
>> [cassandra@a5 ~]$ jstack -l -F  20907 > threads.txt
>> Attaching to process ID 20907, please wait...
>> Debugger attached successfully.
>> Server compiler detected.
>> JVM version is 20.7-b02
>> java.lang.NullPointerException
>> at 
>> sun.jvm.hotspot.oops.InstanceKlass.computeSubtypeOf(InstanceKlass.java:426)
>> at sun.jvm.hotspot.oops.Klass.isSubtypeOf(Klass.java:137)
>> at sun.jvm.hotspot.oops.Oop.isA(Oop.java:100)
>> at sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:93)
>> at sun.jvm.hotspot.runtime.DeadlockDetector.print(DeadlockDetector.java:39)
>> at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:52)
>> at sun.jvm.hotspot.tools.StackTrace.run(StackTrace.java:45)
>> at sun.jvm.hotspot.tools.JStack.run(JStack.java:60)
>> at sun.jvm.hotspot.tools.Tool.start(Tool.java:221)
>> at sun.jvm.hotspot.tools.JStack.main(JStack.java:86)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>> at sun.tools.jstack.JStack.runJStackTool(JStack.java:118)
>> at sun.tools.jstack.JStack.main(JStack.java:84)
>> [cassandra@a5 ~]$ java -version
>> java version "1.6.0_32"



Re: unsubscribe

2013-02-17 Thread Dave Brosius

On 02/17/2013 01:26 PM, puneet loya wrote:

unsubscribe me please.

Thank you


if only directions were followed:

http://hadonejob.com/images/full/102.jpg


send to

user-unsubscr...@cassandra.apache.org




RE: NPE in running "ClientOnlyExample"

2013-02-17 Thread Jain Rahul
Thanks Edward,

My Bad. I was confused as It does seems to create keyspace also, As I 
understand (although i'm not sure)

   List cfDefList = new ArrayList();
CfDef columnFamily = new CfDef(KEYSPACE, COLUMN_FAMILY);
cfDefList.add(columnFamily);
try
{
client.system_add_keyspace(new KsDef(KEYSPACE, 
"org.apache.cassandra.locator.SimpleStrategy", 1, cfDefList));
int magnitude = client.describe_ring(KEYSPACE).size();

Can I request you to please point me to some examples with I can start. I try 
to see some example from hector but it does seems to be in-line with 
Cassandra's 1.1 version.

Regards,
Rahul


-Original Message-
From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
Sent: 17 February 2013 21:49
To: user@cassandra.apache.org
Subject: Re: NPE in running "ClientOnlyExample"

This is a bad example to follow. This is the internal client the Cassandra 
nodes use to talk to each other (fat client) usually you do not use this unless 
you want to write some embedded code on the Cassandra server.

Typically clients use thrift/native transport. But you are likely getting the 
error you are seeing because the keyspace or column family is not created yet.

On Sat, Feb 16, 2013 at 11:41 PM, Jain Rahul  wrote:
> Hi All,
>
>
>
> I am newbie to Cassandra and trying to run an example program
> "ClientOnlyExample"  taken from
> https://raw.github.com/apache/cassandra/cassandra-1.2/examples/client_only/src/ClientOnlyExample.java.
> But while executing  the program it gives me a null pointer exception.
> Can you guys please help me out what I am missing.
>
>
>
> I am using Cassandra 1.2.1 version. I have pasted the logs at
> http://pastebin.com/pmADWCYe
>
>
>
> Exception in thread "main" java.lang.NullPointerException
>
>   at
> org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:71)
>
>   at
> org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:66)
>
>   at
> org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:61)
>
>   at
> org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:56)
>
>   at org.apache.cassandra.db.RowMutation.add(RowMutation.java:183)
>
>   at org.apache.cassandra.db.RowMutation.add(RowMutation.java:204)
>
>   at ClientOnlyExample.testWriting(ClientOnlyExample.java:78)
>
>   at ClientOnlyExample.main(ClientOnlyExample.java:135)
>
>
>
> Regards,
>
> Rahul
>
> This email and any attachments are confidential, and may be legally
> privileged and protected by copyright. If you are not the intended
> recipient dissemination or copying of this email is prohibited. If you
> have received this in error, please notify the sender by replying by
> email and then delete the email completely from your system. Any views
> or opinions are solely those of the sender. This communication is not
> intended to form a binding contract unless expressly indicated to the 
> contrary and properly authorised.
> Any actions taken on the basis of this email are at the recipient's
> own risk.
This email and any attachments are confidential, and may be legally privileged 
and protected by copyright. If you are not the intended recipient dissemination 
or copying of this email is prohibited. If you have received this in error, 
please notify the sender by replying by email and then delete the email 
completely from your system. Any views or opinions are solely those of the 
sender. This communication is not intended to form a binding contract unless 
expressly indicated to the contrary and properly authorised. Any actions taken 
on the basis of this email are at the recipient's own risk.


unsubscribe

2013-02-17 Thread puneet loya
unsubscribe me please.

Thank you


Re: Question on Cassandra Snapshot

2013-02-17 Thread aaron morton
> With incremental_backup turned OFF in Cassandra.yaml - Are all SSTables are 
> under /data/TestKeySpace/ColumnFamily at all times?
No. 
They are deleted when they are compacted and no internal operations are 
referencing them. 

> With incremental_backup turned ON in cassandra.yaml - Are current SSTables 
> under /data/TestKeySpace/ColumnFamily/ with a hardlink to 
> /data/TestKeySpace/ColumnFamily/backups? 
Yes, sort of. 
*All* SSTables ever created are in the backups directory. 
Not just the ones currently "live".

> Lets say I have taken snapshot and moved the 
> /data/TestKeySpace/ColumnFamily/snapshots//*.db to tape, at 
> what point should I be backing up *.db files from 
> /data/TestKeySpace/ColumnFamily/backups directory. Also, should I be deleting 
> the *.db files whose inode matches with the files in the snapshot? Is that a 
> correct approach? 
Backup all files in the snapshots. There may be non .db extensions files if you 
use levelled compactions
When you are finished with the snapshot delete it. If the inode is not longer 
referenced from the live data dir it will be deleted. 

> I noticed /data/TestKeySpace/ColumnFamily/snapshots/-ColumnFamily/ 
> what are these  directories?
Probably automatic snapshot from dropping KS or CF's

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 16/02/2013, at 4:41 AM, S C  wrote:

> I appreciate any advise or pointers on this.
> 
> Thanks in advance.
> 
> From: as...@outlook.com
> To: user@cassandra.apache.org
> Subject: Question on Cassandra Snapshot
> Date: Thu, 14 Feb 2013 20:47:14 -0600
> 
> I have been looking at incremental backups and snapshots. I have done some 
> experimentation but could not come to a conclusion. Can somebody please help 
> me understanding it right?
> 
> /data is my data partition
> 
> With incremental_backup turned OFF in Cassandra.yaml - Are all SSTables are 
> under /data/TestKeySpace/ColumnFamily at all times?
> With incremental_backup turned ON in cassandra.yaml - Are current SSTables 
> under /data/TestKeySpace/ColumnFamily/ with a hardlink to 
> /data/TestKeySpace/ColumnFamily/backups? 
> Lets say I have taken snapshot and moved the 
> /data/TestKeySpace/ColumnFamily/snapshots//*.db to tape, at 
> what point should I be backing up *.db files from 
> /data/TestKeySpace/ColumnFamily/backups directory. Also, should I be deleting 
> the *.db files whose inode matches with the files in the snapshot? Is that a 
> correct approach? 
> I noticed /data/TestKeySpace/ColumnFamily/snapshots/-ColumnFamily/ 
> what are these  directories?
> 
> Thanks in advance. 
> SC



Re: [nodetool] repair with vNodes

2013-02-17 Thread aaron morton
I'm a bit late, but for reference. 

Repair runs in two stages, first differences are detected. You an monitor the 
validation compaction with nodetool compactionstats. 

Then the differences are streamed between the nodes, you can monitor that with 
nodetool netstats. 

> Nodetool repair command has been running for almost 24hours and I can’t see 
> any activity from the logs or JMX.
Grep for "session completed"

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 15/02/2013, at 11:38 PM, Haithem Jarraya  wrote:

> Hi,
>  
> I am new to Cassandra and I would like to hear your thoughts on this.
> We are running our tests with Cassandra 1.2.1, in relatively small dataset 
> ~60GB.
> Nodetool repair command has been running for almost 24hours and I can’t see 
> any activity from the logs or JMX.
> What am I missing? Or there is a problem with node tool repair?
> What other commands that I can run to do a sanity check on the cluster?
> Can I run nodetool repair on different node in the same time?
>  
>  
> Here is the current test deployment of Cassandra
> $ nodetool status
> Datacenter: ams01 (Replication Factor 2)
> =
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address   Load   Tokens  Owns   Host ID   
> Rack
> UN  10.70.48.23   38.38 GB   256 19.0%  
> 7c5fdfad-63c6-4f37-bb9f-a66271aa3423  RAC1
> UN  10.70.6.7858.13 GB   256 18.3%  
> 94e7f48f-d902-4d4a-9b87-81ccd6aa9e65  RAC1
> UN  10.70.47.126  53.89 GB   256 19.4%  
> f36f1f8c-1956-4850-8040-b58273277d83  RAC1
> Datacenter: wdc01 (Replication Factor 1)
> =
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address   Load   Tokens  Owns   Host ID   
> Rack
> UN  10.24.116.66  65.81 GB   256 22.1%  
> f9dba004-8c3d-4670-94a0-d301a9b775a8  RAC1
> Datacenter: sjc01 (Replication Factor 1)
> =
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address   Load   Tokens  Owns   Host ID   
> Rack
> UN  10.55.104.90  63.31 GB   256 21.2%  
> 4746f1bd-85e1-4071-ae5e-9c5baac79469  RAC1
>  
>  
> Many Thanks,
>  
> Haithem
>  



Re: Mutation dropped

2013-02-17 Thread aaron morton
You are hitting the maximum throughput on the cluster. 

The messages are dropped because the node fails to start processing them before 
rpc_timeout. 

However the request is still a success because the client requested CL was 
achieved. 

Testing with RF 2 and CL 1 really just tests the disks on one local machine. 
Both nodes replicate each row, and writes are sent to each replica, so the only 
thing the client is waiting on is the local node to write to it's commit log. 

Testing with (and running in prod) RF3 and CL QUROUM is a more real world 
scenario. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 15/02/2013, at 9:42 AM, Kanwar Sangha  wrote:

> Hi – Is there a parameter which can be tuned to prevent the mutations from 
> being dropped ? Is this logic correct ?
>  
> Node A and B with RF=2, CL =1. Load balanced between the two.
>  
> --  Address   Load   Tokens  Owns (effective)  Host ID
>Rack
> UN  10.x.x.x   746.78 GB  256 100.0%
> dbc9e539-f735-4b0b-8067-b97a85522a1a  rack1
> UN  10.x.x.x   880.77 GB  256 100.0%
> 95d59054-be99-455f-90d1-f43981d3d778  rack1
>  
> Once we hit a very high TPS (around 50k/sec of inserts), the nodes start 
> falling behind and we see the mutation dropped messages. But there are no 
> failures on the client. Does that mean other node is not able to persist the 
> replicated data ? Is there some timeout associated with replicated data 
> persistence ?
>  
> Thanks,
> Kanwar
>  
>  
>  
>  
>  
>  
>  
> From: Kanwar Sangha [mailto:kan...@mavenir.com] 
> Sent: 14 February 2013 09:08
> To: user@cassandra.apache.org
> Subject: Mutation dropped
>  
> Hi – I am doing a load test using YCSB across 2 nodes in a cluster and seeing 
> a lot of mutation dropped messages.  I understand that this is due to the 
> replica not being written to the
> other node ? RF = 2, CL =1.
>  
> From the wiki -
> For MUTATION messages this means that the mutation was not applied to all 
> replicas it was sent to. The inconsistency will be repaired by Read Repair or 
> Anti Entropy Repair
>  
> Thanks,
> Kanwar
>  



Re: Deleting old items during compaction (WAS: Deleting old items)

2013-02-17 Thread aaron morton
That's what the TTL does. 

Manually delete all the older data now, then start using TTL. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 13/02/2013, at 11:08 PM, Ilya Grebnov  wrote:

> Hi,
>  
> We looking for solution for same problem. We have a wide column family with 
> counters and we want to delete old data like 1 months old. One of potential 
> ideas was to implement hook in compaction code and drop column which we don’t 
> need. Is this a viable option?
>  
> Thanks,
> Ilya
> From: aaron morton [mailto:aa...@thelastpickle.com] 
> Sent: Tuesday, February 12, 2013 9:01 AM
> To: user@cassandra.apache.org
> Subject: Re: Deleting old items
>  
> So is it possible to delete all the data inserted in some CF between 2 dates 
> or data older than 1 month ?
> No. 
>  
> You need to issue row level deletes. If you don't know the row key you'll 
> need to do range scans to locate them. 
>  
> If you are deleting parts of wide rows consider reducing the 
> min_compaction_level_threshold on the CF to 2
>  
> Cheers
>  
>  
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>  
> @aaronmorton
> http://www.thelastpickle.com
>  
> On 12/02/2013, at 4:21 AM, Alain RODRIGUEZ  wrote:
> 
> 
> Hi,
>  
> I would like to know if there is a way to delete old/unused data easily ?
>  
> I know about TTL but there are 2 limitations of TTL:
>  
> - AFAIK, there is no TTL on counter columns
> - TTL need to be defined at write time, so it's too late for data already 
> inserted.
>  
> I also could use a standard "delete" but it seems inappropriate for such a 
> massive.
>  
> In some cases, I don't know the row key and would like to delete all the rows 
> starting by, let's say, "1050#..." 
>  
> Even better, I understood that columns are always inserted in C* with (name, 
> value, timestamp). So is it possible to delete all the data inserted in some 
> CF between 2 dates or data older than 1 month ?
>  
> Alain
>  



Re: NPE in running "ClientOnlyExample"

2013-02-17 Thread Edward Capriolo
This is a bad example to follow. This is the internal client the
Cassandra nodes use to talk to each other (fat client) usually you do
not use this unless you want to write some embedded code on the
Cassandra server.

Typically clients use thrift/native transport. But you are likely
getting the error you are seeing because the keyspace or column family
is not created yet.

On Sat, Feb 16, 2013 at 11:41 PM, Jain Rahul  wrote:
> Hi All,
>
>
>
> I am newbie to Cassandra and trying to run an example program
> “ClientOnlyExample”  taken from
> https://raw.github.com/apache/cassandra/cassandra-1.2/examples/client_only/src/ClientOnlyExample.java.
> But while executing  the program it gives me a null pointer exception. Can
> you guys please help me out what I am missing.
>
>
>
> I am using Cassandra 1.2.1 version. I have pasted the logs at
> http://pastebin.com/pmADWCYe
>
>
>
> Exception in thread "main" java.lang.NullPointerException
>
>   at org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:71)
>
>   at org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:66)
>
>   at org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:61)
>
>   at org.apache.cassandra.db.ColumnFamily.create(ColumnFamily.java:56)
>
>   at org.apache.cassandra.db.RowMutation.add(RowMutation.java:183)
>
>   at org.apache.cassandra.db.RowMutation.add(RowMutation.java:204)
>
>   at ClientOnlyExample.testWriting(ClientOnlyExample.java:78)
>
>   at ClientOnlyExample.main(ClientOnlyExample.java:135)
>
>
>
> Regards,
>
> Rahul
>
> This email and any attachments are confidential, and may be legally
> privileged and protected by copyright. If you are not the intended recipient
> dissemination or copying of this email is prohibited. If you have received
> this in error, please notify the sender by replying by email and then delete
> the email completely from your system. Any views or opinions are solely
> those of the sender. This communication is not intended to form a binding
> contract unless expressly indicated to the contrary and properly authorised.
> Any actions taken on the basis of this email are at the recipient's own
> risk.


Re: virtual nodes + map reduce = too many mappers

2013-02-17 Thread cem
Thanks Eric for the appreciation :)

Default split size is 64K rows. ColumnFamilyInputFormat first collects all
tokens and create a split for each. if you have 256 vnode for each node
that it creates 256 splits even if you have no data at all. current split
size will only work if you have a vnode that has more than 64K rows.

Possible solution that came to my mind: We can simply
extend ColumnFamilySplit by adding a list of token ranges instead of one.
Than no need create mapper for each token. Each  mapper can
do multiple range queries.  But I don't know how to combine the range
queries because in the typical range query  you need to set start and end
token. But in the virtual nodes I realized that tokens are not continuous.

Best Regards,
Cem

On Sun, Feb 17, 2013 at 2:47 AM, Edward Capriolo wrote:

> Split size does not have to equal block size.
>
>
> http://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/mapred/lib/CombineFileInputFormat.html
>
> An abstract InputFormat that returns CombineFileSplit's in
> InputFormat.getSplits(JobConf, int) method. Splits are constructed
> from the files under the input paths. A split cannot have files from
> different pools. Each split returned may contain blocks from different
> files. If a maxSplitSize is specified, then blocks on the same node
> are combined to form a single split. Blocks that are left over are
> then combined with other blocks in the same rack. If maxSplitSize is
> not specified, then blocks from the same rack are combined in a single
> split; no attempt is made to create node-local splits. If the
> maxSplitSize is equal to the block size, then this class is similar to
> the default spliting behaviour in Hadoop: each block is a locally
> processed split. Subclasses implement
> InputFormat.getRecordReader(InputSplit, JobConf, Reporter) to
> construct RecordReader's for CombineFileSplit's.
>
> Hive offers a CombinedHiveInputFormat
>
> https://issues.apache.org/jira/browse/HIVE-74
>
> Essentially Combined input formats rock hard. If you have a directory
> with say 2000 files, you do not want 2000 splits, and then the
> overhead of starting stopping 2000 mappers.
>
> If you enable CombineInputFormat you can tune mapred.split.size and
> the number of mappers is based (mostly) on the input size. This gives
> jobs that would create too many map tasks way more throughput, and
> stops them from monopolizing the map slots on the cluster.
>
> It would seem like all the extra splits from the vnode change could be
> combined back together.
>
> On Sat, Feb 16, 2013 at 8:21 PM, Jonathan Ellis  wrote:
> > Wouldn't you have more than 256 splits anyway, given a normal amount of
> data?
> >
> > (Default split size is 64k rows.)
> >
> > On Fri, Feb 15, 2013 at 7:01 PM, Edward Capriolo 
> wrote:
> >> Seems like the hadoop Input format should combine the splits that are
> >> on the same node into the same map task, like Hadoop's
> >> CombinedInputFormat can. I am not sure who recommends vnodes as the
> >> default, because this is now the second problem (that I know of) of
> >> this class where vnodes has extra overhead,
> >> https://issues.apache.org/jira/browse/CASSANDRA-5161
> >>
> >> This seems to be the standard operating practice in c* now, enable
> >> things in the default configuration like new partitioners and newer
> >> features like vnodes, even though they are not heavily tested in the
> >> wild or well understood, then deal with fallout.
> >>
> >>
> >> On Fri, Feb 15, 2013 at 11:52 AM, cem  wrote:
> >>> Hi All,
> >>>
> >>> I have just started to use virtual nodes. I set the number of nodes to
> 256
> >>> as recommended.
> >>>
> >>> The problem that I have is when I run a mapreduce job it creates node
> * 256
> >>> mappers. It creates node * 256 splits. this effects the performance
> since
> >>> the range queries have a lot of overhead.
> >>>
> >>> Any suggestion to improve the performance? It seems like I need to
> lower the
> >>> number of virtual nodes.
> >>>
> >>> Best Regards,
> >>> Cem
> >>>
> >>>
> >
> >
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder, http://www.datastax.com
> > @spyced
>


Re: Size Tiered -> Leveled Compaction

2013-02-17 Thread Mike

Hello Wei,

First thanks for this response.

Out of curiosity, what SSTable size did you choose for your usecase, and 
what made you decide on that number?


Thanks,
-Mike

On 2/14/2013 3:51 PM, Wei Zhu wrote:

I haven't tried to switch compaction strategy. We started with LCS.

For us, after massive data imports (5000 w/seconds for 6 days), the 
first repair is painful since there is quite some data inconsistency. 
For 150G nodes, repair brought in about 30 G and created thousands of 
pending compactions. It took almost a day to clear those. Just be 
prepared LCS is really slow in 1.1.X. System performance degrades 
during that time since reads could go to more SSTable, we see 20 
SSTable lookup for one read.. (We tried everything we can and couldn't 
speed it up. I think it's single threaded and it's not recommended 
to turn on multithread compaction. We even tried that, it didn't help 
)There is parallel LCS in 1.2 which is supposed to alleviate the pain. 
Haven't upgraded yet, hope it works:)


http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2


Since our cluster is not write intensive, only 100 w/seconds. I don't 
see any pending compactions during regular operation.


One thing worth mentioning is the size of the SSTable, default is 5M 
which is kind of small for 200G (all in one CF) data set, and we are 
on SSD.  It more than  150K files in one directory. (200G/5M = 40K 
SSTable and each SSTable creates 4 files on disk)  You might want to 
watch that and decide the SSTable size.


By the way, there is no concept of Major compaction for LCS. Just for 
fun, you can look at a file called $CFName.json in your data directory 
and it tells you the SSTable distribution among different levels.


-Wei


*From:* Charles Brophy 
*To:* user@cassandra.apache.org
*Sent:* Thursday, February 14, 2013 8:29 AM
*Subject:* Re: Size Tiered -> Leveled Compaction

I second these questions: we've been looking into changing some of our 
CFs to use leveled compaction as well. If anybody here has the wisdom 
to answer them it would be of wonderful help.


Thanks
Charles

On Wed, Feb 13, 2013 at 7:50 AM, Mike > wrote:


Hello,

I'm investigating the transition of some of our column families
from Size Tiered -> Leveled Compaction.  I believe we have some
high-read-load column families that would benefit tremendously.

I've stood up a test DB Node to investigate the transition.  I
successfully alter the column family, and I immediately noticed a
large number (1000+) pending compaction tasks become available,
but no compaction get executed.

I tried running "nodetool sstableupgrade" on the column family,
and the compaction tasks don't move.

I also notice no changes to the size and distribution of the
existing SSTables.

I then run a major compaction on the column family.  All pending
compaction tasks get run, and the SSTables have a distribution
that I would expect from LeveledCompaction (lots and lots of 10MB
files).

Couple of questions:

1) Is a major compaction required to transition from size-tiered
to leveled compaction?
2) Are major compactions as much of a concern for
LeveledCompaction as their are for Size Tiered?

All the documentation I found concerning transitioning from Size
Tiered to Level compaction discuss the alter table cql command,
but I haven't found too much on what else needs to be done after
the schema change.

I did these tests with Cassandra 1.1.9.

Thanks,
-Mike








nodetool repair with vnodes

2013-02-17 Thread Marco Matarazzo
Greetings. 

I'm trying to run "nodetool repair" on a Cassandra 1.2.1 cluster of 3 nodes 
with 256 vnodes each.

On a pre-1.2 cluster I used to launch a "nodetool repair" on every node every 
24hrs. Now I'm getting a differenf behavior, and I'm sure I'm missing something.

What I see on the command line is: 

[2013-02-17 10:20:15,186] Starting repair command #1, repairing 768 ranges for 
keyspace goh_master
[2013-02-17 10:48:13,401] Repair session 3d140e10-78e3-11e2-af53-d344dbdd69f5 
for range (6556914650761469337,6580337080281832001] finished
(…repeat the last line 767 times)

…so it seems to me that it is running on all vnodes ranges.

Also, whatever the node which I launch the command on is, only one node log is 
"moving" and is always the same node. 

So, to me, it's like the "nodetool repair" command is running always on the 
same single node and repairing everything.

I'm sure I'm making some mistakes, and I just can't find any clue of what's 
wrong with my nodetool usage on the documentation (if anything is wrong, btw). 
Is there anything I'm missing ?

--
Marco Matarazzo