Re: Help on MMap of SSTables

2012-12-05 Thread Ravikumar Govindarajan
Thanks Aaron,

I found the implementation in CLibrary.trySkipCache() method which uses
fadvise DONTNEED flag after going through
https://issues.apache.org/jira/browse/CASSANDRA-1470

I also came across the link mentioned in JIRA
http://blog.mikemccandless.com/2010/06/lucene-and-fadvisemadvise.html?showComment=1303235497682#c2572106601600642254

which says 2.6.29 version above has implemented madvise SEQUENTIAL in a
better manner.

So for memory mapped files, compaction can do a madvise SEQUENTIAL instead
of current DONTNEED flag after detecting appropriate OS versions. Will this
help?

--
Ravi

On Thu, Dec 6, 2012 at 8:19 AM, aaron morton wrote:

> Background http://en.wikipedia.org/wiki/Memory-mapped_file
>
> Is it going to load only relevant pages per SSTable on read or is it going
> to load an entire SSTable on first access?
>
> It will load what is requested, and maybe some additional data taking into
> account the amount of memory available for caches.
>
> Say suppose compaction kicks in. Will it then evict hot MMapped pages for
> read and substitute it with a lot of pages involving full SSTables?
>
> Some file access in cassandra, such as compaction, hints to the OS that
> the reads should not be cached. Technically is uses posix_fadvise if you
> want to look it up.
>
> Cheers
>
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 5/12/2012, at 11:04 PM, Ravikumar Govindarajan <
> ravikumar.govindara...@gmail.com> wrote:
>
> Thanks Aaron,
>
> I am not quite clear on how MMap loads SSTables other than the fact that
> it kicks in only during a first-time access
>
> Is it going to load only relevant pages per SSTable on read or is it going
> to load an entire SSTable on first access?
>
> Say suppose compaction kicks in. Will it then evict hot MMapped pages for
> read and substitute it with a lot of pages involving full SSTables?
>
> --
> Ravi
>
> On Wed, Dec 5, 2012 at 1:22 AM, aaron morton wrote:
>
>> Will MMapping data files be detrimental for reads, in this case?
>>
>> No.
>>
>> In general, when should we opt for MMap data files and what are the
>> factors that need special attention when enabling the same?
>>
>> mmapping is the default, so I would say use it until you have a reason
>> not to.
>>
>> mmapping will map the entire file, but pages of data are read into memory
>> on demand and purged when space is needed.
>>
>> Cheers
>>
>>-
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 4/12/2012, at 11:59 PM, Ravikumar Govindarajan <
>> ravikumar.govindara...@gmail.com> wrote:
>>
>> Our current SSTable sizes are far greater than RAM. {150 Gigs of data,
>> 32GB RAM}. Currently we run with mlockall and mmap_index_only options and
>> don't experience swapping at all.
>>
>> We use wide rows and size-tiered-compaction, so a given key will
>> definitely be spread across multiple sstables. Will MMapping data files be
>> detrimental for reads, in this case?
>>
>> In general, when should we opt for MMap data files and what are the
>> factors that need special attention when enabling the same?
>>
>> --
>> Ravi
>>
>>
>>
>
>


Re: [BETA RELEASE] Apache Cassandra 1.2.0-beta3 released

2012-12-05 Thread aaron morton
There are two thrift calls, batch_mutate and atomic_batch_mutate . Check with 
your favourite thrift client to see how it handles it. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 6/12/2012, at 7:31 AM, Andrey Ilinykh  wrote:

> Hello, everybody!
> I have read blog about atomic batches in 1.2 
> http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2
> It mentioned that atomic batches are by default starting 1.2. Also it said 
> CQL allows to switch it off. How can I manipulate this setting using thrift 
> API?
> 
> Thank you,
>   Andrey
> 
> 
> On Tue, Dec 4, 2012 at 10:51 AM, Sylvain Lebresne  
> wrote:
> The Cassandra team is pleased to announce the release of the third beta for
> the future Apache Cassandra 1.2.0.
> 
> Let me first stress that this is beta software and as such is *not* ready for
> production use.
> 
> This release is still beta and as such may contain bugs. Any help testing
> this beta would be gladely appreciated and if you were to encounter any 
> problem
> during your testing, please report[3,4] them. Be sure to a look at the change
> log[1] and the release notes[2] to see where Cassandra 1.2 differs from the
> previous series.
> 
> Apache Cassandra 1.2.0-beta3[5] is available as usual from the cassandra
> website (http://cassandra.apache.org/download/) and a debian package is
> available using the 12x branch (see 
> http://wiki.apache.org/cassandra/DebianPackaging).
> 
> Thank you for your help in testing and have fun with it.
> 
> [1]: http://goo.gl/LEmPN (CHANGES.txt)
> [2]: http://goo.gl/tI66z (NEWS.txt)
> [3]: https://issues.apache.org/jira/browse/CASSANDRA
> [4]: user@cassandra.apache.org
> [5]: 
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/cassandra-1.2.0-beta3
> 



Re: Loading SSTables failing via Cassandra SSTableLoader on mulit-node cluster.

2012-12-05 Thread Pradeep Kumar Mantha
Hi,

I followed the configuration section of the blog post. I used 3 nodes
of the cluster which share common space/filesystem.

"sstableloader uses the Cassandra gossip subsystem. It thus requires a
directory containing acassandra.yaml configuration file in the
classpath. (If you use sstableloader from the Cassandra source tree,
thecassandra.yaml file in conf will be used.)"


-bash-3.2$ which cassandra
/global/common/carver/tig/cassandra/dsc-cassandra-1.1.2/bin/cassandra
-bash-3.2$


Node A configuration:

-bash-3.2$ echo $CLASSPATH
/global/project/projectdirs/magellan/hadoop/jesup/cassandra_nodeconfigfiles/c0201:
-bash-3.2$ echo $CASSANDRA_CONF
/global/project/projectdirs/magellan/hadoop/jesup/cassandra_nodeconfigfiles/c0201
-bash-3.2$ ls -ltr
/global/project/projectdirs/magellan/hadoop/jesup/cassandra_nodeconfigfiles/c0201/cassandra.yaml
-rwx-- 1 pmantha pmantha 24947 Dec  4 15:22
/global/project/projectdirs/magellan/hadoop/jesup/cassandra_nodeconfigfiles/c0201/cassandra.yaml
-bash-3.2$


Node B configuation:

-bash-3.2$ echo $CLASSPATH
/global/project/projectdirs/magellan/hadoop/jesup/cassandra_nodeconfigfiles/c0202:
-bash-3.2$ echo $CASSANDRA_CONF
/global/project/projectdirs/magellan/hadoop/jesup/cassandra_nodeconfigfiles/c0202
-bash-3.2$ ls -ltr
/global/project/projectdirs/magellan/hadoop/jesup/cassandra_nodeconfigfiles/c0201/cassandra.yaml
-rwx-- 1 pmantha pmantha 24947 Dec  4 15:22
/global/project/projectdirs/magellan/hadoop/jesup/cassandra_nodeconfigfiles/c0201/cassandra.yaml
-bash-3.2$



"In this config file, the listen_address, storage_port, rpc_address
and rpc_port should be set correctly to communicate with the cluster,
and at least one node of the cluster you want to load data in should
be configured as seed. The rest is ignored for the purposes of
sstableloader."

Node A:

-bash-3.2$  egrep -i
"listen_address|storage_port|rpc_address|rpc_port"
$CASSANDRA_CONF/cassandra.yaml
storage_port: 7000
ssl_storage_port: 7001
listen_address: 128.55.57.85
# Leaving this blank will set it to the same value as listen_address
rpc_address: 128.55.57.85
rpc_port: 9160
#IP as well.) You will need to open the storage_port or
#ssl_storage_port on the public IP firewall.  (For intra-Region
-bash-3.2$



Node B:

-bash-3.2$ egrep -i "listen_address|storage_port|rpc_address|rpc_port"
$CASSANDRA_CONF/cassandra.yaml
storage_port: 7000
ssl_storage_port: 7001
# Leaving this blank will set it to the same value as listen_address
rpc_port: 9160
#IP as well.) You will need to open the storage_port or
#ssl_storage_port on the public IP firewall.  (For intra-Region
rpc_address: 128.55.57.86
listen_address: 128.55.57.86
-bash-3.2$



"Because the sstableloader uses gossip to communicate with other
nodes, if launched on the same machine that a given Cassandra node, it
will need to use a different network interface than the Cassandra
node. "

Took another Node C, which can access both these nodes.

-bash-3.2$ nodetool -host 128.55.57.85 -p 7199 ring
Address DC  RackStatus State   Load
Effective-Ownership Token

13087783343113017825514407978144931209
128.55.57.85datacenter1 rack1   Up Normal  62.1 KB
92.31%  0
128.55.57.86datacenter1 rack1   Up Normal  55.21 KB
7.69%   13087783343113017825514407978144931209
-bash-3.2$


Got the same error.

-bash-3.2$ sstableloader -d 128.55.57.85 Blast/Blast_NR/
Streaming revelant part of Blast/Blast_NR/Blast-Blast_NR-hd-1-Data.db
to [/128.55.57.86, /128.55.57.85]

progress: [/128.55.57.86 0/0 (100)] [/128.55.57.85 0/1 (0)] [total: 0
- 0MB/s (avg: 0MB/s)] WARN 03:21:59,519 Failed attempt 1 to connect to
/128.55.57.86 to stream null. Retrying in 4000 ms.
(java.net.ConnectException: Connection timed out)
 WARN 03:21:59,519 Failed attempt 1 to connect to /128.55.57.85 to
stream Blast/Blast_NR/Blast-Blast_NR-hd-1-Data.db sections=1
progress=0/2362 - 0%. Retrying in 4000 ms. (java.net.ConnectException:
Connection timed out)
progress: [/128.55.57.86 0/0 (100)] [/128.55.57.85 0/1 (0)] [total: 0
- 0MB/s (avg: 0MB/s)] WARN 03:22:24,521 Failed attempt 2 to connect to
/128.55.57.86 to stream null. Retrying in 8000 ms.
(java.net.ConnectException: Connection timed out)
 WARN 03:22:24,522 Failed attempt 2 to connect to /128.55.57.85 to
stream Blast/Blast_NR/Blast-Blast_NR-hd-1-Data.db sections=1
progress=0/2362 - 0%. Retrying in 8000 ms. (java.net.ConnectException:
Connection timed out)
progress: [/128.55.57.86 0/0 (100)] [/128.55.57.85 0/1 (0)] [total: 0
- 0MB/s (avg: 0MB/s)] WARN 03:22:53,525 Failed attempt 3 to connect to
/128.55.57.86 to stream null. Retrying in 16000 ms.
(java.net.ConnectException: Connection timed out)
 WARN 03:22:53,525 Failed attempt 3 to connect to /128.55.57.85 to
stream Blast/Blast_NR/Blast-Blast_NR-hd-1-Data.db sections=1
progress=0/2362 - 0%. Retrying in 16000 ms.
(java.net.ConnectException: Connection timed out)
progress: [/128.55.57.86 0/0 (100)] [/128.55.57.85

Re: Freeing up disk space on Cassandra 1.1.5 with Size-Tiered compaction.

2012-12-05 Thread aaron morton
> Basically we were successful on two of the nodes. They both took ~2 days and 
> 11 hours to complete and at the end we saw one very large file ~900GB and the 
> rest much smaller (the overall size decreased). This is what we expected!
I would recommend having up to 300MB to 400MB per node on a regular HDD with 
1GB networking. 

> But on the 3rd node, we suspect major compaction didn't actually finish it's 
> job…
The file list looks odd. Check the time stamps, on the files. You should not 
have files older than when compaction started. 

> 8GB heap 
The default is 4GB max now days. 

> 1) Do you expect problems with the 3rd node during 2 weeks more of 
> operations, in the conditions seen below? 
I cannot answer that. 

> 2) Should we restart with leveled compaction next year? 
I would run some tests to see how it works for you workload. 

> 4) Should we consider increasing the cluster capacity?
IMHO yes.
You may also want to do some experiments with turing compression on if it not 
already enabled. 

Having so much data on each node is a potential bad day. If instead you had to 
move or repair one of those nodes how long would it take for cassandra to 
stream all the data over ? (Or to rsync the data over.) How long does it take 
to run nodetool repair on the node ?

With RF 3, if you lose a node you have lost your redundancy. It's important to 
have a plan about how to get it back and how long it may take.   

Hope that helps. 

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 6/12/2012, at 3:40 AM, Alexandru Sicoe  wrote:

> Hi guys,
> Sorry for the late follow-up but I waited to run major compactions on all 3 
> nodes at a time before replying with my findings.
> 
> Basically we were successful on two of the nodes. They both took ~2 days and 
> 11 hours to complete and at the end we saw one very large file ~900GB and the 
> rest much smaller (the overall size decreased). This is what we expected!
> 
> But on the 3rd node, we suspect major compaction didn't actually finish it's 
> job. First of all nodetool compact returned much earlier than the rest - 
> after one day and 15 hrs. Secondly from the 1.4TBs initially on the node only 
> about 36GB were freed up (almost the same size as before). Saw nothing in the 
> server log (debug not enabled). Below I pasted some more details about file 
> sizes before and after compaction on this third node and disk occupancy.
> 
> The situation is maybe not so dramatic for us because in less than 2 weeks we 
> will have a down time till after the new year. During this we can completely 
> delete all the data in the cluster and start fresh with TTLs for 1 month (as 
> suggested by Aaron and 8GB heap as suggested by Alain - thanks).
> 
> Questions:
> 
> 1) Do you expect problems with the 3rd node during 2 weeks more of 
> operations, in the conditions seen below? 
> [Note: we expect the minor compactions to continue building up files but 
> never really getting to compacting the large file and thus not needing much 
> temporarily extra disk space].
> 
> 2) Should we restart with leveled compaction next year? 
> [Note: Aaron was right, we have 1 week rows which get deleted after 1 month 
> which means older rows end up in big files => to free up space with 
> SizeTiered we will have no choice but run major compactions which we don't 
> know if they will work provided that we get at ~1TB / node / 1 month. You can 
> see we are at the limit!]
> 
> 3) In case we keep SizeTiered:
> 
> - How can we improve the performance of our major compactions? (we left 
> all config parameters as default). Would increasing compactions throughput 
> interfere with writes and reads? What about multi-threaded compactions?
> 
> - Do we still need to run regular repair operations as well? Do these 
> also do a major compaction or are they completely separate operations? 
> 
> [Note: we have 3 nodes with RF=2 and inserting at consistency level 1 and 
> reading at consistency level ALL. We read primarily for exporting reasons - 
> we export 1 week worth of data at a time].
> 
> 4) Should we consider increasing the cluster capacity?
> [We generate ~5million new rows every week which shouldn't come close to the 
> hundreds of millions of rows on a node mentioned by Aaron which are the 
> volumes that would create problems with bloom filters and indexes].
> 
> Cheers,
> Alex
> --
> 
> The situation in the data folder 
> 
> before calling nodetool comapact:
> 
> du -csh /data_bst/cassandra/data/ATLAS/Data/*-Data.db
> 444G/data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-24370-Data.db
> 376G/data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-46431-Data.db
> 305G/data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-68959-Data.db
> 39G /data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-7352-Data.db
> 78G /data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-74076-Data.db
> 81G /data_bst/

Re: entire range of node out of sync -- out of the blue

2012-12-05 Thread aaron morton
> - how do i stop repair before i run out of storage? ( can't let this finish )

To stop the validation part of the repair…

nodetool -h localhost stop VALIDATION 


The only way I know to stop streaming is restart the node, their may be a 
better way though. 


> INFO [AntiEntropySessions:3] 2012-12-05 02:15:02,301 AntiEntropyService.java 
> (line 666) [repair #7c7665c0-3eab-11e2--dae6667065ff] new session: will 
> sync /X.X.1.113, /X.X.0.71 on range 
> (85070591730234615865843651857942052964,0] for ( .. )
Am assuming this was ran on the first node in DC west with -pr as you said.
The log message is saying this is going to repair the primary range for the 
node for the node. The repair is then actually performed one CF at a time. 

You should also see log messages ending with "range(s) out of sync" which will 
say how out of sync the data is. 
 
> - how do i clean up my stables ( grew from 6k to 20k since this started, 
> while i shut writes off completely )
Sounds like repair is streaming a lot of differences. 
If you have the space I would give  Levelled compaction time to take care of 
it. 

Hope that helps.

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 6/12/2012, at 1:32 AM, Andras Szerdahelyi 
 wrote:

> hi list,
> 
> AntiEntropyService started syncing ranges of entire nodes ( ?! ) across my 
> data centers and i'd like to understand why. 
> 
> I see log lines like this on all my nodes in my two ( east/west ) data 
> centres...
> 
> INFO [AntiEntropySessions:3] 2012-12-05 02:15:02,301 AntiEntropyService.java 
> (line 666) [repair #7c7665c0-3eab-11e2--dae6667065ff] new session: will 
> sync /X.X.1.113, /X.X.0.71 on range 
> (85070591730234615865843651857942052964,0] for ( .. )
> 
> ( this is around 80-100 GB of data for a single node. )
> 
> - i did not observe any network failures or nodes falling off the ring
> - good distribution of data ( load is equal on all nodes )
> - hinted handoff is on
> - read repair chance is 0.1 on the CF
> - 2 replicas in each data centre ( which is also the number of nodes in each 
> ) with NetworkTopologyStrategy
> - repair -pr is scheduled to run off-peak hours, daily
> - leveled compaction with stable max size 256mb ( i have found this to 
> trigger compaction in acceptable intervals while still keeping the stable 
> count down )
> - i am on 1.1.6
> - java heap 10G
> - max memtables 2G
> - 1G row cache
> - 256M key cache
> 
> my nodes'  ranges are:
> 
> DC west
> 0
> 85070591730234615865843651857942052864
> 
> DC east
> 100
> 85070591730234615865843651857942052964
> 
> symptoms are:
> - logs show sstables being streamed over to other nodes
> - 140k files in data dir of CF on all nodes
> - cfstats reports 20k sstables, up from 6 on all nodes
> - compaction continuously running with no results whatsoever ( number of 
> stables growing )
> 
> i tried the following:
> - offline scrub ( has gone OOM, i noticed the script in the debian package 
> specifies 256MB heap? )
> - online scrub ( no effect )
> - repair ( no effect )
> - cleanup ( no effect )
> 
> my questions are:
> - how do i stop repair before i run out of storage? ( can't let this finish )
> - how do i clean up my stables ( grew from 6k to 20k since this started, 
> while i shut writes off completely )
> 
> thanks,
> Andras
> 
> Andras Szerdahelyi
> Solutions Architect, IgnitionOne | 1831 Diegem E.Mommaertslaan 20A
> M: +32 493 05 50 88 | Skype: sandrew84
> 
> 
> 
> 
> 



Re: Help on MMap of SSTables

2012-12-05 Thread aaron morton
Background http://en.wikipedia.org/wiki/Memory-mapped_file

> Is it going to load only relevant pages per SSTable on read or is it going to 
> load an entire SSTable on first access?
It will load what is requested, and maybe some additional data taking into 
account the amount of memory available for caches. 

> Say suppose compaction kicks in. Will it then evict hot MMapped pages for 
> read and substitute it with a lot of pages involving full SSTables?

Some file access in cassandra, such as compaction, hints to the OS that the 
reads should not be cached. Technically is uses posix_fadvise if you want to 
look it up.

Cheers


-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 5/12/2012, at 11:04 PM, Ravikumar Govindarajan 
 wrote:

> Thanks Aaron,
> 
> I am not quite clear on how MMap loads SSTables other than the fact that it 
> kicks in only during a first-time access
> 
> Is it going to load only relevant pages per SSTable on read or is it going to 
> load an entire SSTable on first access?
> 
> Say suppose compaction kicks in. Will it then evict hot MMapped pages for 
> read and substitute it with a lot of pages involving full SSTables?
> 
> --
> Ravi
> 
> On Wed, Dec 5, 2012 at 1:22 AM, aaron morton  wrote:
>> Will MMapping data files be detrimental for reads, in this case?
> No. 
> 
>> In general, when should we opt for MMap data files and what are the factors 
>> that need special attention when enabling the same?
> mmapping is the default, so I would say use it until you have a reason not 
> to. 
> 
> mmapping will map the entire file, but pages of data are read into memory on 
> demand and purged when space is needed. 
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 4/12/2012, at 11:59 PM, Ravikumar Govindarajan 
>  wrote:
> 
>> Our current SSTable sizes are far greater than RAM. {150 Gigs of data, 32GB 
>> RAM}. Currently we run with mlockall and mmap_index_only options and don't 
>> experience swapping at all.
>> 
>> We use wide rows and size-tiered-compaction, so a given key will definitely 
>> be spread across multiple sstables. Will MMapping data files be detrimental 
>> for reads, in this case?
>> 
>> In general, when should we opt for MMap data files and what are the factors 
>> that need special attention when enabling the same?
>> 
>> --
>> Ravi
> 
> 



Re: reversed=true for CQL 3

2012-12-05 Thread Rob Coli
On Wed, Dec 5, 2012 at 3:59 PM, Shahryar Sedghi  wrote:
> Is there a keyword with the same functionality as reversed=True or CQL 3 on
> Cassandra 1.1.6?

https://issues.apache.org/jira/browse/CASSANDRA-4004

=Rob

-- 
=Robert Coli
AIM>ALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: Loading SSTables failing via Cassandra SSTableLoader on mulit-node cluster.

2012-12-05 Thread aaron morton
Have you checked the yaml configuration for the sstableloader ? Background 
configuration section here http://www.datastax.com/dev/blog/bulk-loading

 
Hope that helps. 

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 5/12/2012, at 1:43 PM, Pradeep Kumar Mantha  wrote:

> Hi!
> 
> I am trying to load sstables generated onto a running multi-node
> Cassandra cluster.  But I see problems only with multi-cluster and
> single node works fine.
> 
> Cassandra version used is 1.1.2 .
> The cassandra cluster seems to be active.
> 
> -bash-3.2$ nodetool -host 129.56.57.45 -p 7199 ring
> Address DC  RackStatus State   Load
> Effective-Ownership Token
> 
>13087783343113017825514407978144931209
> 129.56.57.45datacenter1 rack1   Up Normal  57.49 KB
> 92.31%  0
> 129.56.57.46datacenter1 rack1   Up Normal  50.6 KB
> 7.69%   13087783343113017825514407978144931209
> -bash-3.2$
> 
> 
> I tried sstableloader from cassandra node ( 129.56.57.45) annd other
> outside machine. But I get the same error in both the cases.
> 
> 
> Error:
> 
> -bash-3.2$ sstableloader -d 129.56.57.45 Blast/Blast_NR/
> log4j:WARN No appenders could be found for logger
> (org.apache.cassandra.io.sstable.SSTableReader).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
> for more info.
> Streaming revelant part of Blast/Blast_NR/Blast-Blast_NR-hd-1-Data.db
> to [/129.56.57.46, /129.56.57.45]
> 
> progress: [/129.56.57.46 0/0 (100)] [/129.56.57.45 0/1 (0)] [total: 0
> - 0MB/s (avg: 0MB/s)]Streaming session to /129.56.57.45 failed
> Exception in thread "Streaming to /129.56.57.45:1"
> java.lang.RuntimeException: java.net.ConnectException: Connection
> timed out
>at 
> org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:628)
>at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>at java.lang.Thread.run(Thread.java:619)
> Caused by: java.net.ConnectException: Connection timed out
>at java.net.PlainSocketImpl.socketConnect(Native Method)
>at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
>at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>at java.net.Socket.connect(Socket.java:519)
>at java.net.Socket.connect(Socket.java:469)
>at java.net.Socket.(Socket.java:366)Streaming session to
> /129.56.57.46 failed
> 
>at java.net.Socket.(Socket.java:267)
>at 
> org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:96)
>at 
> org.apache.cassandra.streaming.FileStreamTask.connectAttempt(FileStreamTask.java:245)
>at 
> org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
>at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>... 3 more
> Exception in thread "Streaming to /129.56.57.46:1"
> java.lang.RuntimeException: java.net.ConnectException: Connection
> timed out
>at 
> org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:628)
>at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>at java.lang.Thread.run(Thread.java:619)
> Caused by: java.net.ConnectException: Connection timed out
>at java.net.PlainSocketImpl.socketConnect(Native Method)
>at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
>at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>at java.net.Socket.connect(Socket.java:519)
>at java.net.Socket.connect(Socket.java:469)
>at java.net.Socket.(Socket.java:366)
>at java.net.Socket.(Socket.java:267)
>at 
> org.apache.cassandra.net.OutboundTcpConnectionPool.newSocket(OutboundTcpConnectionPool.java:96)
>at 
> org.apache.cassandra.streaming.FileStreamTask.connectAttempt(FileStreamTask.java:245)
>at 
> org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
>at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>... 3 more
> progress: [/129.56.57.46 0/0

reversed=true for CQL 3

2012-12-05 Thread Shahryar Sedghi
Hi

Is there a keyword with the same functionality as reversed=True or CQL 3 on
Cassandra 1.1.6?

Thanks in advance

Shahryar

-- 
"Life is what happens while you are making other plans." ~ John Lennon


Re: What is substituting keys_cached column family argument

2012-12-05 Thread Rob Coli
On Wed, Dec 5, 2012 at 9:06 AM, Roman Yankin  wrote:
> In Cassandra v 0.7 there was a column family property called keys_cached, now 
> it's gone and I'm struggling to understand which of the below properties it's 
> now substituted (if substituted at all)?

Key and row caches are global in modern cassandra. You opt CFs out of
the key cache, not opt in, because the default setting is "keys_only"
on a per-CF basis.

http://www.datastax.com/docs/1.1/configuration/node_configuration#row-cache-keys-to-save

http://www.datastax.com/docs/1.1/configuration/node_configuration#key-cache-keys-to-save

http://www.datastax.com/docs/1.1/configuration/storage_configuration#caching

=Rob

-- 
=Robert Coli
AIM>ALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


how to take consistant snapshot?

2012-12-05 Thread Andrey Ilinykh
Hello, everybody!
I have production cluster with incremental backup on and I want to clone it
(create test one). I don't understand one thing- each column family gets
flushed (and copied to backup storage) independently. Which means the total
snapshot is inconsistent. If I restore from such snapshot  I have totally
useless system. To be more specific, let's say I have two CF, one serves as
an index for another. Every time I update one CF I update index CF. There
is a good chance that all replicas flush index CF first. Then I move it
into backup storage, restore and get CF which has pointers to
non existent data in another CF. What is a way to avoid this situation?

Thank you,
  Andrey


Re: [BETA RELEASE] Apache Cassandra 1.2.0-beta3 released

2012-12-05 Thread Andrey Ilinykh
Hello, everybody!
I have read blog about atomic batches in 1.2
http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2
It mentioned that atomic batches are by default starting 1.2. Also it said
CQL allows to switch it off. How can I manipulate this setting using thrift
API?

Thank you,
  Andrey


On Tue, Dec 4, 2012 at 10:51 AM, Sylvain Lebresne wrote:

> The Cassandra team is pleased to announce the release of the third beta for
> the future Apache Cassandra 1.2.0.
>
> Let me first stress that this is beta software and as such is *not* ready
> for
> production use.
>
> This release is still beta and as such may contain bugs. Any help testing
> this beta would be gladely appreciated and if you were to encounter any
> problem
> during your testing, please report[3,4] them. Be sure to a look at the
> change
> log[1] and the release notes[2] to see where Cassandra 1.2 differs from the
> previous series.
>
> Apache Cassandra 1.2.0-beta3[5] is available as usual from the cassandra
> website (http://cassandra.apache.org/download/) and a debian package is
> available using the 12x branch (see
> http://wiki.apache.org/cassandra/DebianPackaging).
>
> Thank you for your help in testing and have fun with it.
>
> [1]: http://goo.gl/LEmPN (CHANGES.txt)
> [2]: http://goo.gl/tI66z (NEWS.txt)
> [3]: https://issues.apache.org/jira/browse/CASSANDRA
> [4]: user@cassandra.apache.org
> [5]:
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/cassandra-1.2.0-beta3
>


What is substituting keys_cached column family argument

2012-12-05 Thread Roman Yankin
Problem:

If I do something like this in Cassandra 1.1.7 I would get IAE:
create column family travelers with keys_cached=100 and 
column_metadata=[{column_name: t, validation_class: UTF8Type, index_type: 
KEYS}];
java.lang.IllegalArgumentException: No enum const class 
org.apache.cassandra.cli.CliClient$ColumnFamilyArgument.KEYS_CACHED



In Cassandra v 0.7 there was a column family property called keys_cached, now 
it's gone and I'm struggling to understand which of the below properties it's 
now substituted (if substituted at all)?

The below snippet comes from org.apache.cassandra.cli.CliClient class from 
source v 1.1.7 

protected enum ColumnFamilyArgument
    {
    COLUMN_TYPE,
    COMPARATOR,
    SUBCOMPARATOR,
    COMMENT,
    READ_REPAIR_CHANCE,
    DCLOCAL_READ_REPAIR_CHANCE,
    GC_GRACE,
    COLUMN_METADATA,
    MEMTABLE_OPERATIONS,
    MEMTABLE_THROUGHPUT,
    DEFAULT_VALIDATION_CLASS,
    MIN_COMPACTION_THRESHOLD,
    MAX_COMPACTION_THRESHOLD,
    REPLICATE_ON_WRITE,
    KEY_VALIDATION_CLASS,
    COMPACTION_STRATEGY,
    COMPACTION_STRATEGY_OPTIONS,
    COMPRESSION_OPTIONS,
    BLOOM_FILTER_FP_CHANCE,
    CACHING
    }

Thanks,
Roman


Re: cannot parse 'name' as hex bytes

2012-12-05 Thread Edward Sargisson

You're not casting the types.
Cassandra stores everything as bytes. You either need to set the 
key_validation_class to UTF8Type or use the utf8() function to convert.


http://www.datastax.com/docs/1.1/dml/using_cli


On 12-12-05 03:14 AM, Yogesh Dhari wrote:


Hi all,

I am very new to Cassandra,


I am using version-1.1.7 and followed the steps on single node machine 
mention in GETTING STARTED.


I have created key-space named as Demo and then tried to create column 
family names Work as



[default@DEMO] create column family Work ;
5c85706f-87fe-38f1-b23f-c6180e45d178
Waiting for schema agreement...
... schemas agree across the cluster

Now if I do..

[default@DEMO] set Work[1234][name] = scott ;

I got this error.

org.apache.cassandra.db.marshal.MarshalException: cannot parse 'name' 
as hex bytes



Please help and suggest.

Thanks & Regards
Yogesh Kumar









Re: Freeing up disk space on Cassandra 1.1.5 with Size-Tiered compaction.

2012-12-05 Thread Alexandru Sicoe
Hi guys,
Sorry for the late follow-up but I waited to run major compactions on all 3
nodes at a time before replying with my findings.

Basically we were successful on two of the nodes. They both took ~2 days
and 11 hours to complete and at the end we saw one very large file ~900GB
and the rest much smaller (the overall size decreased). This is what we
expected!

But on the 3rd node, we suspect major compaction didn't actually finish
it's job. First of all nodetool compact returned much earlier than the rest
- after one day and 15 hrs. Secondly from the 1.4TBs initially on the node
only about 36GB were freed up (almost the same size as before). Saw nothing
in the server log (debug not enabled). Below I pasted some more details
about file sizes before and after compaction on this third node and disk
occupancy.

The situation is maybe not so dramatic for us because in less than 2 weeks
we will have a down time till after the new year. During this we can
completely delete all the data in the cluster and start fresh with TTLs for
1 month (as suggested by Aaron and 8GB heap as suggested by Alain - thanks).

Questions:

1) Do you expect problems with the 3rd node during 2 weeks more of
operations, in the conditions seen below?
[Note: we expect the minor compactions to continue building up files but
never really getting to compacting the large file and thus not needing much
temporarily extra disk space].

2) Should we restart with leveled compaction next year?
[Note: Aaron was right, we have 1 week rows which get deleted after 1 month
which means older rows end up in big files => to free up space with
SizeTiered we will have no choice but run major compactions which we don't
know if they will work provided that we get at ~1TB / node / 1 month. You
can see we are at the limit!]

3) In case we keep SizeTiered:

- How can we improve the performance of our major compactions? (we left
all config parameters as default). Would increasing compactions throughput
interfere with writes and reads? What about multi-threaded compactions?

- Do we still need to run regular repair operations as well? Do these
also do a major compaction or are they completely separate operations?

[Note: we have 3 nodes with RF=2 and inserting at consistency level 1 and
reading at consistency level ALL. We read primarily for exporting reasons -
we export 1 week worth of data at a time].

4) Should we consider increasing the cluster capacity?
[We generate ~5million new rows every week which shouldn't come close to
the hundreds of millions of rows on a node mentioned by Aaron which are the
volumes that would create problems with bloom filters and indexes].

Cheers,
Alex
--

The situation in the data folder

before calling nodetool comapact:

du -csh /data_bst/cassandra/data/ATLAS/Data/*-Data.db
444G/data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-24370-Data.db
376G/data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-46431-Data.db
305G/data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-68959-Data.db
39G /data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-7352-Data.db
78G /data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-74076-Data.db
81G /data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-79663-Data.db
205M/data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-80370-Data.db
20G /data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-80968-Data.db
20G /data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-82330-Data.db
20G /data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-83710-Data.db
4.9G/data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-84015-Data.db
4.9G/data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-84356-Data.db
4.9G/data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-84696-Data.db
333M/data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-84707-Data.db
92M /data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-84712-Data.db
92M /data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-84717-Data.db
99M /data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-84722-Data.db
2.5G/data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-tmp-he-84723-Data.db
1.4Ttotal

after nodetool comapact returned:

du -csh /data_bst/cassandra/data/ATLAS/Data/*-Data.db
444G/data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-24370-Data.db
910G/data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-84723-Data.db
19G /data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-86229-Data.db
19G /data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-87639-Data.db
5.0G/data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-87923-Data.db
4.8G/data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-88261-Data.db
338M/data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-88271-Data.db
339M/data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-88292-Data.db
339M/data_bst/cassandra/data/ATLAS/Data/ATLAS-Data-he-88312-Data.db
98M


Looking at the disk occupancy for the logical partition where the data
folder is in:

df /data_bst
Filesyst

entire range of node out of sync -- out of the blue

2012-12-05 Thread Andras Szerdahelyi
hi list,

AntiEntropyService started syncing ranges of entire nodes ( ?! ) across my data 
centers and i'd like to understand why.

I see log lines like this on all my nodes in my two ( east/west ) data 
centres...

INFO [AntiEntropySessions:3] 2012-12-05 02:15:02,301 AntiEntropyService.java 
(line 666) [repair #7c7665c0-3eab-11e2--dae6667065ff] new session: will 
sync /X.X.1.113, /X.X.0.71 on range (85070591730234615865843651857942052964,0] 
for ( .. )

( this is around 80-100 GB of data for a single node. )

- i did not observe any network failures or nodes falling off the ring
- good distribution of data ( load is equal on all nodes )
- hinted handoff is on
- read repair chance is 0.1 on the CF
- 2 replicas in each data centre ( which is also the number of nodes in each ) 
with NetworkTopologyStrategy
- repair -pr is scheduled to run off-peak hours, daily
- leveled compaction with stable max size 256mb ( i have found this to trigger 
compaction in acceptable intervals while still keeping the stable count down )
- i am on 1.1.6
- java heap 10G
- max memtables 2G
- 1G row cache
- 256M key cache

my nodes'  ranges are:

DC west
0
85070591730234615865843651857942052864

DC east
100
85070591730234615865843651857942052964

symptoms are:
- logs show sstables being streamed over to other nodes
- 140k files in data dir of CF on all nodes
- cfstats reports 20k sstables, up from 6 on all nodes
- compaction continuously running with no results whatsoever ( number of 
stables growing )

i tried the following:
- offline scrub ( has gone OOM, i noticed the script in the debian package 
specifies 256MB heap? )
- online scrub ( no effect )
- repair ( no effect )
- cleanup ( no effect )

my questions are:
- how do i stop repair before i run out of storage? ( can't let this finish )
- how do i clean up my stables ( grew from 6k to 20k since this started, while 
i shut writes off completely )

thanks,
Andras

Andras Szerdahelyi
Solutions Architect, IgnitionOne | 1831 Diegem E.Mommaertslaan 20A
M: +32 493 05 50 88 | Skype: sandrew84


[cid:7BDF7228-D831-4D98-967A-BE04FEB17544]




<>

cannot parse 'name' as hex bytes

2012-12-05 Thread Yogesh Dhari
Hi all,

I am very new to Cassandra,


I am using version-1.1.7 and followed the steps on single node machine
mention in GETTING STARTED.

I have created key-space named as Demo and then tried to create column
family names Work as


[default@DEMO] create column family Work ;
5c85706f-87fe-38f1-b23f-c6180e45d178
Waiting for schema agreement...
... schemas agree across the cluster

Now if I do..

[default@DEMO] set Work[1234][name] = scott ;

I got this error.

org.apache.cassandra.db.marshal.MarshalException: cannot parse 'name' as
hex bytes


Please help and suggest.

Thanks & Regards
Yogesh Kumar


Re: Help on MMap of SSTables

2012-12-05 Thread Ravikumar Govindarajan
Thanks Aaron,

I am not quite clear on how MMap loads SSTables other than the fact that it
kicks in only during a first-time access

Is it going to load only relevant pages per SSTable on read or is it going
to load an entire SSTable on first access?

Say suppose compaction kicks in. Will it then evict hot MMapped pages for
read and substitute it with a lot of pages involving full SSTables?

--
Ravi

On Wed, Dec 5, 2012 at 1:22 AM, aaron morton wrote:

> Will MMapping data files be detrimental for reads, in this case?
>
> No.
>
> In general, when should we opt for MMap data files and what are the
> factors that need special attention when enabling the same?
>
> mmapping is the default, so I would say use it until you have a reason not
> to.
>
> mmapping will map the entire file, but pages of data are read into memory
> on demand and purged when space is needed.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 4/12/2012, at 11:59 PM, Ravikumar Govindarajan <
> ravikumar.govindara...@gmail.com> wrote:
>
> Our current SSTable sizes are far greater than RAM. {150 Gigs of data,
> 32GB RAM}. Currently we run with mlockall and mmap_index_only options and
> don't experience swapping at all.
>
> We use wide rows and size-tiered-compaction, so a given key will
> definitely be spread across multiple sstables. Will MMapping data files be
> detrimental for reads, in this case?
>
> In general, when should we opt for MMap data files and what are the
> factors that need special attention when enabling the same?
>
> --
> Ravi
>
>
>


Re: Data not replicating to all datacenters

2012-12-05 Thread Owen Davies
My bad, yes it does say dc2 in our config file (actually it is different ip
addresses and different names, but I wasn't sure if it
was sensitive information so changed it to generic terms).

Owen

On 5 December 2012 08:42, Tomas Nunez  wrote:

>
>
> 2012/12/3 Owen Davies 
>
>> cassandra-topology.properties
>> 
>> 192.168.1.1=dc1:rack1
>> 192.168.1.2=dc1:rack1
>> 192.168.1.3=dc1:rack1
>>
>> 192.168.2.1=dc2:rack1
>> 192.168.2.2=dc2:rack1
>> 192.168.2.3=*dc3*:rack1
>>
>
> This is a typo, right? It says "dc2" in your config file, doesn't it?
>
> --
> [image: Groupalia] 
> www.groupalia.com  Tomàs Núñez IT-Sysprod Tel. +
> 34 93 159 31 00 Fax. + 34 93 396 18 52 Llull, 95-97, 2º planta, 08005
> BarcelonaSkype: tomas.nunez.groupalia 
> tomas.nu...@groupalia.com [image:
> Twitter] Twitter [image: Twitter]
>  Facebook [image: Twitter]
>  Linkedin 
>
<><><><>

Re: Data not replicating to all datacenters

2012-12-05 Thread Tomas Nunez
2012/12/3 Owen Davies 

> cassandra-topology.properties
> 
> 192.168.1.1=dc1:rack1
> 192.168.1.2=dc1:rack1
> 192.168.1.3=dc1:rack1
>
> 192.168.2.1=dc2:rack1
> 192.168.2.2=dc2:rack1
> 192.168.2.3=*dc3*:rack1
>

This is a typo, right? It says "dc2" in your config file, doesn't it?

-- 
[image: Groupalia] 
www.groupalia.com Tomàs NúñezIT-SysprodTel. + 34
93 159 31 00 Fax. + 34 93 396 18 52Llull, 95-97, 2º planta, 08005
BarcelonaSkype:
tomas.nunez.groupaliatomas.nu...@groupalia.com[image:
Twitter] Twitter [image: Twitter]
 Facebook [image: Twitter]
 Linkedin 
<><><><>