snapshot issue

2012-07-11 Thread Adeel Akbar

Hi,

I am trying to taking snapshot of my data but faced following error. 
Please help me to resolve this issue.


[root@cassandra1 bin]# ./nodetool -h localhost snapshot 20120711
Exception in thread main java.io.IOError: java.io.IOException: Cannot 
run program ln: java.io.IOException: error=12, Cannot allocate memory
at 
org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1660)
at 
org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1686)

at org.apache.cassandra.db.Table.snapshot(Table.java:198)
at 
org.apache.cassandra.service.StorageService.takeSnapshot(StorageService.java:1393)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:616)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)
at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:226)
at 
com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at 
com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:251)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:857)
at 
com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:795)
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1450)
at 
javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:90)
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1285)
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1383)
at 
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:807)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:616)
at 
sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)

at sun.rmi.transport.Transport$1.run(Transport.java:177)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

at java.lang.Thread.run(Thread.java:636)
Caused by: java.io.IOException: Cannot run program ln: 
java.io.IOException: error=12, Cannot allocate memory

at java.lang.ProcessBuilder.start(ProcessBuilder.java:475)
at 
org.apache.cassandra.utils.CLibrary.createHardLinkWithExec(CLibrary.java:181)
at 
org.apache.cassandra.utils.CLibrary.createHardLink(CLibrary.java:147)
at 
org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:730)
at 
org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1653)

... 33 more
Caused by: java.io.IOException: java.io.IOException: error=12, Cannot 
allocate memory

at java.lang.UNIXProcess.init(UNIXProcess.java:164)
at java.lang.ProcessImpl.start(ProcessImpl.java:81)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:468)
... 37 more

--


Thanks  Regards

*Adeel**Akbar*





RE snapshot issue

2012-07-11 Thread Samuel CARRIERE
Hello,

The problem is described here : 
http://wiki.apache.org/cassandra/Operations
The recommended way to avoit it is to use jna.

Cheers,
Samuel


 Adeel Akbar adeel.ak...@panasiangroup.com 
 11/07/2012 11:38
 
 
 Hi,
 
 I am trying to taking snapshot of my data but faced following error.
 Please help me to resolve this issue.
 
 [root@cassandra1 bin]# ./nodetool -h localhost snapshot 20120711
 Exception in thread main java.io.IOError: java.io.IOException: 
 Cannot run program ln: java.io.IOException: error=12, Cannot allocate 
memory


Reduced key-cache due to memory pressure and cache size estimate

2012-07-11 Thread Omid Aladini
Hi,

I'm trying to tune, memtable size, key cache size and heap size on
Cassandra 1.1.0 but I keep having memory pressure and reduced cache size.
With the following settings:

heap size: 10GB (had the same issue with 8GB so I'm testing with increased
heap size)
memtable_total_space_in_mb: 2GB
key_cache_size_in_mb: 2GB   (global key cache capacity)

Still, heap usage hits flush_largest_memtables_at (= 0.75) many times in a
short period of time before hitting reduce_cache_sizes_at (= 0.85) that
reduces the cache size and resolves memory pressure.

In one instance, cache size is reported to be 1450MB before reduction and
~870MB after reduction, but the gain in heap space due to reduction in
cache size is about 3GB.

Could it be that the cache size estimate in megabytes isn't accurate?

Thanks,
Omid


rounded timestamp ?

2012-07-11 Thread Marco Matarazzo
Greetings.

Running (CQL 3) queries like:

   update users set admin = 1 where corporation_id = 
'7a55bc4c-84e7-479c-9ac6-43f7836705b5';

… I see in logs a row like:

StorageProxy.java (line 175) Mutations/ConsistencyLevel are 
[RowMutation(keyspace='goh_test', 
key='37613535626334632d383465372d343739632d396163362d343366373833363730356235', 
modifications=[ColumnFamily(users [admin:false:1@1342006844093000,])])]/ONE

If I understand it correctly, that 1342006844093000 is the timestamp in 
microseconds, getting rounded to milliseconds.

If I modify queries in this way:

   update users using timestamp 1342006844106123 set admin = 1 where 
corporation_id = '7a55bc4c-84e7-479c-9ac6-43f7836705b5';

… the log row becomes:

StorageProxy.java (line 175) Mutations/ConsistencyLevel are 
[RowMutation(keyspace='goh_test', 
key='37613535626334632d383465372d343739632d396163362d343366373833363730356235', 
modifications=[ColumnFamily(users [admin:false:1@1342006844106123,])])]/ONE

…and what I see is that the timestamp get through NOT rounded, with 
microseconds precision.


We see this behavior using cqlsh, C++ thrift bindings and phpcassa. I guess 
they all use thrift, and so the rounding happens there.

One of the problems is that sometimes it gets rounded up, so it's in the 
future. But that's just a side effect of rounding, and I can't understand why 
in the first place there is a rounding. I guess that the second case is just 
Cassandra correcting the timestamp with data found in the CQL, and maybe 
thrift is still sending a milliseconds-rounded timestamp, but I still can't see 
a reason for thrift doing this.

Could someone enlighten me a bit on this matter ?

--
Marco Matarazzo
== Hex Keep ==

You can learn more about a man
  in one hour of play
  than in one year of conversation.” - Plato






RE: help using org.apache.cassandra.cql3

2012-07-11 Thread Leonid Ilyevsky
I see.
The reason I looked at that package was, I need to use the batch feature, and I 
could not make it work using thrift with the CF having composite key. It worked 
fine with the simple key, but not composite, I was getting an error while 
trying to do the update.
Sylvain suggested (in reply to my other posting) that I use cql3 batch 
statement, but I am not sure how to do it efficiently from Java. Can batch 
statement be prepared? Is it OK to put 1 of update statements in one batch, 
with 5 question marks in it? The set that many variables?
Maybe I can try small example first, just to see if it works at all.


From: Derek Williams [mailto:de...@fyrie.net]
Sent: Tuesday, July 10, 2012 7:19 PM
To: user@cassandra.apache.org
Subject: Re: help using org.apache.cassandra.cql3

On Tue, Jul 10, 2012 at 3:04 PM, Leonid Ilyevsky 
lilyev...@mooncapital.commailto:lilyev...@mooncapital.com wrote:
I am trying to use the org.apache.cassandra.cql3 package. Having problem 
connecting to the server using ClientState.
I was not sure what to put in the credentials map (I did not set any 
users/passwords on my server), so I tried setting empty strings for “username” 
and “password”, setting them to bogus values, passing null to the login method 
– there was no difference.
It does not complain at the login(), but then it complains about 
setKeyspace(my keyspace), saying that the specified keyspace does not exist 
(it obviously does exist).
The configuration was loaded from cassandra.yaml used by the server.

I did not have any problem like this when I used 
org.apache.cassandra.thrift.Cassandra.Client .

What am I doing wrong?

I think that package just contains server classes. Everything you need should 
be in org.apache.cassandra.thrift.

To use cql3 I just use the client methods 'execute_cql_query', 
'prepare_cql_query' and 'execute_prepared_cql_query', after setting cql version 
to '3.0.0'.


--
Derek Williams



This email, along with any attachments, is confidential and may be legally 
privileged or otherwise protected from disclosure. Any unauthorized 
dissemination, copying or use of the contents of this email is strictly 
prohibited and may be in violation of law. If you are not the intended 
recipient, any disclosure, copying, forwarding or distribution of this email is 
strictly prohibited and this email and any attachments should be deleted 
immediately. This email and any attachments do not constitute an offer to sell 
or a solicitation of an offer to purchase any interest in any investment 
vehicle sponsored by Moon Capital Management LP (“Moon Capital”). Moon Capital 
does not provide legal, accounting or tax advice. Any statement regarding 
legal, accounting or tax matters was not intended or written to be relied upon 
by any person as advice. Moon Capital does not waive confidentiality or 
privilege as a result of this email.


Re: rounded timestamp ?

2012-07-11 Thread Sylvain Lebresne
There is no rounding or correction whatsoever. It just happens that if
you don't give a timestamp in CQL, the timestamp is generated server
side using that Java System.currentTimeMillis() that only provide
milliseconds precision. If you provide your own timestamp however we
use it without doing anything with it.

--
Sylvain

On Wed, Jul 11, 2012 at 1:56 PM, Marco Matarazzo
marco.matara...@hexkeep.com wrote:
 Greetings.

 Running (CQL 3) queries like:

update users set admin = 1 where corporation_id = 
 '7a55bc4c-84e7-479c-9ac6-43f7836705b5';

 … I see in logs a row like:

 StorageProxy.java (line 175) Mutations/ConsistencyLevel are 
 [RowMutation(keyspace='goh_test', 
 key='37613535626334632d383465372d343739632d396163362d343366373833363730356235',
  modifications=[ColumnFamily(users [admin:false:1@1342006844093000,])])]/ONE

 If I understand it correctly, that 1342006844093000 is the timestamp in 
 microseconds, getting rounded to milliseconds.

 If I modify queries in this way:

update users using timestamp 1342006844106123 set admin = 1 where 
 corporation_id = '7a55bc4c-84e7-479c-9ac6-43f7836705b5';

 … the log row becomes:

 StorageProxy.java (line 175) Mutations/ConsistencyLevel are 
 [RowMutation(keyspace='goh_test', 
 key='37613535626334632d383465372d343739632d396163362d343366373833363730356235',
  modifications=[ColumnFamily(users [admin:false:1@1342006844106123,])])]/ONE

 …and what I see is that the timestamp get through NOT rounded, with 
 microseconds precision.


 We see this behavior using cqlsh, C++ thrift bindings and phpcassa. I guess 
 they all use thrift, and so the rounding happens there.

 One of the problems is that sometimes it gets rounded up, so it's in the 
 future. But that's just a side effect of rounding, and I can't understand why 
 in the first place there is a rounding. I guess that the second case is just 
 Cassandra correcting the timestamp with data found in the CQL, and maybe 
 thrift is still sending a milliseconds-rounded timestamp, but I still can't 
 see a reason for thrift doing this.

 Could someone enlighten me a bit on this matter ?

 --
 Marco Matarazzo
 == Hex Keep ==

 You can learn more about a man
   in one hour of play
   than in one year of conversation.” - Plato






Re: help using org.apache.cassandra.cql3

2012-07-11 Thread Sylvain Lebresne
When I said to use the BATCH statement I mean't using a query that is
a BATCH statement, so something like:
  BEGIN BATCH
INSERT ...;
INSERT ...;
...
  APPLY BATCH;

If you want to that from java, you will want to look at the jdbc
driver (http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/),
though I don't know what is the status of the support for CQL3.

On Wed, Jul 11, 2012 at 2:18 PM, Leonid Ilyevsky
lilyev...@mooncapital.com wrote:
 Is it OK to put 1 of update statements in one
 batch, with 5 question marks in it? The set that many variables?

Yes batch statement can be prepared and in theory there isn't much
limit on the number of update statement (nor question marks) you can
put in one batch. However, the way C* work best is if you do
reasonably sized batches. It's even more true for CQL in the sense
that by using a huge batch statement you'll pay the parsing. So you
probably want to prepare one batch statement with a reasonable number
of statement in it (you'll have to test to find number that give the
best performances, but I would typically start with say 50-100 and see
if the performance are good enough) and reuse that to insert the data.

The other reason why breaking the insert into smallish batches is a
good idea is that it allows you to parallelize the insert using
multiple threads. And you need to parallelize if you want to get the
best out of C*.

--
Sylvain


 Maybe I can try small example first, just to see if it works at all.





 From: Derek Williams [mailto:de...@fyrie.net]
 Sent: Tuesday, July 10, 2012 7:19 PM
 To: user@cassandra.apache.org
 Subject: Re: help using org.apache.cassandra.cql3



 On Tue, Jul 10, 2012 at 3:04 PM, Leonid Ilyevsky lilyev...@mooncapital.com
 wrote:

 I am trying to use the org.apache.cassandra.cql3 package. Having problem
 connecting to the server using ClientState.

 I was not sure what to put in the credentials map (I did not set any
 users/passwords on my server), so I tried setting empty strings for
 “username” and “password”, setting them to bogus values, passing null to the
 login method – there was no difference.

 It does not complain at the login(), but then it complains about
 setKeyspace(my keyspace), saying that the specified keyspace does not
 exist (it obviously does exist).

 The configuration was loaded from cassandra.yaml used by the server.



 I did not have any problem like this when I used
 org.apache.cassandra.thrift.Cassandra.Client .



 What am I doing wrong?



 I think that package just contains server classes. Everything you need
 should be in org.apache.cassandra.thrift.



 To use cql3 I just use the client methods 'execute_cql_query',
 'prepare_cql_query' and 'execute_prepared_cql_query', after setting cql
 version to '3.0.0'.





 --

 Derek Williams




 
 This email, along with any attachments, is confidential and may be legally
 privileged or otherwise protected from disclosure. Any unauthorized
 dissemination, copying or use of the contents of this email is strictly
 prohibited and may be in violation of law. If you are not the intended
 recipient, any disclosure, copying, forwarding or distribution of this email
 is strictly prohibited and this email and any attachments should be deleted
 immediately. This email and any attachments do not constitute an offer to
 sell or a solicitation of an offer to purchase any interest in any
 investment vehicle sponsored by Moon Capital Management LP (“Moon Capital”).
 Moon Capital does not provide legal, accounting or tax advice. Any statement
 regarding legal, accounting or tax matters was not intended or written to be
 relied upon by any person as advice. Moon Capital does not waive
 confidentiality or privilege as a result of this email.


RE: help using org.apache.cassandra.cql3

2012-07-11 Thread Leonid Ilyevsky
Thanks Sylvain, I actually tried the prepared batch, works fine. I did the 1000 
rows in one batch, 20 columns each, and it was good. Then I tried 1, and it 
still works, I am going to measure which way it is faster overall.

-Original Message-
From: Sylvain Lebresne [mailto:sylv...@datastax.com]
Sent: Wednesday, July 11, 2012 9:32 AM
To: user@cassandra.apache.org
Subject: Re: help using org.apache.cassandra.cql3

When I said to use the BATCH statement I mean't using a query that is
a BATCH statement, so something like:
  BEGIN BATCH
INSERT ...;
INSERT ...;
...
  APPLY BATCH;

If you want to that from java, you will want to look at the jdbc
driver (http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/),
though I don't know what is the status of the support for CQL3.

On Wed, Jul 11, 2012 at 2:18 PM, Leonid Ilyevsky
lilyev...@mooncapital.com wrote:
 Is it OK to put 1 of update statements in one
 batch, with 5 question marks in it? The set that many variables?

Yes batch statement can be prepared and in theory there isn't much
limit on the number of update statement (nor question marks) you can
put in one batch. However, the way C* work best is if you do
reasonably sized batches. It's even more true for CQL in the sense
that by using a huge batch statement you'll pay the parsing. So you
probably want to prepare one batch statement with a reasonable number
of statement in it (you'll have to test to find number that give the
best performances, but I would typically start with say 50-100 and see
if the performance are good enough) and reuse that to insert the data.

The other reason why breaking the insert into smallish batches is a
good idea is that it allows you to parallelize the insert using
multiple threads. And you need to parallelize if you want to get the
best out of C*.

--
Sylvain


 Maybe I can try small example first, just to see if it works at all.





 From: Derek Williams [mailto:de...@fyrie.net]
 Sent: Tuesday, July 10, 2012 7:19 PM
 To: user@cassandra.apache.org
 Subject: Re: help using org.apache.cassandra.cql3



 On Tue, Jul 10, 2012 at 3:04 PM, Leonid Ilyevsky lilyev...@mooncapital.com
 wrote:

 I am trying to use the org.apache.cassandra.cql3 package. Having problem
 connecting to the server using ClientState.

 I was not sure what to put in the credentials map (I did not set any
 users/passwords on my server), so I tried setting empty strings for
 username and password, setting them to bogus values, passing null to the
 login method - there was no difference.

 It does not complain at the login(), but then it complains about
 setKeyspace(my keyspace), saying that the specified keyspace does not
 exist (it obviously does exist).

 The configuration was loaded from cassandra.yaml used by the server.



 I did not have any problem like this when I used
 org.apache.cassandra.thrift.Cassandra.Client .



 What am I doing wrong?



 I think that package just contains server classes. Everything you need
 should be in org.apache.cassandra.thrift.



 To use cql3 I just use the client methods 'execute_cql_query',
 'prepare_cql_query' and 'execute_prepared_cql_query', after setting cql
 version to '3.0.0'.





 --

 Derek Williams




 
 This email, along with any attachments, is confidential and may be legally
 privileged or otherwise protected from disclosure. Any unauthorized
 dissemination, copying or use of the contents of this email is strictly
 prohibited and may be in violation of law. If you are not the intended
 recipient, any disclosure, copying, forwarding or distribution of this email
 is strictly prohibited and this email and any attachments should be deleted
 immediately. This email and any attachments do not constitute an offer to
 sell or a solicitation of an offer to purchase any interest in any
 investment vehicle sponsored by Moon Capital Management LP (Moon Capital).
 Moon Capital does not provide legal, accounting or tax advice. Any statement
 regarding legal, accounting or tax matters was not intended or written to be
 relied upon by any person as advice. Moon Capital does not waive
 confidentiality or privilege as a result of this email.

This email, along with any attachments, is confidential and may be legally 
privileged or otherwise protected from disclosure. Any unauthorized 
dissemination, copying or use of the contents of this email is strictly 
prohibited and may be in violation of law. If you are not the intended 
recipient, any disclosure, copying, forwarding or distribution of this email is 
strictly prohibited and this email and any attachments should be deleted 
immediately.  This email and any attachments do not constitute an offer to sell 
or a solicitation of an offer to purchase any interest in any investment 
vehicle sponsored by Moon Capital Management LP (Moon Capital). Moon Capital 
does not provide legal, accounting or tax advice. Any statement regarding 
legal, accounting or tax 

Re: Zurich / Swiss / Alps meetup

2012-07-11 Thread Benoit Perroud
Coming back on this thread, we are proud to announce we opened a Swiss
BigData UserGroup.

http://www.bigdata-usergroup.ch/

Next meetup is July 16, with topic NoSQL Storage: War Stories and
Best Practices.

Hope to meet you there !

Benoit.


2012/5/17 Sasha Dolgy sdo...@gmail.com:
 All,

 A year ago I made a simple query to see if there were any users based in and
 around Zurich, Switzerland or the Alps region, interested in participating
 in some form of Cassandra User Group / Meetup.  At the time, 1-2 replies
 happened.  I didn't do much with that.

 Let's try this again.  Who all is interested?  I often am jealous about all
 the fun I miss out on with the regular meetups that happen stateside ...

 Regards,
 -sd

 --
 Sasha Dolgy
 sasha.do...@gmail.com


Connected file list in Cassandra

2012-07-11 Thread Tomek Hankus
Hi,
at the moment I'm doing research about keeping linked/connected file list
in Cassandra- e.g. PDF file cut into pages (multiple PDFs) where first page
is connected to second, second to third etc.
This files connection/link is not specified. Main goal is to be able to
get all linked files (the whole PDF/ all pages) while having only key to
first file (page).

Is there any Cassandra tool/feature which could help me to do that or the
only way is to create some wrapper holding keys relations?


Tom H


Re: Connected file list in Cassandra

2012-07-11 Thread David McNelis
I would use something other than the page itself as the key.  Maybe a
filename, something smaller.

Then you could use a LongType comparator for the columns and use the page
number for the column name, the value being the contents of the files.

On Wed, Jul 11, 2012 at 1:34 PM, Tomek Hankus tom...@gmail.com wrote:

 Hi,
 at the moment I'm doing research about keeping linked/connected file
 list in Cassandra- e.g. PDF file cut into pages (multiple PDFs) where
 first page is connected to second, second to third etc.
 This files connection/link is not specified. Main goal is to be able to
 get all linked files (the whole PDF/ all pages) while having only key to
 first file (page).

 Is there any Cassandra tool/feature which could help me to do that or the
 only way is to create some wrapper holding keys relations?


 Tom H





Re: Connected file list in Cassandra

2012-07-11 Thread David Brosius
 why not just hold the pages as different columns in the same row? columns are 
automatically sorted such that if the column name was associated with the page 
number it would automatically flow the way you wanted.  - Original Message 
-From: quot;Tomek Hankusquot; ;tom...@gmail.com 

Why is our range query failing in Cassandra 0.8.10 Client

2012-07-11 Thread JohnB
Hi: 

We are currently using Cassandra 0.8.10 and have run into some strange issues 
surrounding 
querying for a range of data


I ran a couple of get statements via the Cassandra client and found some 
interesting results:


Consider the following Column Family Definition:

ColumnFamily: events
  Key Validation Class: org.apache.cassandra.db.marshal.BytesType
  Default column value validator: org.apache.cassandra.db.marshal.BytesType
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period in seconds: 0.0/0
  Row Cache Provider: 
org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider
  Key cache size / save period in seconds: 20.0/14400
  Memtable thresholds: 0.2953125/1440/63 (millions of ops/minutes/MB)
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Replicate on write: true
  Built indexes: [events.events_Firm_idx, events.events_OrdType_idx, 
events.events_OrderID_idx
 , events.events_OrderQty_idx, events.events_Price_idx, 
events.events_Symbol_idx, events.events_ds_timestamp_idx]
  Column Metadata:
Column Name: Firm
  Validation Class: org.apache.cassandra.db.marshal.BytesType
  Index Name: events_Firm_idx
  Index Type: KEYS
Column Name: OrdType
  Validation Class: org.apache.cassandra.db.marshal.BytesType
  Index Name: events_OrdType_idx
  Index Type: KEYS
Column Name: OrderID
  Validation Class: org.apache.cassandra.db.marshal.BytesType
  Index Name: events_OrderID_idx
  Index Type: KEYS
Column Name: OrderQty
  Validation Class: org.apache.cassandra.db.marshal.LongType
  Index Name: events_OrderQty_idx
  Index Type: KEYS
Column Name: Price
  Validation Class: org.apache.cassandra.db.marshal.LongType
  Index Name: events_Price_idx
  Index Type: KEYS
Column Name: Symbol
  Validation Class: org.apache.cassandra.db.marshal.BytesType
  Index Name: events_Symbol_idx
Column Name: ds_timestamp
  Validation Class: org.apache.cassandra.db.marshal.LongType
  Index Name: events_ds_timestamp_idx
  Index Type: KEYS
 

get events WHERE Firm=434550 AND ds_timestamp=1341955958200;


…and the results are pretty much instantaneous.

1 Row Returned.

[default@FIX] get events WHERE Firm=434550 AND ds_timestamp=1341955958200;

---

RowKey: 64326430363362302d636164362d313165312d626637622d333836303737306639303133
= (column=ClOrdID, value=32323833, timestamp=1341955980651010)
= (column=Firm, value=434550, timestamp=1341955980651026)
= (column=OrdType, value=31, timestamp=1341955980651008)
= (column=OrderQty, value=8200, timestamp=1341955980651013)
= (column=Price, value=433561, timestamp=1341955980651019)
= (column=Symbol, value=544e54, timestamp=1341955980651018)
= (column=ds_timestamp, value=1341955958200, timestamp=1341955980651020)


If I run the following query:

get events WHERE Firm=434550 AND ds_timestamp=1341955958200 AND 
ds_timestamp=1341955958200;

(which in theory would should return the same 1 row result)

 

It runs for around 12 seconds,

 

And I get:

TimedOutException()

 


If I run:

get events WHERE Firm=434550 AND ds_timestamp=1341955958200; 

or

get events WHERE Firm=434550 AND ds_timestamp=1341955958200;

The results return quickly.



Curious, I also ran  a similar set of queries against the price field:

get events WHERE Firm=434550 AND Price=433561;
get events WHERE Firm=434550 AND Price=433561;
get events WHERE Firm=434550 AND Price=433561;

These all work  fine.


While, 

get events WHERE Firm=434550 AND Price=433561 AND Price = 433561;

returns an IO Exception.



This feels like it’s attempting to do a full table scan here….

What is going on here?

Am I doing something incorrectly?

We also see similar behavior when submit the query through our app via the 
Thrift API.

 
Thanks,
JohnB



Re: Java heap space on Cassandra start up version 1.0.10

2012-07-11 Thread Jason Hill
Thanks Jonathan that did the trick. I deleted the Statistics.db files
for the offending column family and was able to get Cassandra to
start.

Thank you,
Jason


RE: How to come up with a predefined topology

2012-07-11 Thread Richard Lowe
Using PropertyFileSnitch you can fine tune the topology of the cluster. 

What you tell Cassandra about your DC and rack doesn't have to match how 
they are in real life. You can create virtual DCs for Cassandra and even treat 
each node as a separate rack.

For example, in cassandra-topology.properties:

# Format is Node IP=DC Name:Rack Name
192.168.0.11=DC1_realtime:node_1
192.168.0.12=DC1_realtime:node_2
192.168.0.13=DC1_analytics:node_3
192.168.1.11=DC2_realtime:node_1

If you then specify the parameters for the keyspace to use these, you can 
control exactly which set of nodes replicas end up on. 

For example, in cassandra-cli:

create keyspace ks1 with placement_strategy = 
'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options = { 
DC1_realtime: 2, DC1_analytics: 1, DC2_realtime: 1 };

As far as I know there isn't any way to use the rack name in the 
strategy_options for a keyspace. You might want to look at the code to dig into 
that, perhaps.

Whichever snitch you use, the nodes are sorted in order of proximity to the 
client node. How this is determined depends on the snitch that's used but most 
(the ones that ship with Cassandra) will use the default ordering of same-node 
 same-rack  same-datacenter  different-datacenter. Each snitch has methods 
to tell Cassandra which rack and DC a node is in, so it always knows which node 
is closest. Used with the Bloom filters this can tell us where the nearest 
replica is.



-Original Message-
From: prasenjit mukherjee [mailto:prasen@gmail.com] 
Sent: 11 July 2012 06:33
To: user
Subject: How to come up with a predefined topology

Quoting from 
http://www.datastax.com/docs/0.8/cluster_architecture/replication#networktopologystrategy
:

Asymmetrical replication groupings are also possible depending on your use 
case. For example, you may want to have three replicas per data center to serve 
real-time application requests, and then have a single replica in a separate 
data center designated to running analytics.

Have 2 questions :
1. Any example how to configure a topology with 3 replicas in one DC ( with 2 
in 1 rack + 1 in another rack ) and one replica in another DC ?
 The default networktopologystrategy with rackinferringsnitch will only give me 
equal distribution ( 2+2 )

2. I am assuming the reads can go to any of the replicas. Is there a client 
which will send query to a node ( in cassandra ring ) which is closest to the 
client ?

-Thanks,
Prasenjit




Re: is this something to be concerned about - MUTATION message dropped

2012-07-11 Thread Frank Hsueh
out of curiosity, is there a way that Cassandra can communicate that it's
close to the being overloaded ?


On Sun, Jun 17, 2012 at 6:29 PM, aaron morton aa...@thelastpickle.comwrote:

 http://wiki.apache.org/cassandra/FAQ#dropped_messages

 https://www.google.com/#q=cassandra+dropped+messages

 Cheers


 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 15/06/2012, at 12:54 AM, Poziombka, Wade L wrote:

 INFO [ScheduledTasks:1] 2012-06-14 07:49:54,355 MessagingService.java
 (line 615) 15 MUTATION message dropped in last 5000ms
 ** **
 It is at INFO level so I’m inclined to think not but is seems like
 whenever messages are dropped there may be some issue?





-- 
Frank Hsueh | frank.hs...@gmail.com


Concerns about Cassandra upgrade from 1.0.6 to 1.1.X

2012-07-11 Thread Roshan
Hello

Currently we are using Cassandra 1.0.6 in our production system but suffer
with the CASSANDRA-3616 (it is already fixed in 1.0.7 version).

We thought to upgrade the Cassandra to 1.1.X versions, to get it's new
features, but having some concerns about the upgrade and expert advices are
mostly welcome.

1. Can Cassandra 1.1.X identify 1.0.X configurations like SSTables, commit
logs, etc without ant issue? And vise versa. Because if something happens to
1.1.X after deployed to production, we want to downgrade to 1.0.6 version
(because that's the versions we tested with our applications). 

2. How do we need to do upgrade process?  Currently we have 3 node 1.0.6
cluster in production. Can we upgrade node by node? If we upgrade node by
node, will the other 1.0.6 nodes identify 1.1.X nodes without any issue?

Appreciate experts comments on this. Many Thanks.

/Roshan 

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Concerns-about-Cassandra-upgrade-from-1-0-6-to-1-1-X-tp7581197.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: How to come up with a predefined topology

2012-07-11 Thread prasenjit mukherjee
 As far as I know there isn't any way to use the rack name in the 
 strategy_options for a keyspace. You
 might want to look at the code to dig into that, perhaps.

Aha, I was wondering if I could do that as well ( specify rack options ) :)

Thanks for the pointer, I will dig into the code.

-Thanks,
Prasenjit

On Thu, Jul 12, 2012 at 5:33 AM, Richard Lowe richard.l...@arkivum.com wrote:
 If you then specify the parameters for the keyspace to use these, you can 
 control exactly which set of nodes replicas end up on.

 For example, in cassandra-cli:

 create keyspace ks1 with placement_strategy = 
 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options = 
 { DC1_realtime: 2, DC1_analytics: 1, DC2_realtime: 1 };

 As far as I know there isn't any way to use the rack name in the 
 strategy_options for a keyspace. You might want to look at the code to dig 
 into that, perhaps.

 Whichever snitch you use, the nodes are sorted in order of proximity to the 
 client node. How this is determined depends on the snitch that's used but 
 most (the ones that ship with Cassandra) will use the default ordering of 
 same-node  same-rack  same-datacenter  different-datacenter. Each snitch 
 has methods to tell Cassandra which rack and DC a node is in, so it always 
 knows which node is closest. Used with the Bloom filters this can tell us 
 where the nearest replica is.



 -Original Message-
 From: prasenjit mukherjee [mailto:prasen@gmail.com]
 Sent: 11 July 2012 06:33
 To: user
 Subject: How to come up with a predefined topology

 Quoting from 
 http://www.datastax.com/docs/0.8/cluster_architecture/replication#networktopologystrategy
 :

 Asymmetrical replication groupings are also possible depending on your use 
 case. For example, you may want to have three replicas per data center to 
 serve real-time application requests, and then have a single replica in a 
 separate data center designated to running analytics.

 Have 2 questions :
 1. Any example how to configure a topology with 3 replicas in one DC ( with 2 
 in 1 rack + 1 in another rack ) and one replica in another DC ?
  The default networktopologystrategy with rackinferringsnitch will only give 
 me equal distribution ( 2+2 )

 2. I am assuming the reads can go to any of the replicas. Is there a client 
 which will send query to a node ( in cassandra ring ) which is closest to the 
 client ?

 -Thanks,
 Prasenjit




Re: Using a node in separate cluster without decommissioning.

2012-07-11 Thread aaron morton
 Since replication factor is 2 in first cluster, I
 won't lose any data.
Assuming you have been running repair or working at CL QUORUM (which is the 
same as CL ALL for RF 2)

 Is it advisable and safe to go ahead?
um, so the plan is to turn off 2 nodes in the first cluster, restask them into 
the new cluster and then reverse the process ?

If you simply turn two nodes off in the first cluster you will have reduce the 
availability for a portion of the ring. 25% of the keys will now have at best 1 
node they can be stored on. If a node is having any sort of problems, and it's 
is a replica for one of the down nodes, the cluster will appear down for 12.5% 
of the keyspace.

If you work at QUORUM you will not have enough nodes available to write / read 
25% of the keys. 

If you decomission the nodes, you will still have 2 replicas available for each 
key range. This is the path I would recommend.

If you _really_ need to do it what you suggest will probably work. Some tips:

* do safe shutdowns - nodetool disablegossip, disablethrift, drain
* don't forget to copy the yaml file. 
* in the first cluster the other nodes will collect hints for the first hour 
the nodes are down. You are not going to want these so disable HH. 
* get the nodes back into the first cluster before gc_grace_seconds expires. 
* bring them back and repair them.
* when you bring them back, reading at CL ONE will give inconsistent results. 
Reading at QUOURM may result in a lot of repair activity.

Hope that helps. 
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 11/07/2012, at 6:35 AM, rohit bhatia wrote:

 Hi
 
 I want to take out 2 nodes from a 8 node cluster and use in another
 cluster, but can't afford the overhead of streaming the data and
 rebalance cluster. Since replication factor is 2 in first cluster, I
 won't lose any data.
 
 I'm planning to save my commit_log and data directories and
 bootstrapping the node in the second cluster. Afterwards I'll just
 replace both the directories and join the node back to the original
 cluster.  This should work since cassandra saves all the cluster and
 schema info in the system keyspace.
 
 Is it advisable and safe to go ahead?
 
 Thanks
 Rohit



Re: failed to delete commitlog, cassandra can't accept writes

2012-07-11 Thread aaron morton
I don't think it's related to 4337. 

There is an explicit close call just before the deletion attempt. 

Can you create a ticket on https://issues.apache.org/jira/browse/CASSANDRA with 
all of the information you've got here (including the full JVM vendor, version, 
build). Can you also check if the file it tries to delete exists ? (I assume it 
does, otherwise it would be a different error). 

Thanks for digging into this. 

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 11/07/2012, at 9:36 AM, Frank Hsueh wrote:

 oops; I missed log line:
 
 
 ERROR [COMMIT-LOG-ALLOCATOR] 2012-07-10 14:19:39,776 
 AbstractCassandraDaemon.java (line 134) Exception in thread 
 Thread[COMMIT-LOG-ALLOCATOR,5,main]
 java.io.IOError: java.io.IOException: Failed to delete 
 C:\var\lib\cassandra\commitlog\CommitLog-948695923996466.log
   at 
 org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:176)
   at 
 org.apache.cassandra.db.commitlog.CommitLogAllocator$4.run(CommitLogAllocator.java:223)
   at 
 org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
   at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.IOException: Failed to delete 
 C:\var\lib\cassandra\commitlog\CommitLog-948695923996466.log
   at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:54)
   at 
 org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:172)
   ... 4 more
 
 
 
 On Tue, Jul 10, 2012 at 2:35 PM, Frank Hsueh frank.hs...@gmail.com wrote:
 after reading the JIRA, I decided to use Java 6.
 
 with Casandra 1.1.2 on Java 6 x64 on Win7 sp1 x64 (all latest versions), 
 after a several minutes of sustained writes, I see:
 
 from system.log:
 
 java.io.IOError: java.io.IOException: Failed to delete 
 C:\var\lib\cassandra\commitlog\CommitLog-948695923996466.log
   at 
 org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:176)
   at 
 org.apache.cassandra.db.commitlog.CommitLogAllocator$4.run(CommitLogAllocator.java:223)
   at 
 org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
   at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.IOException: Failed to delete 
 C:\var\lib\cassandra\commitlog\CommitLog-948695923996466.log
   at 
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:54)
   at 
 org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:172)
   ... 4 more
 
 
 anybody seen this before?  is this related to 4337 ?
 
 
 
 
 On Sat, Jul 7, 2012 at 6:36 PM, Frank Hsueh frank.hs...@gmail.com wrote:
 bug already reported:
 
 https://issues.apache.org/jira/browse/CASSANDRA-4337
 
 
 
 On Sat, Jul 7, 2012 at 6:26 PM, Frank Hsueh frank.hs...@gmail.com wrote:
 Hi,
 
 I'm running Casandra 1.1.2 on Java 7 x64 on Win7 sp1 x64 (all latest 
 versions).  If it matters, I'm using a late version of Astyanax as my client.
 
 I'm using 4 threads to write a lot of data into a single CF.
 
 After several minutes of load (~ 30m at last incident), Cassandra stops 
 accepting writes (client reports an OperationTimeoutException).  I looked at 
 the logs and I see on the Cassandra server:
 
 
 ERROR 18:00:42,807 Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main]
 java.io.IOError: java.io.IOException: Rename from 
 \var\lib\cassandra\commitlog\CommitLog-701533048437587.log to 703272597990002 
 failed
 at 
 org.apache.cassandra.db.commitlog.CommitLogSegment.init(CommitLogSegment.java:127)
 at 
 org.apache.cassandra.db.commitlog.CommitLogSegment.recycle(CommitLogSegment.java:204)
 at 
 org.apache.cassandra.db.commitlog.CommitLogAllocator$2.run(CommitLogAllocator.java:166)
 at 
 org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: java.io.IOException: Rename from 
 \var\lib\cassandra\commitlog\CommitLog-701533048437587.log to 703272597990002 
 failed
 at 
 org.apache.cassandra.db.commitlog.CommitLogSegment.init(CommitLogSegment.java:105)
 ... 5 more
 
 
 Anybody else seen this before ?
 
 
 -- 
 Frank Hsueh | frank.hs...@gmail.com
 
 
 
 -- 
 Frank Hsueh | frank.hs...@gmail.com
 
 
 
 -- 
 Frank Hsueh | frank.hs...@gmail.com
 
 
 
 -- 
 Frank Hsueh | frank.hs...@gmail.com



Re: snapshot issue

2012-07-11 Thread aaron morton
Make sure JNA is in the class path http://wiki.apache.org/cassandra/FAQ#jna

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 11/07/2012, at 9:38 PM, Adeel Akbar wrote:

 Hi,
 
 I am trying to taking snapshot of my data but faced following error. Please 
 help me to resolve this issue.
 
 [root@cassandra1 bin]# ./nodetool -h localhost snapshot 20120711
 Exception in thread main java.io.IOError: java.io.IOException: Cannot run 
 program ln: java.io.IOException: error=12, Cannot allocate memory
at 
 org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1660)
at 
 org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1686)
at org.apache.cassandra.db.Table.snapshot(Table.java:198)
at 
 org.apache.cassandra.service.StorageService.takeSnapshot(StorageService.java:1393)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
at 
 com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)
at 
 com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:226)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:251)
at 
 com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:857)
at 
 com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:795)
at 
 javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1450)
at 
 javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:90)
at 
 javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1285)
at 
 javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1383)
at 
 javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:807)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
at sun.rmi.transport.Transport$1.run(Transport.java:177)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
at 
 sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553)
at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808)
at 
 sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667)
at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
 Caused by: java.io.IOException: Cannot run program ln: java.io.IOException: 
 error=12, Cannot allocate memory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:475)
at 
 org.apache.cassandra.utils.CLibrary.createHardLinkWithExec(CLibrary.java:181)
at 
 org.apache.cassandra.utils.CLibrary.createHardLink(CLibrary.java:147)
at 
 org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:730)
at 
 org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1653)
... 33 more
 Caused by: java.io.IOException: java.io.IOException: error=12, Cannot 
 allocate memory
at java.lang.UNIXProcess.init(UNIXProcess.java:164)
at java.lang.ProcessImpl.start(ProcessImpl.java:81)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:468)
... 37 more
 
 -- 
 
 
 Thanks  Regards
 
 *Adeel**Akbar*
 
 
 



Re: is this something to be concerned about - MUTATION message dropped

2012-07-11 Thread Tyler Hobbs
JMX is really the only way it exposes that kind of information.  I
recommend setting up mx4j if you want to check on the server stats
programmatically.

On Wed, Jul 11, 2012 at 8:17 PM, Frank Hsueh frank.hs...@gmail.com wrote:

 out of curiosity, is there a way that Cassandra can communicate that it's
 close to the being overloaded ?


 On Sun, Jun 17, 2012 at 6:29 PM, aaron morton aa...@thelastpickle.comwrote:

 http://wiki.apache.org/cassandra/FAQ#dropped_messages

 https://www.google.com/#q=cassandra+dropped+messages

 Cheers


   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 15/06/2012, at 12:54 AM, Poziombka, Wade L wrote:

 INFO [ScheduledTasks:1] 2012-06-14 07:49:54,355 MessagingService.java
 (line 615) 15 MUTATION message dropped in last 5000ms
 ** **
 It is at INFO level so I’m inclined to think not but is seems like
 whenever messages are dropped there may be some issue?





 --
 Frank Hsueh | frank.hs...@gmail.com




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Concerns about Cassandra upgrade from 1.0.6 to 1.1.X

2012-07-11 Thread Tyler Hobbs
On Wed, Jul 11, 2012 at 8:38 PM, Roshan codeva...@gmail.com wrote:



 Currently we are using Cassandra 1.0.6 in our production system but suffer
 with the CASSANDRA-3616 (it is already fixed in 1.0.7 version).

 We thought to upgrade the Cassandra to 1.1.X versions, to get it's new
 features, but having some concerns about the upgrade and expert advices are
 mostly welcome.

 1. Can Cassandra 1.1.X identify 1.0.X configurations like SSTables, commit
 logs, etc without ant issue? And vise versa. Because if something happens
 to
 1.1.X after deployed to production, we want to downgrade to 1.0.6 version
 (because that's the versions we tested with our applications).


1.1 can handle 1.0 data/schemas/etc without a problem, but the reverse is
not necessarily true.  I don't know what in particular might break if you
downgrade from 1.1 to 1.0, but in general, Cassandra does not handle
downgrading gracefully; typically the SSTable formats have changed during
major releases.  If you snapshot prior to upgrading, you can always roll
back to that, but you will have lost anything written since the upgrade.



 2. How do we need to do upgrade process?  Currently we have 3 node 1.0.6
 cluster in production. Can we upgrade node by node? If we upgrade node by
 node, will the other 1.0.6 nodes identify 1.1.X nodes without any issue?


Yes, you can do a rolling upgrade to 1.1, one node at a time.  It's usually
fine to leave the cluster in a mixed state for a short while as long as you
don't do things like repairs, decommissions, or bootstraps, but I wouldn't
stay in a mixed state any longer than you have to.

It's best to test major upgrades with a second, non-production cluster if
that's an option.

-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: How to come up with a predefined topology

2012-07-11 Thread Tyler Hobbs
I highly recommend specifying the same rack for all nodes (using
cassandra-topology.properties) unless you really have a good reason not too
(and you probably don't).  The way that replicas are chosen when multiple
racks are in play can be fairly confusing and lead to a data imbalance if
you don't catch it.

On Wed, Jul 11, 2012 at 10:53 PM, prasenjit mukherjee
prasen@gmail.comwrote:

  As far as I know there isn't any way to use the rack name in the
 strategy_options for a keyspace. You
  might want to look at the code to dig into that, perhaps.

 Aha, I was wondering if I could do that as well ( specify rack options ) :)

 Thanks for the pointer, I will dig into the code.

 -Thanks,
 Prasenjit

 On Thu, Jul 12, 2012 at 5:33 AM, Richard Lowe richard.l...@arkivum.com
 wrote:
  If you then specify the parameters for the keyspace to use these, you
 can control exactly which set of nodes replicas end up on.
 
  For example, in cassandra-cli:
 
  create keyspace ks1 with placement_strategy =
 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options
 = { DC1_realtime: 2, DC1_analytics: 1, DC2_realtime: 1 };
 
  As far as I know there isn't any way to use the rack name in the
 strategy_options for a keyspace. You might want to look at the code to dig
 into that, perhaps.
 
  Whichever snitch you use, the nodes are sorted in order of proximity to
 the client node. How this is determined depends on the snitch that's used
 but most (the ones that ship with Cassandra) will use the default ordering
 of same-node  same-rack  same-datacenter  different-datacenter. Each
 snitch has methods to tell Cassandra which rack and DC a node is in, so it
 always knows which node is closest. Used with the Bloom filters this can
 tell us where the nearest replica is.
 
 
 
  -Original Message-
  From: prasenjit mukherjee [mailto:prasen@gmail.com]
  Sent: 11 July 2012 06:33
  To: user
  Subject: How to come up with a predefined topology
 
  Quoting from
 http://www.datastax.com/docs/0.8/cluster_architecture/replication#networktopologystrategy
  :
 
  Asymmetrical replication groupings are also possible depending on your
 use case. For example, you may want to have three replicas per data center
 to serve real-time application requests, and then have a single replica in
 a separate data center designated to running analytics.
 
  Have 2 questions :
  1. Any example how to configure a topology with 3 replicas in one DC (
 with 2 in 1 rack + 1 in another rack ) and one replica in another DC ?
   The default networktopologystrategy with rackinferringsnitch will only
 give me equal distribution ( 2+2 )
 
  2. I am assuming the reads can go to any of the replicas. Is there a
 client which will send query to a node ( in cassandra ring ) which is
 closest to the client ?
 
  -Thanks,
  Prasenjit
 
 




-- 
Tyler Hobbs
DataStax http://datastax.com/