Re: Linux Filesystem for Cassandra

2012-04-05 Thread Radim Kolar



What OS are you using

FreeBSD 8.3 64 bit PRERELEASE


Re: Is there a way to update column's TTL only?

2012-04-05 Thread aaron morton
You cannot set the the TTL without also setting the column value. 

Could you keep a record of future deletes in a CF and then action them as a 
batch process ? 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 5/04/2012, at 2:00 PM, 金剑 wrote:

 Hi,
 
 We would like to leverage Cassandra's expiration to manage our data's 
 lifecycle. We need to: Delete data after a period, like: 1 hour , when user 
 clicked the Delete button. 
 
 We need to read and insert the column in order to update the TTL, but this is 
 unacceptable in our system that might need to readout gigabytes of data.
 
 Is there a way to do this?
 
 Best Regards!
 
 Jian Jin
 



Re: server down

2012-04-05 Thread aaron morton
Sun or Open JDK ? 

Either way I would suggest upgrading to the latest JDK, upgrading cassandra to 
1.0.8 and running nodetool upgradetables. 

If the fault persists after that I would look at IO or memory issues. 

Hope that helps. 

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 5/04/2012, at 4:31 PM, Michael Vaknin wrote:

 i am usuinc cassandra 1.0.3 java 6.0.17 on ubuntu 10.04
 this is a stable version for 6 months now.
 
 On Wed, Apr 4, 2012 at 9:51 PM, aaron morton aa...@thelastpickle.com wrote:
 What version of cassandra are you using ? 
 
 What java vendor / version ? 
 
 What OS vendor / version ? 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 4/04/2012, at 11:33 PM, Michael Vaknine wrote:
 
 I am recently having problems with 1 of my 4ty cluster servers.
 I would appreciate any help
  
 ERROR 11:23:14,331 Fatal exception in thread Thread[MutationStage:19,5,main]
 225 java.lang.InternalError: a fault occurred in a recent unsafe memory 
 access operation in compiled Java code
 226 at 
 org.apache.cassandra.io.util.AbstractDataInput.readInt(AbstractDataInput.java:196)
 227 at 
 org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:348)
 228 at 
 org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:120)
 229 at 
 org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:83)
 230 at 
 org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:73)
 231 at 
 org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:37)
 232 at 
 org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:79)
 233 at 
 org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:40)
 234 at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
 235 at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
 236 at 
 org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:107)
 237 at 
 org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:145)
 238 at 
 org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:124)
 239 at 
 org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98)
 240 at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
 241 at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
 242 at 
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:116)
 243 at 
 org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:144)
 244 at 
 org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:227)
 245 at 
 org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62)
 246 at 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1274)
 247 at 
 org.apache.cassandra.db.ColumnFamilyStore.cacheRow(ColumnFamilyStore.java:1141)
 248 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1170)
 249 at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1122)
 250 at 
 org.apache.cassandra.db.Table.readCurrentIndexedColumns(Table.java:504)
 251 at org.apache.cassandra.db.Table.apply(Table.java:441)
 252 at 
 org.apache.cassandra.db.commitlog.CommitLog$2.runMayThrow(CommitLog.java:338)
 253 at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
 254 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 255 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 256 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 257 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 258 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 259 at java.lang.Thread.run(Thread.java:619)
 260  INFO 11:23:17,640 Finished reading 
 /var/lib/cassandra/commitlog/CommitLog-1333465745624.log
 261 ERROR 11:23:17,641 Exception encountered during startup
 262 java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
 java.lang.InternalError: a fault occurred in a recent unsafe memory access 
 operation in compiled Java code
 263 at 
 org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:508)
  
 Thanks
 Michael
 
 



Bulk loading errors with 1.0.8

2012-04-05 Thread Benoit Perroud
Hi All,

I'm experiencing the following errors while bulk loading data into a cluster

ERROR [Thread-23] 2012-04-05 09:58:12,252 AbstractCassandraDaemon.java
(line 139) Fatal exception in thread Thread[Thread-23,5,main]
java.lang.RuntimeException: Insufficient disk space to flush
7813594056494754913 bytes
at 
org.apache.cassandra.db.ColumnFamilyStore.getFlushPath(ColumnFamilyStore.java:635)
at 
org.apache.cassandra.streaming.StreamIn.getContextMapping(StreamIn.java:92)
at 
org.apache.cassandra.streaming.IncomingStreamReader.init(IncomingStreamReader.java:68)
at 
org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:185)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)

Here I'm not really sure I was able to generate 7 exa bytes of data ;)


ERROR [Thread-46] 2012-04-05 09:58:14,453 AbstractCassandraDaemon.java
(line 139) Fatal exception in thread Thread[Thread-46,5,main]
java.lang.NullPointerException
at 
org.apache.cassandra.io.sstable.SSTable.getMinimalKey(SSTable.java:156)
at 
org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:334)
at 
org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:302)
at 
org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:155)
at 
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:89)
at 
org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:185)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)

This one sounds like a null key added to the SSTable at some point,
but I'm rather confident I'm checking for key nullity.

Errors are seen in different nodes. Some nodes succeeded. All 5
cluster nodes run 1.0.8 with JNA enabled.

Basically I'm generating SSTables in a Hadoop Reducer, storing them in
HDFS. After the job finished, I download them back into a single node
and streaming the files into Cassandra (Yes, I definitely need to try
1.1).

Do someone have a hint or a pointer where to start looking ?

Thanks,

Benoit.


Re: Will Cassandra balance load across replicas?

2012-04-05 Thread zhiming shen
Thanks for your reply. My question is about the impact of replication on
load balancing. Say we have nodes ABCD... in the ring. ReplicationFactor is
3 so the data on A will also have replicas on B and C. If we are reading
data own by A, and A is already very busy, will the requests be forwarded
to B and C? How about update requests?


Thanks,


Zhiming

On Thu, Apr 5, 2012 at 12:33 AM, Watanabe Maki watanabe.m...@gmail.comwrote:

 I assume you are talking about nodes, rather than replicas.
 The data distribution over ring depends on Partitioner and Replica
 placement strategy you use.
 If you are using Random Partitioner and Simple Strategy, your data will be
 automatically distributed over the nodes in the ring.

 maki


 On 2012/04/05, at 12:31, zhiming shen zhiming.s...@gmail.com wrote:

  Hi,
 
  Can any one tell me whether Cassandra can do load balancing across
 replicas? How to configure it for this purpose? Thanks very much.
 
 
  Best Regards,
 
  Zhiming



Re: Will Cassandra balance load across replicas?

2012-04-05 Thread Rob Coli
On Thu, Apr 5, 2012 at 9:22 AM, zhiming shen zhiming.s...@gmail.com wrote:
 Thanks for your reply. My question is about the impact of replication on
 load balancing. Say we have nodes ABCD... in the ring. ReplicationFactor is
 3 so the data on A will also have replicas on B and C. If we are reading
 data own by A, and A is already very busy, will the requests be forwarded to
 B and C? How about update requests?

Google cassandra dynamic snitch.

-- 
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Request timeout and host marked down

2012-04-05 Thread Daning Wang
Hi all,

We are using Hector and ofter we see lots of timeout exception in the log,
I know that the hector can failover to other node, but I want to reduce the
number of timeouts.

any hector parameter I should change to reduce this error?

also, on the server side, any kind of tunning need to do for the timeout?


Thanks in advance.


12/04/04 15:13:20 ERROR
com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 1 ms
12/04/04 15:13:25 ERROR
me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN
TRIGGERED for host 10.28.78.123(10.28.78.123):9160
12/04/04 15:13:25 ERROR
me.prettyprint.cassandra.connection.HConnectionManager: Pool state on
shutdown:
ConcurrentCassandraClientPoolByHost:{10.28.78.123(10.28.78.123):9160};
IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
12/04/04 15:13:44 ERROR
me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN
TRIGGERED for host 10.240.113.171(10.240.113.171):9160
12/04/04 15:13:44 ERROR
me.prettyprint.cassandra.connection.HConnectionManager: Pool state on
shutdown:
ConcurrentCassandraClientPoolByHost:{10.240.113.171(10.240.113.171):9160};
IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
12/04/04 15:13:46 ERROR
me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN
TRIGGERED for host 10.28.78.123(10.28.78.123):9160
12/04/04 15:13:46 ERROR
me.prettyprint.cassandra.connection.HConnectionManager: Pool state on
shutdown:
ConcurrentCassandraClientPoolByHost:{10.28.78.123(10.28.78.123):9160};
IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
12/04/04 15:13:46 ERROR
me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN
TRIGGERED for host 10.123.83.114(10.123.83.114):9160
12/04/04 15:13:46 ERROR
me.prettyprint.cassandra.connection.HConnectionManager: Pool state on
shutdown:
ConcurrentCassandraClientPoolByHost:{10.123.83.114(10.123.83.114):9160};
IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
12/04/04 15:13:46 ERROR
me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN
TRIGGERED for host 10.6.115.239(10.6.115.239):9160
12/04/04 15:13:46 ERROR
me.prettyprint.cassandra.connection.HConnectionManager: Pool state on
shutdown:
ConcurrentCassandraClientPoolByHost:{10.6.115.239(10.6.115.239):9160};
IsActive?: true; Active: 1; Blocked: 0; Idle: 5; NumBeforeExhausted: 19
12/04/04 15:13:49 ERROR
com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 1 ms
12/04/04 15:13:49 ERROR
me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN
TRIGGERED for host 10.120.205.48(10.120.205.48):9160
12/04/04 15:13:49 ERROR
me.prettyprint.cassandra.connection.HConnectionManager: Pool state on
shutdown:
ConcurrentCassandraClientPoolByHost:{10.120.205.48(10.120.205.48):9160};
IsActive?: true; Active: 3; Blocked: 0; Idle: 3; NumBeforeExhausted: 17
12/04/04 15:13:50 ERROR
me.prettyprint.cassandra.connection.HConnectionManager: MARK HOST AS DOWN
TRIGGERED for host 10.28.20.200(10.28.20.200):9160
12/04/04 15:13:50 ERROR
me.prettyprint.cassandra.connection.HConnectionManager: Pool state on
shutdown:
ConcurrentCassandraClientPoolByHost:{10.28.20.200(10.28.20.200):9160};
IsActive?: true; Active: 2; Blocked: 0; Idle: 4; NumBeforeExhausted: 18
12/04/04 15:13:51 ERROR
com.netseer.services.keywordstat.io.KeywordServiceImpl: Timout 1 ms


Re: Server side scripting support in Cassandra - go Python !

2012-04-05 Thread Data Craftsman
Just like the Oracle Store Procedure.

2012/3/26 Data Craftsman database.crafts...@gmail.com:
 Howdy,

 Some Polyglot Persistence(NoSQL) products started support server side
 scripting, similar to RDBMS store procedure.
 E.g. Redis Lua scripting.

 I wish it is Python when Cassandra has the server side scripting feature.

 FYI,

 http://antirez.com/post/250

 http://nosql.mypopescu.com/post/19949274021/alchemydb-an-integrated-graphdb-rdbms-kv-store

 server side scripting support is an extremely powerful tool. Having
 processing close to data (i.e. data locality) is a well known
 advantage, ..., it can open the doors to completely new features.

 Thanks,

 Charlie (@mujiang) 一个 木匠
 ===
 Data Architect Developer
 http://mujiang.blogspot.com


Resident size growth

2012-04-05 Thread Omid Aladini
Hi,

I'm experiencing a steady growth in resident size of JVM running
Cassandra 1.0.7. I disabled JNA and off-heap row cache, tested with
and without mlockall disabling paging, and upgraded to JRE 1.6.0_31 to
prevent this bug [1] to leak memory. Still JVM's resident set size
grows steadily. A process with Xmx=2048M has grown to 6GB resident
size and one with Xmx=8192M to 16GB in a few hours and increasing. Has
anyone experienced this? Any idea how to deal with this issue?

Thanks,
Omid

[1] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7066129


Re: Materialized Views or Index CF - data model question

2012-04-05 Thread Radim Kolar
Will 1500 bytes row size be large or small for Cassandra from your 
understanding?


performance degradation starts at 500MB rows, its very slow if you hit 
this limit.


upgrade from cassandra 0.8 to 1.0

2012-04-05 Thread ruslan usifov
Hello

It's looks that cassandra 1.0.x is stable, and have interesting things like
offheap memtables and row cashes, so we want to upgrade to 1.0 version from
0.8. Is it possible to do without cluster downtime (while we upgrade all
nodes)? I mean follow: when we begin upgrade at some point in working
cluster will be mix of 0.8 (nodes that are not upgraded yet) and 1.0(nodes
that already upgraded) so i am concerned about this situation, i.e.
communications between nodes can be broken because  version communication
protocol incompatibilies


Re: upgrade from cassandra 0.8 to 1.0

2012-04-05 Thread William Wichgers

On 04/05/2012 03:19 PM, ruslan usifov wrote:

Hello

It's looks that cassandra 1.0.x is stable, and have interesting things 
like offheap memtables and row cashes, so we want to upgrade to 1.0 
version from 0.8. Is it possible to do without cluster downtime (while 
we upgrade all nodes)? I mean follow: when we begin upgrade at some 
point in working cluster will be mix of 0.8 (nodes that are not 
upgraded yet) and 1.0(nodes that already upgraded) so i am concerned 
about this situation, i.e. communications between nodes can be broken 
because  version communication protocol incompatibilies


This is from the NEWs file in the 1.0 upgrade:

1.0
===

Upgrading
-
- Upgrading from version 0.7.1+ or 0.8.2+ can be done with a rolling
  restart, one node at a time.  (0.8.0 or 0.8.1 are NOT 
network-compatible

  with 1.0: upgrade to the most recent 0.8 release first.)
  You do not need to bring down the whole cluster at once.
- After upgrading, run nodetool scrub against each node before running
  repair, moving nodes, or adding new ones.



Re: really bad select performance

2012-04-05 Thread Chris Hart
Thanks for all the help everyone.  The values were meant to be binary.  I ended 
making the possible values between 0 and 50 instead of just 0 or 1.  That way 
no single index row gets that wide.  I now run queries for everything from 1 to 
50 to get 'queued' items and set the value to 0 when I'm done (I will never 
query for row_loaded = 0).  It's unfortunate Cassandra doesn't delegate the 
query execution to a node that had the index row on it, but rather tries to 
move the entire index row to the node that is queried.

-Chris

- Original Message -
From: David Leimbach leim...@gmail.com
To: user@cassandra.apache.org
Sent: Monday, April 2, 2012 8:51:46 AM
Subject: Re: really bad select performance


This is all very hypothetical, but I've been bitten by this before. 

Does row_loaded happen to be a binary or boolean value? If so the secondary 
index generated by Cassandra will have at most 2 rows, and they'll be REALLY 
wide if you have a lot of entries. Since Cassandra doesn't distribute columns 
over rows, those potentially very wide index rows, and their replicas, must 
live in SSTables in their entirety on the nodes that own them (and their 
replicas). 


Even though you limit 1, I'm not sure what behind the scenes things Cassandra 
does. I've received advice to avoid the built in secondary indexes in Cassandra 
for some of these reasons. Also if row_loaded is meant to implement some kind 
of queuing behavior, it could be the wrong problem space for Cassandra as a 
result of all of the above. 









On Sat, Mar 31, 2012 at 12:22 PM, aaron morton  aa...@thelastpickle.com  
wrote: 




Is there anything in the logs when you run the queries ? 


Try turning the logging up to DEBUG on the node that fails to return and see 
what happens. You will see it send messages to other nodes and do work itself. 

One thing to note, a query that uses secondary indexes runs on a node for each 
token range. So it will use more than CL number of nodes. 


Cheers 







- 
Aaron Morton 
Freelance Developer 
@aaronmorton 
http://www.thelastpickle.com 


On 30/03/2012, at 11:52 AM, Chris Hart wrote: 



Hi, 

I have the following cluster: 

136112946768375385385349842972707284580 
ip address MountainViewRAC1 Up Normal 1.86 GB 20.00% 0 
ip address MountainViewRAC1 Up Normal 2.17 GB 33.33% 
56713727820156410577229101238628035242 
ip address MountainViewRAC1 Up Normal 2.41 GB 33.33% 
113427455640312821154458202477256070485 
ip address Rackspace RAC1 Up Normal 3.9 GB 13.33% 
136112946768375385385349842972707284580 

The following query runs quickly on all nodes except 1 MountainView node: 

select * from Access_Log where row_loaded = 0 limit 1; 

There is a secondary index on row_loaded. The query usually doesn't complete 
(but sometimes does) on the bad node and returns very quickly on all other 
nodes. I've upping the rpc timeout to a full minute (rpc_timeout_in_ms: 6) 
in the yaml, but it still often doesn't complete in a minute. It seems just as 
likely to complete and takes about the same amount of time whether the limit is 
1, 100 or 1000. 


Thanks for any help, 
Chris 




Re: Will Cassandra balance load across replicas?

2012-04-05 Thread Watanabe Maki
In your case, cassandra will read the data from the nearest node, and read 
digest from other two nodes.
When those read meet requested consistency level, cassandra will return the 
result.

maki

From iPhone


On 2012/04/06, at 1:22, zhiming shen zhiming.s...@gmail.com wrote:

 Thanks for your reply. My question is about the impact of replication on load 
 balancing. Say we have nodes ABCD... in the ring. ReplicationFactor is 3 so 
 the data on A will also have replicas on B and C. If we are reading data own 
 by A, and A is already very busy, will the requests be forwarded to B and C? 
 How about update requests?
 
 
 Thanks,
 
 
 Zhiming
 
 On Thu, Apr 5, 2012 at 12:33 AM, Watanabe Maki watanabe.m...@gmail.com 
 wrote:
 I assume you are talking about nodes, rather than replicas.
 The data distribution over ring depends on Partitioner and Replica placement 
 strategy you use.
 If you are using Random Partitioner and Simple Strategy, your data will be 
 automatically distributed over the nodes in the ring.
 
 maki
 
 
 On 2012/04/05, at 12:31, zhiming shen zhiming.s...@gmail.com wrote:
 
  Hi,
 
  Can any one tell me whether Cassandra can do load balancing across 
  replicas? How to configure it for this purpose? Thanks very much.
 
 
  Best Regards,
 
  Zhiming
 


leveled compaction - improve log message

2012-04-05 Thread Radim Kolar

it would be really helpfull if leveled compaction prints level into syslog.

Demo:

INFO [CompactionExecutor:891] 2012-04-05 22:39:27,043 
CompactionTask.java (line 113) Compacting ***LEVEL 1*** 
[SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19690-Data.db'), 
SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19688-Data.db'), 
SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19691-Data.db'), 
SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19700-Data.db'), 
SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19686-Data.db'), 
SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19696-Data.db'), 
SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19687-Data.db'), 
SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19695-Data.db'), 
SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19689-Data.db'), 
SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19694-Data.db'), 
SSTableReader(path='/var/lib/cassandra/data/rapidshare/querycache-hc-19693-Data.db')]


 INFO [CompactionExecutor:891] 2012-04-05 22:39:57,299 
CompactionTask.java (line 221) *** LEVEL 1 *** Compacted to 
[/var/lib/cassandra/data/rapidshare/querycache-hc-19701-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19702-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19703-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19704-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19705-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19706-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19707-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19708-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19709-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19710-Data.db,/var/lib/cassandra/data/rapidshare/querycache-hc-19711-Data.db,].  
59,643,011 to 57,564,216 (~96% of original) bytes for 590,909 keys at 
1.814434MB/s.  Time: 30,256ms.





Re: Is there a way to update column's TTL only?

2012-04-05 Thread 金剑
We do not have a central point to do such work yet, but it seems this is
only way to do it a little bit more efficiently. Thanks.

Best Regards!

Jian Jin



2012/4/5 aaron morton aa...@thelastpickle.com

 You cannot set the the TTL without also setting the column value.

 Could you keep a record of future deletes in a CF and then action them as
 a batch process ?

 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 5/04/2012, at 2:00 PM, 金剑 wrote:

 Hi,

 We would like to leverage Cassandra's expiration to manage our data's
 lifecycle. We need to: Delete data after a period, like: 1 hour , when user
 clicked the Delete button.

 We need to read and insert the column in order to update the TTL, but this
 is unacceptable in our system that might need to readout gigabytes of data.

 Is there a way to do this?

 Best Regards!

 Jian Jin





Re: really bad select performance

2012-04-05 Thread David Leimbach
But now when you set to 0 that index row will get very wide as it collects
everything completed.  You may want to consider deleting the indexed column
for completed rows when done.

Cassandra is not a great queue to use with built in indexes.  Yo cold write
your own index here and potentially do better.

On Thursday, April 5, 2012, Chris Hart wrote:

 Thanks for all the help everyone.  The values were meant to be binary.  I
 ended making the possible values between 0 and 50 instead of just 0 or 1.
  That way no single index row gets that wide.  I now run queries for
 everything from 1 to 50 to get 'queued' items and set the value to 0 when
 I'm done (I will never query for row_loaded = 0).  It's unfortunate
 Cassandra doesn't delegate the query execution to a node that had the index
 row on it, but rather tries to move the entire index row to the node that
 is queried.

 -Chris

 - Original Message -
 From: David Leimbach leim...@gmail.com javascript:;
 To: user@cassandra.apache.org javascript:;
 Sent: Monday, April 2, 2012 8:51:46 AM
 Subject: Re: really bad select performance


 This is all very hypothetical, but I've been bitten by this before.

 Does row_loaded happen to be a binary or boolean value? If so the
 secondary index generated by Cassandra will have at most 2 rows, and
 they'll be REALLY wide if you have a lot of entries. Since Cassandra
 doesn't distribute columns over rows, those potentially very wide index
 rows, and their replicas, must live in SSTables in their entirety on the
 nodes that own them (and their replicas).


 Even though you limit 1, I'm not sure what behind the scenes things
 Cassandra does. I've received advice to avoid the built in secondary
 indexes in Cassandra for some of these reasons. Also if row_loaded is meant
 to implement some kind of queuing behavior, it could be the wrong problem
 space for Cassandra as a result of all of the above.









 On Sat, Mar 31, 2012 at 12:22 PM, aaron morton  
 aa...@thelastpickle.comjavascript:; wrote:




 Is there anything in the logs when you run the queries ?


 Try turning the logging up to DEBUG on the node that fails to return and
 see what happens. You will see it send messages to other nodes and do work
 itself.

 One thing to note, a query that uses secondary indexes runs on a node for
 each token range. So it will use more than CL number of nodes.


 Cheers







 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com


 On 30/03/2012, at 11:52 AM, Chris Hart wrote:



 Hi,

 I have the following cluster:

 136112946768375385385349842972707284580
 ip address MountainViewRAC1 Up Normal 1.86 GB 20.00% 0
 ip address MountainViewRAC1 Up Normal 2.17 GB 33.33%
 56713727820156410577229101238628035242
 ip address MountainViewRAC1 Up Normal 2.41 GB 33.33%
 113427455640312821154458202477256070485
 ip address Rackspace RAC1 Up Normal 3.9 GB 13.33%
 136112946768375385385349842972707284580

 The following query runs quickly on all nodes except 1 MountainView node:

 select * from Access_Log where row_loaded = 0 limit 1;

 There is a secondary index on row_loaded. The query usually doesn't
 complete (but sometimes does) on the bad node and returns very quickly on
 all other nodes. I've upping the rpc timeout to a full minute
 (rpc_timeout_in_ms: 6) in the yaml, but it still often doesn't complete
 in a minute. It seems just as likely to complete and takes about the same
 amount of time whether the limit is 1, 100 or 1000.


 Thanks for any help,
 Chris