Re: 1000's of CF's.

2012-10-10 Thread Vanger
Main problem that this "sweet spot" is very narrow. We can't have lots 
of CF, we can't have long rows and we end up with enormous amount of 
huge composite row keys and stored metadata about that keys (keep in 
mind overhead on such scheme, but looks like that nobody really cares 
about it anymore). And this approach is bad for running Hadoop jobs on 
it (for now i'm pointing at this as main problem for me right now) and 
for creating secondary indices (lots of rows - high cardinality, 
right?), also some 'per-CF option' could become a limitation factor. And 
bad thing about it - this just doesn't look extendable, you just must 
end up with 'not-so-many' big CFs - that's a dead end. Maybe it wouldn't 
look that bad if you try not to associate CF with any real entity and 
call them 'Random stuff store'.
I just hope that i'm wrong and there's some good compromise between 
three ways of storing data - long rows, many 'very-composite' rows and 
partitioning by CF. Which way is preferable to run complicated analytics 
queries on top of it in fair amount of time? How people handle this?


*/--
W/ best regards,
Sergey.

/*
On 10.10.2012 2:15, Ben Hood wrote:

I'm not a Cassandra dev, so take what I say with a lot of salt, but
AFAICT, there is a certain amount of overhead in maintaining a CF, so
when you have large numbers of CFs, this adds up. From a layperson's
perspective, this observation sounds reasonable, since zero-cost CFs
would be tantamount to being able to implement secondary indexes by
just adding CFs. So instead of paying the for the overhead (or
ineffectiveness of high-cardinaility secondary indexes, which ever way
you want to look at it), you are expecting a free lunch by just
scaling out in terms on new CFs. I would imagine that under the
covers, the layout of Cassandra has a sweet spot of a smallish number
of CFs (i.e. 10s),  but these can practically have as many rows as you
like.

On Mon, Oct 8, 2012 at 11:02 AM, Vanger  wrote:

So what solution should be for cassandra architecture when we need to make
Hadoop M\R jobs and not be restricted by number of CF?
What we have now is fair amount of CFs  (> 2K) and this number is slowly
growing so we already planing to merge partitioned CFs. But our next goal is
to run hadoop tasks on those CFs. All we have is plain Hector and custom ORM
on top of it. As far as i understand VirtualKeyspace doesn't help in our
case.
Also i dont understand why not implement support for many CF ( or build-in
partitioning ) on cassandra side. Anybody can explain why this can or cannot
be done in cassandra?

Just in case:
We're using cassandra 1.0.11 on 30 nodes (planning upgrade on 1.1.* soon).

--
W/ best regards,
Sergey.

On 04.10.2012 0:10, Hiller, Dean wrote:

Okay, so it only took me two solid days not a week.  PlayOrm in master
branch now supports virtual CF's or virtual tables in ONE CF, so you can
have 1000's or millions of virtual CF's in one CF now.  It works with all
the Scalable-SQL, works with the joins, and works with the PlayOrm command
line tool.

Two ways to do it, if you are using the ORM half, you just annotate

@NoSqlEntity("MyVirtualCfName")
@NoSqlVirtualCf(storedInCf="sharedCf")

So it's stored in sharedCf with the table name of MyVirtualCfName(in command
line tool, use MyVirtualCfName to query the table).

Then if you don't know your meta data ahead of time, you need to create
DboTableMeta and DboColumnMeta objects and save them for every table you
create and can use TypedRow to read and persist (which is what we have a
project doing).

If you try it out let me know.  We usually get bug fixes in pretty fast if
you run into anything.  (more and more questions are forming on stack
overflow as well ;) ).

Later,
Dean








Re: 1000's of CF's.

2012-10-08 Thread Vanger
So what solution should be for cassandra architecture when we need to make 
Hadoop M\R jobs and not be restricted by number of CF?
What we have now is fair amount of CFs  (> 2K) and this number is slowly 
growing so we already planing to merge partitioned CFs. But our next goal is to 
run hadoop tasks on those CFs. All we have is plain Hector and custom ORM on 
top of it. As far as i understand VirtualKeyspace doesn't help in our case. 
Also i dont understand why not implement support for many CF ( or build-in  
partitioning ) on cassandra side. Anybody can explain why this can or cannot be 
done in cassandra?

Just in case:
We're using cassandra 1.0.11 on 30 nodes (planning upgrade on 1.1.* soon).

--
W/ best regards, 
Sergey.

On 04.10.2012 0:10, Hiller, Dean wrote:
> Okay, so it only took me two solid days not a week.  PlayOrm in master branch 
> now supports virtual CF's or virtual tables in ONE CF, so you can have 1000's 
> or millions of virtual CF's in one CF now.  It works with all the 
> Scalable-SQL, works with the joins, and works with the PlayOrm command line 
> tool.
> 
> Two ways to do it, if you are using the ORM half, you just annotate
> 
> @NoSqlEntity("MyVirtualCfName")
> @NoSqlVirtualCf(storedInCf="sharedCf")
> 
> So it's stored in sharedCf with the table name of MyVirtualCfName(in command 
> line tool, use MyVirtualCfName to query the table).
> 
> Then if you don't know your meta data ahead of time, you need to create 
> DboTableMeta and DboColumnMeta objects and save them for every table you 
> create and can use TypedRow to read and persist (which is what we have a 
> project doing).
> 
> If you try it out let me know.  We usually get bug fixes in pretty fast if 
> you run into anything.  (more and more questions are forming on stack 
> overflow as well ;) ).
> 
> Later,
> Dean
> 
> 




OOM when applying migrations

2012-09-20 Thread Vanger

Hello,
We are trying to add new nodes to our *6-node* cassandra cluster with 
RF=3 cassandra version 1.0.11. We are *adding 18 new nodes* one-by-one.


First strange thing, I've noticed, is the number of completed 
MigrationStage in nodetool tpstats grows for every new node, while 
schema is not changed. For now with 21-nodes ring, for final join it 
shows 184683 migrations, while with 7-nodes it was about 50k migrations.
In fact it seems that this number is not a number of applied migrations. 
When i grep log file with

grep "Applying migration" /var/log/cassandra/system.log -c
For each new node result is pretty much the same - around 7500 "Applying 
migration" found in log.


And the real problem is that now new nodes fail with Out Of Memory while 
building schema from migrations. In logs we can find the following:


WARN [ScheduledTasks:1] 2012-09-19 18:51:22,497 GCInspector.java (line 
145) Heap is 0.7712290960125684 full.  You may need to reduce memtable 
and/or cache sizes. Cassandra will now flush up to the two largest 
memtables to free up memory.  Adjust flush_largest_memtables_at 
threshold in cassandra.yaml if you don't want Cassandra to do this 
automatically
 INFO [ScheduledTasks:1] 2012-09-19 18:51:22,498 StorageService.java 
(line 2658) Unable to reduce heap usage since there are no dirty column 
families


 WARN [ScheduledTasks:1] 2012-09-19 18:51:29,500 GCInspector.java (line 
139) Heap is 0.853078131310858 full. You may need to reduce memtable 
and/or cache sizes. Cassandra is now reducing cache sizes to free up 
memory. Adjust reduce_cache_sizes_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
 WARN [ScheduledTasks:1] 2012-09-19 18:51:29,500 AutoSavingCache.java 
(line 187) Reducing AppUser RowCache capacity from 10 to 0 to reduce 
memory pressure
 WARN [ScheduledTasks:1] 2012-09-19 18:51:29,500 AutoSavingCache.java 
(line 187) Reducing AppUser KeyCache capacity from 10 to 0 to reduce 
memory pressure
 WARN [ScheduledTasks:1] 2012-09-19 18:51:29,500 AutoSavingCache.java 
(line 187) Reducing PaymentClaim KeyCache capacity from 5 to 0 to 
reduce memory pressure
 WARN [ScheduledTasks:1] 2012-09-19 18:51:29,500 AutoSavingCache.java 
(line 187) Reducing Organization RowCache capacity from 1000 to 0 to 
reduce memory pressure

 .
 INFO [main] 2012-09-19 18:57:14,181 StorageService.java (line 668) 
JOINING: waiting for schema information to complete
ERROR [Thread-28] 2012-09-19 18:57:14,198 AbstractCassandraDaemon.java 
(line 139) Fatal exception in thread Thread[Thread-28,5,main]

java.lang.OutOfMemoryError: Java heap space
at 
org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:140)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:115)

...
ERROR [ReadStage:353] 2012-09-19 18:57:20,453 
AbstractCassandraDaemon.java (line 139) Fatal exception in thread 
Thread[ReadStage:353,5,main]

java.lang.OutOfMemoryError: Java heap space
at 
org.apache.cassandra.service.MigrationManager.makeColumns(MigrationManager.java:256)
at 
org.apache.cassandra.db.DefinitionsUpdateVerbHandler.doVerb(DefinitionsUpdateVerbHandler.java:51)
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)



Originally "max heap size" was set to 6G. Then we increased heap size 
limit to 8G and it works. But warnings still present


 WARN [ScheduledTasks:1] 2012-09-20 11:39:11,373 GCInspector.java (line 
145) Heap is 0.7760745735786222 full. You may need to reduce memtable 
and/or cache sizes.  Cassandra will now flush up to the two largest 
memtables to free up memory.  Adjust flush_largest_memtables_at 
threshold in cassandra.yaml if you don't want Cassandra to do this 
automatically
 INFO [ScheduledTasks:1] 2012-09-20 11:39:11,374 StorageService.java 
(line 2658) Unable to reduce heap usage since there are no dirty column 
families


It is probably a bug in applying migrations.
Could anyone explain why cassandra behaves this way? Could you please 
recommend us smth to cope with this situation?

Thank you in advance.

--
W/ best regards,
Sergey B.



Moving to 1.1

2012-05-30 Thread Vanger
I didn't track mailing list since 1.1-rc is out and know i have several 
questions.


1) We want to upgrade from 1.09. How stable 1.1 is? I mean work under 
high load, running compactions and clean-ups? Is it faster then 1.09?


2) If i what to use hector as cassandra client which version is better 
for 1.1? Is it ok to use "0.8.0-3"?
We're kind of stuck on this hector release because new versions support 
serialization of Doubles (and some other types, but doubles are 50% of 
data). So we can't read old data: double values were serialized as 
objects and can't be deserialized as double.
We can override default serializer by it's older version and keep 
working with serialized objects... but it looks rather stupid. Did 
anyone run into such problem?
And i didn't find any change lists for hector - so such backward 
incompatibility was quite a surprise. Anybody knows some other breaking 
changes from 0.8.0-3?


3) Java 7 now recommended for use by Oracle. We have several developers 
running local cassandra instances on it for a while without problems. 
Anybody tried it in production? Some time ago java 7 wasn't recommended 
for use with cassandra, what's for now?



p.s. sorry for my 'english'

Thanks,
Sergey B.


Re: Adding node to Cassandra

2012-03-12 Thread Vanger

Cassandra v1.0.8
once again: 4-nodes cluster, RF = 3.


On 12.03.2012 16:18, Rustam Aliyev wrote:

What version of Cassandra do you have?

On 12/03/2012 11:38, Vanger wrote:
We were aware of compaction overhead, but still don't understand why 
that shall happened: node 'D' was in stable condition, works for at 
least month, had all data for its token range and was comfortable 
with such disk space.
Why suddenly node needs 2x more space for data it already have? Why 
decreasing token range not lead to decreasing disk usage?


On 12.03.2012 15:14, Rustam Aliyev wrote:

Hi,

If you use SizeTieredCompactionStrategy, you should have x2 disk 
space to be on the safe side. So if you want to store 2TB data, you 
need partition size of 4TB at least.  LeveledCompactionStrategy is 
available in 1.x and supposed to require less free disk space (but 
comes at price of I/O).


--
Rustam.

On 12/03/2012 09:23, Vanger wrote:
*We have cassandra 4 nodes cluster* with RF = 3 (nodes named from 
'A' to 'D', initial tokens:

*A (25%)*: 20543402371996174596346065790779111550, *
B (25%)*: 63454860067234500516210522518260948578,
*C (25%)*: 106715317233367107622067286720208938865,
*D (25%)*: 150141183460469231731687303715884105728),
*and want to add 5th node* ('E') with initial token = 
164163260474281062972548100673162157075,  then we want to rebalance 
A, D, E nodes such way they'll own equal percentage of data. All 
nodes have ~400 GB of data and around ~300GB disk free space.

What we did:
1. 'Join' new cassandra instance (node 'E') to cluster and wait 
'till it loads data for it tokens range.


2. Move node 'D' initial token down from 150... to 130...
Here we ran into a problem. When "move" started disk usage for node 
C grows from 400 to 750GB, we saw running compactions on node 'D' 
but some compactions failed with /"WARN [CompactionExecutor:580] 
2012-03-11 16:57:56,036 CompactionTask.java (line 87) insufficient 
space to compact all requested files SSTableReader"/ after that we 
killed "move" process to avoid "out of disk space" error (when 5GB 
of free space left). After restart it frees 100GB of space and now 
we have total of 105GB free disk space on node 'D'. Also we noticed 
increased disk usage by ~150GB at node 'B' but it stops growing 
before we stopped "move token".



So now we have 5 nodes in cluster in status like this:
Node, Owns%, Load, Init. token
A: 16%   400GB020...
B: 25%   520GB063...
C: 25%   400GB106...
D: 25%   640GB150...
E:  9% 300GB164...

We'll add disk space for all nodes and run some cleanups, but 
there's still left some questions:


What is the best next step  for us from this point?
What is correct procedure after all and what should we expect when 
adding node to cassandra cluster?
We expected decrease of used disk space on node 'D' 'cause we 
shrink token range for this node, but saw the opposite, why it 
happened and is it normal behavior?
What if we'll have 2TB of data on 2.5TB disk and we wanted to add 
another node and move tokens?
Is it possible to automate node addition to cluster and be sure we 
won't run out of space?


Thank.






Re: Adding node to Cassandra

2012-03-12 Thread Vanger
We were aware of compaction overhead, but still don't understand why 
that shall happened: node 'D' was in stable condition, works for at 
least month, had all data for its token range and was comfortable with 
such disk space.
Why suddenly node needs 2x more space for data it already have? Why 
decreasing token range not lead to decreasing disk usage?


On 12.03.2012 15:14, Rustam Aliyev wrote:

Hi,

If you use SizeTieredCompactionStrategy, you should have x2 disk space 
to be on the safe side. So if you want to store 2TB data, you need 
partition size of 4TB at least.  LeveledCompactionStrategy is 
available in 1.x and supposed to require less free disk space (but 
comes at price of I/O).


--
Rustam.

On 12/03/2012 09:23, Vanger wrote:
*We have cassandra 4 nodes cluster* with RF = 3 (nodes named from 'A' 
to 'D', initial tokens:

*A (25%)*: 20543402371996174596346065790779111550, *
B (25%)*: 63454860067234500516210522518260948578,
*C (25%)*: 106715317233367107622067286720208938865,
*D (25%)*: 150141183460469231731687303715884105728),
*and want to add 5th node* ('E') with initial token = 
164163260474281062972548100673162157075,  then we want to rebalance 
A, D, E nodes such way they'll own equal percentage of data. All 
nodes have ~400 GB of data and around ~300GB disk free space.

What we did:
1. 'Join' new cassandra instance (node 'E') to cluster and wait 'till 
it loads data for it tokens range.


2. Move node 'D' initial token down from 150... to 130...
Here we ran into a problem. When "move" started disk usage for node C 
grows from 400 to 750GB, we saw running compactions on node 'D' but 
some compactions failed with /"WARN [CompactionExecutor:580] 
2012-03-11 16:57:56,036 CompactionTask.java (line 87) insufficient 
space to compact all requested files SSTableReader"/ after that we 
killed "move" process to avoid "out of disk space" error (when 5GB of 
free space left). After restart it frees 100GB of space and now we 
have total of 105GB free disk space on node 'D'. Also we noticed 
increased disk usage by ~150GB at node 'B' but it stops growing 
before we stopped "move token".



So now we have 5 nodes in cluster in status like this:
Node, Owns%, Load, Init. token
A: 16%   400GB020...
B: 25%   520GB063...
C: 25%   400GB106...
D: 25%   640GB150...
E:  9% 300GB164...

We'll add disk space for all nodes and run some cleanups, but there's 
still left some questions:


What is the best next step  for us from this point?
What is correct procedure after all and what should we expect when 
adding node to cassandra cluster?
We expected decrease of used disk space on node 'D' 'cause we shrink 
token range for this node, but saw the opposite, why it happened and 
is it normal behavior?
What if we'll have 2TB of data on 2.5TB disk and we wanted to add 
another node and move tokens?
Is it possible to automate node addition to cluster and be sure we 
won't run out of space?


Thank.




Adding node to Cassandra

2012-03-12 Thread Vanger
*We have cassandra 4 nodes cluster* with RF = 3 (nodes named from 'A' to 
'D', initial tokens:

*A (25%)*: 20543402371996174596346065790779111550, *
B (25%)*: 63454860067234500516210522518260948578,
*C (25%)*: 106715317233367107622067286720208938865,
*D (25%)*: 150141183460469231731687303715884105728),
*and want to add 5th node* ('E') with initial token = 
164163260474281062972548100673162157075,  then we want to rebalance A, 
D, E nodes such way they'll own equal percentage of data. All nodes have 
~400 GB of data and around ~300GB disk free space.

What we did:
1. 'Join' new cassandra instance (node 'E') to cluster and wait 'till it 
loads data for it tokens range.


2. Move node 'D' initial token down from 150... to 130...
Here we ran into a problem. When "move" started disk usage for node C 
grows from 400 to 750GB, we saw running compactions on node 'D' but some 
compactions failed with /"WARN [CompactionExecutor:580] 2012-03-11 
16:57:56,036 CompactionTask.java (line 87) insufficient space to compact 
all requested files SSTableReader"/ after that we killed "move" process 
to avoid "out of disk space" error (when 5GB of free space left). After 
restart it frees 100GB of space and now we have total of 105GB free disk 
space on node 'D'. Also we noticed increased disk usage by ~150GB at 
node 'B' but it stops growing before we stopped "move token".



So now we have 5 nodes in cluster in status like this:
Node, Owns%, Load, Init. token
A: 16%   400GB020...
B: 25%   520GB063...
C: 25%   400GB106...
D: 25%   640GB150...
E:  9% 300GB164...

We'll add disk space for all nodes and run some cleanups, but there's 
still left some questions:


What is the best next step  for us from this point?
What is correct procedure after all and what should we expect when 
adding node to cassandra cluster?
We expected decrease of used disk space on node 'D' 'cause we shrink 
token range for this node, but saw the opposite, why it happened and is 
it normal behavior?
What if we'll have 2TB of data on 2.5TB disk and we wanted to add 
another node and move tokens?
Is it possible to automate node addition to cluster and be sure we won't 
run out of space?


Thank.


Division by zero

2012-03-05 Thread Vanger

After upgrading from version 1.0.1 to 1.0.8  we started to get exception:

ERROR [http-8095-1 WideEntityServiceImpl.java:142] - get: key1 - 
{type=RANGE, start=0, end=9223372036854775807, orderDesc=false, limit=1}
me.prettyprint.hector.api.exceptions.HCassandraInternalException: 
Cassandra encountered an internal error processing this request: 
TApplicationError type: 6 message:Internal error processing get_slice
at 
me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:31)
at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl$7.execute(KeyspaceServiceImpl.java:285)
at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl$7.execute(KeyspaceServiceImpl.java:268)
at 
me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101)
at 
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:233)
at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131)
at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl.getSlice(KeyspaceServiceImpl.java:289)
at 
me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(ThriftSliceQuery.java:53)
at 
me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(ThriftSliceQuery.java:49)
at 
me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)
at 
me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85)
at 
me.prettyprint.cassandra.model.thrift.ThriftSliceQuery.execute(ThriftSliceQuery.java:48)



I already (not too soon?) created an issue in jira with more detailed 
description:

https://issues.apache.org/jira/browse/CASSANDRA-4000

Any ideas?

Thanks.