AW: cant seem to figure out secondary index definition

2011-02-15 Thread Roland Gude
Thanks, it works.

roland

Von: Michal Augustýn [mailto:augustyn.mic...@gmail.com]
Gesendet: Dienstag, 15. Februar 2011 16:22
An: user@cassandra.apache.org
Betreff: Re: cant seem to figure out secondary index definition

Ah, ok. I checked that in source and the problem is that you wrote 
"validation_class" but you should "validator_class".

Augi
2011/2/15 Roland Gude 
mailto:roland.g...@yoochoose.com>>
Yeah i know about that, but the definition i have is for a cluster that is 
started/stopped from a unit test with hector embeddedServerHelper, which takes 
definitions from the yaml.
So i'd still like to define the index in the yaml file (it should very well be 
possible I guess)


Von: Michal Augustýn 
[mailto:augustyn.mic...@gmail.com]
Gesendet: Dienstag, 15. Februar 2011 15:53
An: user@cassandra.apache.org
Betreff: Re: cant seem to figure out secondary index definition

Hi,

if you download Cassandra and look into "conf/cassandra.yaml" then you can see 
this:

"this keyspace definition is for demonstration purposes only. Cassandra will 
not load these definitions during startup. See 
http://wiki.apache.org/cassandra/FAQ#no_keyspaces for an explanation."

So you should make all schema-related operation via Thrift/AVRO API, or you can 
use Cassandra CLI.

Augi

2011/2/15 Roland Gude 
mailto:roland.g...@yoochoose.com>>
Hi,

i am a little puzzled on creation of secondary indexes and the docs in that 
area are still very sparse.
What I am trying to do is - in a columnfamily with TimeUUID comparator, I want 
the "special" timeuuid --1000-- to be indexed. The 
value being some UTF8 string on which I want to perform equality checks.

What do I need to put in my cassandra.yaml file?
Something like this?

  - column_metadata: [{name: --1000--, 
validation_class: UTF8Type, index_name: MyIndex, index_type: KEYS}]

This gives me that error:

15:05:12.492 [pool-1-thread-1] ERROR o.a.c.config.DatabaseDescriptor - Fatal 
error: null; Can't construct a java object for 
tag:yaml.org,2002:org.apache.cassandra.config.Config; 
exception=Cannot create property=keyspaces for 
JavaBean=org.apache.cassandra.config.Config@7eb6e2; Cannot create 
property=column_families for 
JavaBean=org.apache.cassandra.config.RawKeyspace@987a33; Cannot create 
property=column_metadata for 
JavaBean=org.apache.cassandra.config.RawColumnFamily@716cb7; Cannot create 
property=validation_class for 
JavaBean=org.apache.cassandra.config.RawColumnDefinition@e29820; Unable to find 
property 'validation_class' on class: 
org.apache.cassandra.config.RawColumnDefinition
Bad configuration; unable to start server


I am furthermor uncertain if the column name will be correctly used if given 
like this. Should I put the byte representation of the uuid there?

Greetings,
roland
--
YOOCHOOSE GmbH

Roland Gude
Software Engineer

Im Mediapark 8, 50670 Köln

+49 221 4544151 (Tel)
+49 221 4544159 (Fax)
+49 171 7894057 (Mobil)


Email: roland.g...@yoochoose.com
WWW: www.yoochoose.com

YOOCHOOSE GmbH
Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann
Handelsregister: Amtsgericht Köln HRB 65275
Ust-Ident-Nr: DE 264 773 520
Sitz der Gesellschaft: Köln





Re: latest rows

2011-02-15 Thread Tyler Hobbs
>
> But wouldn't using timestamp as row keys cause conflicts?
>

Depending on client behavior, yes.  If that's an issue for you, make your
own UUIDs by appending something random or client-specific to the timestamp.

-- 
Tyler Hobbs
Software Engineer, DataStax 
Maintainer of the pycassa  Cassandra
Python client library


Hinted Handoff/GC Tuning Headache

2011-02-15 Thread Chris Baron
Recently upgraded my 8 node cluster from 0.6.6 to 0.7.0 (even more recently 
0.7.1) for ExpiringColumn, among the many other spectacular improvements.
 
Retuned the GC settings based on experience from 0.6.6 and new defaults.
 
After about a week, two of the nodes were very far behind on minor compactions 
(2k+ SSTables per CF and growing, 20k+ pending compactions).  The SSTable 
switch rate on these two nodes was about 10x higher than the other nodes.  I 
also observed rolling long pause deaths (Gossip saying node X is dead), 
seemingly every three minutes one of the nodes would long pause GC.  I saw this 
behavior also when I upgraded from 0.6.6 to 0.6.8, but I rolled back to 0.6.6 
because time did not allow for a deeper observation at that time. (found this: 
https://issues.apache.org/jira/browse/CASSANDRA-1656)
 
I eventually traced this behavior back to a nasty interaction between Hinted 
Handoff and GC tuned for normal operating conditions.  
 
If I understand the code correctly, when a node replays a hint it reads the 
hinted data directly from the application tables (read: my ColumnFamily).  If 
the replaying node happens to be to also be a replica it will resend the entire 
row, even if only one column was mutated.  Because of the rolling GC pause 
deaths the HHs rarely succeeded and if they did it wasn’t long before a new set 
of hints were recorded.
 
Disabling Hinted Handoffs has fixed this problem, for me.
 
Looking into intermittent GC issues further, the verbose gc log showed ParNew 
promotion failures, so I conservatively lowered CMSInitiatingOccupancyFraction, 
MAX_NEWSIZE, and in_memory_compaction_limit_in_mb.  I’m now seeing long CMS 
times (8000ms+) but no failures, which leads me to believe 6G heap may be too 
large based on the current tuning.
 
It’s worth noting that I saw no increase in ColumnFamily WriteCount or 
StorageProxy.WriteOperations, only ColumnFamily MemtableColumnsCount and 
MemtableDataSize were increasing very rapidly on the target node while 
HintedHandoffs were replaying.

--
Chris

Re: Unavalible Exception

2011-02-15 Thread ruslan usifov
Can this be as result of compacting?

2011/2/16 ruslan usifov 

>
>
> 2011/2/5 Jonathan Ellis 
>
> Start with "grep -i down system.log" on each machine
>>
>> I grep all machines but nothing found
>


Re: Patterns for writing enterprise applications on cassandra

2011-02-15 Thread buddhasystem

FWIW,

we'll keep RDBMS for transactional data, and Cassandra will be used for
referential data (browsing history and data mining). Horses for courses.

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Patterns-for-writing-enterprise-applications-on-cassandra-tp6030077p6030436.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Another EOFException

2011-02-15 Thread Dan Washusen
I'm seeing this as well; several column families with keys_cached = 0 on 0.7.1.

Debug level logs: http://pastebin.com/qvujKDth

-- 
Dan Washusen
On Wednesday, 16 February 2011 at 1:12 PM, Jonathan Ellis wrote: 
> Created https://issues.apache.org/jira/browse/CASSANDRA-2172.
> 
> On Tue, Feb 15, 2011 at 3:34 PM, B. Todd Burruss  wrote:
> > it happens when i start the node. just tried it again. here's the
> > saved_caches directory:
> > 
> > 
> > [cassandra@kv-app02 ~]$ ls -l /data/cassandra-data/saved_caches/
> > total 12
> > -rw-rw-r-- 1 cassandra cassandra 0 Feb 15 10:36
> > NotificationSystem-Events-KeyCache
> > -rw-rw-r-- 1 cassandra cassandra 0 Feb 15 10:36
> > NotificationSystem-Msgs-KeyCache
> > -rw-rw-r-- 1 cassandra cassandra 0 Feb 15 10:36
> > NotificationSystem-Rendered-KeyCache
> > -rw-rw-r-- 1 cassandra cassandra 0 Feb 15 10:36
> > NotificationSystem-ScheduledMsgs-KeyCache
> > -rw-rw-r-- 1 cassandra cassandra 0 Feb 15 10:36
> > NotificationSystem-ScheduledTimes-KeyCache
> > -rw-rw-r-- 1 cassandra cassandra 0 Feb 15 10:36
> > NotificationSystem-SystemState-KeyCache
> > -rw-rw-r-- 1 cassandra cassandra 0 Feb 15 10:36
> > NotificationSystem-Templates-KeyCache
> > -rw-rw-r-- 1 cassandra cassandra 0 Feb 15 10:36
> > NotificationSystem-Transports-KeyCache
> > -rw-rw-r-- 1 cassandra cassandra 0 Feb 15 10:36
> > Queues-EmailTransport_Pending-KeyCache
> > -rw-rw-r-- 1 cassandra cassandra 0 Feb 15 10:36
> > Queues-EmailTransport_Waiting-KeyCache
> > -rw-rw-r-- 1 cassandra cassandra 0 Feb 15 10:36
> > Queues-Errors_Pending-KeyCache
> > -rw-rw-r-- 1 cassandra cassandra 0 Feb 15 10:36
> > Queues-Errors_Waiting-KeyCache
> > -rw-rw-r-- 1 cassandra cassandra 0 Feb 15 10:36
> > Queues-MessageDescriptors-KeyCache
> > -rw-rw-r-- 1 cassandra cassandra 0 Feb 15 10:36
> > Queues-PipeDescriptors-KeyCache
> > -rw-rw-r-- 1 cassandra cassandra 0 Feb 15 10:36
> > Queues-Processing_Pending-KeyCache
> > -rw-rw-r-- 1 cassandra cassandra 0 Feb 15 10:36
> > Queues-Processing_Waiting-KeyCache
> > -rw-rw-r-- 1 cassandra cassandra 0 Feb 15 10:36
> > Queues-QueueDescriptors-KeyCache
> > -rw-rw-r-- 1 cassandra cassandra 0 Feb 15 10:36
> > Queues-QueuePipeCnxn-KeyCache
> > -rw-rw-r-- 1 cassandra cassandra 0 Feb 15 10:36 Queues-QueueStats-KeyCache
> > -rw-rw-r-- 1 cassandra cassandra 38 Feb 15 09:36
> > system-HintsColumnFamily-KeyCache
> > -rw-rw-r-- 1 cassandra cassandra 0 Feb 15 09:36 system-IndexInfo-KeyCache
> > -rw-rw-r-- 1 cassandra cassandra 5 Feb 15 09:36
> > system-LocationInfo-KeyCache
> > -rw-rw-r-- 1 cassandra cassandra 0 Feb 15 09:36 system-Migrations-KeyCache
> > -rw-rw-r-- 1 cassandra cassandra 18 Feb 15 09:36 system-Schema-KeyCache
> > -rw-rw-r-- 1 cassandra cassandra 0 Feb 15 09:36
> > UDS4Profile-ProfileDefinitions-KeyCache
> > -rw-rw-r-- 1 cassandra cassandra 0 Feb 15 09:36
> > UDS4Profile-ProfileNamespaces-KeyCache
> > -rw-rw-r-- 1 cassandra cassandra 0 Feb 15 09:36
> > UDS4Profile-Profiles_40229-KeyCache
> > -rw-rw-r-- 1 cassandra cassandra 0 Feb 15 09:36
> > UDS4Profile-Profiles_RN_test-KeyCache
> > 
> > 
> > 
> > On 02/15/2011 01:01 PM, Jonathan Ellis wrote:
> > > 
> > > Is this reproducible or just "I happened to kill the server while it
> > > was in the middle of writing out the cache keys?"
> > > 
> > > On Tue, Feb 15, 2011 at 1:10 PM, B. Todd Burruss
> > > wrote:
> > > > 
> > > > the following exception seems to be about loading saved caches, but i
> > > > don't
> > > > really care about the cache so maybe isn't a big deal. anyway, this is
> > > > with
> > > > patched 0.7.1 (0001-Fix-bad-signed-conversion-from-byte-to-int.patch)
> > > > 
> > > > 
> > > > WARN 11:07:59,800 error reading saved cache
> > > > /data/cassandra-data/saved_caches/UDS4Profile-Profiles_40229-KeyCache
> > > > java.io.EOFException
> > > > at
> > > > 
> > > > java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2281)
> > > > at
> > > > 
> > > > java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2750)
> > > > at
> > > > java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:780)
> > > > at java.io.ObjectInputStream.(ObjectInputStream.java:280)
> > > > at
> > > > 
> > > > org.apache.cassandra.db.ColumnFamilyStore.readSavedCache(ColumnFamilyStore.java:255)
> > > > at
> > > > 
> > > > org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:198)
> > > > at
> > > > 
> > > > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:451)
> > > > at
> > > > 
> > > > org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:432)
> > > > at org.apache.cassandra.db.Table.initCf(Table.java:360)
> > > > at org.apache.cassandra.db.Table.(Table.java:290)
> > > > at org.apache.cassandra.db.Table.open(Table.java:107)
> > > > at
> > > > 
> > > > org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:162)
> > > > at
> > > > 
> > > > org.apache.cassandra.service.AbstractCassandraDaemon.acti

Re: Patterns for writing enterprise applications on cassandra

2011-02-15 Thread Zhongwei Sun
Is there any Python implementation for transaction?


2011/2/16 Gaurav Sharma :
> Enterprise applications is a very broad topic. There's no one answer for 
> every type.
>
> You specifically mention a transactional scenario. For that, I can recommend 
> you look at Cages (http://code.google.com/p/cages) if you haven't already.
>
> On Feb 15, 2011, at 19:45, Ritesh Tijoriwala  
> wrote:
>
>> Hi,
>> I have general questions on writing enterprise applications on cassandra. I 
>> come from a background which involves writing enterprise applications using 
>> DBMS.
>>
>> What are the general patterns people follow in Cassandra world when 
>> migrating a code that is within transaction boundaries in a traditional DBMS 
>> application? for e.g. transfer $5 from account A to account B. The code 
>> would normally look like:
>>
>>         beginXT
>>         try {
>>                   A = A - $5;
>>                   B = B + $5;
>>                   commitXT;
>>         } catch () {
>>                   rollbackXT;
>>         }
>>
>> The effect of this is that either both statements execute, or none. The sum 
>> of account balances remain constant. How does one deal with this type of 
>> code when writing on top of Cassandra? I understand that consistency will be 
>> eventual and its fine that eventually, sum of both account balances remain 
>> constant but how to detect that a transaction failed and only step "A = A - 
>> $5" has executed and the later step has not been executed?
>>
>> Are there any sample applications out there where I can browse code and see 
>> how it is written? For e.g. customer purchase order application, etc. which 
>> atleast involves some concept of transaction and has code to keep things 
>> consistent.
>>
>> Thanks,
>> Ritesh
>


Re: Unavalible Exception

2011-02-15 Thread ruslan usifov
2011/2/5 Jonathan Ellis 

> Start with "grep -i down system.log" on each machine
>
> I grep all machines but nothing found


Re: Patterns for writing enterprise applications on cassandra

2011-02-15 Thread Gaurav Sharma
Enterprise applications is a very broad topic. There's no one answer for every 
type.

You specifically mention a transactional scenario. For that, I can recommend 
you look at Cages (http://code.google.com/p/cages) if you haven't already.

On Feb 15, 2011, at 19:45, Ritesh Tijoriwala  
wrote:

> Hi,
> I have general questions on writing enterprise applications on cassandra. I 
> come from a background which involves writing enterprise applications using 
> DBMS.
> 
> What are the general patterns people follow in Cassandra world when migrating 
> a code that is within transaction boundaries in a traditional DBMS 
> application? for e.g. transfer $5 from account A to account B. The code would 
> normally look like:
> 
> beginXT
> try {
>   A = A - $5;
>   B = B + $5;
>   commitXT;
> } catch () {
>   rollbackXT;
> }
> 
> The effect of this is that either both statements execute, or none. The sum 
> of account balances remain constant. How does one deal with this type of code 
> when writing on top of Cassandra? I understand that consistency will be 
> eventual and its fine that eventually, sum of both account balances remain 
> constant but how to detect that a transaction failed and only step "A = A - 
> $5" has executed and the later step has not been executed? 
> 
> Are there any sample applications out there where I can browse code and see 
> how it is written? For e.g. customer purchase order application, etc. which 
> atleast involves some concept of transaction and has code to keep things 
> consistent.
> 
> Thanks,
> Ritesh


Re: Dropping & Creating Column Families Never Returns

2011-02-15 Thread William R Speirs
What would/could take so long for the nodes to agree? It's a small cluster (7 
nodes) all on local LAN and not being used by anything else.


I think a delete & refresh might be in order...

Thanks!

Bill-

On 02/15/2011 09:13 PM, Jonathan Ellis wrote:

"command never returns" means "it's waiting for the nodes to agree on
the new schema version."  Bad Mojo will ensue if you issue more schema
updates anyway.

On Tue, Feb 15, 2011 at 3:46 PM, Bill Speirs  wrote:

Has anyone ever tried to drop a column family and/or create one and
have the command not return from the cli? I'm using 0.7.1 and I tried
to drop a column family and the command never returned. However, on
another node it showed it was gone. I Ctrl-C out of the command, then
issued a create for a column family of the same name, different
schema. That command never returned, but again in other host it showed
it was there. I went to describe and list this column family and got
this:

[default@Logging] describe keyspace Logging;
Keyspace: Logging:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
Replication Factor: 3
  Column Families:
ColumnFamily: Messages
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period: 0.0/0
  Key cache size / save period: 20.0/14400
  Memtable thresholds: 0.5953125/127/60
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Built indexes: []
[default@Logging] list Messages;
Messages not found in current keyspace.


Any ideas?

Bill-







Re: Dropping & Creating Column Families Never Returns

2011-02-15 Thread Jonathan Ellis
"command never returns" means "it's waiting for the nodes to agree on
the new schema version."  Bad Mojo will ensue if you issue more schema
updates anyway.

On Tue, Feb 15, 2011 at 3:46 PM, Bill Speirs  wrote:
> Has anyone ever tried to drop a column family and/or create one and
> have the command not return from the cli? I'm using 0.7.1 and I tried
> to drop a column family and the command never returned. However, on
> another node it showed it was gone. I Ctrl-C out of the command, then
> issued a create for a column family of the same name, different
> schema. That command never returned, but again in other host it showed
> it was there. I went to describe and list this column family and got
> this:
>
> [default@Logging] describe keyspace Logging;
> Keyspace: Logging:
>  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>    Replication Factor: 3
>  Column Families:
>    ColumnFamily: Messages
>      Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>      Row cache size / save period: 0.0/0
>      Key cache size / save period: 20.0/14400
>      Memtable thresholds: 0.5953125/127/60
>      GC grace seconds: 864000
>      Compaction min/max thresholds: 4/32
>      Read repair chance: 1.0
>      Built indexes: []
> [default@Logging] list Messages;
> Messages not found in current keyspace.
>
>
> Any ideas?
>
> Bill-
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Another EOFException

2011-02-15 Thread Jonathan Ellis
Created https://issues.apache.org/jira/browse/CASSANDRA-2172.

On Tue, Feb 15, 2011 at 3:34 PM, B. Todd Burruss  wrote:
> it happens when i start the node.  just tried it again.  here's the
> saved_caches directory:
>
>
> [cassandra@kv-app02 ~]$ ls -l /data/cassandra-data/saved_caches/
> total 12
> -rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36
> NotificationSystem-Events-KeyCache
> -rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36
> NotificationSystem-Msgs-KeyCache
> -rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36
> NotificationSystem-Rendered-KeyCache
> -rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36
> NotificationSystem-ScheduledMsgs-KeyCache
> -rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36
> NotificationSystem-ScheduledTimes-KeyCache
> -rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36
> NotificationSystem-SystemState-KeyCache
> -rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36
> NotificationSystem-Templates-KeyCache
> -rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36
> NotificationSystem-Transports-KeyCache
> -rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36
> Queues-EmailTransport_Pending-KeyCache
> -rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36
> Queues-EmailTransport_Waiting-KeyCache
> -rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36
> Queues-Errors_Pending-KeyCache
> -rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36
> Queues-Errors_Waiting-KeyCache
> -rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36
> Queues-MessageDescriptors-KeyCache
> -rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36
> Queues-PipeDescriptors-KeyCache
> -rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36
> Queues-Processing_Pending-KeyCache
> -rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36
> Queues-Processing_Waiting-KeyCache
> -rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36
> Queues-QueueDescriptors-KeyCache
> -rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36
> Queues-QueuePipeCnxn-KeyCache
> -rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36 Queues-QueueStats-KeyCache
> -rw-rw-r-- 1 cassandra cassandra 38 Feb 15 09:36
> system-HintsColumnFamily-KeyCache
> -rw-rw-r-- 1 cassandra cassandra  0 Feb 15 09:36 system-IndexInfo-KeyCache
> -rw-rw-r-- 1 cassandra cassandra  5 Feb 15 09:36
> system-LocationInfo-KeyCache
> -rw-rw-r-- 1 cassandra cassandra  0 Feb 15 09:36 system-Migrations-KeyCache
> -rw-rw-r-- 1 cassandra cassandra 18 Feb 15 09:36 system-Schema-KeyCache
> -rw-rw-r-- 1 cassandra cassandra  0 Feb 15 09:36
> UDS4Profile-ProfileDefinitions-KeyCache
> -rw-rw-r-- 1 cassandra cassandra  0 Feb 15 09:36
> UDS4Profile-ProfileNamespaces-KeyCache
> -rw-rw-r-- 1 cassandra cassandra  0 Feb 15 09:36
> UDS4Profile-Profiles_40229-KeyCache
> -rw-rw-r-- 1 cassandra cassandra  0 Feb 15 09:36
> UDS4Profile-Profiles_RN_test-KeyCache
>
>
>
> On 02/15/2011 01:01 PM, Jonathan Ellis wrote:
>>
>> Is this reproducible or just "I happened to kill the server while it
>> was in the middle of writing out the cache keys?"
>>
>> On Tue, Feb 15, 2011 at 1:10 PM, B. Todd Burruss
>>  wrote:
>>>
>>> the following exception seems to be about loading saved caches, but i
>>> don't
>>> really care about the cache so maybe isn't a big deal.  anyway, this is
>>> with
>>> patched 0.7.1 (0001-Fix-bad-signed-conversion-from-byte-to-int.patch)
>>>
>>>
>>> WARN 11:07:59,800 error reading saved cache
>>> /data/cassandra-data/saved_caches/UDS4Profile-Profiles_40229-KeyCache
>>> java.io.EOFException
>>>    at
>>>
>>> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2281)
>>>    at
>>>
>>> java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2750)
>>>    at
>>> java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:780)
>>>    at java.io.ObjectInputStream.(ObjectInputStream.java:280)
>>>    at
>>>
>>> org.apache.cassandra.db.ColumnFamilyStore.readSavedCache(ColumnFamilyStore.java:255)
>>>    at
>>>
>>> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:198)
>>>    at
>>>
>>> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:451)
>>>    at
>>>
>>> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:432)
>>>    at org.apache.cassandra.db.Table.initCf(Table.java:360)
>>>    at org.apache.cassandra.db.Table.(Table.java:290)
>>>    at org.apache.cassandra.db.Table.open(Table.java:107)
>>>    at
>>>
>>> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:162)
>>>    at
>>>
>>> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:316)
>>>    at
>>> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:79)
>>>
>>>
>>
>>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: latest rows

2011-02-15 Thread Tan Yeh Zheng
But wouldn't using timestamp as row keys cause conflicts?
On Tue, 2011-02-15 at 19:11 -0600, Tyler Hobbs wrote:
> 
> What is the best way to retrieve the latest rows from a CF
> with OPP.
> 
> Use inverted timestamps (for example, 2^64 - timestamp) with zeros for
> padding as the row keys.
> 
> This way you can do a normal forward range scan and get the N latest
> rows.
> 
> -- 
> Tyler Hobbs
> Software Engineer, DataStax
> Maintainer of the pycassa Cassandra Python client library
> 

-- 
Best Regards,

Tan Yeh Zheng
Software Programmer

 ChartNexus® :: Chart Your Success 

ChartNexus Pte. Ltd.

15 Enggor Street #10-01
Realty Center
Singapore 079716
Tel:  (65) 6491 1456
Website: www.chartnexus.com

Disclaimer:
This email is confidential and intended only for the use of the
individual or individuals named above and may contain information that
is privileged. If you are not the intended recipient, you are notified
that any dissemination, distribution or copying of this email is
strictly prohibited.



Re: Subscribe

2011-02-15 Thread Victor Kabdebon
Looks like your wish has been granted.

2011/2/15 Chris Goffinet 

> I would like to subscribe to your newsletter.
>
> On Tue, Feb 15, 2011 at 8:04 AM, A J  wrote:
>
>>
>>
>


Re: Subscribe

2011-02-15 Thread Chris Goffinet
I would like to subscribe to your newsletter.

On Tue, Feb 15, 2011 at 8:04 AM, A J  wrote:

>
>


Re: Data distribution

2011-02-15 Thread mcasandra

HH is one aspect and the other aspect is when new node join there need to be
some balancing that need to occur, this may take time as well.

But I also understand it will add lot of complexity in the code.

Is there any place where I can read other things of concern that one should
be aware of?
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-distribution-tp6025869p6030157.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: What is the most solid version of Cassandra? No secondary indexes needed.

2011-02-15 Thread Robert Coli
On Tue, Feb 15, 2011 at 11:40 AM, buddhasystem  wrote:
> So, if I don't need indexes, what is the most stable, reliable version of
> Cassandra that I can put in production? I'm seeing bug reports here and some
> sound quite serious, I just want something that works day in, day out.

Note : the following is my opinion only, and likely does not represent
the view of the Apache Cassandra project.

The most stable/production ready version of Cassandra is :

0.6.8

I have to say 0.6.8 instead of 0.6.6, because 0.6.8 contains the
(0.6.7-era) patch from CASSANDRA-1676, without which streaming is
broken. However.. 0.6.7 and 0.6.8 contain non-bugfix patches, and in
0.6.7's case there is a regression in that non-bugfix patch. Versions
of the 0.6 branch above 0.6.9 contain still more non-bugfix patches
and regressions. All extant 0.7 releases (0.7.0, 0.7.1) contain major
bugs. The most stable/safe version is, therefore, likely to be
"0.6.6+1676". If you are uncomfortable patching 1676 into 0.6.6
yourself, use 0.6.8.

https://issues.apache.org/jira/browse/CASSANDRA-1676

=Rob


Re: latest rows

2011-02-15 Thread Tyler Hobbs
> What is the best way to retrieve the latest rows from a CF with OPP.
>

Use inverted timestamps (for example, 2^64 - timestamp) with zeros for
padding as the row keys.

This way you can do a normal forward range scan and get the N latest rows.

-- 
Tyler Hobbs
Software Engineer, DataStax 
Maintainer of the pycassa  Cassandra
Python client library


latest rows

2011-02-15 Thread Alaa Zubaidi

Hi,

What is the best way to retrieve the latest rows from a CF with OPP.

We are using OPP and key range queries but I cannot find an easy way to 
get the latest 10 keys for example from a column family with 1000s of keys.
I really don't want to create another CF to store row key names as 
columns and then retrieve the latest columns from this CF and use the 
row keys to retrieve the latest data.


Regards and Thanks,
Alaa



Patterns for writing enterprise applications on cassandra

2011-02-15 Thread Ritesh Tijoriwala
Hi,
I have general questions on writing enterprise applications on cassandra. I
come from a background which involves writing enterprise applications using
DBMS.

What are the general patterns people follow in Cassandra world when
migrating a code that is within transaction boundaries in a traditional DBMS
application? for e.g. transfer $5 from account A to account B. The code
would normally look like:

beginXT
try {
  A = A - $5;
  B = B + $5;
  commitXT;
} catch () {
  rollbackXT;
}

The effect of this is that either both statements execute, or none. The sum
of account balances remain constant. How does one deal with this type of
code when writing on top of Cassandra? I understand that consistency will be
eventual and its fine that eventually, sum of both account balances remain
constant but how to detect that a transaction failed and only step "A = A -
$5" has executed and the later step has not been executed?

Are there any sample applications out there where I can browse code and see
how it is written? For e.g. customer purchase order application, etc. which
atleast involves some concept of transaction and has code to keep things
consistent.

Thanks,
Ritesh


Keyspace additions are not replicated to one node in the cluster

2011-02-15 Thread Shu Zhang
Hi, a node in my cassandra cluster will not accept keyspace additions applied 
to other nodes. In its logs, it says:

DEBUG [MigrationStage:1] 2011-02-15 15:39:57,995 
DefinitionsUpdateResponseVerbHandler.java (line 71) Applying AddKeyspace from 
{X}
DEBUG [MigrationStage:1] 2011-02-15 15:39:57,995 
DefinitionsUpdateResponseVerbHandler.java (line 79) Migration not applied 
Previous version mismatch. cannot apply.

My cassandra nodes' version is 0.7-rc4.

Does anyone know how I can recover from this problem? I'm fine with this node 
being synced to whatever data definition is defined on the rest of the cluster.

Thanks,
Shu

Re: Data distribution

2011-02-15 Thread Robert Coli
On Tue, Feb 15, 2011 at 3:05 PM, mcasandra  wrote:
>
> Is there a way to let the new node join cluster in the background and make it
> live to clients only after it has finished with node repair, syncing data
> etc. and in the end sync keys or trees that's needed before it's come to
> life. I know it can be tricky since it needs to be live as soon as it steals
> the keys.

In general, no. This sort of thing has been proposed a few times, in
different contexts, and has not been implemented.

https://issues.apache.org/jira/browse/CASSANDRA-768

=Rob


Re: Data distribution

2011-02-15 Thread mcasandra

Thanks! Would Hector take care of not load balancing to the new node until
it's ready?

Also, when repair is occuring in background is there a status that I can
look at to see that repair is occuring for key ABC.
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-distribution-tp6025869p6029882.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Data distribution

2011-02-15 Thread Matthew Dennis
Assuming you aren't changing the RC, the normal bootstrap process takes care
of all the problems like that, making sure things work correctly.

Most importantly, if something fails (either the new node or any of the
existing nodes) you can recover from it.

Just don't connect clients directly to that new node until it's fully in the
ring.

On Tue, Feb 15, 2011 at 5:05 PM, mcasandra  wrote:

>
> Is there a way to let the new node join cluster in the background and make
> it
> live to clients only after it has finished with node repair, syncing data
> etc. and in the end sync keys or trees that's needed before it's come to
> life. I know it can be tricky since it needs to be live as soon as it
> steals
> the keys.
>
> This ways we know we are adding nodes only when we think it's all ready.
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-distribution-tp6025869p6029708.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>


Re: Data distribution

2011-02-15 Thread mcasandra

Is there a way to let the new node join cluster in the background and make it
live to clients only after it has finished with node repair, syncing data
etc. and in the end sync keys or trees that's needed before it's come to
life. I know it can be tricky since it needs to be live as soon as it steals
the keys.

This ways we know we are adding nodes only when we think it's all ready.
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-distribution-tp6025869p6029708.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Coordinator node

2011-02-15 Thread Attila Babo
There is a single point of failure for sure as there is a single proxy
in front but that pays off as the load is even between nodes. Another
plus is when a machine is out of the cluster for maintenance the proxy
handles that automatically. Originally I started it as an experiment,
there is a large number of long running clients and using a proxy was
an easy way to reduce configuration.

On Tue, Feb 15, 2011 at 11:45 PM, Matthew Dennis  wrote:
> You have a single HAProxy node in front of the cluster or you have a HAProxy
> node on each machine that is a client of Cassandra that points at all the
> nodes in the cluster?
>
> The former has a SPOF and bottleneck (the HAProxy instance), the latter does
> not (and is somewhat common, especially for things like Apache+PHP).


Re: Coordinator node

2011-02-15 Thread Matthew Dennis
You have a single HAProxy node in front of the cluster or you have a HAProxy
node on each machine that is a client of Cassandra that points at all the
nodes in the cluster?

The former has a SPOF and bottleneck (the HAProxy instance), the latter does
not (and is somewhat common, especially for things like Apache+PHP).

On Tue, Feb 15, 2011 at 4:41 PM, Attila Babo  wrote:

> We are using haproxy in TCP mode for round-robin with great succes.
> It's bit unorthodox but has same real added values like logging.
>
> Here is the relavant config for haproxy:
>
> #
>
> global
>log 127.0.0.1 local0
>log 127.0.0.1 local1 notice
>maxconn 4096
>user haproxy
>group haproxy
>daemon
>
> defaults
>log global
>mode tcp
>maxconn 2000
>contimeout 5000
>clitimeout 5
>srvtimeout 5
>
> listen cassandra 0.0.0.0:9160
>balance roundrobin
>server db1 ec2-XXX.compute-1.amazonaws.com:9160 check observe layer4
>server db2 ec2-YYY.compute-1.amazonaws.com:9160 check observe layer4
>server db3 ec2-ZZZ.compute-1.amazonaws.com:9160 check observe layer4
>


Re: What is the most solid version of Cassandra? No secondary indexes needed.

2011-02-15 Thread buddhasystem

Thank you Attila!

We will indeed have a few months of "breaking in". I suppose I'll
keep my fingers crossed and see that 0.7.X is very stable. So I'll
deploy 0.7.1 -- I will need to apply all the patches, there is no
cumulative download, is that correct?


Attila Babo wrote:
> 
> 0.6.8 is stable and production ready, the later versions of the 0.6
> branch has "issues". No offense, but the 0.7 branch is fairly unstable
> from my experience. I have reproduced all the open bugs with a
> production dataset, even when tried to rebuild it from scratch after a
> complete loss.
> 
> If you have a few month before going to production your best bet is
> still 0.7.1 as it will stabilize but the switch between versions is
> painful.
> 
> /Attila
> 
> 

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-is-the-most-solid-version-of-Cassandra-No-secondary-indexes-needed-tp6028966p6029622.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Coordinator node

2011-02-15 Thread Attila Babo
We are using haproxy in TCP mode for round-robin with great succes.
It's bit unorthodox but has same real added values like logging.

Here is the relavant config for haproxy:

#

global
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
maxconn 4096
user haproxy
group haproxy
daemon

defaults
log global
mode tcp
maxconn 2000
contimeout 5000
clitimeout 5
srvtimeout 5

listen cassandra 0.0.0.0:9160
balance roundrobin
server db1 ec2-XXX.compute-1.amazonaws.com:9160 check observe layer4
server db2 ec2-YYY.compute-1.amazonaws.com:9160 check observe layer4
server db3 ec2-ZZZ.compute-1.amazonaws.com:9160 check observe layer4


Re: What is the most solid version of Cassandra? No secondary indexes needed.

2011-02-15 Thread Attila Babo
0.6.8 is stable and production ready, the later versions of the 0.6
branch has "issues". No offense, but the 0.7 branch is fairly unstable
from my experience. I have reproduced all the open bugs with a
production dataset, even when tried to rebuild it from scratch after a
complete loss.

If you have a few month before going to production your best bet is
still 0.7.1 as it will stabilize but the switch between versions is
painful.

/Attila


Re: Coordinator node

2011-02-15 Thread A J
Makes sense ! Thanks.
Just a quick follow-up:
Now I understand the write is not made to coordinator (unless it is part of
the replica for that key). But does the write column traffic 'flow' through
the coordinator node. For a 2G column write, will I see 2G network traffic
on the coordinator node  or just a few bytes of traffic on the co-ordinator
of it reading the key and talking to nodes/client etc ?

This will be a factor for us. So need to make sure exactly.


On Tue, Feb 15, 2011 at 5:02 PM, Matthew Dennis wrote:

> It doesn't write anything to the coordinator node, it just forwards it to
> nodes in the replica set for that row key.
>
> write goes to some node (coordinator, i.e. whatever node you connected to).
> coordinator looks at key, determines which nodes are responsible for it.
> in parallel it forwards the requests to those nodes (in the case it is in
> the replica set for that key, it will write it locally in parallel with the
> writes that were forwarded).
> the coordinator waits until it has the appropriate number of responses to
> meet your consistency level from the nodes in the replica set for the key
> (possibly including itself).
> the coordinator determines the correct value to send to the client based on
> the responses it receives and then sends it.
>
>
> On Tue, Feb 15, 2011 at 3:55 PM, A J  wrote:
>
>> Thanks.
>> 1. That is somewhat disappointing. Wish the redundancy of write on the
>> coordinator node could have been avoided somehow.
>> Does the write on the coordinator node (incase it is not part of the N
>> replica nodes for that key) get deleted before response of the write is
>> returned back to the client ?
>>
>>
>> On Tue, Feb 15, 2011 at 4:40 PM, Matthew Dennis wrote:
>>
>>> 1. Yes, the coordinator node propagates requests to the correct nodes.
>>>
>>> 2. most (all?) higher level clients (pycassa, hector, etc) load balance
>>> for you.  In general your client and/or the caller of the client needs to
>>> catch exceptions and retry.  If you're using RRDNS and some of the nodes are
>>> temporarily down, you wouldn't bother to update DNS; your client would just
>>> route to some other node that is up after noticing the first node is down.
>>>
>>> In general you don't want a load balancer in front of the nodes as the
>>> load balancer itself becomes a SPOF as well as a performance bottleneck (not
>>> to mention the extra cost and complexity).  By far the most common setup is
>>> to have the clients load balance for you, coupled with retry logic in your
>>> application.
>>>
>>>
>>> On Tue, Feb 15, 2011 at 2:45 PM, A J  wrote:
>>>
 From my reading it seems like the node that the client connects to
 becomes the coordinator node. Questions:

 1. Is it true that the write first happens on the coordinator node and
 then the coordinator node propagates it to the right primary node and the
 replicas ? In other words if I have a 2G write, would the 2G be transferred
 first to the coordinator node or is it just a witness and just waits for 
 the
 transfer to happen directly between the client and required right nodes ?

 2. How do you load-balance between the different nodes to give all equal
 chance to become co-ordinator node ? Does the client need a sort of
 round-robin DNS balancer ? if so, what if some of the nodes drop off. How 
 to
 inform the DNS balancer  ?
 Or do I need a proper load balancer in front that looks at the traffic
 on each node and accordingly selects a co-ordinator node ? What is more
 pervalent ?

 Thanks.



>>>
>>
>


Re: Benchmarking Cassandra with YCSB

2011-02-15 Thread Markus Klems
Good point. When we looked at the EC2 nodes, we measured 120% CPU utilization 
or so. We interpreted this as a false representation of CPU utilization on a 
multi-core machine. Our EC2 nodes have 8 virtual cores each.

Maybe Cassandra 0.6.5 is not so good with execution on multi-core systems?

On 15.02.2011, at 20:59, Thibaut Britz  wrote:

> Cassandra is very CPU hungry so you might be hitting a CPU bottleneck.
> What's your CPU usage during these tests?
> 
> 
> On Tue, Feb 15, 2011 at 8:45 PM, Markus Klems  wrote:
>> Hi there,
>> 
>> we are currently benchmarking a Cassandra 0.6.5 cluster with 3
>> High-Mem Quadruple Extra Large EC2 nodes
>> (http://aws.amazon.com/ec2/#instance) using Yahoo's YCSB tool
>> (replication factor is 3, random partitioner). We assigned 32 GB RAM
>> to the JVM and left 32 GB RAM for the Ubuntu Linux filesystem buffer.
>> We also set the user count to a very large number via ulimit -u
>> 99.
>> 
>> Our goal is to achieve max throughput by increasing YCSB's threadcount
>> parameter (i.e. the number of parallel benchmarking client threads).
>> However, this does only improve Cassandra throughput for low numbers
>> of threads. If we move to higher threadcounts, throughput does not
>> increase and even  decreases. Do you have any idea why this is
>> happening and possibly suggestions how to scale throughput to much
>> higher numbers? Why is throughput hitting a wall, anyways? And where
>> does the latency/throughput tradeoff come from?
>> 
>> Here is our YCSB configuration:
>> recordcount=30
>> operationcount=100
>> workload=com.yahoo.ycsb.workloads.CoreWorkload
>> readallfields=true
>> readproportion=0.5
>> updateproportion=0.5
>> scanproportion=0
>> insertproportion=0
>> threadcount= 500
>> target = 1
>> hosts=EC2-1,EC2-2,EC2-3
>> requestdistribution=uniform
>> 
>> These are typical results for threadcount=1:
>> Loading workload...
>> Starting test.
>>  0 sec: 0 operations;
>>  10 sec: 11733 operations; 1168.28 current ops/sec; [UPDATE
>> AverageLatency(ms)=0.64] [READ AverageLatency(ms)=1.03]
>>  20 sec: 24246 operations; 1251.68 current ops/sec; [UPDATE
>> AverageLatency(ms)=0.48] [READ AverageLatency(ms)=1.11]
>> 
>> These are typical results for threadcount=10:
>> 10 sec: 30428 operations; 3029.77 current ops/sec; [UPDATE
>> AverageLatency(ms)=2.11] [READ AverageLatency(ms)=4.32]
>>  20 sec: 60838 operations; 3041.91 current ops/sec; [UPDATE
>> AverageLatency(ms)=2.15] [READ AverageLatency(ms)=4.37]
>> 
>> These are typical results for threadcount=100:
>> 10 sec: 29070 operations; 2895.42 current ops/sec; [UPDATE
>> AverageLatency(ms)=20.53] [READ AverageLatency(ms)=44.91]
>>  20 sec: 53621 operations; 2455.84 current ops/sec; [UPDATE
>> AverageLatency(ms)=23.11] [READ AverageLatency(ms)=55.39]
>> 
>> These are typical results for threadcount=500:
>> 10 sec: 30655 operations; 3053.59 current ops/sec; [UPDATE
>> AverageLatency(ms)=72.71] [READ AverageLatency(ms)=187.19]
>>  20 sec: 68846 operations; 3814.14 current ops/sec; [UPDATE
>> AverageLatency(ms)=65.36] [READ AverageLatency(ms)=191.75]
>> 
>> We never measured more than ~6000 ops/sec. Are there ways to tune
>> Cassandra that we are not aware of? We made some modification to the
>> Cassandra 0.6.5 core for experimental reasons, so it's not easy to
>> switch to 0.7x or 0.8x. However, if this might solve the scaling
>> issues, we might consider to port our modifications to a newer
>> Cassandra version...
>> 
>> Thanks,
>> 
>> Markus Klems
>> 
>> Karlsruhe Institute of Technology, Germany
>> 


Re: Coordinator node

2011-02-15 Thread Matthew Dennis
It doesn't write anything to the coordinator node, it just forwards it to
nodes in the replica set for that row key.

write goes to some node (coordinator, i.e. whatever node you connected to).
coordinator looks at key, determines which nodes are responsible for it.
in parallel it forwards the requests to those nodes (in the case it is in
the replica set for that key, it will write it locally in parallel with the
writes that were forwarded).
the coordinator waits until it has the appropriate number of responses to
meet your consistency level from the nodes in the replica set for the key
(possibly including itself).
the coordinator determines the correct value to send to the client based on
the responses it receives and then sends it.

On Tue, Feb 15, 2011 at 3:55 PM, A J  wrote:

> Thanks.
> 1. That is somewhat disappointing. Wish the redundancy of write on the
> coordinator node could have been avoided somehow.
> Does the write on the coordinator node (incase it is not part of the N
> replica nodes for that key) get deleted before response of the write is
> returned back to the client ?
>
>
> On Tue, Feb 15, 2011 at 4:40 PM, Matthew Dennis wrote:
>
>> 1. Yes, the coordinator node propagates requests to the correct nodes.
>>
>> 2. most (all?) higher level clients (pycassa, hector, etc) load balance
>> for you.  In general your client and/or the caller of the client needs to
>> catch exceptions and retry.  If you're using RRDNS and some of the nodes are
>> temporarily down, you wouldn't bother to update DNS; your client would just
>> route to some other node that is up after noticing the first node is down.
>>
>> In general you don't want a load balancer in front of the nodes as the
>> load balancer itself becomes a SPOF as well as a performance bottleneck (not
>> to mention the extra cost and complexity).  By far the most common setup is
>> to have the clients load balance for you, coupled with retry logic in your
>> application.
>>
>>
>> On Tue, Feb 15, 2011 at 2:45 PM, A J  wrote:
>>
>>> From my reading it seems like the node that the client connects to
>>> becomes the coordinator node. Questions:
>>>
>>> 1. Is it true that the write first happens on the coordinator node and
>>> then the coordinator node propagates it to the right primary node and the
>>> replicas ? In other words if I have a 2G write, would the 2G be transferred
>>> first to the coordinator node or is it just a witness and just waits for the
>>> transfer to happen directly between the client and required right nodes ?
>>>
>>> 2. How do you load-balance between the different nodes to give all equal
>>> chance to become co-ordinator node ? Does the client need a sort of
>>> round-robin DNS balancer ? if so, what if some of the nodes drop off. How to
>>> inform the DNS balancer  ?
>>> Or do I need a proper load balancer in front that looks at the traffic on
>>> each node and accordingly selects a co-ordinator node ? What is more
>>> pervalent ?
>>>
>>> Thanks.
>>>
>>>
>>>
>>
>


Re: Coordinator node

2011-02-15 Thread A J
Thanks.
1. That is somewhat disappointing. Wish the redundancy of write on the
coordinator node could have been avoided somehow.
Does the write on the coordinator node (incase it is not part of the N
replica nodes for that key) get deleted before response of the write is
returned back to the client ?

On Tue, Feb 15, 2011 at 4:40 PM, Matthew Dennis wrote:

> 1. Yes, the coordinator node propagates requests to the correct nodes.
>
> 2. most (all?) higher level clients (pycassa, hector, etc) load balance for
> you.  In general your client and/or the caller of the client needs to catch
> exceptions and retry.  If you're using RRDNS and some of the nodes are
> temporarily down, you wouldn't bother to update DNS; your client would just
> route to some other node that is up after noticing the first node is down.
>
> In general you don't want a load balancer in front of the nodes as the load
> balancer itself becomes a SPOF as well as a performance bottleneck (not to
> mention the extra cost and complexity).  By far the most common setup is to
> have the clients load balance for you, coupled with retry logic in your
> application.
>
>
> On Tue, Feb 15, 2011 at 2:45 PM, A J  wrote:
>
>> From my reading it seems like the node that the client connects to becomes
>> the coordinator node. Questions:
>>
>> 1. Is it true that the write first happens on the coordinator node and
>> then the coordinator node propagates it to the right primary node and the
>> replicas ? In other words if I have a 2G write, would the 2G be transferred
>> first to the coordinator node or is it just a witness and just waits for the
>> transfer to happen directly between the client and required right nodes ?
>>
>> 2. How do you load-balance between the different nodes to give all equal
>> chance to become co-ordinator node ? Does the client need a sort of
>> round-robin DNS balancer ? if so, what if some of the nodes drop off. How to
>> inform the DNS balancer  ?
>> Or do I need a proper load balancer in front that looks at the traffic on
>> each node and accordingly selects a co-ordinator node ? What is more
>> pervalent ?
>>
>> Thanks.
>>
>>
>>
>


Dropping & Creating Column Families Never Returns

2011-02-15 Thread Bill Speirs
Has anyone ever tried to drop a column family and/or create one and
have the command not return from the cli? I'm using 0.7.1 and I tried
to drop a column family and the command never returned. However, on
another node it showed it was gone. I Ctrl-C out of the command, then
issued a create for a column family of the same name, different
schema. That command never returned, but again in other host it showed
it was there. I went to describe and list this column family and got
this:

[default@Logging] describe keyspace Logging;
Keyspace: Logging:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
Replication Factor: 3
  Column Families:
ColumnFamily: Messages
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period: 0.0/0
  Key cache size / save period: 20.0/14400
  Memtable thresholds: 0.5953125/127/60
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Built indexes: []
[default@Logging] list Messages;
Messages not found in current keyspace.


Any ideas?

Bill-


Re: Coordinator node

2011-02-15 Thread Matthew Dennis
1. Yes, the coordinator node propagates requests to the correct nodes.

2. most (all?) higher level clients (pycassa, hector, etc) load balance for
you.  In general your client and/or the caller of the client needs to catch
exceptions and retry.  If you're using RRDNS and some of the nodes are
temporarily down, you wouldn't bother to update DNS; your client would just
route to some other node that is up after noticing the first node is down.

In general you don't want a load balancer in front of the nodes as the load
balancer itself becomes a SPOF as well as a performance bottleneck (not to
mention the extra cost and complexity).  By far the most common setup is to
have the clients load balance for you, coupled with retry logic in your
application.

On Tue, Feb 15, 2011 at 2:45 PM, A J  wrote:

> From my reading it seems like the node that the client connects to becomes
> the coordinator node. Questions:
>
> 1. Is it true that the write first happens on the coordinator node and then
> the coordinator node propagates it to the right primary node and the
> replicas ? In other words if I have a 2G write, would the 2G be transferred
> first to the coordinator node or is it just a witness and just waits for the
> transfer to happen directly between the client and required right nodes ?
>
> 2. How do you load-balance between the different nodes to give all equal
> chance to become co-ordinator node ? Does the client need a sort of
> round-robin DNS balancer ? if so, what if some of the nodes drop off. How to
> inform the DNS balancer  ?
> Or do I need a proper load balancer in front that looks at the traffic on
> each node and accordingly selects a co-ordinator node ? What is more
> pervalent ?
>
> Thanks.
>
>
>


Re: Another EOFException

2011-02-15 Thread B. Todd Burruss
it happens when i start the node.  just tried it again.  here's the 
saved_caches directory:



[cassandra@kv-app02 ~]$ ls -l /data/cassandra-data/saved_caches/
total 12
-rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36 
NotificationSystem-Events-KeyCache
-rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36 
NotificationSystem-Msgs-KeyCache
-rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36 
NotificationSystem-Rendered-KeyCache
-rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36 
NotificationSystem-ScheduledMsgs-KeyCache
-rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36 
NotificationSystem-ScheduledTimes-KeyCache
-rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36 
NotificationSystem-SystemState-KeyCache
-rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36 
NotificationSystem-Templates-KeyCache
-rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36 
NotificationSystem-Transports-KeyCache
-rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36 
Queues-EmailTransport_Pending-KeyCache
-rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36 
Queues-EmailTransport_Waiting-KeyCache
-rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36 
Queues-Errors_Pending-KeyCache
-rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36 
Queues-Errors_Waiting-KeyCache
-rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36 
Queues-MessageDescriptors-KeyCache
-rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36 
Queues-PipeDescriptors-KeyCache
-rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36 
Queues-Processing_Pending-KeyCache
-rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36 
Queues-Processing_Waiting-KeyCache
-rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36 
Queues-QueueDescriptors-KeyCache
-rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36 
Queues-QueuePipeCnxn-KeyCache

-rw-rw-r-- 1 cassandra cassandra  0 Feb 15 10:36 Queues-QueueStats-KeyCache
-rw-rw-r-- 1 cassandra cassandra 38 Feb 15 09:36 
system-HintsColumnFamily-KeyCache

-rw-rw-r-- 1 cassandra cassandra  0 Feb 15 09:36 system-IndexInfo-KeyCache
-rw-rw-r-- 1 cassandra cassandra  5 Feb 15 09:36 
system-LocationInfo-KeyCache

-rw-rw-r-- 1 cassandra cassandra  0 Feb 15 09:36 system-Migrations-KeyCache
-rw-rw-r-- 1 cassandra cassandra 18 Feb 15 09:36 system-Schema-KeyCache
-rw-rw-r-- 1 cassandra cassandra  0 Feb 15 09:36 
UDS4Profile-ProfileDefinitions-KeyCache
-rw-rw-r-- 1 cassandra cassandra  0 Feb 15 09:36 
UDS4Profile-ProfileNamespaces-KeyCache
-rw-rw-r-- 1 cassandra cassandra  0 Feb 15 09:36 
UDS4Profile-Profiles_40229-KeyCache
-rw-rw-r-- 1 cassandra cassandra  0 Feb 15 09:36 
UDS4Profile-Profiles_RN_test-KeyCache




On 02/15/2011 01:01 PM, Jonathan Ellis wrote:

Is this reproducible or just "I happened to kill the server while it
was in the middle of writing out the cache keys?"

On Tue, Feb 15, 2011 at 1:10 PM, B. Todd Burruss  wrote:

the following exception seems to be about loading saved caches, but i don't
really care about the cache so maybe isn't a big deal.  anyway, this is with
patched 0.7.1 (0001-Fix-bad-signed-conversion-from-byte-to-int.patch)


WARN 11:07:59,800 error reading saved cache
/data/cassandra-data/saved_caches/UDS4Profile-Profiles_40229-KeyCache
java.io.EOFException
at
java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2281)
at
java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2750)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:780)
at java.io.ObjectInputStream.(ObjectInputStream.java:280)
at
org.apache.cassandra.db.ColumnFamilyStore.readSavedCache(ColumnFamilyStore.java:255)
at
org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:198)
at
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:451)
at
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:432)
at org.apache.cassandra.db.Table.initCf(Table.java:360)
at org.apache.cassandra.db.Table.(Table.java:290)
at org.apache.cassandra.db.Table.open(Table.java:107)
at
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:162)
at
org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:316)
at
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:79)







Re: Possible EOFException regression in 0.7.1

2011-02-15 Thread Jonathan Ellis
Note that this is a read-time bug, there is no data loss involved.

Patch is committed with a new test to prevent future regressions.

I've asked Hudson (https://hudson.apache.org/hudson/job/Cassandra) to
create a new binary build with the patch included but the backlog is
long enough that I don't expect to see it before our next "nightly"
build.

We'll get 0.7.2 out quickly.

On Tue, Feb 15, 2011 at 12:15 PM, Sylvain Lebresne  wrote:
> On Tue, Feb 15, 2011 at 7:10 PM, ruslan usifov 
> wrote:
>>
>> It will be great if patch appear very quick
>
> patch attached here: https://issues.apache.org/jira/browse/CASSANDRA-2165
> Hoping this is quick enough.
>
>>
>> 2011/2/15 Jonathan Ellis 
>>>
>>> I can reproduce with your script.  Thanks!
>>>
>>> 2011/2/15 Jonas Borgström :
>>> > Hi all,
>>> >
>>> > While testing the new 0.7.1 release I got the following exception:
>>> >
>>> > ERROR [ReadStage:11] 2011-02-15 16:39:18,105
>>> > DebuggableThreadPoolExecutor.java (line 103) Error in
>>> > ThreadPoolExecutor
>>> > java.io.IOError: java.io.EOFException
>>> >        at
>>> >
>>> > org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNamesIterator.java:75)
>>> >        at
>>> >
>>> > org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
>>> >        at
>>> >
>>> > org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
>>> >        at
>>> >
>>> > org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1274)
>>> >        at
>>> >
>>> > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1166)
>>> >        at
>>> >
>>> > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1095)
>>> >        at org.apache.cassandra.db.Table.getRow(Table.java:384)
>>> >        at
>>> >
>>> > org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
>>> >        at
>>> >
>>> > org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:473)
>>> >        at
>>> > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>>> >        at
>>> >
>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>> >        at
>>> >
>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>> >        at java.lang.Thread.run(Thread.java:636)
>>> > Caused by: java.io.EOFException
>>> >        at java.io.DataInputStream.readInt(DataInputStream.java:392)
>>> >        at
>>> >
>>> > org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:48)
>>> >        at
>>> >
>>> > org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:30)
>>> >        at
>>> >
>>> > org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:108)
>>> >        at
>>> >
>>> > org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:106)
>>> >        at
>>> >
>>> > org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNamesIterator.java:71)
>>> >        ... 12 more
>>> >
>>> > I'm able reliably reproduce this using the following one node cluster:
>>> > - apache-cassandra-0.7.1-bin.tar.gz
>>> > - Fedora 14
>>> > - java version "1.6.0_20".
>>> >  OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)
>>> > - Default cassandra.yaml
>>> > - cassandra-env.sh: MAX_HEAP_SIZE="1G"; HEAP_NEWSIZE="200M"
>>> >
>>> > cassandra-cli initialization:
>>> > - create keyspace foo;
>>> > - use foo;
>>> > - create column family datasets;
>>> >
>>> > $ python dataset_check.py (attached)
>>> > Inserting row 0 of 10
>>> > Inserting row 1 of 10
>>> > Inserting row 2 of 10
>>> > Inserting row 3 of 10
>>> > Inserting row 4 of 10
>>> > Inserting row 5 of 10
>>> > Inserting row 6 of 10
>>> > Inserting row 7 of 10
>>> > Inserting row 8 of 10
>>> > Inserting row 9 of 10
>>> > Attempting to fetch key 0
>>> > Traceback (most recent call last):
>>> > ...
>>> > pycassa.pool.MaximumRetryException: Retried 6 times
>>> >
>>> > After this I have 6 EOFExceptions in system.log.
>>> > Running "get datasets[0]['name'];" using cassandra-cli also triggers
>>> > the
>>> > same exception.
>>> > I've not been able to reproduce this with cassandra 0.7.0.
>>> >
>>> > Regards,
>>> > Jonas
>>> >
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra support
>>> http://www.datastax.com
>>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Another EOFException

2011-02-15 Thread Jonathan Ellis
Is this reproducible or just "I happened to kill the server while it
was in the middle of writing out the c

On Tue, Feb 15, 2011 at 1:10 PM, B. Todd Burruss  wrote:
> the following exception seems to be about loading saved caches, but i don't
> really care about the cache so maybe isn't a big deal.  anyway, this is with
> patched 0.7.1 (0001-Fix-bad-signed-conversion-from-byte-to-int.patch)
>
>
> WARN 11:07:59,800 error reading saved cache
> /data/cassandra-data/saved_caches/UDS4Profile-Profiles_40229-KeyCache
> java.io.EOFException
>    at
> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2281)
>    at
> java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2750)
>    at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:780)
>    at java.io.ObjectInputStream.(ObjectInputStream.java:280)
>    at
> org.apache.cassandra.db.ColumnFamilyStore.readSavedCache(ColumnFamilyStore.java:255)
>    at
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:198)
>    at
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:451)
>    at
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:432)
>    at org.apache.cassandra.db.Table.initCf(Table.java:360)
>    at org.apache.cassandra.db.Table.(Table.java:290)
>    at org.apache.cassandra.db.Table.open(Table.java:107)
>    at
> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:162)
>    at
> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:316)
>    at
> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:79)
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Another EOFException

2011-02-15 Thread Jonathan Ellis
Is this reproducible or just "I happened to kill the server while it
was in the middle of writing out the cache keys?"

On Tue, Feb 15, 2011 at 1:10 PM, B. Todd Burruss  wrote:
> the following exception seems to be about loading saved caches, but i don't
> really care about the cache so maybe isn't a big deal.  anyway, this is with
> patched 0.7.1 (0001-Fix-bad-signed-conversion-from-byte-to-int.patch)
>
>
> WARN 11:07:59,800 error reading saved cache
> /data/cassandra-data/saved_caches/UDS4Profile-Profiles_40229-KeyCache
> java.io.EOFException
>    at
> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2281)
>    at
> java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2750)
>    at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:780)
>    at java.io.ObjectInputStream.(ObjectInputStream.java:280)
>    at
> org.apache.cassandra.db.ColumnFamilyStore.readSavedCache(ColumnFamilyStore.java:255)
>    at
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:198)
>    at
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:451)
>    at
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:432)
>    at org.apache.cassandra.db.Table.initCf(Table.java:360)
>    at org.apache.cassandra.db.Table.(Table.java:290)
>    at org.apache.cassandra.db.Table.open(Table.java:107)
>    at
> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:162)
>    at
> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:316)
>    at
> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:79)
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: What if write consistency level cannot me met ?

2011-02-15 Thread Aaron Morton
Here's my understandingThe request will not start if CL nodes are not up from the point of view of the coordinator (when considering a single mutation). I the case described where the CL is ALL, the write would not start and UnavailableException would be thrown This comes from IWriteResponseHandler.assureSufficientNodes() . Same thing happens on reads, we're not going to start a request we know cannot succeed.if CL nodes are up and the write starts, but less then CL nodes return by rpc_timeout the AbstractWriteResponseHandler ( or ReadCallback for reads) will raise a TimeoutException,  oa.c.t.CassandraServer is will catch this and turn it into the thrift TimedoutException. So in general:- UnavailableException means the request could not start (insufficient CL, or node bootstrapping or any node down when doing schema ops)- TimedoutException means we started and timed out. In both cases you can retry. AaronOn 16 Feb, 2011,at 09:11 AM, Matthew Dennis  wrote:But you can not depend on such behavior.  If you do a write and you get an unavailable exception, the only thing you know is at that time it was not able to be placed on all the nodes required to meet your CL.  It may eventually end up on all those nodes, it may not be on any of the nodes or at the time you read the reply it could be on some subset of the nodes.  Regardless of the outcome, it is safe to retry.
On Tue, Feb 15, 2011 at 2:00 PM, Aaron Morton  wrote:
The write will not start if there are insufficient nodes up. In this case (All cl) you would get an error and nothing would be committed to disk. You would get an Unavailable exception.

Aaron

On 16/02/2011, at 7:46 AM, Thibaut Britz  wrote:

> Your write will fail. But if the write has reached  at least one node,
> it will eventually reach all the other nodes as well. So it won't
> rollback.
>
>
> On Tue, Feb 15, 2011 at 7:38 PM, A J  wrote:
>> Say I set write consistency level to ALL and all but one node are down. What
>> happens to writes ? Does it rollback from the live node before returning
>> failure to client ?
>> Thanks.



Re: Possible EOFException regression in 0.7.1

2011-02-15 Thread Jonathan Ellis
This bug was not in 0.7.0, but it's certainly possible that other
ByteBuffer-related bugs were.

On Tue, Feb 15, 2011 at 1:00 PM, Dan Hendry  wrote:
> I have been having plenty of problems (on 0.7.0,
> http://www.mail-archive.com/user@cassandra.apache.org/msg09341.html,
> http://www.mail-archive.com/user@cassandra.apache.org/msg09230.html,
> http://www.mail-archive.com/user@cassandra.apache.org/msg09122.html,
> http://www.mail-archive.com/dev@cassandra.apache.org/msg01746.html, and from
> others:
> http://www.mail-archive.com/user@cassandra.apache.org/msg09838.html,) which
> are very similar to what was reported and apparently fixed for this case. In
> my instance, I have not been able to find a reproducible case but its not
> all that feasible to log what is going into my nodes. Could this bug have
> existed in 0.7.0 in another form or could this problem occur elsewhere in
> the code?
>
>
>
> Dan
>
>
>
> From: Sylvain Lebresne [mailto:sylv...@datastax.com]
> Sent: February-15-11 13:15
> To: user@cassandra.apache.org
> Subject: Re: Possible EOFException regression in 0.7.1
>
>
>
> On Tue, Feb 15, 2011 at 7:10 PM, ruslan usifov 
> wrote:
>
> It will be great if patch appear very quick
>
>
>
> patch attached here: https://issues.apache.org/jira/browse/CASSANDRA-2165
>
>
>
> Hoping this is quick enough.
>
>
>
>
>
> 2011/2/15 Jonathan Ellis 
>
>
>
> I can reproduce with your script.  Thanks!
>
> 2011/2/15 Jonas Borgström :
>
>> Hi all,
>>
>> While testing the new 0.7.1 release I got the following exception:
>>
>> ERROR [ReadStage:11] 2011-02-15 16:39:18,105
>> DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
>> java.io.IOError: java.io.EOFException
>>        at
>>
>> org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNamesIterator.java:75)
>>        at
>>
>> org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
>>        at
>>
>> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
>>        at
>>
>> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1274)
>>        at
>>
>> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1166)
>>        at
>>
>> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1095)
>>        at org.apache.cassandra.db.Table.getRow(Table.java:384)
>>        at
>>
>> org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
>>        at
>>
>> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:473)
>>        at
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>>        at
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>        at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>        at java.lang.Thread.run(Thread.java:636)
>> Caused by: java.io.EOFException
>>        at java.io.DataInputStream.readInt(DataInputStream.java:392)
>>        at
>>
>> org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:48)
>>        at
>>
>> org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:30)
>>        at
>>
>> org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:108)
>>        at
>>
>> org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:106)
>>        at
>>
>> org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNamesIterator.java:71)
>>        ... 12 more
>>
>> I'm able reliably reproduce this using the following one node cluster:
>> - apache-cassandra-0.7.1-bin.tar.gz
>> - Fedora 14
>> - java version "1.6.0_20".
>>  OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)
>> - Default cassandra.yaml
>> - cassandra-env.sh: MAX_HEAP_SIZE="1G"; HEAP_NEWSIZE="200M"
>>
>> cassandra-cli initialization:
>> - create keyspace foo;
>> - use foo;
>> - create column family datasets;
>>
>> $ python dataset_check.py (attached)
>> Inserting row 0 of 10
>> Inserting row 1 of 10
>> Inserting row 2 of 10
>> Inserting row 3 of 10
>> Inserting row 4 of 10
>> Inserting row 5 of 10
>> Inserting row 6 of 10
>> Inserting row 7 of 10
>> Inserting row 8 of 10
>> Inserting row 9 of 10
>> Attempting to fetch key 0
>> Traceback (most recent call last):
>> ...
>> pycassa.pool.MaximumRetryException: Retried 6 times
>>
>> After this I have 6 EOFExceptions in system.log.
>> Running "get datasets[0]['name'];" using cassandra-cli also triggers the
>> same exception.
>> I've not been able to reproduce this with cassandra 0.7.0.
>>
>> Regards,
>> Jonas
>>
>>
>>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>
>
>
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version:

Coordinator node

2011-02-15 Thread A J
>From my reading it seems like the node that the client connects to becomes
the coordinator node. Questions:

1. Is it true that the write first happens on the coordinator node and then
the coordinator node propagates it to the right primary node and the
replicas ? In other words if I have a 2G write, would the 2G be transferred
first to the coordinator node or is it just a witness and just waits for the
transfer to happen directly between the client and required right nodes ?

2. How do you load-balance between the different nodes to give all equal
chance to become co-ordinator node ? Does the client need a sort of
round-robin DNS balancer ? if so, what if some of the nodes drop off. How to
inform the DNS balancer  ?
Or do I need a proper load balancer in front that looks at the traffic on
each node and accordingly selects a co-ordinator node ? What is more
pervalent ?

Thanks.


Re: Update of value for a given name

2011-02-15 Thread Tyler Hobbs
This may help clear things up for you:

http://wiki.apache.org/cassandra/MemtableSSTable

-- 
Tyler Hobbs
Software Engineer, DataStax 
Maintainer of the pycassa  Cassandra
Python client library


Update of value for a given name

2011-02-15 Thread A J
If I update a column (i.e. change the value contents for a given name in a
given key), is the physical disk operation equivalent to delete followed by
insert.
Or is it just insert somehow making the last value marked as stale ?

In the definite guide, it says the following about SSTable:
*All writes are sequential, which is the primary reason that writes perform
so well in Cassandra. No reads or seeks of any kind are required for writing
a value to Cassandra because all writes are append operations.*
*
*
It made me wonder how updates work, if it is append only.

I am new to cassandra so please let me know if I am missing something basic.

Regards,
-AJ


Re: What if write consistency level cannot me met ?

2011-02-15 Thread Matthew Dennis
But you can not depend on such behavior.  If you do a write and you get an
unavailable exception, the only thing you know is at that time it was not
able to be placed on all the nodes required to meet your CL.  It may
eventually end up on all those nodes, it may not be on any of the nodes or
at the time you read the reply it could be on some subset of the nodes.
Regardless of the outcome, it is safe to retry.

On Tue, Feb 15, 2011 at 2:00 PM, Aaron Morton wrote:

> The write will not start if there are insufficient nodes up. In this case
> (All cl) you would get an error and nothing would be committed to disk. You
> would get an Unavailable exception.
>
> Aaron
>
> On 16/02/2011, at 7:46 AM, Thibaut Britz 
> wrote:
>
> > Your write will fail. But if the write has reached  at least one node,
> > it will eventually reach all the other nodes as well. So it won't
> > rollback.
> >
> >
> > On Tue, Feb 15, 2011 at 7:38 PM, A J  wrote:
> >> Say I set write consistency level to ALL and all but one node are down.
> What
> >> happens to writes ? Does it rollback from the live node before returning
> >> failure to client ?
> >> Thanks.
>


Re: What is the most solid version of Cassandra? No secondary indexes needed.

2011-02-15 Thread Edward Capriolo
On Tue, Feb 15, 2011 at 3:03 PM, buddhasystem  wrote:
>
> Thank you! It's just that 7.1 seems the bleeding edge now (a serious bug
> fixed today). Would you still trust it as a production-level service? I'm
> just slightly concerned. I don't want to create a perception among our IT
> that the product is not ready for prime time.
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-is-the-most-solid-version-of-Cassandra-No-secondary-indexes-needed-tp6028966p6029047.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.
>

You are not going to want to go through the 6.X API to 7.0 API
migration. I am still happily running 0.6.8 But I know I need the
features in 0.7.X. If i were starting today I would go with the 0.7.X
branch and be ready to do some minor updates in the next couple
months.


Re: Benchmarking Cassandra with YCSB

2011-02-15 Thread Aaron Morton
Initial thoughts are you are overloading the cluster, are their any log lines 
about dropping messages?

What is the schema, what settings do you have in Cassandra yaml  and what are 
CF stats telling you? E.g. Are you switching Memtables too quickly? What are 
the write latency numbers?

Also 0.7 is much faster.

Aaron

On 16/02/2011, at 8:59 AM, Thibaut Britz  wrote:

> Cassandra is very CPU hungry so you might be hitting a CPU bottleneck.
> What's your CPU usage during these tests?
> 
> 
> On Tue, Feb 15, 2011 at 8:45 PM, Markus Klems  wrote:
>> Hi there,
>> 
>> we are currently benchmarking a Cassandra 0.6.5 cluster with 3
>> High-Mem Quadruple Extra Large EC2 nodes
>> (http://aws.amazon.com/ec2/#instance) using Yahoo's YCSB tool
>> (replication factor is 3, random partitioner). We assigned 32 GB RAM
>> to the JVM and left 32 GB RAM for the Ubuntu Linux filesystem buffer.
>> We also set the user count to a very large number via ulimit -u
>> 99.
>> 
>> Our goal is to achieve max throughput by increasing YCSB's threadcount
>> parameter (i.e. the number of parallel benchmarking client threads).
>> However, this does only improve Cassandra throughput for low numbers
>> of threads. If we move to higher threadcounts, throughput does not
>> increase and even  decreases. Do you have any idea why this is
>> happening and possibly suggestions how to scale throughput to much
>> higher numbers? Why is throughput hitting a wall, anyways? And where
>> does the latency/throughput tradeoff come from?
>> 
>> Here is our YCSB configuration:
>> recordcount=30
>> operationcount=100
>> workload=com.yahoo.ycsb.workloads.CoreWorkload
>> readallfields=true
>> readproportion=0.5
>> updateproportion=0.5
>> scanproportion=0
>> insertproportion=0
>> threadcount= 500
>> target = 1
>> hosts=EC2-1,EC2-2,EC2-3
>> requestdistribution=uniform
>> 
>> These are typical results for threadcount=1:
>> Loading workload...
>> Starting test.
>>  0 sec: 0 operations;
>>  10 sec: 11733 operations; 1168.28 current ops/sec; [UPDATE
>> AverageLatency(ms)=0.64] [READ AverageLatency(ms)=1.03]
>>  20 sec: 24246 operations; 1251.68 current ops/sec; [UPDATE
>> AverageLatency(ms)=0.48] [READ AverageLatency(ms)=1.11]
>> 
>> These are typical results for threadcount=10:
>> 10 sec: 30428 operations; 3029.77 current ops/sec; [UPDATE
>> AverageLatency(ms)=2.11] [READ AverageLatency(ms)=4.32]
>>  20 sec: 60838 operations; 3041.91 current ops/sec; [UPDATE
>> AverageLatency(ms)=2.15] [READ AverageLatency(ms)=4.37]
>> 
>> These are typical results for threadcount=100:
>> 10 sec: 29070 operations; 2895.42 current ops/sec; [UPDATE
>> AverageLatency(ms)=20.53] [READ AverageLatency(ms)=44.91]
>>  20 sec: 53621 operations; 2455.84 current ops/sec; [UPDATE
>> AverageLatency(ms)=23.11] [READ AverageLatency(ms)=55.39]
>> 
>> These are typical results for threadcount=500:
>> 10 sec: 30655 operations; 3053.59 current ops/sec; [UPDATE
>> AverageLatency(ms)=72.71] [READ AverageLatency(ms)=187.19]
>>  20 sec: 68846 operations; 3814.14 current ops/sec; [UPDATE
>> AverageLatency(ms)=65.36] [READ AverageLatency(ms)=191.75]
>> 
>> We never measured more than ~6000 ops/sec. Are there ways to tune
>> Cassandra that we are not aware of? We made some modification to the
>> Cassandra 0.6.5 core for experimental reasons, so it's not easy to
>> switch to 0.7x or 0.8x. However, if this might solve the scaling
>> issues, we might consider to port our modifications to a newer
>> Cassandra version...
>> 
>> Thanks,
>> 
>> Markus Klems
>> 
>> Karlsruhe Institute of Technology, Germany
>> 


Re: What is the most solid version of Cassandra? No secondary indexes needed.

2011-02-15 Thread Javier Canillas
We have been running a 0.6.3 with some custom features for more than 1 month
and it has been running fine. we are planning on moving to 0.7.1 in about 1
month from now if it past our stress tests.

If you are really going from scratch to production environment, I would
definetly go with 0.7.1 after some heavy stress test, it will give you a
better migration process to any new version.

Javier Canillas

On Tue, Feb 15, 2011 at 5:03 PM, buddhasystem  wrote:

>
> Thank you! It's just that 7.1 seems the bleeding edge now (a serious bug
> fixed today). Would you still trust it as a production-level service? I'm
> just slightly concerned. I don't want to create a perception among our IT
> that the product is not ready for prime time.
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-is-the-most-solid-version-of-Cassandra-No-secondary-indexes-needed-tp6028966p6029047.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>


Re: What if write consistency level cannot me met ?

2011-02-15 Thread Aaron Morton
The write will not start if there are insufficient nodes up. In this case (All 
cl) you would get an error and nothing would be committed to disk. You would 
get an Unavailable exception.

Aaron

On 16/02/2011, at 7:46 AM, Thibaut Britz  wrote:

> Your write will fail. But if the write has reached  at least one node,
> it will eventually reach all the other nodes as well. So it won't
> rollback.
> 
> 
> On Tue, Feb 15, 2011 at 7:38 PM, A J  wrote:
>> Say I set write consistency level to ALL and all but one node are down. What
>> happens to writes ? Does it rollback from the live node before returning
>> failure to client ?
>> Thanks.


Re: What is the most solid version of Cassandra? No secondary indexes needed.

2011-02-15 Thread buddhasystem

Thank you! It's just that 7.1 seems the bleeding edge now (a serious bug
fixed today). Would you still trust it as a production-level service? I'm
just slightly concerned. I don't want to create a perception among our IT
that the product is not ready for prime time.
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-is-the-most-solid-version-of-Cassandra-No-secondary-indexes-needed-tp6028966p6029047.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: What is the most solid version of Cassandra? No secondary indexes needed.

2011-02-15 Thread Matthew Dennis
0.7.1 is what I would go with right now.  It's likely you'll eventually have
to upgrade that as well, but moving to other 0.7.x releases should be fairly
painless.  Most development is happening on the 0.7 releases, which already
have lots of fixes over the 0.6 series (not to mention performance
improvements and better logging in general).

On Tue, Feb 15, 2011 at 1:40 PM, buddhasystem  wrote:

>
> Hello,
>
> we are acquiring new hardware for our cluster and will be installing it
> soon. It's likely that I won't need to rely on secondary index
> functionality, as data will be write-once read-many and I can get away with
> inverse index creation at load time, plus I have some more complex indexing
> in mind than comes packaged (too much to explain here).
>
> So, if I don't need indexes, what is the most stable, reliable version of
> Cassandra that I can put in production? I'm seeing bug reports here and
> some
> sound quite serious, I just want something that works day in, day out.
>
> Thank you,
>
> Maxim
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-is-the-most-solid-version-of-Cassandra-No-secondary-indexes-needed-tp6028966p6028966.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>


Re: Benchmarking Cassandra with YCSB

2011-02-15 Thread Thibaut Britz
Cassandra is very CPU hungry so you might be hitting a CPU bottleneck.
What's your CPU usage during these tests?


On Tue, Feb 15, 2011 at 8:45 PM, Markus Klems  wrote:
> Hi there,
>
> we are currently benchmarking a Cassandra 0.6.5 cluster with 3
> High-Mem Quadruple Extra Large EC2 nodes
> (http://aws.amazon.com/ec2/#instance) using Yahoo's YCSB tool
> (replication factor is 3, random partitioner). We assigned 32 GB RAM
> to the JVM and left 32 GB RAM for the Ubuntu Linux filesystem buffer.
> We also set the user count to a very large number via ulimit -u
> 99.
>
> Our goal is to achieve max throughput by increasing YCSB's threadcount
> parameter (i.e. the number of parallel benchmarking client threads).
> However, this does only improve Cassandra throughput for low numbers
> of threads. If we move to higher threadcounts, throughput does not
> increase and even  decreases. Do you have any idea why this is
> happening and possibly suggestions how to scale throughput to much
> higher numbers? Why is throughput hitting a wall, anyways? And where
> does the latency/throughput tradeoff come from?
>
> Here is our YCSB configuration:
> recordcount=30
> operationcount=100
> workload=com.yahoo.ycsb.workloads.CoreWorkload
> readallfields=true
> readproportion=0.5
> updateproportion=0.5
> scanproportion=0
> insertproportion=0
> threadcount= 500
> target = 1
> hosts=EC2-1,EC2-2,EC2-3
> requestdistribution=uniform
>
> These are typical results for threadcount=1:
> Loading workload...
> Starting test.
>  0 sec: 0 operations;
>  10 sec: 11733 operations; 1168.28 current ops/sec; [UPDATE
> AverageLatency(ms)=0.64] [READ AverageLatency(ms)=1.03]
>  20 sec: 24246 operations; 1251.68 current ops/sec; [UPDATE
> AverageLatency(ms)=0.48] [READ AverageLatency(ms)=1.11]
>
> These are typical results for threadcount=10:
> 10 sec: 30428 operations; 3029.77 current ops/sec; [UPDATE
> AverageLatency(ms)=2.11] [READ AverageLatency(ms)=4.32]
>  20 sec: 60838 operations; 3041.91 current ops/sec; [UPDATE
> AverageLatency(ms)=2.15] [READ AverageLatency(ms)=4.37]
>
> These are typical results for threadcount=100:
> 10 sec: 29070 operations; 2895.42 current ops/sec; [UPDATE
> AverageLatency(ms)=20.53] [READ AverageLatency(ms)=44.91]
>  20 sec: 53621 operations; 2455.84 current ops/sec; [UPDATE
> AverageLatency(ms)=23.11] [READ AverageLatency(ms)=55.39]
>
> These are typical results for threadcount=500:
> 10 sec: 30655 operations; 3053.59 current ops/sec; [UPDATE
> AverageLatency(ms)=72.71] [READ AverageLatency(ms)=187.19]
>  20 sec: 68846 operations; 3814.14 current ops/sec; [UPDATE
> AverageLatency(ms)=65.36] [READ AverageLatency(ms)=191.75]
>
> We never measured more than ~6000 ops/sec. Are there ways to tune
> Cassandra that we are not aware of? We made some modification to the
> Cassandra 0.6.5 core for experimental reasons, so it's not easy to
> switch to 0.7x or 0.8x. However, if this might solve the scaling
> issues, we might consider to port our modifications to a newer
> Cassandra version...
>
> Thanks,
>
> Markus Klems
>
> Karlsruhe Institute of Technology, Germany
>


Re: online chat scenario

2011-02-15 Thread Sasha Dolgy
Hi Aaron,

I did come across this:

http://www.juhonkoti.net/2010/09/25/example-how-to-model-your-data-into-nosql-with-cassandra

Was
this what you were referring to?  I found this one interesting, and keep
coming back to it but have some concerns that this is the best way to
achieve the same result.

-sd

On Tue, Feb 15, 2011 at 8:50 PM, Aaron Morton wrote:

> There was a by here last year who did something similar and did a nice
> write up. Cannot find it right now, some googleing  may help.
>
> Aaron
>
>
> On 16/02/2011, at 2:56 AM, Victor Kabdebon 
> wrote:
>
> Hello Sasha.
>
> In this sort of real time application the way you insert (QUORUM, ONE,
> etc..) and  the way you retrieve is extremely important because your data
> may not have had the time to propagate to all your nodes. Be sure to use
> adequate policies to do that : insert to a certain number of nodes but don't
> sacrifice to much time doing that to keep the real time component.
> Here is a presentation of how the chat is made in Facebook, it may be
> useful to you :
>
>
> 
> http://www.erlang-factory.com/upload/presentations/31/EugeneLetuchy-ErlangatFacebook.pdf
>
> It's more focused on erlang, but it might give you ideas on how to deal
> with that problem (I am not sure that DB are the best way to deal with
> that... but it's just my opinion).
>
> Victor Kabdebon
> http://www.voxnucleus.fr
>
>
>
> 2011/2/15 Sasha Dolgy < sdo...@gmail.com>
>
>> thanks for the response.  thinking about this, this would not allow for
>> the sorting of messages into a chronological order for end user display.  i
>> had thought about having each message as its own column against the room or
>> the user, but i have had some inconsistencies in retrieving the data.
>> sometimes i get 3 columns, sometimes i get 50...( i think this is because of
>> the random partitioner)
>>
>> i had thought about this structure:
>>
>> [messages][nickname][message id => message data]
>> [chatrooms][room_name][message id]
>>
>> this way i can pull all messages a user ever posted, not specific to a
>> room.  what i haven't been able to do so far is print the timestamp on the
>> row or column.  does this have to be explicitly added somewhere or can it be
>> returned as part of a 'get' request?
>>
>> -sd
>>
>>
>> On Tue, Feb 15, 2011 at 2:12 PM, Michal Augustýn <
>> augustyn.mic...@gmail.com> wrote:
>>
>>> The schema design depends on chatrooms/users/messages numbers. I.e. you
>>> can have one CF, where key is chatroom, column name is username, column
>>> value is the message and message time is the same as column timestamp.
>>> You can add day-timestamp to the chatroom name to avoid large rows.
>>>
>>> Augi
>>>
>>> 2011/2/15 Andrey V. Panov < panov.a...@gmail.com>
>>>
>>> I never did it. But I suppose you can use "chatroom name" as key and
 store messages & nicks as columns in JSON and timestamp as columnName.

>>>
>>>
>>
>>
>> --
>> Sasha Dolgy
>> sasha.do...@gmail.com
>>
>
>


-- 
Sasha Dolgy
sasha.do...@gmail.com


Re: Partitioning

2011-02-15 Thread Aaron Morton
You can using the Network Topology Strategy see http://wiki.apache.org/cassandra/Operations?highlight=(topology)|(network)#Network_topologyand NetworkTopologyStrategy in the  conf/cassandra.yaml file. You can control the number of replicas to each DC.Also look at conf/cassandra-topology.properties for information on how to tell cassandra about your network topology.AaronOn 16 Feb, 2011,at 05:10 AM, "RW>N"  wrote:
Hi, 
I am new to Cassandra and am evaluating it. 

Following diagram is how my setup will be: http://bit.ly/gJZlhw
Here each oval represents one data center. I want to keep N=4. i.e. four
copies of every Column Family. I want one copy in each data-center.  In
other words, COMPLETE database must be contained in each of the data
centers. 

Question: 
1. Is this possible ? If so, how do I configure (partitioner, replica etc) ? 

Thanks 

AJ

P.S excuse my multiple posting of the same. I am unable to subscribe for
some reason. 
-- 
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Partitioning-tp6028132p6028132.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.


Re: online chat scenario

2011-02-15 Thread Aaron Morton
There was a by here last year who did something similar and did a nice write 
up. Cannot find it right now, some googleing  may help.

Aaron


On 16/02/2011, at 2:56 AM, Victor Kabdebon  wrote:

> Hello Sasha.
> 
> In this sort of real time application the way you insert (QUORUM, ONE, etc..) 
> and  the way you retrieve is extremely important because your data may not 
> have had the time to propagate to all your nodes. Be sure to use adequate 
> policies to do that : insert to a certain number of nodes but don't sacrifice 
> to much time doing that to keep the real time component.
> Here is a presentation of how the chat is made in Facebook, it may be useful 
> to you :
> 
> http://www.erlang-factory.com/upload/presentations/31/EugeneLetuchy-ErlangatFacebook.pdf
> 
> It's more focused on erlang, but it might give you ideas on how to deal with 
> that problem (I am not sure that DB are the best way to deal with that... but 
> it's just my opinion).
> 
> Victor Kabdebon
> http://www.voxnucleus.fr
> 
> 
> 
> 2011/2/15 Sasha Dolgy 
> thanks for the response.  thinking about this, this would not allow for the 
> sorting of messages into a chronological order for end user display.  i had 
> thought about having each message as its own column against the room or the 
> user, but i have had some inconsistencies in retrieving the data.  sometimes 
> i get 3 columns, sometimes i get 50...( i think this is because of the random 
> partitioner)
>  
> i had thought about this structure:
>  
> [messages][nickname][message id => message data]
> [chatrooms][room_name][message id]
>  
> this way i can pull all messages a user ever posted, not specific to a room.  
> what i haven't been able to do so far is print the timestamp on the row or 
> column.  does this have to be explicitly added somewhere or can it be 
> returned as part of a 'get' request? 
>  
> -sd
>  
>  
> On Tue, Feb 15, 2011 at 2:12 PM, Michal Augustýn  
> wrote:
> The schema design depends on chatrooms/users/messages numbers. I.e. you can 
> have one CF, where key is chatroom, column name is username, column value is 
> the message and message time is the same as column timestamp.
> You can add day-timestamp to the chatroom name to avoid large rows.
> 
> Augi
> 
> 2011/2/15 Andrey V. Panov 
> 
> I never did it. But I suppose you can use "chatroom name" as key and store 
> messages & nicks as columns in JSON and timestamp as columnName.
> 
> 
> 
> 
> -- 
> Sasha Dolgy
> sasha.do...@gmail.com
> 


Re: Backend application for Cassandra

2011-02-15 Thread Aaron Morton
Jaspersoft.com make reporting tools that claim no work with Cassandra. Have not 
used them myself.

It will depend on what the reports are and how big your data is, though Pig may 
be the best bet.

A

On 15/02/2011, at 8:18 PM, Michal Augustýn  wrote:

> Hi,
> 
> it depends on your queries complexity - maybe secondary indexes would be 
> sufficient for you - 
> http://www.datastax.com/dev/blog/whats-new-cassandra-07-secondary-indexes
> 
> If your queries are too complex then you could use Pig (over Hadoop) - 
> http://www.slideshare.net/jeromatron/cassandrahadoop-integration
> 
> Augi
> 
> P.S.: I'm just user as you, not Cassandra developer.
> 
> 2011/2/14 cbert...@libero.it 
> Hi all,
> I've build a web application using Cassandra.
> Data are stored in order to be quickly red/sorted due to my web-app needs.
> Everything is working quite good.
> Now the big "problem" is that the "other side" of my company needs to create
> reports over these data and the query they need to do would be very "heavy" in
> terms of client-side complexity.
> I'd like to know if you have any tips that may help ... I've red something
> about Kundera and Lucandra but I don't know these could be solutions ...
> 
> Did you already face problems like this? Could you suggest any valid
> product/solution?
> I've heard (team-mates) some tips like "export all your CF into a relational
> model and query it" ... and I behaved like i didn't hear it :)
> 
> TIA for any help
> 
> Best Regards
> 
> Carlo
> 


Benchmarking Cassandra with YCSB

2011-02-15 Thread Markus Klems
Hi there,

we are currently benchmarking a Cassandra 0.6.5 cluster with 3
High-Mem Quadruple Extra Large EC2 nodes
(http://aws.amazon.com/ec2/#instance) using Yahoo's YCSB tool
(replication factor is 3, random partitioner). We assigned 32 GB RAM
to the JVM and left 32 GB RAM for the Ubuntu Linux filesystem buffer.
We also set the user count to a very large number via ulimit -u
99.

Our goal is to achieve max throughput by increasing YCSB's threadcount
parameter (i.e. the number of parallel benchmarking client threads).
However, this does only improve Cassandra throughput for low numbers
of threads. If we move to higher threadcounts, throughput does not
increase and even  decreases. Do you have any idea why this is
happening and possibly suggestions how to scale throughput to much
higher numbers? Why is throughput hitting a wall, anyways? And where
does the latency/throughput tradeoff come from?

Here is our YCSB configuration:
recordcount=30
operationcount=100
workload=com.yahoo.ycsb.workloads.CoreWorkload
readallfields=true
readproportion=0.5
updateproportion=0.5
scanproportion=0
insertproportion=0
threadcount= 500
target = 1
hosts=EC2-1,EC2-2,EC2-3
requestdistribution=uniform

These are typical results for threadcount=1:
Loading workload...
Starting test.
 0 sec: 0 operations;
 10 sec: 11733 operations; 1168.28 current ops/sec; [UPDATE
AverageLatency(ms)=0.64] [READ AverageLatency(ms)=1.03]
 20 sec: 24246 operations; 1251.68 current ops/sec; [UPDATE
AverageLatency(ms)=0.48] [READ AverageLatency(ms)=1.11]

These are typical results for threadcount=10:
10 sec: 30428 operations; 3029.77 current ops/sec; [UPDATE
AverageLatency(ms)=2.11] [READ AverageLatency(ms)=4.32]
 20 sec: 60838 operations; 3041.91 current ops/sec; [UPDATE
AverageLatency(ms)=2.15] [READ AverageLatency(ms)=4.37]

These are typical results for threadcount=100:
10 sec: 29070 operations; 2895.42 current ops/sec; [UPDATE
AverageLatency(ms)=20.53] [READ AverageLatency(ms)=44.91]
 20 sec: 53621 operations; 2455.84 current ops/sec; [UPDATE
AverageLatency(ms)=23.11] [READ AverageLatency(ms)=55.39]

These are typical results for threadcount=500:
10 sec: 30655 operations; 3053.59 current ops/sec; [UPDATE
AverageLatency(ms)=72.71] [READ AverageLatency(ms)=187.19]
 20 sec: 68846 operations; 3814.14 current ops/sec; [UPDATE
AverageLatency(ms)=65.36] [READ AverageLatency(ms)=191.75]

We never measured more than ~6000 ops/sec. Are there ways to tune
Cassandra that we are not aware of? We made some modification to the
Cassandra 0.6.5 core for experimental reasons, so it's not easy to
switch to 0.7x or 0.8x. However, if this might solve the scaling
issues, we might consider to port our modifications to a newer
Cassandra version...

Thanks,

Markus Klems

Karlsruhe Institute of Technology, Germany


What is the most solid version of Cassandra? No secondary indexes needed.

2011-02-15 Thread buddhasystem

Hello,

we are acquiring new hardware for our cluster and will be installing it
soon. It's likely that I won't need to rely on secondary index
functionality, as data will be write-once read-many and I can get away with
inverse index creation at load time, plus I have some more complex indexing
in mind than comes packaged (too much to explain here).

So, if I don't need indexes, what is the most stable, reliable version of
Cassandra that I can put in production? I'm seeing bug reports here and some
sound quite serious, I just want something that works day in, day out.

Thank you,

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-is-the-most-solid-version-of-Cassandra-No-secondary-indexes-needed-tp6028966p6028966.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Request For 0.6.12 Release

2011-02-15 Thread Aaron Morton
I worked on that ticket, will try to chase it up.


Aaron


On 15/02/2011, at 2:01 PM, Gregory Szorc  wrote:

> The latest official 0.6.x releases, 0.6.10 and 0.6.11, have a very serious 
> bug/regression when performing some quorum reads (CASSANDRA-2081), which is 
> fixed in the head of the 0.6 branch. If there aren’t any plans to cut 0.6.12 
> any time soon, as an end user, I request that an official and “blessed” 
> release of 0.6.x be made ASAP.
> 
>  
> 
> On a related note, I am frustrated that such a serious issue has lingered in 
> the “latest oldstable release.” I would have liked to see one or more of the 
> following:
> 
>  
> 
> 1)  The issue documented prominently on the apache.org web site and 
> inside the download archive so end users would know they are downloading and 
> running known-broken software
> 
> 2)  The 0.6.10 and 0.6.11 builds pulled after identification of the issue
> 
> 3)  A 0.6.12 release cut immediately (with reasonable time for testing, 
> of course) to address the issue
> 
>  
> 
> I understand that releases may not always be as stable as we all desire. But, 
> I hope that when future bugs affecting the bread and butter properties of a 
> distributed storage engine surface (especially when they are regressions) 
> that the official project response (preferably via mailing list and the web 
> site) is swift and maximizes the potential for data integrity and 
> availability.
> 
>  
> 
> If there is anything I can do to help the process, I’d gladly give some of my 
> time to help the overall community.
> 
>  
> 
> Gregory Szorc
> 
> gregory.sz...@gmail.com
> 
>  


Another EOFException

2011-02-15 Thread B. Todd Burruss
the following exception seems to be about loading saved caches, but i 
don't really care about the cache so maybe isn't a big deal.  anyway, 
this is with patched 0.7.1 
(0001-Fix-bad-signed-conversion-from-byte-to-int.patch)



WARN 11:07:59,800 error reading saved cache 
/data/cassandra-data/saved_caches/UDS4Profile-Profiles_40229-KeyCache

java.io.EOFException
at 
java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2281)
at 
java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2750)
at 
java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:780)

at java.io.ObjectInputStream.(ObjectInputStream.java:280)
at 
org.apache.cassandra.db.ColumnFamilyStore.readSavedCache(ColumnFamilyStore.java:255)
at 
org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:198)
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:451)
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:432)

at org.apache.cassandra.db.Table.initCf(Table.java:360)
at org.apache.cassandra.db.Table.(Table.java:290)
at org.apache.cassandra.db.Table.open(Table.java:107)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:162)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:316)
at 
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:79)




RE: Possible EOFException regression in 0.7.1

2011-02-15 Thread Dan Hendry
I have been having plenty of problems (on 0.7.0,
http://www.mail-archive.com/user@cassandra.apache.org/msg09341.html,
http://www.mail-archive.com/user@cassandra.apache.org/msg09230.html,
http://www.mail-archive.com/user@cassandra.apache.org/msg09122.html,
http://www.mail-archive.com/dev@cassandra.apache.org/msg01746.html, and from
others:
http://www.mail-archive.com/user@cassandra.apache.org/msg09838.html,) which
are very similar to what was reported and apparently fixed for this case. In
my instance, I have not been able to find a reproducible case but its not
all that feasible to log what is going into my nodes. Could this bug have
existed in 0.7.0 in another form or could this problem occur elsewhere in
the code?

 

Dan

 

From: Sylvain Lebresne [mailto:sylv...@datastax.com] 
Sent: February-15-11 13:15
To: user@cassandra.apache.org
Subject: Re: Possible EOFException regression in 0.7.1

 

On Tue, Feb 15, 2011 at 7:10 PM, ruslan usifov 
wrote:

It will be great if patch appear very quick

 

patch attached here: https://issues.apache.org/jira/browse/CASSANDRA-2165

 

Hoping this is quick enough.

 

 

2011/2/15 Jonathan Ellis 

 

I can reproduce with your script.  Thanks!

2011/2/15 Jonas Borgström :

> Hi all,
>
> While testing the new 0.7.1 release I got the following exception:
>
> ERROR [ReadStage:11] 2011-02-15 16:39:18,105
> DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
> java.io.IOError: java.io.EOFException
>at
>
org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNa
mesIterator.java:75)
>at
>
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(Nam
esQueryFilter.java:59)
>at
>
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFil
ter.java:80)
>at
>
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilySto
re.java:1274)
>at
>
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.
java:1166)
>at
>
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.
java:1095)
>at org.apache.cassandra.db.Table.getRow(Table.java:384)
>at
>
org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadComma
nd.java:60)
>at
>
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(Stor
ageProxy.java:473)
>at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>at
>
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
10)
>at
>
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
03)
>at java.lang.Thread.run(Thread.java:636)
> Caused by: java.io.EOFException
>at java.io.DataInputStream.readInt(DataInputStream.java:392)
>at
>
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSeri
alizer.java:48)
>at
>
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSeri
alizer.java:30)
>at
>
org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.
java:108)
>at
>
org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableName
sIterator.java:106)
>at
>
org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNa
mesIterator.java:71)
>... 12 more
>
> I'm able reliably reproduce this using the following one node cluster:
> - apache-cassandra-0.7.1-bin.tar.gz
> - Fedora 14
> - java version "1.6.0_20".
>  OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)
> - Default cassandra.yaml
> - cassandra-env.sh: MAX_HEAP_SIZE="1G"; HEAP_NEWSIZE="200M"
>
> cassandra-cli initialization:
> - create keyspace foo;
> - use foo;
> - create column family datasets;
>
> $ python dataset_check.py (attached)
> Inserting row 0 of 10
> Inserting row 1 of 10
> Inserting row 2 of 10
> Inserting row 3 of 10
> Inserting row 4 of 10
> Inserting row 5 of 10
> Inserting row 6 of 10
> Inserting row 7 of 10
> Inserting row 8 of 10
> Inserting row 9 of 10
> Attempting to fetch key 0
> Traceback (most recent call last):
> ...
> pycassa.pool.MaximumRetryException: Retried 6 times
>
> After this I have 6 EOFExceptions in system.log.
> Running "get datasets[0]['name'];" using cassandra-cli also triggers the
> same exception.
> I've not been able to reproduce this with cassandra 0.7.0.
>
> Regards,
> Jonas
>
>
>




--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

 

 

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.872 / Virus Database: 271.1.1/3443 - Release Date: 02/15/11
02:34:00



Re: What if write consistency level cannot me met ?

2011-02-15 Thread Thibaut Britz
Your write will fail. But if the write has reached  at least one node,
it will eventually reach all the other nodes as well. So it won't
rollback.


On Tue, Feb 15, 2011 at 7:38 PM, A J  wrote:
> Say I set write consistency level to ALL and all but one node are down. What
> happens to writes ? Does it rollback from the live node before returning
> failure to client ?
> Thanks.


What if write consistency level cannot me met ?

2011-02-15 Thread A J
Say I set write consistency level to ALL and all but one node are down. What
happens to writes ? Does it rollback from the live node before returning
failure to client ?

Thanks.


Re: Possible EOFException regression in 0.7.1

2011-02-15 Thread ruslan usifov
2011/2/15 Sylvain Lebresne 

> On Tue, Feb 15, 2011 at 7:10 PM, ruslan usifov wrote:
>
>> It will be great if patch appear very quick
>>
>
> patch attached here: https://issues.apache.org/jira/browse/CASSANDRA-2165
>
>
>
Does this patch appear in binary release, or it's better for now stay on
version 0.7.0?


Re: Possible EOFException regression in 0.7.1

2011-02-15 Thread Sylvain Lebresne
On Tue, Feb 15, 2011 at 7:10 PM, ruslan usifov wrote:

> It will be great if patch appear very quick
>

patch attached here: https://issues.apache.org/jira/browse/CASSANDRA-2165

Hoping this is quick enough.


>
> 2011/2/15 Jonathan Ellis 
>
> I can reproduce with your script.  Thanks!
>>
>> 2011/2/15 Jonas Borgström :
>> > Hi all,
>> >
>> > While testing the new 0.7.1 release I got the following exception:
>> >
>> > ERROR [ReadStage:11] 2011-02-15 16:39:18,105
>> > DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
>> > java.io.IOError: java.io.EOFException
>> >at
>> >
>> org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNamesIterator.java:75)
>> >at
>> >
>> org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
>> >at
>> >
>> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
>> >at
>> >
>> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1274)
>> >at
>> >
>> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1166)
>> >at
>> >
>> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1095)
>> >at org.apache.cassandra.db.Table.getRow(Table.java:384)
>> >at
>> >
>> org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
>> >at
>> >
>> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:473)
>> >at
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>> >at
>> >
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>> >at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>> >at java.lang.Thread.run(Thread.java:636)
>> > Caused by: java.io.EOFException
>> >at java.io.DataInputStream.readInt(DataInputStream.java:392)
>> >at
>> >
>> org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:48)
>> >at
>> >
>> org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:30)
>> >at
>> >
>> org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:108)
>> >at
>> >
>> org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:106)
>> >at
>> >
>> org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNamesIterator.java:71)
>> >... 12 more
>> >
>> > I'm able reliably reproduce this using the following one node cluster:
>> > - apache-cassandra-0.7.1-bin.tar.gz
>> > - Fedora 14
>> > - java version "1.6.0_20".
>> >  OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)
>> > - Default cassandra.yaml
>> > - cassandra-env.sh: MAX_HEAP_SIZE="1G"; HEAP_NEWSIZE="200M"
>> >
>> > cassandra-cli initialization:
>> > - create keyspace foo;
>> > - use foo;
>> > - create column family datasets;
>> >
>> > $ python dataset_check.py (attached)
>> > Inserting row 0 of 10
>> > Inserting row 1 of 10
>> > Inserting row 2 of 10
>> > Inserting row 3 of 10
>> > Inserting row 4 of 10
>> > Inserting row 5 of 10
>> > Inserting row 6 of 10
>> > Inserting row 7 of 10
>> > Inserting row 8 of 10
>> > Inserting row 9 of 10
>> > Attempting to fetch key 0
>> > Traceback (most recent call last):
>> > ...
>> > pycassa.pool.MaximumRetryException: Retried 6 times
>> >
>> > After this I have 6 EOFExceptions in system.log.
>> > Running "get datasets[0]['name'];" using cassandra-cli also triggers the
>> > same exception.
>> > I've not been able to reproduce this with cassandra 0.7.0.
>> >
>> > Regards,
>> > Jonas
>> >
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>
>
>


Re: Possible EOFException regression in 0.7.1

2011-02-15 Thread ruslan usifov
It will be great if patch appear very quick

2011/2/15 Jonathan Ellis 

> I can reproduce with your script.  Thanks!
>
> 2011/2/15 Jonas Borgström :
> > Hi all,
> >
> > While testing the new 0.7.1 release I got the following exception:
> >
> > ERROR [ReadStage:11] 2011-02-15 16:39:18,105
> > DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
> > java.io.IOError: java.io.EOFException
> >at
> >
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNamesIterator.java:75)
> >at
> >
> org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
> >at
> >
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
> >at
> >
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1274)
> >at
> >
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1166)
> >at
> >
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1095)
> >at org.apache.cassandra.db.Table.getRow(Table.java:384)
> >at
> >
> org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
> >at
> >
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:473)
> >at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> >at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >at java.lang.Thread.run(Thread.java:636)
> > Caused by: java.io.EOFException
> >at java.io.DataInputStream.readInt(DataInputStream.java:392)
> >at
> >
> org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:48)
> >at
> >
> org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:30)
> >at
> >
> org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:108)
> >at
> >
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:106)
> >at
> >
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNamesIterator.java:71)
> >... 12 more
> >
> > I'm able reliably reproduce this using the following one node cluster:
> > - apache-cassandra-0.7.1-bin.tar.gz
> > - Fedora 14
> > - java version "1.6.0_20".
> >  OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)
> > - Default cassandra.yaml
> > - cassandra-env.sh: MAX_HEAP_SIZE="1G"; HEAP_NEWSIZE="200M"
> >
> > cassandra-cli initialization:
> > - create keyspace foo;
> > - use foo;
> > - create column family datasets;
> >
> > $ python dataset_check.py (attached)
> > Inserting row 0 of 10
> > Inserting row 1 of 10
> > Inserting row 2 of 10
> > Inserting row 3 of 10
> > Inserting row 4 of 10
> > Inserting row 5 of 10
> > Inserting row 6 of 10
> > Inserting row 7 of 10
> > Inserting row 8 of 10
> > Inserting row 9 of 10
> > Attempting to fetch key 0
> > Traceback (most recent call last):
> > ...
> > pycassa.pool.MaximumRetryException: Retried 6 times
> >
> > After this I have 6 EOFExceptions in system.log.
> > Running "get datasets[0]['name'];" using cassandra-cli also triggers the
> > same exception.
> > I've not been able to reproduce this with cassandra 0.7.0.
> >
> > Regards,
> > Jonas
> >
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


Re: consistency question

2011-02-15 Thread Edward Capriolo
On Tue, Feb 15, 2011 at 3:59 AM, Serdar Irmak  wrote:

> Hi,
>
>
>
> In a 3 node named (named A,B,C) setup with replication factor 3 and quorum
> read/write scenario;
>
> suppose a new value of data X is written to A and B but not C with any
> reason, then A wend down and I fired D with the data of C or with an empty
> data where in a case is X is not present in D.
>
> Then when I read quorum, nodes C and D responded and gave me the old value
> (then read repair in background). So doesn’t it mean there is no
> constistency with quorum, too ?
>
>
>
>
>
> My best
>
> Serdar
>
>
>

The consistency rules do NOT apply if you introduce a new node without
properly bootstrapping it. If you have A,B,C  and A fails you should 1)
'nodetool removetoken A'. 2) Start node D with auto_boostrap=true.

You can start a node empty (with bootstrap=false) using quorum/quorum, but
if you do not 'nodetool repair' it before another node fails you end up with
the situation you described.

Edward


Re: Binary object storage in Cassandra

2011-02-15 Thread Tyler Hobbs
http://wiki.apache.org/cassandra/FAQ#large_file_and_blob_storage

Retrieval should be the same as the examples in the pycassa
tutorial
.

-- 
Tyler Hobbs
Software Engineer, DataStax 
Maintainer of the pycassa  Cassandra
Python client library


Binary object storage in Cassandra

2011-02-15 Thread A J
Hello
Is it possible to store binary objects (images, pdfs, videos etc) in
Cassandra. The size of my images are less than 100MB.

If so, how do I try inserting and retrieving a few files from cassandra ?
Would prefer if someone can give examples using pycassa.

Thanks !
AJ


Re: Possible EOFException regression in 0.7.1

2011-02-15 Thread Jake Luciani
Have you made any changes to the cassandra config?

2011/2/15 Jonas Borgström 

> Hi all,
>
> While testing the new 0.7.1 release I got the following exception:
>
> ERROR [ReadStage:11] 2011-02-15 16:39:18,105
> DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
> java.io.IOError: java.io.EOFException
>at
>
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNamesIterator.java:75)
>at
>
> org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
>at
>
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
>at
>
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1274)
>at
>
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1166)
>at
>
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1095)
>at org.apache.cassandra.db.Table.getRow(Table.java:384)
>at
>
> org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
>at
>
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:473)
>at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>at java.lang.Thread.run(Thread.java:636)
> Caused by: java.io.EOFException
>at java.io.DataInputStream.readInt(DataInputStream.java:392)
>at
>
> org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:48)
>at
>
> org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:30)
>at
>
> org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:108)
>at
>
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:106)
>at
>
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNamesIterator.java:71)
>... 12 more
>
> I'm able reliably reproduce this using the following one node cluster:
> - apache-cassandra-0.7.1-bin.tar.gz
> - Fedora 14
> - java version "1.6.0_20".
>  OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)
> - Default cassandra.yaml
> - cassandra-env.sh: MAX_HEAP_SIZE="1G"; HEAP_NEWSIZE="200M"
>
> cassandra-cli initialization:
> - create keyspace foo;
> - use foo;
> - create column family datasets;
>
> $ python dataset_check.py (attached)
> Inserting row 0 of 10
> Inserting row 1 of 10
> Inserting row 2 of 10
> Inserting row 3 of 10
> Inserting row 4 of 10
> Inserting row 5 of 10
> Inserting row 6 of 10
> Inserting row 7 of 10
> Inserting row 8 of 10
> Inserting row 9 of 10
> Attempting to fetch key 0
> Traceback (most recent call last):
> ...
> pycassa.pool.MaximumRetryException: Retried 6 times
>
> After this I have 6 EOFExceptions in system.log.
> Running "get datasets[0]['name'];" using cassandra-cli also triggers the
> same exception.
> I've not been able to reproduce this with cassandra 0.7.0.
>
> Regards,
> Jonas
>
>
>


Re: Possible EOFException regression in 0.7.1

2011-02-15 Thread Jonathan Ellis
I can reproduce with your script.  Thanks!

2011/2/15 Jonas Borgström :
> Hi all,
>
> While testing the new 0.7.1 release I got the following exception:
>
> ERROR [ReadStage:11] 2011-02-15 16:39:18,105
> DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
> java.io.IOError: java.io.EOFException
>        at
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNamesIterator.java:75)
>        at
> org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
>        at
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
>        at
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1274)
>        at
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1166)
>        at
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1095)
>        at org.apache.cassandra.db.Table.getRow(Table.java:384)
>        at
> org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
>        at
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:473)
>        at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>        at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>        at java.lang.Thread.run(Thread.java:636)
> Caused by: java.io.EOFException
>        at java.io.DataInputStream.readInt(DataInputStream.java:392)
>        at
> org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:48)
>        at
> org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:30)
>        at
> org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:108)
>        at
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:106)
>        at
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNamesIterator.java:71)
>        ... 12 more
>
> I'm able reliably reproduce this using the following one node cluster:
> - apache-cassandra-0.7.1-bin.tar.gz
> - Fedora 14
> - java version "1.6.0_20".
>  OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)
> - Default cassandra.yaml
> - cassandra-env.sh: MAX_HEAP_SIZE="1G"; HEAP_NEWSIZE="200M"
>
> cassandra-cli initialization:
> - create keyspace foo;
> - use foo;
> - create column family datasets;
>
> $ python dataset_check.py (attached)
> Inserting row 0 of 10
> Inserting row 1 of 10
> Inserting row 2 of 10
> Inserting row 3 of 10
> Inserting row 4 of 10
> Inserting row 5 of 10
> Inserting row 6 of 10
> Inserting row 7 of 10
> Inserting row 8 of 10
> Inserting row 9 of 10
> Attempting to fetch key 0
> Traceback (most recent call last):
> ...
> pycassa.pool.MaximumRetryException: Retried 6 times
>
> After this I have 6 EOFExceptions in system.log.
> Running "get datasets[0]['name'];" using cassandra-cli also triggers the
> same exception.
> I've not been able to reproduce this with cassandra 0.7.0.
>
> Regards,
> Jonas
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: How to rename column family?

2011-02-15 Thread Jonathan Ellis
Renames are not yet supported (see
https://issues.apache.org/jira/browse/CASSANDRA-1585)

On Tue, Feb 15, 2011 at 7:45 AM, Michal Augustýn
 wrote:
> Hello,
> I would like to rename some column families but I discovered that the
> system_rename_column_family disappeared in 0.7. How to rename the column
> family now? I tried system_update_column_family method but it doesn't work
> for renaming :(
> Thank you!



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: NFS instead of local storage

2011-02-15 Thread Jonathan Ellis
Some good discussion here:
http://www.mail-archive.com/user@cassandra.apache.org/msg09020.html

On Sun, Feb 13, 2011 at 5:25 PM, mcasandra  wrote:
>
> I just now watched some videos about performance tunning. And it looks like
> most of the bottleneck could be on reads. Also, it looks like it's advisable
> to put commit logs on separate drive.
>
> I was wondering if it makes sense to use NFS (if we can) with netapp array
> which provides it's own read and write caching.
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/NFS-instead-of-local-storage-tp6021959p6021959.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Partitioning

2011-02-15 Thread RW>N

Hi, 
I am new to Cassandra and am evaluating it. 

Following diagram is how my setup will be: http://bit.ly/gJZlhw
Here each oval represents one data center. I want to keep N=4. i.e. four
copies of every Column Family. I want one copy in each data-center.  In
other words, COMPLETE database must be contained in each of the data
centers. 

Question: 
1. Is this possible ? If so, how do I configure (partitioner, replica etc) ? 

Thanks 

AJ

P.S excuse my multiple posting of the same. I am unable to subscribe for
some reason. 
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Partitioning-tp6028132p6028132.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Subscribe

2011-02-15 Thread A J



Possible EOFException regression in 0.7.1

2011-02-15 Thread Jonas Borgström
Hi all,

While testing the new 0.7.1 release I got the following exception:

ERROR [ReadStage:11] 2011-02-15 16:39:18,105
DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
java.io.IOError: java.io.EOFException
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNamesIterator.java:75)
at
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:59)
at
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:80)
at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1274)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1166)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1095)
at org.apache.cassandra.db.Table.getRow(Table.java:384)
at
org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadCommand.java:60)
at
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:473)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:48)
at
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:30)
at
org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:108)
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:106)
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNamesIterator.java:71)
... 12 more

I'm able reliably reproduce this using the following one node cluster:
- apache-cassandra-0.7.1-bin.tar.gz
- Fedora 14
- java version "1.6.0_20".
  OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)
- Default cassandra.yaml
- cassandra-env.sh: MAX_HEAP_SIZE="1G"; HEAP_NEWSIZE="200M"

cassandra-cli initialization:
- create keyspace foo;
- use foo;
- create column family datasets;

$ python dataset_check.py (attached)
Inserting row 0 of 10
Inserting row 1 of 10
Inserting row 2 of 10
Inserting row 3 of 10
Inserting row 4 of 10
Inserting row 5 of 10
Inserting row 6 of 10
Inserting row 7 of 10
Inserting row 8 of 10
Inserting row 9 of 10
Attempting to fetch key 0
Traceback (most recent call last):
...
pycassa.pool.MaximumRetryException: Retried 6 times

After this I have 6 EOFExceptions in system.log.
Running "get datasets[0]['name'];" using cassandra-cli also triggers the
same exception.
I've not been able to reproduce this with cassandra 0.7.0.

Regards,
Jonas


import pycassa

pool = pycassa.ConnectionPool('foo', ['localhost:9160'], timeout=10)
cf = pycassa.ColumnFamily(pool, 'datasets')


def insert_dataset(key, num_cols=5):
columns = {}
extra_data = 'XXX' * 20
for i in range(num_cols):
col = 'r%08d' % i
columns[col] = '%s:%s:%s' % (key, col, extra_data)
if len(columns) >= 3000:
cf.insert(key, columns)
columns = {}
if len(columns) >= 3000:
cf.insert(key, columns)
columns = {}
cf.insert(key, {'name': 'key:%s' % key})


def test_insert_and_column_fetch(num=20):
# Insert @num fairly large rows
for i in range(num):
print 'Inserting row %d of %d' % (i, num)
insert_dataset(str(i))
# Verify that the "name" column is correctly stored
for i in range(num):
print 'Attempting to fetch key %d' % i
row = cf.get(str(i), columns=['name'])
assert row['name'] == 'key:%d' % i
for i, (key, row) in enumerate(cf.get_range(columns=['name'])):
print '%d: get_range returned: key %s, name: "%s"' % (i, key, 
row['name'])
assert row['name'] == 'key:' + key


test_insert_and_column_fetch(10)



Re: cant seem to figure out secondary index definition

2011-02-15 Thread Michal Augustýn
Ah, ok. I checked that in source and the problem is that you wrote
"validation_class" but you should "validator_class".

Augi

2011/2/15 Roland Gude 

> Yeah i know about that, but the definition i have is for a cluster that is
> started/stopped from a unit test with hector embeddedServerHelper, which
> takes definitions from the yaml.
>
> So i’d still like to define the index in the yaml file (it should very well
> be possible I guess)
>
>
>
>
>
> *Von:* Michal Augustýn [mailto:augustyn.mic...@gmail.com]
> *Gesendet:* Dienstag, 15. Februar 2011 15:53
> *An:* user@cassandra.apache.org
> *Betreff:* Re: cant seem to figure out secondary index definition
>
>
>
> Hi,
>
>
>
> if you download Cassandra and look into "conf/cassandra.yaml" then you can
> see this:
>
>
>
> "this keyspace definition is for demonstration purposes only. Cassandra
> will not load these definitions during startup. See
> http://wiki.apache.org/cassandra/FAQ#no_keyspaces for an explanation."
>
>
>
> So you should make all schema-related operation via Thrift/AVRO API, or you
> can use Cassandra CLI.
>
>
>
> Augi
>
>
>
> 2011/2/15 Roland Gude 
>
> Hi,
>
>
>
> i am a little puzzled on creation of secondary indexes and the docs in that
> area are still very sparse.
>
> What I am trying to do is – in a columnfamily with TimeUUID comparator, I
> want the “special” timeuuid --1000-- to be
> indexed. The value being some UTF8 string on which I want to perform
> equality checks.
>
>
>
> What do I need to put in my cassandra.yaml file?
>
> Something like this?
>
>
>
>   - column_metadata: [{name: --1000--,
> validation_class: UTF8Type, index_name: MyIndex, index_type: KEYS}]
>
>
>
> This gives me that error:
>
>
>
> 15:05:12.492 [pool-1-thread-1] ERROR o.a.c.config.DatabaseDescriptor -
> Fatal error: null; Can't construct a java object for 
> tag:yaml.org,2002:org.apache.cassandra.config.Config;
> exception=Cannot create property=keyspaces for
> JavaBean=org.apache.cassandra.config.Config@7eb6e2; Cannot create
> property=column_families for
> JavaBean=org.apache.cassandra.config.RawKeyspace@987a33; Cannot create
> property=column_metadata for
> JavaBean=org.apache.cassandra.config.RawColumnFamily@716cb7; Cannot create
> property=validation_class for
> JavaBean=org.apache.cassandra.config.RawColumnDefinition@e29820; Unable to
> find property 'validation_class' on class:
> org.apache.cassandra.config.RawColumnDefinition
>
> Bad configuration; unable to start server
>
>
>
>
>
> I am furthermor uncertain if the column name will be correctly used if
> given like this. Should I put the byte representation of the uuid there?
>
>
>
> Greetings,
>
> roland
>
> --
>
> YOOCHOOSE GmbH
>
>
>
> Roland Gude
>
> Software Engineer
>
>
>
> Im Mediapark 8, 50670 Köln
>
>
>
> +49 221 4544151 (Tel)
>
> +49 221 4544159 (Fax)
>
> +49 171 7894057 (Mobil)
>
>
>
>
>
> Email: roland.g...@yoochoose.com
>
> WWW: www.yoochoose.com
>
>
>
> YOOCHOOSE GmbH
>
> Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann
>
> Handelsregister: Amtsgericht Köln HRB 65275
>
> Ust-Ident-Nr: DE 264 773 520
>
> Sitz der Gesellschaft: Köln
>
>
>
>
>


AW: cant seem to figure out secondary index definition

2011-02-15 Thread Roland Gude
Yeah i know about that, but the definition i have is for a cluster that is 
started/stopped from a unit test with hector embeddedServerHelper, which takes 
definitions from the yaml.
So i'd still like to define the index in the yaml file (it should very well be 
possible I guess)


Von: Michal Augustýn [mailto:augustyn.mic...@gmail.com]
Gesendet: Dienstag, 15. Februar 2011 15:53
An: user@cassandra.apache.org
Betreff: Re: cant seem to figure out secondary index definition

Hi,

if you download Cassandra and look into "conf/cassandra.yaml" then you can see 
this:

"this keyspace definition is for demonstration purposes only. Cassandra will 
not load these definitions during startup. See 
http://wiki.apache.org/cassandra/FAQ#no_keyspaces for an explanation."

So you should make all schema-related operation via Thrift/AVRO API, or you can 
use Cassandra CLI.

Augi

2011/2/15 Roland Gude 
mailto:roland.g...@yoochoose.com>>
Hi,

i am a little puzzled on creation of secondary indexes and the docs in that 
area are still very sparse.
What I am trying to do is - in a columnfamily with TimeUUID comparator, I want 
the "special" timeuuid --1000-- to be indexed. The 
value being some UTF8 string on which I want to perform equality checks.

What do I need to put in my cassandra.yaml file?
Something like this?

  - column_metadata: [{name: --1000--, 
validation_class: UTF8Type, index_name: MyIndex, index_type: KEYS}]

This gives me that error:

15:05:12.492 [pool-1-thread-1] ERROR o.a.c.config.DatabaseDescriptor - Fatal 
error: null; Can't construct a java object for 
tag:yaml.org,2002:org.apache.cassandra.config.Config; 
exception=Cannot create property=keyspaces for 
JavaBean=org.apache.cassandra.config.Config@7eb6e2; Cannot create 
property=column_families for 
JavaBean=org.apache.cassandra.config.RawKeyspace@987a33; Cannot create 
property=column_metadata for 
JavaBean=org.apache.cassandra.config.RawColumnFamily@716cb7; Cannot create 
property=validation_class for 
JavaBean=org.apache.cassandra.config.RawColumnDefinition@e29820; Unable to find 
property 'validation_class' on class: 
org.apache.cassandra.config.RawColumnDefinition
Bad configuration; unable to start server


I am furthermor uncertain if the column name will be correctly used if given 
like this. Should I put the byte representation of the uuid there?

Greetings,
roland
--
YOOCHOOSE GmbH

Roland Gude
Software Engineer

Im Mediapark 8, 50670 Köln

+49 221 4544151 (Tel)
+49 221 4544159 (Fax)
+49 171 7894057 (Mobil)


Email: roland.g...@yoochoose.com
WWW: www.yoochoose.com

YOOCHOOSE GmbH
Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann
Handelsregister: Amtsgericht Köln HRB 65275
Ust-Ident-Nr: DE 264 773 520
Sitz der Gesellschaft: Köln




Re: cant seem to figure out secondary index definition

2011-02-15 Thread Michal Augustýn
Hi,

if you download Cassandra and look into "conf/cassandra.yaml" then you can
see this:

"this keyspace definition is for demonstration purposes only. Cassandra will
not load these definitions during startup. See
http://wiki.apache.org/cassandra/FAQ#no_keyspaces for an explanation."

So you should make all schema-related operation via Thrift/AVRO API, or you
can use Cassandra CLI.

Augi


2011/2/15 Roland Gude 

> Hi,
>
>
>
> i am a little puzzled on creation of secondary indexes and the docs in that
> area are still very sparse.
>
> What I am trying to do is – in a columnfamily with TimeUUID comparator, I
> want the “special” timeuuid --1000-- to be
> indexed. The value being some UTF8 string on which I want to perform
> equality checks.
>
>
>
> What do I need to put in my cassandra.yaml file?
>
> Something like this?
>
>
>
>   - column_metadata: [{name: --1000--,
> validation_class: UTF8Type, index_name: MyIndex, index_type: KEYS}]
>
>
>
> This gives me that error:
>
>
>
> 15:05:12.492 [pool-1-thread-1] ERROR o.a.c.config.DatabaseDescriptor -
> Fatal error: null; Can't construct a java object for 
> tag:yaml.org,2002:org.apache.cassandra.config.Config;
> exception=Cannot create property=keyspaces for
> JavaBean=org.apache.cassandra.config.Config@7eb6e2; Cannot create
> property=column_families for
> JavaBean=org.apache.cassandra.config.RawKeyspace@987a33; Cannot create
> property=column_metadata for
> JavaBean=org.apache.cassandra.config.RawColumnFamily@716cb7; Cannot create
> property=validation_class for
> JavaBean=org.apache.cassandra.config.RawColumnDefinition@e29820; Unable to
> find property 'validation_class' on class:
> org.apache.cassandra.config.RawColumnDefinition
>
> Bad configuration; unable to start server
>
>
>
>
>
> I am furthermor uncertain if the column name will be correctly used if
> given like this. Should I put the byte representation of the uuid there?
>
>
>
> Greetings,
>
> roland
>
> --
>
> YOOCHOOSE GmbH
>
>
>
> Roland Gude
>
> Software Engineer
>
>
>
> Im Mediapark 8, 50670 Köln
>
>
>
> +49 221 4544151 (Tel)
>
> +49 221 4544159 (Fax)
>
> +49 171 7894057 (Mobil)
>
>
>
>
>
> Email: roland.g...@yoochoose.com
>
> WWW: www.yoochoose.com
>
>
>
> YOOCHOOSE GmbH
>
> Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann
>
> Handelsregister: Amtsgericht Köln HRB 65275
>
> Ust-Ident-Nr: DE 264 773 520
>
> Sitz der Gesellschaft: Köln
>
>
>


cant seem to figure out secondary index definition

2011-02-15 Thread Roland Gude
Hi,

i am a little puzzled on creation of secondary indexes and the docs in that 
area are still very sparse.
What I am trying to do is - in a columnfamily with TimeUUID comparator, I want 
the "special" timeuuid --1000-- to be indexed. The 
value being some UTF8 string on which I want to perform equality checks.

What do I need to put in my cassandra.yaml file?
Something like this?

  - column_metadata: [{name: --1000--, 
validation_class: UTF8Type, index_name: MyIndex, index_type: KEYS}]

This gives me that error:

15:05:12.492 [pool-1-thread-1] ERROR o.a.c.config.DatabaseDescriptor - Fatal 
error: null; Can't construct a java object for 
tag:yaml.org,2002:org.apache.cassandra.config.Config; exception=Cannot create 
property=keyspaces for JavaBean=org.apache.cassandra.config.Config@7eb6e2; 
Cannot create property=column_families for 
JavaBean=org.apache.cassandra.config.RawKeyspace@987a33; Cannot create 
property=column_metadata for 
JavaBean=org.apache.cassandra.config.RawColumnFamily@716cb7; Cannot create 
property=validation_class for 
JavaBean=org.apache.cassandra.config.RawColumnDefinition@e29820; Unable to find 
property 'validation_class' on class: 
org.apache.cassandra.config.RawColumnDefinition
Bad configuration; unable to start server


I am furthermor uncertain if the column name will be correctly used if given 
like this. Should I put the byte representation of the uuid there?

Greetings,
roland
--
YOOCHOOSE GmbH

Roland Gude
Software Engineer

Im Mediapark 8, 50670 Köln

+49 221 4544151 (Tel)
+49 221 4544159 (Fax)
+49 171 7894057 (Mobil)


Email: roland.g...@yoochoose.com
WWW: www.yoochoose.com

YOOCHOOSE GmbH
Geschäftsführer: Dr. Uwe Alkemper, Michael Friedmann
Handelsregister: Amtsgericht Köln HRB 65275
Ust-Ident-Nr: DE 264 773 520
Sitz der Gesellschaft: Köln



Re: online chat scenario

2011-02-15 Thread Victor Kabdebon
Hello Sasha.

In this sort of real time application the way you insert (QUORUM, ONE,
etc..) and  the way you retrieve is extremely important because your data
may not have had the time to propagate to all your nodes. Be sure to use
adequate policies to do that : insert to a certain number of nodes but don't
sacrifice to much time doing that to keep the real time component.
Here is a presentation of how the chat is made in Facebook, it may be useful
to you :

http://www.erlang-factory.com/upload/presentations/31/EugeneLetuchy-ErlangatFacebook.pdf

It's more focused on erlang, but it might give you ideas on how to deal with
that problem (I am not sure that DB are the best way to deal with that...
but it's just my opinion).

Victor Kabdebon
http://www.voxnucleus.fr



2011/2/15 Sasha Dolgy 

> thanks for the response.  thinking about this, this would not allow for the
> sorting of messages into a chronological order for end user display.  i had
> thought about having each message as its own column against the room or the
> user, but i have had some inconsistencies in retrieving the data.  sometimes
> i get 3 columns, sometimes i get 50...( i think this is because of the
> random partitioner)
>
> i had thought about this structure:
>
> [messages][nickname][message id => message data]
> [chatrooms][room_name][message id]
>
> this way i can pull all messages a user ever posted, not specific to a
> room.  what i haven't been able to do so far is print the timestamp on the
> row or column.  does this have to be explicitly added somewhere or can it be
> returned as part of a 'get' request?
>
> -sd
>
>
> On Tue, Feb 15, 2011 at 2:12 PM, Michal Augustýn <
> augustyn.mic...@gmail.com> wrote:
>
>> The schema design depends on chatrooms/users/messages numbers. I.e. you
>> can have one CF, where key is chatroom, column name is username, column
>> value is the message and message time is the same as column timestamp.
>> You can add day-timestamp to the chatroom name to avoid large rows.
>>
>> Augi
>>
>> 2011/2/15 Andrey V. Panov 
>>
>> I never did it. But I suppose you can use "chatroom name" as key and store
>>> messages & nicks as columns in JSON and timestamp as columnName.
>>>
>>
>>
>
>
> --
> Sasha Dolgy
> sasha.do...@gmail.com
>


Re: online chat scenario

2011-02-15 Thread Sasha Dolgy
thanks for the response.  thinking about this, this would not allow for the
sorting of messages into a chronological order for end user display.  i had
thought about having each message as its own column against the room or the
user, but i have had some inconsistencies in retrieving the data.  sometimes
i get 3 columns, sometimes i get 50...( i think this is because of the
random partitioner)

i had thought about this structure:

[messages][nickname][message id => message data]
[chatrooms][room_name][message id]

this way i can pull all messages a user ever posted, not specific to a
room.  what i haven't been able to do so far is print the timestamp on the
row or column.  does this have to be explicitly added somewhere or can it be
returned as part of a 'get' request?

-sd


On Tue, Feb 15, 2011 at 2:12 PM, Michal Augustýn
wrote:

> The schema design depends on chatrooms/users/messages numbers. I.e. you can
> have one CF, where key is chatroom, column name is username, column value is
> the message and message time is the same as column timestamp.
> You can add day-timestamp to the chatroom name to avoid large rows.
>
> Augi
>
> 2011/2/15 Andrey V. Panov 
>
> I never did it. But I suppose you can use "chatroom name" as key and store
>> messages & nicks as columns in JSON and timestamp as columnName.
>>
>
>


-- 
Sasha Dolgy
sasha.do...@gmail.com


How to rename column family?

2011-02-15 Thread Michal Augustýn
Hello,

I would like to rename some column families but I discovered that the
system_rename_column_family disappeared in 0.7. How to rename the column
family now? I tried system_update_column_family method but it doesn't work
for renaming :(

Thank you!


Re: online chat scenario

2011-02-15 Thread Michal Augustýn
The schema design depends on chatrooms/users/messages numbers. I.e. you can
have one CF, where key is chatroom, column name is username, column value is
the message and message time is the same as column timestamp.
You can add day-timestamp to the chatroom name to avoid large rows.

Augi

2011/2/15 Andrey V. Panov 

> I never did it. But I suppose you can use "chatroom name" as key and store
> messages & nicks as columns in JSON and timestamp as columnName.
>


Re: online chat scenario

2011-02-15 Thread Andrey V. Panov
I never did it. But I suppose you can use "chatroom name" as key and store
messages & nicks as columns in JSON and timestamp as columnName.


online chat scenario

2011-02-15 Thread Sasha Dolgy
hi everyone,

is anyone using cassandra as a backend repository for storing and serving
online chat information?  are you able to share your design thoughts?  have
you encountered problems with the data structure you've implemented?   i was
playing with some ideas and each time i come back to super columns, which
i'd like to avoid if possible.

i have read elsewhere that people are suggesting MongoDB or redis ... i'm
curious about Cassandra.

kind regards,
-sd

-- 
Sasha Dolgy
sasha.do...@gmail.com


Re: Cassandra from unprivileged user

2011-02-15 Thread Sasha Dolgy
so long as the ports you want it to bind to are above 1024

http://wiki.apache.org/cassandra/FAQ#ports


On Tue, Feb 15, 2011 at 12:07 PM, Mateusz Korniak <
mateusz-li...@ant.gliwice.pl> wrote:

> On Tuesday 15 of February 2011, ruslan usifov wrote:
> > Is it possible to launch cassandra from unprivileged user?
> On linux - yes.
>


Re: Cassandra from unprivileged user

2011-02-15 Thread Mateusz Korniak
On Tuesday 15 of February 2011, ruslan usifov wrote:
> Is it possible to launch cassandra from unprivileged user?
On linux - yes.


-- 
Mateusz Korniak


Cassandra from unprivileged user

2011-02-15 Thread ruslan usifov
Is it possible to launch cassandra from unprivileged user?


consistency question

2011-02-15 Thread Serdar Irmak
Hi,

In a 3 node named (named A,B,C) setup with replication factor 3 and quorum 
read/write scenario;
suppose a new value of data X is written to A and B but not C with any reason, 
then A wend down and I fired D with the data of C or with an empty data where 
in a case is X is not present in D.
Then when I read quorum, nodes C and D responded and gave me the old value 
(then read repair in background). So doesn't it mean there is no constistency 
with quorum, too ?


My best
Serdar




- Bu e-posta mesaji kisiye özel olup, gizli bilgiler içeriyor olabilir. Eger bu 
e-posta mesaji size yanlislikla ulasmissa, e-posta mesajini kullaniciya hemen 
geri gönderiniz ve mesaj kutunuzdan siliniz. Bu e-posta mesaji, hiç bir 
sekilde, herhangi bir amaç için çogaltilamaz, yayinlanamaz ve para karsiligi 
satilamaz. Yollayici, bu e-posta mesajinin - virüs koruma sistemleri ile 
kontrol ediliyor olsa bile - virüs içermedigini garanti etmez ve meydana 
gelebilecek zararlardan dogacak hiçbir sorumlulugu kabul etmez.
- The information contained in this message is confidential, intended solely 
for the use of the individual or entity to whom it is addressed and may be 
protected by professional secrecy. You should not copy, disclose or distribute 
this information for any purpose. If you are not the intended recipient of this 
message or you receive this mail in error, you should refrain from making any 
use of the contents and from opening any attachment. In that case, please 
notify the sender immediately and return the message to the sender, then, 
delete and destroy all copies. This e-mail message has been swept by anti-virus 
systems for the presence of computer viruses. In doing so, however, we cannot 
warrant that virus or other forms of data corruption may not be present and we 
do not take any responsibility in any occurrence.