SV: how to retrieve data from supercolumns by phpcassa ?

2010-08-12 Thread Thorvaldsson Justus
I don't use php so I don't know the method but

http://wiki.apache.org/cassandra/API
"get
ColumnOrSuperColumn get(string keyspace, string key, ColumnPath column_path, 
ConsistencyLevel consistency_level)

Get the Column or SuperColumn at the given column_path. If no value is present, 
NotFoundException is thrown. (This is the only method that can throw an 
exception under non-failure conditions.)
"
So don't use get if you want to specify a super column.

"get_slice
list get_slice(string keyspace, string key, ColumnParent 
column_parent, SlicePredicate predicate, ConsistencyLevel consistency_level)
Get the group of columns contained by column_parent (either a ColumnFamily name 
or a ColumnFamily/SuperColumn name pair) specified by the given SlicePredicate 
struct."

Is one way to do what you want. How this works in php I don't know but should 
be similar. 

/Justus


-Ursprungligt meddelande-
Från: lisek [mailto:m.lisow...@powerprice.pl] 
Skickat: den 12 augusti 2010 15:49
Till: cassandra-u...@incubator.apache.org
Ämne: how to retrieve data from supercolumns by phpcassa ?


Hi all,

I've got cassandra superlcolumn looking like that:



now in this columnfamily I've inserted something like that:

["client"] => array(1) {
  ["2a3909c0-a612-11df-b27e-346336336631"]=>
array(3) {
  ["add_date"]=>
  string(10) "1281618279"
  ["lastname"]=>
  string(8) "blablabla"
  ["name"]=>
  string(6) "myname"
  }

}

my question is, how to get from "client" this one "2a3909c0-a612-11df-
b27e-346336336631" column? I was tryin to get->('client', '2a3909c0-
a612-11df-b27e-346336336631') - but with no results... maybe I should
convert this "2a3909c0-a612-11df-b27e-346336336631" somehow before I
put it to get() ?

or maybe I'm thinking wrong way...

regards 
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-retrieve-data-from-supercolumns-by-phpcassa-tp5416141p5416141.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: 0.7 CLI w/TSocket

2010-08-12 Thread Mark

On 8/12/10 10:20 PM, Mark wrote:

On 8/12/10 9:14 PM, Jonathan Ellis wrote:

Works fine here.

bin/cassandra-cli --host localhost --port 9160
Connected to: "Test Cluster" on localhost/9160
Welcome to cassandra CLI.

On Thu, Aug 12, 2010 at 2:18 PM, Mark  wrote:

On 8/12/10 8:29 AM, Mark wrote:

On 8/11/10 10:11 PM, Jonathan Ellis wrote:

you have to use an up to date CLI, the old one used broken options w/
its framed mode

On Wed, Aug 11, 2010 at 6:39 PM, 
Markwrote:

"org.apache.thrift.protocol.TProtocolException: Missing version in
readMessageBegin, old client?"

Is the CLI not supported when using TSocket? I don't believe this 
was

the
same in 0.6.

Can someone explain the differences between TFramedTransport vs 
TSocket.

I
tried searching but I couldn't find much information on either one.
Thanks




Where can I find an updated cli? I just downloaded the nightly build
(apache-cassandra-2010-08-12_13-11-16-bin.tar.gz) and I am still 
seeing the

same thing. Thanks

Same thing with cassandra-0.7.0-beta1





Jon

I am using apache-cassandra-0.7.0-beta1 
(http://people.apache.org/~eevans/apache-cassandra-0.7.0-beta1-bin.tar.gz) 
with pretty much all the defaults besides : 
"thrift_framed_transport_size_in_mb:"


cluster_name: MyCluster
auto_bootstrap: true
hinted_handoff_enabled: true
authenticator: org.apache.cassandra.auth.AllowAllAuthenticator
partitioner: org.apache.cassandra.dht.RandomPartitioner
data_file_directories:
- /var/lib/cassandra/data
seeds:
- localhost
disk_access_mode: mmap_index_only
concurrent_reads: 8
concurrent_writes: 32
sliced_buffer_size_in_kb: 64
storage_port: 7000
listen_address: localhost
rpc_address: localhost
rpc_port: 9160
thrift_framed_transport_size_in_mb: 15
thrift_max_message_length_in_mb: 16
snapshot_before_compaction: false
binary_memtable_throughput_in_mb: 256
memtable_flush_after_mins: 60
memtable_throughput_in_mb: 64
memtable_operations_in_millions: 0.3
column_index_size_in_kb: 64
in_memory_compaction_limit_in_mb: 64
commitlog_directory: /var/lib/cassandra/commitlog
commitlog_rotation_threshold_in_mb: 128
commitlog_sync: periodic
commitlog_sync_period_in_ms: 1
rpc_timeout_in_ms: 1
endpoint_snitch: org.apache.cassandra.locator.SimpleSnitch
dynamic_snitch: true
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
request_scheduler_id: keyspace
keyspaces:
- name: MyKeyspace
  replica_placement_strategy: 
org.apache.cassandra.locator.RackUnawareStrategy

  replication_factor: 1
  column_families:
- name: Foo
  compare_with: BytesType

If I change "thrift_framed_transport_size_in_mb" back to the default 
value of 15 then the CLI will work, otherwise I receiving the 
following error messages:


CLI
$ bin/cassandra-cli --host localhost --port 9160
Exception retrieving information about the cassandra node, check you 
have connected to the thrift port.

Welcome to cassandra CLI.

Cassandra
ERROR 22:12:06,235 Thrift error occurred during processing of message.
org.apache.thrift.protocol.TProtocolException: Missing version in 
readMessageBegin, old client?
at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:211) 

at 
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2487) 

at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167) 

at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 


at java.lang.Thread.run(Thread.java:637)



Just realized what I cut & pasted above was after I switched 
thrift_framed_transport_size_in_mb back to 15. When I set it to 0 I get 
the above errors.


Re: 0.7 CLI w/TSocket

2010-08-12 Thread Mark

On 8/12/10 9:14 PM, Jonathan Ellis wrote:

Works fine here.

bin/cassandra-cli --host localhost --port 9160
Connected to: "Test Cluster" on localhost/9160
Welcome to cassandra CLI.

On Thu, Aug 12, 2010 at 2:18 PM, Mark  wrote:
   

On 8/12/10 8:29 AM, Mark wrote:
 

On 8/11/10 10:11 PM, Jonathan Ellis wrote:
   

you have to use an up to date CLI, the old one used broken options w/
its framed mode

On Wed, Aug 11, 2010 at 6:39 PM, Markwrote:
 

"org.apache.thrift.protocol.TProtocolException: Missing version in
readMessageBegin, old client?"

Is the CLI not supported when using TSocket? I don't believe this was
the
same in 0.6.

Can someone explain the differences between TFramedTransport vs TSocket.
I
tried searching but I couldn't find much information on either one.
Thanks

   


 

Where can I find an updated cli? I just downloaded the nightly build
(apache-cassandra-2010-08-12_13-11-16-bin.tar.gz) and I am still seeing the
same thing. Thanks
   

Same thing with cassandra-0.7.0-beta1

 



   

Jon

I am using apache-cassandra-0.7.0-beta1 
(http://people.apache.org/~eevans/apache-cassandra-0.7.0-beta1-bin.tar.gz) 
with pretty much all the defaults besides : 
"thrift_framed_transport_size_in_mb:"


cluster_name: MyCluster
auto_bootstrap: true
hinted_handoff_enabled: true
authenticator: org.apache.cassandra.auth.AllowAllAuthenticator
partitioner: org.apache.cassandra.dht.RandomPartitioner
data_file_directories:
- /var/lib/cassandra/data
seeds:
- localhost
disk_access_mode: mmap_index_only
concurrent_reads: 8
concurrent_writes: 32
sliced_buffer_size_in_kb: 64
storage_port: 7000
listen_address: localhost
rpc_address: localhost
rpc_port: 9160
thrift_framed_transport_size_in_mb: 15
thrift_max_message_length_in_mb: 16
snapshot_before_compaction: false
binary_memtable_throughput_in_mb: 256
memtable_flush_after_mins: 60
memtable_throughput_in_mb: 64
memtable_operations_in_millions: 0.3
column_index_size_in_kb: 64
in_memory_compaction_limit_in_mb: 64
commitlog_directory: /var/lib/cassandra/commitlog
commitlog_rotation_threshold_in_mb: 128
commitlog_sync: periodic
commitlog_sync_period_in_ms: 1
rpc_timeout_in_ms: 1
endpoint_snitch: org.apache.cassandra.locator.SimpleSnitch
dynamic_snitch: true
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
request_scheduler_id: keyspace
keyspaces:
- name: MyKeyspace
  replica_placement_strategy: 
org.apache.cassandra.locator.RackUnawareStrategy

  replication_factor: 1
  column_families:
- name: Foo
  compare_with: BytesType

If I change "thrift_framed_transport_size_in_mb" back to the default 
value of 15 then the CLI will work, otherwise I receiving the following 
error messages:


CLI
$ bin/cassandra-cli --host localhost --port 9160
Exception retrieving information about the cassandra node, check you 
have connected to the thrift port.

Welcome to cassandra CLI.

Cassandra
ERROR 22:12:06,235 Thrift error occurred during processing of message.
org.apache.thrift.protocol.TProtocolException: Missing version in 
readMessageBegin, old client?
at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:211)
at 
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2487)
at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:637)





Re: 0.7 CLI w/TSocket

2010-08-12 Thread Jonathan Ellis
Works fine here.

bin/cassandra-cli --host localhost --port 9160
Connected to: "Test Cluster" on localhost/9160
Welcome to cassandra CLI.

On Thu, Aug 12, 2010 at 2:18 PM, Mark  wrote:
> On 8/12/10 8:29 AM, Mark wrote:
>>
>> On 8/11/10 10:11 PM, Jonathan Ellis wrote:
>>>
>>> you have to use an up to date CLI, the old one used broken options w/
>>> its framed mode
>>>
>>> On Wed, Aug 11, 2010 at 6:39 PM, Mark  wrote:

 "org.apache.thrift.protocol.TProtocolException: Missing version in
 readMessageBegin, old client?"

 Is the CLI not supported when using TSocket? I don't believe this was
 the
 same in 0.6.

 Can someone explain the differences between TFramedTransport vs TSocket.
 I
 tried searching but I couldn't find much information on either one.
 Thanks

>>>
>>>
>> Where can I find an updated cli? I just downloaded the nightly build
>> (apache-cassandra-2010-08-12_13-11-16-bin.tar.gz) and I am still seeing the
>> same thing. Thanks
>
> Same thing with cassandra-0.7.0-beta1
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: How does cfstats calculate Row Size?

2010-08-12 Thread Jonathan Ellis
Right, row stats in 0.6 are just "what I've seen during the
compactions that happened to run since this node restarted last."

0.7 has persistent (and more fine-grained) statistics.

On Thu, Aug 12, 2010 at 1:28 PM, Ryan King  wrote:
> On Thu, Aug 12, 2010 at 9:08 AM, Julie  wrote:
>> I am chasing down a row size discrepancy and am confused.
>>
>> I populated a single node Cassandra cluster with 10,000 rows of data, using
>> numeric keys 1-10,000, where each row is a little over 100kB in length and 
>> has
>> a single column in it.
>>
>> When I perform a cfstats on the node immediately after writing the data, it
>> reports that the Compacted row minimum size = Compacted row maximum size 
>> which
>> is a little over 100,000 bytes.  This is what I expect.
>>
>> I then run an application that randomly reads rows and adds a timestamp 
>> column
>> to each row read.  This timestamp column name and column value is just adding
>> a few bytes to the row.
>>
>> But after running my reading app for a few hours, cfstats reports a very odd
>> minimum row size (and compacted mean row size):
>>
>> [r...@ec2-server1 ~]# /mnt/server/apache-cassandra-0.6.2/bin/nodetool -h
>> ec2-server1 -p 8080 cfstats
>> Keyspace: Keyspace1
>>        Read Count: 670434
>>        Read Latency: 36.22349047035205 ms.
>>        Write Count: 1519933
>>        Write Latency: 0.02940705741634664 ms.
>>        Pending Tasks: 0
>>                Column Family: Standard1
>>                SSTable count: 6
>>                Space used (live): 11130225642
>>                Space used (total): 11130225642
>>                Memtable Columns Count: 1435
>>                Memtable Data Size: 40344907
>>                Memtable Switch Count: 1329
>>                Read Count: 670434
>>                Read Latency: 41.768 ms.
>>                Write Count: 1519933
>>                Write Latency: 0.025 ms.
>>                Pending Tasks: 0
>>                Key cache capacity: 20
>>                Key cache size: 20
>>                Key cache hit rate: 0.48049934471509675
>>                Row cache: disabled
>>                Compacted row minimum size: 238
>>                Compacted row maximum size: 100323
>>                Compacted row mean size: 67548
>>
>> I thought I had a bug in my code so I wrote another app to read every row
>> in the database, keys 1-10,000.  I get the size of each row after reading it
>> (by adding up all column names and column values in the row and the size of
>> the key string) and this matches what I expect -- every single key in this
>> table has a size of just over 100,000 bytes.  (I know there are some
>> overhead columns in each row but I assume these will only make the row
>> larger, not smaller.)
>>
>> So I am confused about where cfstats is getting the row sizes it is working
>> with?
>>
>> When I add the timestamp column to each row, I am not deleting the other
>> column (large) in the row but I am not rewriting the large column either.
>
> I'm guessing (haven't read this part of the source) that the min size
> is being generated in minor compaction, which doesn't see the whole
> row.
>
> -ryan
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


RE: error using get_range_slice with random partitioner

2010-08-12 Thread David McIntosh
I'm also seeing an issue with not being able to iterate over all keys in
Cassandra 0.6.4.  In my unit test I create 20 keys (0-19) and iterate with a
batch size of 6.  This is what I get.

 

Cassandra 0.6.4

start key: ""

9, 14, 4, 15, 11, 18

start key: 18

18, 7, 17, 7, 17

start key:17

17

 

Cassandra 0.6.3

start key: ""

3, 6, 5, 19, 10, 0

start key: 0

0, 8, 2, 16, 13, 1

start key: 1

1, 12, 9, 14, 4, 15

start key: 15

15, 11, 15, 11, 18, 7

start key: 7

7, 17, 7, 17

 

In both versions I get duplicates but in 0.6.4 I don't get the complete set
of keys back.  The complete set is returned in 0.6.3.



Re: 0.7 CLI w/TSocket

2010-08-12 Thread Mark

On 8/12/10 8:29 AM, Mark wrote:

On 8/11/10 10:11 PM, Jonathan Ellis wrote:

you have to use an up to date CLI, the old one used broken options w/
its framed mode

On Wed, Aug 11, 2010 at 6:39 PM, Mark  wrote:

"org.apache.thrift.protocol.TProtocolException: Missing version in
readMessageBegin, old client?"

Is the CLI not supported when using TSocket? I don't believe this 
was the

same in 0.6.

Can someone explain the differences between TFramedTransport vs 
TSocket. I
tried searching but I couldn't find much information on either one. 
Thanks





Where can I find an updated cli? I just downloaded the nightly build 
(apache-cassandra-2010-08-12_13-11-16-bin.tar.gz) and I am still 
seeing the same thing. Thanks

Same thing with cassandra-0.7.0-beta1


Re: Can I retrieve specific key range from a table in RandomPartitioner?

2010-08-12 Thread Aaron Morton
Try setting the end key to an empty string, and then set the number of rows to something sane and make multiple calls if needed. Or you may be able to make your own secondary index another CF, so you do two reads: one on the secondary index then one of the rows you want.There has been some discussion about range slices with RP recently that may be helpful see http://www.mail-archive.com/user@cassandra.apache.org/msg05017.htmlAaronOn 12 Aug, 2010,at 08:50 PM, ChingShen  wrote:I have a key range that between 00 and 001000, and my code as below:SlicePredicate predicate = new SlicePredicate();predicate.setColumn_names(columns);ColumnParent parent = new ColumnParent(columnFamily);
KeyRange k = new KeyRange(1000);k.setStart_key(key[0]);k.setEnd_key(key[1000]);List results = client.get_range_slices(keyspace, parent, predicate, k, ConsistencyLevel.ONE);        
On Thu, Aug 12, 2010 at 4:44 PM, ChingShen  wrote:
Hi all,   Can I retrieve specific key range from a table in RandomPartitioner? Because I always got below exception:Exception in thread "main" InvalidRequestException(why:start key's md5 sorts after end key's md5.  this is not allowed; you probably should not specify end key at all, under RandomPartitioner)

Thanks.Shen



Re: How does cfstats calculate Row Size?

2010-08-12 Thread Ryan King
On Thu, Aug 12, 2010 at 9:08 AM, Julie  wrote:
> I am chasing down a row size discrepancy and am confused.
>
> I populated a single node Cassandra cluster with 10,000 rows of data, using
> numeric keys 1-10,000, where each row is a little over 100kB in length and has
> a single column in it.
>
> When I perform a cfstats on the node immediately after writing the data, it
> reports that the Compacted row minimum size = Compacted row maximum size which
> is a little over 100,000 bytes.  This is what I expect.
>
> I then run an application that randomly reads rows and adds a timestamp column
> to each row read.  This timestamp column name and column value is just adding
> a few bytes to the row.
>
> But after running my reading app for a few hours, cfstats reports a very odd
> minimum row size (and compacted mean row size):
>
> [r...@ec2-server1 ~]# /mnt/server/apache-cassandra-0.6.2/bin/nodetool -h
> ec2-server1 -p 8080 cfstats
> Keyspace: Keyspace1
>        Read Count: 670434
>        Read Latency: 36.22349047035205 ms.
>        Write Count: 1519933
>        Write Latency: 0.02940705741634664 ms.
>        Pending Tasks: 0
>                Column Family: Standard1
>                SSTable count: 6
>                Space used (live): 11130225642
>                Space used (total): 11130225642
>                Memtable Columns Count: 1435
>                Memtable Data Size: 40344907
>                Memtable Switch Count: 1329
>                Read Count: 670434
>                Read Latency: 41.768 ms.
>                Write Count: 1519933
>                Write Latency: 0.025 ms.
>                Pending Tasks: 0
>                Key cache capacity: 20
>                Key cache size: 20
>                Key cache hit rate: 0.48049934471509675
>                Row cache: disabled
>                Compacted row minimum size: 238
>                Compacted row maximum size: 100323
>                Compacted row mean size: 67548
>
> I thought I had a bug in my code so I wrote another app to read every row
> in the database, keys 1-10,000.  I get the size of each row after reading it
> (by adding up all column names and column values in the row and the size of
> the key string) and this matches what I expect -- every single key in this
> table has a size of just over 100,000 bytes.  (I know there are some
> overhead columns in each row but I assume these will only make the row
> larger, not smaller.)
>
> So I am confused about where cfstats is getting the row sizes it is working
> with?
>
> When I add the timestamp column to each row, I am not deleting the other
> column (large) in the row but I am not rewriting the large column either.

I'm guessing (haven't read this part of the source) that the min size
is being generated in minor compaction, which doesn't see the whole
row.

-ryan


Re: Data Distribution / Replication

2010-08-12 Thread Benjamin Black
On Thu, Aug 12, 2010 at 8:30 AM, Stefan Kaufmann  wrote:
> Hello again,
>
> last day's I started several tests with Cassandra and learned quite some 
> facts.
>
> However, of course, there are still enough things I need to
> understand. One thing is, how the data replication works.
> For my Testing:
> 1. I set the replication Factor to 3, started with 1 active node (the
> seed) and I inserted some test key's.

This is not a correct concept of what a seed is.  I suggest you not
use the word 'seed' for it.

> 2. I started 2 more nodes, which joined the cluster.
> 3. I waited for the data to replicate, which didn't happen.

Correct, you need to run nodetool repair because the nodes were not
present when the writes came in.  You can also use a higher
consistency level to force read repair before returning data, which
will incrementally repair things.

> 4. I inserted more key's, and it looked like they were distributed to
> all three nodes.
>

Correct, they were up at the time and received the write operations directly.

Seems like you might benefit from reading the operations wiki:
http://wiki.apache.org/cassandra/Operations

b
b


Re: Filesystem for Cassandra

2010-08-12 Thread Peter Schuller
> Actually we use Cassandra on ZFS (OpenSolaris) - fine tuned for our need. no
> Raidcontroller used.

An interesting property of ZFS is the use of the ARC for caching.
Contrary to the traditional behavior of buffer caches, the ARC should
theoretically not evict all interesting data as a result of single
sequential reads/writes of large files (without repeated access).

Thus, I would expect that ZFS has the potential to give very even
performance in terms of being less affected by compaction and memtable
flush operations which otherwise (on e.g. traditional linux w/
xfs/jfs/etc) have a significant impact.

That said I'd love to hear more about people's experiences. My only
testing with Cassandra on ZFS has been on FreeBSD and the issue I had
there was the tendency for bulk writes to cause poor latency on other
I/O (when doing lots of writes I regularly have this issue on my
desktop which runs freebsd/zfs) - but I am hoping that is specific to
the FreeBSD port


-- 
/ Peter Schuller


Re: Post on experiences with Cassandra for Twitter retweet analysis

2010-08-12 Thread Eric Evans
On Thu, 2010-08-12 at 11:29 +0200, Mikio Braun wrote:
> So far, we're very pleased with Cassandra performance, but we've also
> had to overcome some issues on which I report in the blog and which
> are hopefully interesting for other users of Cassandra.
> 
> The blog post can be found here:
> 
> http://blog.mikiobraun.de/2010/08/-cassandra-tips.html

Thanks, this is a nice write up.

I am curious though about the troubles you had using wide rows.  As a
rule, several hundred thousand columns in a row should not be a problem.
In fact, this runs contrary to the advice usually given since this
should be the fastest/most efficient way to retrieve a dataset of that
size.

-- 
Eric Evans
eev...@rackspace.com



How does cfstats calculate Row Size?

2010-08-12 Thread Julie
I am chasing down a row size discrepancy and am confused.

I populated a single node Cassandra cluster with 10,000 rows of data, using 
numeric keys 1-10,000, where each row is a little over 100kB in length and has 
a single column in it. 

When I perform a cfstats on the node immediately after writing the data, it 
reports that the Compacted row minimum size = Compacted row maximum size which 
is a little over 100,000 bytes.  This is what I expect.  

I then run an application that randomly reads rows and adds a timestamp column 
to each row read.  This timestamp column name and column value is just adding 
a few bytes to the row.

But after running my reading app for a few hours, cfstats reports a very odd 
minimum row size (and compacted mean row size):

[r...@ec2-server1 ~]# /mnt/server/apache-cassandra-0.6.2/bin/nodetool -h 
ec2-server1 -p 8080 cfstats
Keyspace: Keyspace1
Read Count: 670434
Read Latency: 36.22349047035205 ms.
Write Count: 1519933
Write Latency: 0.02940705741634664 ms.
Pending Tasks: 0
Column Family: Standard1
SSTable count: 6
Space used (live): 11130225642
Space used (total): 11130225642
Memtable Columns Count: 1435
Memtable Data Size: 40344907
Memtable Switch Count: 1329
Read Count: 670434
Read Latency: 41.768 ms.
Write Count: 1519933
Write Latency: 0.025 ms.
Pending Tasks: 0
Key cache capacity: 20
Key cache size: 20
Key cache hit rate: 0.48049934471509675
Row cache: disabled
Compacted row minimum size: 238
Compacted row maximum size: 100323
Compacted row mean size: 67548

I thought I had a bug in my code so I wrote another app to read every row 
in the database, keys 1-10,000.  I get the size of each row after reading it 
(by adding up all column names and column values in the row and the size of 
the key string) and this matches what I expect -- every single key in this 
table has a size of just over 100,000 bytes.  (I know there are some 
overhead columns in each row but I assume these will only make the row 
larger, not smaller.)

So I am confused about where cfstats is getting the row sizes it is working 
with?  

When I add the timestamp column to each row, I am not deleting the other 
column (large) in the row but I am not rewriting the large column either.

Thanks for your help!
Julie




Data Distribution / Replication

2010-08-12 Thread Stefan Kaufmann
Hello again,

last day's I started several tests with Cassandra and learned quite some facts.

However, of course, there are still enough things I need to
understand. One thing is, how the data replication works.
For my Testing:
1. I set the replication Factor to 3, started with 1 active node (the
seed) and I inserted some test key's.
2. I started 2 more nodes, which joined the cluster.
3. I waited for the data to replicate, which didn't happen.
4. I inserted more key's, and it looked like they were distributed to
all three nodes.

So here is my question:
How can I ensure that every key exists at least on three nodes? So,
when I start with one node and later join 2 more - the data will be
distributed.
Shouldn't this happen automatically? Am I just not patient enough?
How is this handled in productive environments? For instance, one node
has a hardware failure, so it will be exchanged with a new blank one.
How does that one get it's data back?

I searched the mailinglist, the only answer I found was to copy the
data manually, is this true?

I'm currently using Cassandra 0.6.4 in our testing environment. I
chose the RackUnawareStrategy

Stefan


Re: 0.7 CLI w/TSocket

2010-08-12 Thread Mark

On 8/11/10 10:11 PM, Jonathan Ellis wrote:

you have to use an up to date CLI, the old one used broken options w/
its framed mode

On Wed, Aug 11, 2010 at 6:39 PM, Mark  wrote:
   

"org.apache.thrift.protocol.TProtocolException: Missing version in
readMessageBegin, old client?"

Is the CLI not supported when using TSocket? I don't believe this was the
same in 0.6.

Can someone explain the differences between TFramedTransport vs TSocket. I
tried searching but I couldn't find much information on either one. Thanks

 



   
Where can I find an updated cli? I just downloaded the nightly build 
(apache-cassandra-2010-08-12_13-11-16-bin.tar.gz) and I am still seeing 
the same thing. Thanks


how to retrieve data from supercolumns by phpcassa ?

2010-08-12 Thread lisek

Hi all,

I've got cassandra superlcolumn looking like that:



now in this columnfamily I've inserted something like that:

["client"] => array(1) {
  ["2a3909c0-a612-11df-b27e-346336336631"]=>
array(3) {
  ["add_date"]=>
  string(10) "1281618279"
  ["lastname"]=>
  string(8) "blablabla"
  ["name"]=>
  string(6) "myname"
  }

}

my question is, how to get from "client" this one "2a3909c0-a612-11df-
b27e-346336336631" column? I was tryin to get->('client', '2a3909c0-
a612-11df-b27e-346336336631') - but with no results... maybe I should
convert this "2a3909c0-a612-11df-b27e-346336336631" somehow before I
put it to get() ?

or maybe I'm thinking wrong way...

regards 
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-to-retrieve-data-from-supercolumns-by-phpcassa-tp5416141p5416141.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Reload/Update schema 0.7

2010-08-12 Thread Gary Dusbabek
You should take a look at http://wiki.apache.org/cassandra/LiveSchemaUpdates

loadSchemaFromYaml() is intended to initialize the schema on a seed
node in a new cluster (or one that has been upgraded from 0.6).  It is
an operation that should only be performed one time *per cluster.*

Gary

On Wed, Aug 11, 2010 at 20:56, Mark  wrote:
> How is this accomplished?
>
> I tried using the
> org.apache.cassandra.service.StorageService.loadSchemaFromYAML() method but
> I am receiving the following error.
>
> java.util.concurrent.ExecutionException:
> org.apache.cassandra.config.ConfigurationException: Cannot load from XML on
> top of pre-existing schemas.
>    at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
>    at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>    at
> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:87)
>    at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888)
>    at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>    at java.lang.Thread.run(Thread.java:637)
>
> Thanks again
>


Filesystem for Cassandra

2010-08-12 Thread Michael Widmann
Hi out there ...

Without starting an OT  Thread or an Evangelists war it would be interesting

what filesystems most cassandra installation uses, which performs best in
which cases

Actually we use Cassandra on ZFS (OpenSolaris) - fine tuned for our need. no
Raidcontroller used.

What are the experience with

XFS
ZFS (on FreeBSD or something)
EXT4 / EXT3
BTRFS (if someone really use it already)
etc.

Would be an interesting Fact for production servers

greetings

Michael


Post on experiences with Cassandra for Twitter retweet analysis

2010-08-12 Thread Mikio Braun
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello,

I've put a blog post where I discuss our experiences with using
Cassandra as the main database backend for twimpact. Twimpact is
research project at the TU Berlin which aims at estimating user impact
based on retweet analysis. A live version of the analysis for the
japanese market can be seen at http://twimpact.jp

So far, we're very pleased with Cassandra performance, but we've also
had to overcome some issues on which I report in the blog and which are
hopefully interesting for other users of Cassandra.

The blog post can be found here:

http://blog.mikiobraun.de/2010/08/-cassandra-tips.html

- -M


- -- 
Dr. Mikio Braunemail: mi...@cs.tu-berlin.de
TU Berlin  web: ml.cs.tu-berlin.de/~mikio
Franklinstr. 28/29 tel: +49 30 314 78627
10587 Berlin, Germany



-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxjvwYACgkQtnXKX8rQtgB3AQCcCOuWhVePsWQt81uspETC4Zg3
s2MAn2wH/1xxOuTWGXpgmEyzI4Hmi99+
=08Y9
-END PGP SIGNATURE-


Re: Can I retrieve specific key range from a table in RandomPartitioner?

2010-08-12 Thread ChingShen
I have a key range that between 00 and 001000, and my code as
below:

SlicePredicate predicate = new SlicePredicate();
predicate.setColumn_names(columns);
ColumnParent parent = new ColumnParent(columnFamily);
KeyRange k = new KeyRange(1000);
k.setStart_key(key[0]);
k.setEnd_key(key[1000]);
List results = client.get_range_slices(keyspace, parent,
predicate, k, ConsistencyLevel.ONE);

On Thu, Aug 12, 2010 at 4:44 PM, ChingShen  wrote:

> Hi all,
>
>Can I retrieve specific key range from a table in RandomPartitioner?
> Because I always got below exception:
> Exception in thread "main" InvalidRequestException(why:start key's md5
> sorts after end key's md5.  this is not allowed; you probably should not
> specify end key at all, under RandomPartitioner)
>
> Thanks.
>
> Shen
>


Can I retrieve specific key range from a table in RandomPartitioner?

2010-08-12 Thread ChingShen
Hi all,

   Can I retrieve specific key range from a table in RandomPartitioner?
Because I always got below exception:
Exception in thread "main" InvalidRequestException(why:start key's md5 sorts
after end key's md5.  this is not allowed; you probably should not specify
end key at all, under RandomPartitioner)

Thanks.

Shen