Re: How to delete bulk data from cassandra 0.6.3

2011-02-05 Thread Ali Ahsan

Any update on this?

On 02/05/2011 12:53 AM, Ali Ahsan wrote:
So do we need to write a script ? or its some thing i can do as a 
system admin without involving and developer.If yes please guide me in 
this case.





On 02/04/2011 10:36 PM, Jonathan Ellis wrote:
In that case, you should shut down the server before removing data 
files.


On Fri, Feb 4, 2011 at 9:01 AM,roshandawr...@gmail.com  wrote:

I thought truncate() was not available before 0.7 (in 0.6.3)was it?

---
Sent from BlackBerry

-Original Message-
From: Jonathan Ellisjbel...@gmail.com
Date: Fri, 4 Feb 2011 08:58:35
To: useruser@cassandra.apache.org
Reply-To: user@cassandra.apache.org
Subject: Re: How to delete bulk data from cassandra 0.6.3

You should use truncate instead. (Then remove the snapshot truncate 
creates.)


On Fri, Feb 4, 2011 at 2:05 AM, Ali 
Ahsanali.ah...@panasiangroup.com  wrote:

Hi All

Is there any way i can delete column families data (not removing 
column
families ) from Cassandra without effecting ring integrity.What if  
i delete

some column families data in linux with rm command  ?

--
S.Ali Ahsan

Senior System Engineer

e-Business (Pvt) Ltd

49-C Jail Road, Lahore, P.O. Box 676
Lahore 54000, Pakistan

Tel: +92 (0)42 3758 7140 Ext. 128

Mobile: +92 (0)345 831 8769

Fax: +92 (0)42 3758 0027

Email: ali.ah...@panasiangroup.com



www.ebusiness-pg.com

www.panasiangroup.com

Confidentiality: This e-mail and any attachments may be confidential
and/or privileged. If you are not a named recipient, please notify the
sender immediately and do not disclose the contents to another person
use it for any purpose or store or copy the information in any medium.
Internet communications cannot be guaranteed to be timely, secure, 
error

or virus-free. We do not accept liability for any errors or omissions.





--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com










--
S.Ali Ahsan

Senior System Engineer

e-Business (Pvt) Ltd

49-C Jail Road, Lahore, P.O. Box 676
Lahore 54000, Pakistan

Tel: +92 (0)42 3758 7140 Ext. 128

Mobile: +92 (0)345 831 8769

Fax: +92 (0)42 3758 0027

Email: ali.ah...@panasiangroup.com



www.ebusiness-pg.com

www.panasiangroup.com

Confidentiality: This e-mail and any attachments may be confidential
and/or privileged. If you are not a named recipient, please notify the
sender immediately and do not disclose the contents to another person
use it for any purpose or store or copy the information in any medium.
Internet communications cannot be guaranteed to be timely, secure, error
or virus-free. We do not accept liability for any errors or omissions.



Re: Sorting in time order without using TimeUUID type column names

2011-02-05 Thread Bill Speirs
You can specify reverse order through the API when you slice the cols so I
don't think you need to write a comparator.

Bill-

On Feb 4, 2011 9:45 PM, Aditya Narayan ady...@gmail.com wrote:
Thanks Aaron,

Yes I can put the column names without using the userId in the
timeline row, and when I want to retrieve the row corresponding to
that column name, I will attach the userId to get the row key.

Yes I'll store it as a long  I guess I'll have to write with a custom
comparator type (ReversedIntegerType) to sort those longs in
descending order.

Regards
Aditya



On Sat, Feb 5, 2011 at 6:24 AM, aaron morton aa...@thelastpickle.com
wrote:
 IMHO If you know t...


Hinted handoffs - how do they work?

2011-02-05 Thread Paul T
Good morning!
I have a been reading through Cassandra wiki and have some confusion around how 
hinted handoffs work.

Here is my scenario:
Five nodes in the ring (A, B, C, D, E)
Replication factor=3
Assume that the replicas for a given key are A, B, C
Assume CL=ONE 

During a write operation, nodes  B and C are down.
Will hints for B and C be written to just A (the only live replica available) 
or 
will D and E also take the hints and the data?  

If D and E take on the hints+data, will that data be reachable during a 
subsequent read operation? (assuming B and C are still down)

Would appreciate a clarification.
TIA


  

Re: Hinted handoffs - how do they work?

2011-02-05 Thread Jonathan Ellis
On Sat, Feb 5, 2011 at 6:46 AM, Paul T paulmax6...@yahoo.com wrote:
 Good morning!
 I have a been reading through Cassandra wiki and have some confusion around
 how hinted handoffs work.
 Here is my scenario:
 Five nodes in the ring (A, B, C, D, E)
 Replication factor=3
 Assume that the replicas for a given key are A, B, C
 Assume CL=ONE
 During a write operation, nodes  B and C are down.
 Will hints for B and C be written to just A (the only live replica
 available)

Just A.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: How to delete bulk data from cassandra 0.6.3

2011-02-05 Thread Edward Capriolo
On Sat, Feb 5, 2011 at 4:12 AM, Ali Ahsan ali.ah...@panasiangroup.com wrote:
 Any update on this?

 On 02/05/2011 12:53 AM, Ali Ahsan wrote:

 So do we need to write a script ? or its some thing i can do as a system
 admin without involving and developer.If yes please guide me in this case.




 On 02/04/2011 10:36 PM, Jonathan Ellis wrote:

 In that case, you should shut down the server before removing data files.

 On Fri, Feb 4, 2011 at 9:01 AM,roshandawr...@gmail.com  wrote:

 I thought truncate() was not available before 0.7 (in 0.6.3)was it?

 ---
 Sent from BlackBerry

 -Original Message-
 From: Jonathan Ellisjbel...@gmail.com
 Date: Fri, 4 Feb 2011 08:58:35
 To: useruser@cassandra.apache.org
 Reply-To: user@cassandra.apache.org
 Subject: Re: How to delete bulk data from cassandra 0.6.3

 You should use truncate instead. (Then remove the snapshot truncate
 creates.)

 On Fri, Feb 4, 2011 at 2:05 AM, Ali Ahsanali.ah...@panasiangroup.com
  wrote:

 Hi All

 Is there any way i can delete column families data (not removing column
 families ) from Cassandra without effecting ring integrity.What if  i
 delete
 some column families data in linux with rm command  ?

 --
 S.Ali Ahsan

 Senior System Engineer


 e-Business (Pvt) Ltd

 49-C Jail Road, Lahore, P.O. Box 676
 Lahore 54000, Pakistan

 Tel: +92 (0)42 3758 7140 Ext. 128

 Mobile: +92 (0)345 831 8769

 Fax: +92 (0)42 3758 0027

 Email: ali.ah...@panasiangroup.com



 www.ebusiness-pg.com

 www.panasiangroup.com

 Confidentiality: This e-mail and any attachments may be confidential
 and/or privileged. If you are not a named recipient, please notify the
 sender immediately and do not disclose the contents to another person
 use it for any purpose or store or copy the information in any medium.
 Internet communications cannot be guaranteed to be timely, secure,
 error
 or virus-free. We do not accept liability for any errors or omissions.




 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com







 --
 S.Ali Ahsan

 Senior System Engineer

 e-Business (Pvt) Ltd

 49-C Jail Road, Lahore, P.O. Box 676
 Lahore 54000, Pakistan

 Tel: +92 (0)42 3758 7140 Ext. 128

 Mobile: +92 (0)345 831 8769

 Fax: +92 (0)42 3758 0027

 Email: ali.ah...@panasiangroup.com



 www.ebusiness-pg.com

 www.panasiangroup.com

 Confidentiality: This e-mail and any attachments may be confidential
 and/or privileged. If you are not a named recipient, please notify the
 sender immediately and do not disclose the contents to another person
 use it for any purpose or store or copy the information in any medium.
 Internet communications cannot be guaranteed to be timely, secure, error
 or virus-free. We do not accept liability for any errors or omissions.



in 0.6.X
pkill `pid of cassandra`
rm -rf * /var/lib/cassandra/data/keyspace/CF you want to delete*
(start cassandra)


Re: How to delete bulk data from cassandra 0.6.3

2011-02-05 Thread Ali Ahsan
Thanks for replying Edward Capriolo.Will this effect cassandra ring  
integrity? Another question is that will cassandra work properly after 
this operation.And will it be possible to restore deleted  data from 
backup?.



in 0.6.X
pkill `pid of cassandra`
rm -rf * /var/lib/cassandra/data/keyspace/CF you want to delete*
(start cassandra)





--
S.Ali Ahsan

Senior System Engineer

e-Business (Pvt) Ltd

49-C Jail Road, Lahore, P.O. Box 676
Lahore 54000, Pakistan

Tel: +92 (0)42 3758 7140 Ext. 128

Mobile: +92 (0)345 831 8769

Fax: +92 (0)42 3758 0027

Email: ali.ah...@panasiangroup.com



www.ebusiness-pg.com

www.panasiangroup.com

Confidentiality: This e-mail and any attachments may be confidential
and/or privileged. If you are not a named recipient, please notify the
sender immediately and do not disclose the contents to another person
use it for any purpose or store or copy the information in any medium.
Internet communications cannot be guaranteed to be timely, secure, error
or virus-free. We do not accept liability for any errors or omissions.



How bad is teh impact of compaction on performance?

2011-02-05 Thread buddhasystem

Just wanted to see if someone with experience in running an actual service
can advise me:

how often do you run nodetool compact on your nodes? Do you stagger it in
time, for each node? How badly is performance affected?

I know this all seems too generic but then again no two clusters are created
equal anyhow. Just wanted to get a feel.

Thanks,
Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-bad-is-teh-impact-of-compaction-on-performance-tp5995868p5995868.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: How to delete bulk data from cassandra 0.6.3

2011-02-05 Thread Edward Capriolo
On Sat, Feb 5, 2011 at 11:35 AM, Ali Ahsan ali.ah...@panasiangroup.com wrote:
 Thanks for replying Edward Capriolo.Will this effect cassandra ring
  integrity? Another question is that will cassandra work properly after this
 operation.And will it be possible to restore deleted  data from backup?.

 in 0.6.X
 pkill `pid of cassandra`
 rm -rf * /var/lib/cassandra/data/keyspace/CF you want to delete*
 (start cassandra)




 --
 S.Ali Ahsan

 Senior System Engineer

 e-Business (Pvt) Ltd

 49-C Jail Road, Lahore, P.O. Box 676
 Lahore 54000, Pakistan

 Tel: +92 (0)42 3758 7140 Ext. 128

 Mobile: +92 (0)345 831 8769

 Fax: +92 (0)42 3758 0027

 Email: ali.ah...@panasiangroup.com



 www.ebusiness-pg.com

 www.panasiangroup.com

 Confidentiality: This e-mail and any attachments may be confidential
 and/or privileged. If you are not a named recipient, please notify the
 sender immediately and do not disclose the contents to another person
 use it for any purpose or store or copy the information in any medium.
 Internet communications cannot be guaranteed to be timely, secure, error
 or virus-free. We do not accept liability for any errors or omissions.



I am not sure what you mean by data integrity.

In short, when Cassandra starts up it searches it's data directories
and loads up the data, index, bloom filters, and saved caches it
finds.

Unless the files are corrupt it will happily load up what it finds.

Restores are done by the process your described , stop server, restore
files, start server.


Re: How bad is teh impact of compaction on performance?

2011-02-05 Thread Edward Capriolo
On Sat, Feb 5, 2011 at 11:59 AM, buddhasystem potek...@bnl.gov wrote:

 Just wanted to see if someone with experience in running an actual service
 can advise me:

 how often do you run nodetool compact on your nodes? Do you stagger it in
 time, for each node? How badly is performance affected?

 I know this all seems too generic but then again no two clusters are created
 equal anyhow. Just wanted to get a feel.

 Thanks,
 Maxim

 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-bad-is-teh-impact-of-compaction-on-performance-tp5995868p5995868.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.


This is an interesting topic. Cassandra can now remove tombstones on
non-major compaction. For some use cases you may not have to trigger
nodetool compact yourself to remove tombstones. Use cases that do not
to many updates, deletes may have the least need to run compaction
yourself.

!However! If you have smaller SSTables, or less SSTables your read
operations will be more efficient.

if you have downtime such as from 1AM-6AM. Going through a major
compaction might shrink you dataset significantly and that will make
reads better.

Compaction can be more or less intensive. The largest factor is is row
size.  Users with large rows probably see faster compaction while
smaller rows see it take a long time. You can lower the priority of
the compaction thread for experimentation.

As to the performance you want to get your cluster to the state where
it is not compacting often. This may mean you need more nodes to
handle writes.

I graph the compaction information from JMX
http://www.jointhegrid.com/cassandra/cassandra-cacti-m6.jsp
to get a feel for how often a node is compacting on average. Also I
cross reference the compaction with Read latency and IO graphs I have
to see what impact compaction has on reads.

Forcing a major compaction also lowers the chances a compaction will
happen during the day on peak time. I major compact a few cluster
nodes each night through cron (gc time 3 days). This has been good for
keeping our data on disk as small as possible. Forcing the major
compact at night uses IO, but i find it saves IO over the course of
the day because each read seeks less on disk.


order of index expressions

2011-02-05 Thread Shaun Cutts
Hello,

I'm wondering if cassandra is sensitive to the order of index expressions in 
(pycassa call) get_indexed_slices?

If I have several column indexes available, will it attempt to optimize the 
order?

Thanks,

-- Shaun










postgis cassandra?

2011-02-05 Thread Sean Ochoa
Can someone tell me how to represent spatial data (coming from postgis) in
Cassandra?

 - Sean


Re: postgis cassandra?

2011-02-05 Thread William R Speirs
I know nothing about postgis and little about spacial data, but if you're simply 
talking about data that relates to some latitude  longitude pair, you could 
have your row key simply be the concatenation of the two: lat:long.


Can you provide more details about the type of data you're looking to store?

Thanks...

Bill-

On 02/05/2011 12:22 PM, Sean Ochoa wrote:

Can someone tell me how to represent spatial data (coming from postgis) in
Cassandra?

  - Sean


Re: How bad is teh impact of compaction on performance?

2011-02-05 Thread buddhasystem

Thanks Edward. In our usage scenario, there is never downtime, it's a global
24/7 operation.

What is impacted the worst, the read or write?

How does a node handle compaction when there is a spike of writes coming to
it?



Edward Capriolo wrote:
 
 On Sat, Feb 5, 2011 at 11:59 AM, buddhasystem potek...@bnl.gov wrote:

 Just wanted to see if someone with experience in running an actual
 service
 can advise me:

 how often do you run nodetool compact on your nodes? Do you stagger it in
 time, for each node? How badly is performance affected?

 I know this all seems too generic but then again no two clusters are
 created
 equal anyhow. Just wanted to get a feel.

 Thanks,
 Maxim

 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-bad-is-teh-impact-of-compaction-on-performance-tp5995868p5995868.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.

 
 This is an interesting topic. Cassandra can now remove tombstones on
 non-major compaction. For some use cases you may not have to trigger
 nodetool compact yourself to remove tombstones. Use cases that do not
 to many updates, deletes may have the least need to run compaction
 yourself.
 
 !However! If you have smaller SSTables, or less SSTables your read
 operations will be more efficient.
 
 if you have downtime such as from 1AM-6AM. Going through a major
 compaction might shrink you dataset significantly and that will make
 reads better.
 
 Compaction can be more or less intensive. The largest factor is is row
 size.  Users with large rows probably see faster compaction while
 smaller rows see it take a long time. You can lower the priority of
 the compaction thread for experimentation.
 
 As to the performance you want to get your cluster to the state where
 it is not compacting often. This may mean you need more nodes to
 handle writes.
 
 I graph the compaction information from JMX
 http://www.jointhegrid.com/cassandra/cassandra-cacti-m6.jsp
 to get a feel for how often a node is compacting on average. Also I
 cross reference the compaction with Read latency and IO graphs I have
 to see what impact compaction has on reads.
 
 Forcing a major compaction also lowers the chances a compaction will
 happen during the day on peak time. I major compact a few cluster
 nodes each night through cron (gc time 3 days). This has been good for
 keeping our data on disk as small as possible. Forcing the major
 compact at night uses IO, but i find it saves IO over the course of
 the day because each read seeks less on disk.
 
 

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-bad-is-the-impact-of-compaction-on-performance-tp5995868p5995978.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: How to delete bulk data from cassandra 0.6.3

2011-02-05 Thread Ali Ahsan

Thanks for detailed replay


On 02/05/2011 10:01 PM, Edward Capriolo wrote:

On Sat, Feb 5, 2011 at 11:35 AM, Ali Ahsanali.ah...@panasiangroup.com  wrote:

Thanks for replying Edward Capriolo.Will this effect cassandra ring
  integrity? Another question is that will cassandra work properly after this
operation.And will it be possible to restore deleted  data from backup?.


in 0.6.X
pkill `pid of cassandra`
rm -rf * /var/lib/cassandra/data/keyspace/CF you want to delete*
(start cassandra)




--
S.Ali Ahsan

Senior System Engineer

e-Business (Pvt) Ltd

49-C Jail Road, Lahore, P.O. Box 676
Lahore 54000, Pakistan

Tel: +92 (0)42 3758 7140 Ext. 128

Mobile: +92 (0)345 831 8769

Fax: +92 (0)42 3758 0027

Email: ali.ah...@panasiangroup.com



www.ebusiness-pg.com

www.panasiangroup.com

Confidentiality: This e-mail and any attachments may be confidential
and/or privileged. If you are not a named recipient, please notify the
sender immediately and do not disclose the contents to another person
use it for any purpose or store or copy the information in any medium.
Internet communications cannot be guaranteed to be timely, secure, error
or virus-free. We do not accept liability for any errors or omissions.



I am not sure what you mean by data integrity.

In short, when Cassandra starts up it searches it's data directories
and loads up the data, index, bloom filters, and saved caches it
finds.

Unless the files are corrupt it will happily load up what it finds.

Restores are done by the process your described , stop server, restore
files, start server.





--
S.Ali Ahsan

Senior System Engineer

e-Business (Pvt) Ltd

49-C Jail Road, Lahore, P.O. Box 676
Lahore 54000, Pakistan

Tel: +92 (0)42 3758 7140 Ext. 128

Mobile: +92 (0)345 831 8769

Fax: +92 (0)42 3758 0027

Email: ali.ah...@panasiangroup.com



www.ebusiness-pg.com

www.panasiangroup.com

Confidentiality: This e-mail and any attachments may be confidential
and/or privileged. If you are not a named recipient, please notify the
sender immediately and do not disclose the contents to another person
use it for any purpose or store or copy the information in any medium.
Internet communications cannot be guaranteed to be timely, secure, error
or virus-free. We do not accept liability for any errors or omissions.



How to upgrade cassandra from 0.6.3 to 0.7

2011-02-05 Thread Ali Ahsan

Hi All
We are planning to upgrade cassanra from 0.6.3 to 0.7 any one can guide 
me to web link where i can find upgrade procedure.




Re: postgis cassandra?

2011-02-05 Thread Sean Ochoa
That's a good question, Bill.

The data that I'm trying to store begins as a simple point.  But, moving fo=
rward, it will become more like complex geometries.  I assume that I can si=
mply create a JSON-like object and insert it.  Which, for now, that works. =
 I'm just wondering if theres a typical / publicly accepted standard of sto=
ring somewhat complex spatial data in Cassandra.

Additionally, I would like to figure out how one goes about slicing on large
spatial data sets given situations where, for instance, I would like to get
all the points in a column-family where the point is within a shape.  I
guess it boils down to using a spatial comparator of some sort, but I
haven't seen one, yet.

 - Sean

On Sat, Feb 5, 2011 at 9:51 AM, William R Speirs bill.spe...@gmail.comwrote:

 I know nothing about postgis and little about spacial data, but if you're
 simply talking about data that relates to some latitude  longitude pair,
 you could have your row key simply be the concatenation of the two:
 lat:long.

 Can you provide more details about the type of data you're looking to
 store?

 Thanks...

 Bill-


 On 02/05/2011 12:22 PM, Sean Ochoa wrote:

 Can someone tell me how to represent spatial data (coming from postgis) in
 Cassandra?

  - Sean




-- 
Sean | M (206) 962-7954 | GV (760) 624-8718


Re: How bad is teh impact of compaction on performance?

2011-02-05 Thread Edward Capriolo
On Sat, Feb 5, 2011 at 12:48 PM, buddhasystem potek...@bnl.gov wrote:

 Thanks Edward. In our usage scenario, there is never downtime, it's a global
 24/7 operation.

 What is impacted the worst, the read or write?

 How does a node handle compaction when there is a spike of writes coming to
 it?



 Edward Capriolo wrote:

 On Sat, Feb 5, 2011 at 11:59 AM, buddhasystem potek...@bnl.gov wrote:

 Just wanted to see if someone with experience in running an actual
 service
 can advise me:

 how often do you run nodetool compact on your nodes? Do you stagger it in
 time, for each node? How badly is performance affected?

 I know this all seems too generic but then again no two clusters are
 created
 equal anyhow. Just wanted to get a feel.

 Thanks,
 Maxim

 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-bad-is-teh-impact-of-compaction-on-performance-tp5995868p5995868.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.


 This is an interesting topic. Cassandra can now remove tombstones on
 non-major compaction. For some use cases you may not have to trigger
 nodetool compact yourself to remove tombstones. Use cases that do not
 to many updates, deletes may have the least need to run compaction
 yourself.

 !However! If you have smaller SSTables, or less SSTables your read
 operations will be more efficient.

 if you have downtime such as from 1AM-6AM. Going through a major
 compaction might shrink you dataset significantly and that will make
 reads better.

 Compaction can be more or less intensive. The largest factor is is row
 size.  Users with large rows probably see faster compaction while
 smaller rows see it take a long time. You can lower the priority of
 the compaction thread for experimentation.

 As to the performance you want to get your cluster to the state where
 it is not compacting often. This may mean you need more nodes to
 handle writes.

 I graph the compaction information from JMX
 http://www.jointhegrid.com/cassandra/cassandra-cacti-m6.jsp
 to get a feel for how often a node is compacting on average. Also I
 cross reference the compaction with Read latency and IO graphs I have
 to see what impact compaction has on reads.

 Forcing a major compaction also lowers the chances a compaction will
 happen during the day on peak time. I major compact a few cluster
 nodes each night through cron (gc time 3 days). This has been good for
 keeping our data on disk as small as possible. Forcing the major
 compact at night uses IO, but i find it saves IO over the course of
 the day because each read seeks less on disk.



 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-bad-is-the-impact-of-compaction-on-performance-tp5995868p5995978.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.


It does not have to be downtime. It just has to be a slow time. Use
your traffic graphs to run major compact at the slowest time so it is
least impacting on performance.

Compaction does not generally effect writes or busts or writes,
especially if your writes go to a separate commit log disk.

In the best case scenario compaction may not effect your performance
at all. An example of this would be if your use case is near 100%
reads are serviced by row cache disk is not a factor.

Generally speaking if you have good fast hard disks, and only a single
node is compacting at a given time the cluster absorbs this. In 0.7.0
dynamic snitch should help re-route traffic away from slower nodes for
even less impact. In other words, making compaction non impacting is
all about capacity.


row keys

2011-02-05 Thread Sean Ochoa
Hey all.

I'm using Pycassa to insert some spatial data into Cassandra.  Here's where
I am on the tutorial:
http://pycassa.github.com/pycassa/tutorial.html#inserting-data  And, I'm not
quite understanding where row-keys come from.  What mind-set should I have
when I generate them for the values that are being inserted?

Oh, and a note about the values that I'm inserting:  I've got an object
identifier, time-stamp, lat, and long.

 - Sean


Re: How to upgrade cassandra from 0.6.3 to 0.7

2011-02-05 Thread Ali Ahsan

Ok let me read it out.

On 02/06/2011 12:20 AM, Tyler Hobbs wrote:


We are planning to upgrade cassanra from 0.6.3 to 0.7 any one can
guide me to web link where i can find upgrade procedure.


NEWS.txt in an 0.7.0 package covers all the details of upgrading quite 
well.


--
Tyler Hobbs
Software Engineer, DataStax http://datastax.com/
Maintainer of the pycassa http://github.com/pycassa/pycassa 
Cassandra Python client library







Re: order of index expressions

2011-02-05 Thread Jonathan Ellis
On Sat, Feb 5, 2011 at 8:48 AM, Shaun Cutts sh...@cuttshome.net wrote:
 Hello,
 I'm wondering if cassandra is sensitive to the order of index expressions in
 (pycassa call) get_indexed_slices?

No.

 If I have several column indexes available, will it attempt to optimize the
 order?

Yes.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Merging the rows of two column families(with similar attributes) into one ??

2011-02-05 Thread Tyler Hobbs

 if you have under control parameters like
 memtable_throughput  memtable_operations which are set per column
 family basis then you can directly control  adjust by splitting the
 memory space between two CFs in proportion to what you would do in
 single CF.
 Hence there should be no extra memory consumption for multiple CFs
 that have been split from single one??


Yes, I think you have the right idea here.  This *is* a small amount of
overhead for the extra memtable and keeping track of a second set of
indexes, bloom filters, sstables, etc.

Regarding the compactions, I think even if they are more the size of
 the SST files to be compacted is smaller as the data has been split
 into two.
 Then more compactions but smaller too!!


Yes.

if some CF is written less often as compared to other CFs, then the
 memtable would consume space in the memory until it is flushed, this
 memory space could have been much better used by a CF that's heavily
 written and read. And if you try to make the thresholds for flush
 smaller then more compactions would be needed.


If you merge the two CFs together, then updates to the 'less freqent' rows
will still consume memory, only it will all be within one memtable.
(Memtables grow in size until they are flushed, they don't reserve some set
amount of memory.)  Furthermore, because your memtables will be filled up by
the 'more frequent' rows, the 'less frequent' rows will get fewer
updates/overwrites in memory, so they will tend to be spread across a
greater number of SSTables.

-- 
Tyler Hobbs
Software Engineer, DataStax http://datastax.com/
Maintainer of the pycassa http://github.com/pycassa/pycassa Cassandra
Python client library


Re: row keys

2011-02-05 Thread Stephen Connolly
you really need to know how you will be pulling the data back out again. you
could use the object id as the row key, timestamp as the column name and
long/lat as the value... that would allow you to query by object is and get
the time sorted location trace... but if you have a lot of frequent readings
for each object, that would be a poor model because very large rows can
impact performance... in that case you might use the object id combined with
the timestamp rounded to the nearest hour (say) to keep the row size
lower...

but if you are more interested in tracking multiple objects per time, you
might use the timestamp as row key, object id as column name, etc...

with cassandra you need to know what queries you will want to make and
design for that

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 5 Feb 2011 18:17, Sean Ochoa sean.m.oc...@gmail.com wrote:


Re: order of index expressions

2011-02-05 Thread buddhasystem

Jonathan,

what's the implementation of that? I.e. is is a product of indexes or nested
loops?

Thanks,

Maxim

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/order-of-index-expressions-tp5995909p5996488.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


revisioned data

2011-02-05 Thread Raj Bakhru
Hi all -

We're new to Cassandra and have read plenty on the data model, but we wanted
to poll for thoughts on how to best handle this structure.

We have simple objects that have and ID and we want to maintain a history of
all the revisions.

e.g.
MyObject:
ID (long)
name
other fields
update time (long [date])


Any time the object changes, we'll store down a new version of the object
(same ID, but different update time and other fields).  We need to be able
to query out what the object was as-of any time historically.  We also need
to be able to query out what some or all of the items of this object type
were as-of any time historically..

In SQL, we'd just find the max(id) where update time  queried_as_of_time

In Cassandra, we were thinking of modeling as follows:

CF:  MyObjectType
Super-Column: ID of object (e.g. 625)
Column:  updatetime  (e.g. 1000245242)
Value: byte[] of serialized object

We were thinking of using the OrderingPartitioner and using range queries
against the data.

Does this make sense?  Are we approaching this in the wrong way?

Thanks a lot


Re: revisioned data

2011-02-05 Thread Victor Kabdebon
Hello Raj,

No it actually doesn't make sense from the point of view of Cassandra;
OrderingPartioner preserves the order of the *keys*. The Ordering will be
done according to the *supercolumn name*. In that case you can set the
ordering with compare_super_with (sorry I don't remember exactly the new
term in Cassandra, but that's the idea). The compare_with will order your
columns inside your supercolumn.

However, and I think that many will agree here, tend to avoid SuperColumn.
Rather than using SuperColumns try to think like that :

CF1 : ObjectStore
Key :ID (long)
Columns : {
name
other fields
update time (long [date])
...}

CF2 : ObjectOrder
Key : myorderedobjects
Column:{
   { name : identifier that can be sorted
   value :ObjectID},
   ...
}

Best regards,
Victor Kabdebon,
http://www.voxnucleus.fr

2011/2/5 Raj Bakhru rbak...@gmail.com

 Hi all -

 We're new to Cassandra and have read plenty on the data model, but we
 wanted to poll for thoughts on how to best handle this structure.

 We have simple objects that have and ID and we want to maintain a history
 of all the revisions.

 e.g.
 MyObject:
 ID (long)
 name
 other fields
 update time (long [date])


 Any time the object changes, we'll store down a new version of the object
 (same ID, but different update time and other fields).  We need to be able
 to query out what the object was as-of any time historically.  We also need
 to be able to query out what some or all of the items of this object type
 were as-of any time historically..

 In SQL, we'd just find the max(id) where update time  queried_as_of_time

 In Cassandra, we were thinking of modeling as follows:

 CF:  MyObjectType
 Super-Column: ID of object (e.g. 625)
 Column:  updatetime  (e.g. 1000245242)
 Value: byte[] of serialized object

 We were thinking of using the OrderingPartitioner and using range queries
 against the data.

 Does this make sense?  Are we approaching this in the wrong way?

 Thanks a lot






Re: Merging the rows of two column families(with similar attributes) into one ??

2011-02-05 Thread Ertio Lew
Thanks Tyler!

I think I'll have to very carefully take into consideration all these
factors before deciding upon how to split my data into CFs, as this
cannot an objective answer. I am expecting around atleast 8 column
families for my entire application, if I split the data strictly
according to the various features and requirements of the application.

I think there should have been provision for specifying on per query
basis, what rows be cached while you're reading them, from a row_cache
enabled CF. Thus you could easily merge similar data for different
features of your application in a single CF. I believe, this would
have also lead to much more efficient use of the cache space!!( if you
were using same data for different parts in your app which have
different caching needs)

Regards,

Ertio

On Sun, Feb 6, 2011 at 1:22 AM, Tyler Hobbs ty...@datastax.com wrote:
 if you have under control parameters like
 memtable_throughput  memtable_operations which are set per column
 family basis then you can directly control  adjust by splitting the
 memory space between two CFs in proportion to what you would do in
 single CF.
 Hence there should be no extra memory consumption for multiple CFs
 that have been split from single one??

 Yes, I think you have the right idea here.  This is a small amount of
 overhead for the extra memtable and keeping track of a second set of
 indexes, bloom filters, sstables, etc.

 Regarding the compactions, I think even if they are more the size of
 the SST files to be compacted is smaller as the data has been split
 into two.
 Then more compactions but smaller too!!

 Yes.

 if some CF is written less often as compared to other CFs, then the
 memtable would consume space in the memory until it is flushed, this
 memory space could have been much better used by a CF that's heavily
 written and read. And if you try to make the thresholds for flush
 smaller then more compactions would be needed.

 If you merge the two CFs together, then updates to the 'less freqent' rows
 will still consume memory, only it will all be within one memtable.
 (Memtables grow in size until they are flushed, they don't reserve some set
 amount of memory.)  Furthermore, because your memtables will be filled up by
 the 'more frequent' rows, the 'less frequent' rows will get fewer
 updates/overwrites in memory, so they will tend to be spread across a
 greater number of SSTables.

 --
 Tyler Hobbs
 Software Engineer, DataStax
 Maintainer of the pycassa Cassandra Python client library




Re: order of index expressions

2011-02-05 Thread Shaun Cutts
Thanks for the response!

So.. I *may* have a bug to report (at least I can generate radically different 
response times based on expression order with a multiply indexed columnfamily), 
but first I'll have to upgrade to a stable version (currently I have 7.0rc2 
installed).

I was also wondering where the code that does this is... is it in 

java.org.apache.cassandra.db.columniterator.IndexedSliceReader?


Thanks,

-- Shaun

On Feb 5, 2011, at 2:39 PM, Jonathan Ellis wrote:

 On Sat, Feb 5, 2011 at 8:48 AM, Shaun Cutts sh...@cuttshome.net wrote:
 Hello,
 I'm wondering if cassandra is sensitive to the order of index expressions in
 (pycassa call) get_indexed_slices?
 
 No.
 
 If I have several column indexes available, will it attempt to optimize the
 order?
 
 Yes.
 
 -- 
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: order of index expressions

2011-02-05 Thread Jonathan Ellis
ColumnFamilyStore.scan

On Sat, Feb 5, 2011 at 10:32 PM, Shaun Cutts sh...@cuttshome.net wrote:
 Thanks for the response!

 So.. I *may* have a bug to report (at least I can generate radically 
 different response times based on expression order with a multiply indexed 
 columnfamily), but first I'll have to upgrade to a stable version (currently 
 I have 7.0rc2 installed).

 I was also wondering where the code that does this is... is it in

 java.org.apache.cassandra.db.columniterator.IndexedSliceReader?


 Thanks,

 -- Shaun

 On Feb 5, 2011, at 2:39 PM, Jonathan Ellis wrote:

 On Sat, Feb 5, 2011 at 8:48 AM, Shaun Cutts sh...@cuttshome.net wrote:
 Hello,
 I'm wondering if cassandra is sensitive to the order of index expressions in
 (pycassa call) get_indexed_slices?

 No.

 If I have several column indexes available, will it attempt to optimize the
 order?

 Yes.

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Ruby thrift is trying to write Time as string

2011-02-05 Thread Joshua Partogi
Hi,

I don't know whether my assumption is right or not. When I tried to insert a
Time value into a column I am getting this exception:

vendor/ruby/1.8/gems/thrift-0.5.0/lib/thrift/protocol/binary_protocol.rb:106:in
`write_string'
vendor/ruby/1.8/gems/thrift-0.5.0/lib/thrift/client.rb:35:in `write'
vendor/ruby/1.8/gems/thrift-0.5.0/lib/thrift/client.rb:35:in `send_message'
vendor/ruby/1.8/gems/cassandra-0.9.0/lib/./vendor/0.7/gen-rb/cassandra.rb:213:in
`send_batch_mutate'
vendor/ruby/1.8/gems/cassandra-0.9.0/lib/./vendor/0.7/gen-rb/cassandra.rb:208:in
`batch_mutate'
vendor/ruby/1.8/gems/thrift_client-0.6.0/lib/thrift_client/abstract_thrift_client.rb:115:in
`send'
vendor/ruby/1.8/gems/thrift_client-0.6.0/lib/thrift_client/abstract_thrift_client.rb:115:in
`handled_proxy'
vendor/ruby/1.8/gems/thrift_client-0.6.0/lib/thrift_client/abstract_thrift_client.rb:57:in
`batch_mutate'
vendor/ruby/1.8/gems/cassandra-0.9.0/lib/cassandra/0.7/protocol.rb:8:in
`_mutate'
vendor/ruby/1.8/gems/cassandra-0.9.0/lib/cassandra/cassandra.rb:130:in
`insert'

But I am not getting any error if I insert a Time value into a sub-column.

Is this an error or does it suppose to work that way?

Thanks heaps for the insight.

Kind regards,
Joshua.

-- 
http://twitter.com/jpartogi