Move

2011-02-04 Thread Stu King
I am running a move on one node in a 5 node cluster. There are no writes to
the cluster during the move.

I am seeing an exception on one of the nodes (not the node which I am doing
the move on).

The exception stack is

ERROR [CompactionExecutor:1] 2011-02-04 08:10:46,855 PrecompactedRow.java
(line 82) Skipping row DecoratedKey(656517988577125179070965247963445,
555345524e414d452e6a6f746173696c766573747265) in
/var/lib/cassandra/data/Wenzani/UUID_UUID_SUPER-e-408-Data.db
java.io.EOFException
at java.io.RandomAccessFile.readFully(RandomAccessFile.java:416)
at
org.apache.cassandra.utils.FBUtilities.readByteArray(FBUtilities.java:280)
at
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:94)
at
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:364)
at
org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:313)
at
org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:129)
at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:137)
at org.apache.cassandra.io.PrecompactedRow.init(PrecompactedRow.java:78)
at
org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:138)
at
org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:107)
at
org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:42)
at
org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
at
org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
at
org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
at
org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:323)
at
org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:122)
at
org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:92)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)

Output from nodetool ring.

Address Status State   LoadOwnsToken


105916716988735575505223832861775432335
1.1.1.2   Up Normal  34.29 GB45.36%
 12956529933298582072612274413196299151
1.1.1.3Up Normal  34.46 GB11.41%
 32366675628954067180152712803029297247
1.1.1.4   Up Normal  48.96 GB11.40%
 51756081624280481651195537730585467204
1.1.1.5  Up Normal  22 GB   22.78%
 90515859237527157456212262236145255573
1.1.1.6   Up Leaving 13.34 GB9.05%
105916716988735575505223832861775432335

1.1.1.6 is the node which I executed the move on. It seems to be locked in
the Leaving state. Is this normal until the move completes?

There is almost no activity in the logs and very little cpu usage across the
cluster.

Is this expected for a move?

Cheers

Stu


Re: for counters: does read have to be ALL ?

2011-02-04 Thread Sylvain Lebresne
On Thu, Feb 3, 2011 at 10:39 PM, Yang tedd...@gmail.com wrote:

 the pdf at the design doc

 https://issues.apache.org/jira/secure/attachment/12459754/Partitionedcountersdesigndoc.pdf

 does say so:
 page 2 - strongly consistent read: requires consistency level ALL.
 (QUORUM is insufficient.)
 

 but the wiki  http://wiki.apache.org/cassandra/Counters
 gave a code example:

 rv = client.get_counter('key1', ColumnPath(column_family='Counter1',
 column='c1'), ConsistencyLevel.ONE)


 is one of them wrong?


Three things:
First, the design doc is talking of strongly consistent reads, the wiki
gives
a simple exemple of a read (it's even followed with a warning) so there is
no actual contradiction here.

Second, and more to the point, the design docs are slightly outdated, on
this point at least. There is now support for QUORUM (or ALL) writes (since
https://issues.apache.org/jira/browse/CASSANDRA-1944), so you have the
usual consistency guarantee (i.e, you get strong consistency with QUORUM
(resp. ONE) read provided you wrote at QUORUM (resp. ALL)).

Third, it is good to recall that counters are not considered stable yet
(that
includes the documentations).

--
Sylvain


 Thanks
 Yang



Re: cassandra 0.6.11 binary package problem

2011-02-04 Thread Stephen Connolly
That's because of an issue I found in the ANT scripts while doing the
maven-ant-tasks switch on 0.7.0.

Any jar in build will be bundled... (so ivy goes into the bin dist...
when I did the m-a-t version eric was wondering why i was including
m-a-t in the bin dist, and I said I was being symmetric with the ivy
version... he said it was a failed experiment that had been left
in...)

For 0.7.x there should just be the one jar.

For the 0.6.x dists if you have forgotten to run ant realclean, then
there could be earlier versions present

-Stephen

On 3 February 2011 14:36, Jonathan Ellis jbel...@gmail.com wrote:
 Well, that's odd. :)

 Do any of the other tar.gz balls contain multiple jars?

 On Thu, Feb 3, 2011 at 6:06 AM, Jean-Yves LEBLEU jleb...@gmail.com wrote:
 Hi all,

 Just for info, in apache-cassandra-0.6.11-bin.tar.gz there are both
 apache-cassandra-0.6.10.jar  and apache-cassandra-0.6.11.jar in the
 lib directory.

 Causing troubles to my upgrade scripts which use this file to get
 installed version and check if upgrade needed . :(

 Thanks for the good job.
 Jean-Yves




 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



RE: Using Cassandra to store files

2011-02-04 Thread Brendan Poole

Hi Daniel 
 
When you say We are doing this do you mean via NFS or Cassandra. 
 
Thanks
 
Brendan
 
 
 



 Brendan Poole
 Systems Developer
 NewLaw Solicitors
 Helmont House 
 Churchill Way
 Cardiff
 brendan.po...@new-law.co.uk
 029 2078 4283
 www.new-law.co.uk





From: Daniel Doubleday [mailto:daniel.double...@gmx.net] 
Sent: 03 February 2011 17:21
To: user@cassandra.apache.org
Subject: Re: Using Cassandra to store files


Hundreds of thousands doesn't sound too bad. Good old NFS would do with
an ok directory structure. 

We are doing this. Our documents are pretty small though (a few kb). We
have around 40M right now with around 300GB total.

Generally the problem is that much data usually means that cassandra
becomes io bound during repairs and compactions even if your hot dataset
would fit in the page cache. There are efforts to overcome this and 0.7
will help with repair problems but for the time being you have to have
quite some headroom in terms of io performance to handle these
situations.  

Here is a related post:

http://comments.gmane.org/gmane.comp.db.cassandra.user/11190


On Feb 3, 2011, at 1:33 PM, Brendan Poole wrote:


Hi
 
Would anyone recommend using Cassandra for storing hundreds of
thousands of documents in Word/PDF format? The manual says it can store
documents under 64MB with no issue but was wondering if anyone is using
it for this specific perpose.  Would it be efficient/reliable and is
there anything I need to bear in mind?
 
Thanks in advance
 

Signature.jpg http://www.new-law.co.uk/  Brendan Poole
 Systems Developer
  NewLaw Solicitors
 Helmont House  
 Churchill Way
 Cardiff
 brendan.po...@new-law.co.uk
 029 2078 4283
 www.new-law.co.uk http://www.new-law.co.uk/ 



 




P Please consider the environment before printing this e-mail

Important - The information contained in this email (and any
attached files) is confidential and may be legally privileged and
protected by law. 



The intended recipient is authorised to access it. If you are
not the intended recipient, please notify the sender immediately and
delete or destroy all copies. You must not disclose the contents of this
email to anyone. Unauthorised use, dissemination, distribution,
publication or copying of this communication is prohibited.



NewLaw Solicitors does not accept any liability for any
inaccuracies or omissions in the contents of this email that may have
arisen as a result of transmission. This message and any attachments are
believed to be free of any virus or defect that might affect any
computer system into which it is received and opened. However, it is the
responsibility of the recipient to ensure that it is virus free;
therefore, no responsibility is accepted for any loss or damage in any
way arising from its use.



NewLaw Solicitors is the trading name of NewLaw Legal Ltd, a
limited company registered in England and Wales with registered number
07200038. 
NewLaw Legal Ltd is regulated by the Solicitors Regulation
Authority whose website is http://www.sra.org.uk
http://www.sra.org.uk/ 



The registered office of NewLaw Legal Ltd is at Helmont House,
Churchill Way, Cardiff, CF10 2HE. Tel: 0845 756 6870, Fax: 0845 756
6871, Email: i...@new-law.co.uk mailto:i...@new-law.co.uk .
www.new-law.co.uk http://www.new-law.co.uk/ . 



We use the word 'partner' to refer to a shareowner or director
of the company, or an employee or consultant of the company who is a
lawyer with equivalent standing and qualifications. A list of the
directors is displayed at the above address, together with a list of
those persons who are designated as partners.

Please consider the environment before printing this e-mail

Important - The information contained in this email (and any attached files) is 
confidential and may be legally privileged and protected by law.  

The intended recipient is authorised to access it.  If you are not the intended 
recipient, please notify the sender immediately and delete or destroy all 
copies. You must not disclose the 
contents of this email to anyone. Unauthorised use, dissemination, 
distribution, publication or copying of this communication is prohibited. 

NewLaw Solicitors does not accept any liability for any inaccuracies or 
omissions in the contents of this email that may have arisen as a result of 
transmission.  This message and any 
attachments are believed to be free of any virus or defect that might affect 
any computer system into which it is received and opened.  However,it is the 
responsibility of the recipient to 
ensure that it is virus free; therefore, no responsibility is accepted for 

RE: Using Cassandra to store files

2011-02-04 Thread Brendan Poole

The first line on the couchDB website doesn't fill me with confidence...

The 1.0.0 release has a critical bug which can lead to data loss in the
default configuration



 Brendan Poole
 Systems Developer
 NewLaw Solicitors
 Helmont House 
 Churchill Way
 Cardiff
 brendan.po...@new-law.co.uk
 029 2078 4283
 www.new-law.co.uk

-Original Message-


From: buddhasystem [mailto:potek...@bnl.gov] 
Sent: 03 February 2011 15:03
To: cassandra-u...@incubator.apache.org
Subject: Re: Using Cassandra to store files


CouchDB

--
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-C
assandra-to-store-files-tp5988698p5989122.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive
at Nabble.com.

Please consider the environment before printing this e-mail

Important - The information contained in this email (and any attached files) is 
confidential and may be legally privileged and protected by law.  

The intended recipient is authorised to access it.  If you are not the intended 
recipient, please notify the sender immediately and delete or destroy all 
copies. You must not disclose the 
contents of this email to anyone. Unauthorised use, dissemination, 
distribution, publication or copying of this communication is prohibited. 

NewLaw Solicitors does not accept any liability for any inaccuracies or 
omissions in the contents of this email that may have arisen as a result of 
transmission.  This message and any 
attachments are believed to be free of any virus or defect that might affect 
any computer system into which it is received and opened.  However,it is the 
responsibility of the recipient to 
ensure that it is virus free; therefore, no responsibility is accepted for any 
loss or damage in any way arising from its use. 

NewLaw Solicitors is the trading name of NewLaw Legal Ltd, a limited company 
registered in England and Wales with registered number 07200038.  
NewLaw Legal Ltd is regulated by the Solicitors Regulation Authority whose 
website is http://www.sra.org.uk 

The registered office of NewLaw Legal Ltd is at Helmont House, Churchill Way, 
Cardiff, CF10 2HE. Tel: 0845 756 6870, Fax: 0845 756 6871, Email: 
i...@new-law.co.uk. www.new-law.co.uk.  

We use the word ‘partner’ to refer to a shareowner or director of the company, 
or an employee or consultant of the company who is a lawyer with equivalent 
standing and qualifications. A list 
of the directors is displayed at the above address, together with a list of 
those persons who are designated as partners. 

Recall: Using Cassandra to store files

2011-02-04 Thread Brendan Poole

Brendan Poole would like to recall the message, Using Cassandra to store 
files.

 Brendan Poole
 Systems Developer
 NewLaw Solicitors
 Helmont House 
 Churchill Way
 Cardiff
 brendan.po...@new-law.co.uk
 029 2078 4283
 www.new-law.co.uk

Please consider the environment before printing this e-mail

Important - The information contained in this email (and any attached files) is 
confidential and may be legally privileged and protected by law.  

The intended recipient is authorised to access it.  If you are not the intended 
recipient, please notify the sender immediately and delete or destroy all 
copies. You must not disclose the 
contents of this email to anyone. Unauthorised use, dissemination, 
distribution, publication or copying of this communication is prohibited. 

NewLaw Solicitors does not accept any liability for any inaccuracies or 
omissions in the contents of this email that may have arisen as a result of 
transmission.  This message and any 
attachments are believed to be free of any virus or defect that might affect 
any computer system into which it is received and opened.  However,it is the 
responsibility of the recipient to 
ensure that it is virus free; therefore, no responsibility is accepted for any 
loss or damage in any way arising from its use. 

NewLaw Solicitors is the trading name of NewLaw Legal Ltd, a limited company 
registered in England and Wales with registered number 07200038.  
NewLaw Legal Ltd is regulated by the Solicitors Regulation Authority whose 
website is http://www.sra.org.uk 

The registered office of NewLaw Legal Ltd is at Helmont House, Churchill Way, 
Cardiff, CF10 2HE. Tel: 0845 756 6870, Fax: 0845 756 6871, Email: 
i...@new-law.co.uk. www.new-law.co.uk.  

We use the word ‘partner’ to refer to a shareowner or director of the company, 
or an employee or consultant of the company who is a lawyer with equivalent 
standing and qualifications. A list 
of the directors is displayed at the above address, together with a list of 
those persons who are designated as partners. inline: Signature.jpg

Re: Do supercolumns have a purpose?

2011-02-04 Thread Sylvain Lebresne
On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone m...@simplegeo.com wrote:

 On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne sylv...@datastax.comwrote:

 On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn da...@lookin2.comwrote:

 The advantage would be to enable secondary indexes on supercolumn
 families.


 Then I suggest opening a ticket for adding secondary indexes to
 supercolumn families and voting on it. This will be 1 or 2 order of
 magnitude less work than getting rid of super column internally, and
 probably a much better solution anyway.


 I realize that this is largely subjective, and on such matters code speaks
 louder than words, but I don't think I agree with you on the issue of which
 alternative is less work, or even which is a better solution.


You are right, I put probably too much emphase in that sentence. My main
point was to say that it's think it is better to create tickets for what you
want, rather than for something else completely different that would, as a
by-product, give you what you want.
Then I suspect that *if* the only goal is to get secondary indexes on super
columns, then there is a good chance this would be less work than getting
rid of super columns. But to be fair, secondary indexes on super columns may
not make too much sense without #598, which itself would require quite some
work, so clearly I spoke a bit quickly.


 If the goal is to have a hierarchical model, limiting the depth to two
 seems arbitrary. Why not go all the way and allow an arbitrarily deep
 hierarchy?

 If a more sophisticated hierarchical model is deemed unnecessary, or
 impractical, allowing a depth of two seems inconsistent and
 unnecessary. It's pretty trivial to overlay a hierarchical model on top of
 the map-of-sorted-maps model that Cassandra implements. Ed Anuff has
 implemented a custom comparator that does the job [1]. Google's Megastore
 has a similar architecture and goes even further [2].

 It seems to me that super columns are a historical artifact from
 Cassandra's early life as Facebook's inbox storage system. They needed
 posting lists of messages, sharded by user. So that's what they built. In my
 dealings with the Cassandra code, super columns end up making a mess all
 over the place when algorithms need to be special cased and branch based on
 the column/supercolumn distinction.

 I won't even mention what it does to the thrift interface.


Actually, I agree with you, more than you know. If I were to start coding
Cassandra now, I wouldn't include super columns (and I would probably not go
for a depth unlimited hierarchical model either). But it's there and I'm not
sure getting rid of them fully (meaning, including in thrift) is an option
(it would be a big compatibility breakage). And (even though I certainly
though about this more than once :)) I'm slightly less enthusiastic about
keeping them in thrift but encoding them in regular column family
internally: it would still be a lot of work but we would still probably end
up with nasty tricks to stick to the thrift api.

--
Sylvain


 Mike

 [1] http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html
 [2] http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf



How to delete bulk data from cassandra 0.6.3

2011-02-04 Thread Ali Ahsan

Hi All

Is there any way i can delete column families data (not removing column 
families ) from Cassandra without effecting ring integrity.What if  i 
delete some column families data in linux with rm command  ?


--
S.Ali Ahsan

Senior System Engineer

e-Business (Pvt) Ltd

49-C Jail Road, Lahore, P.O. Box 676
Lahore 54000, Pakistan

Tel: +92 (0)42 3758 7140 Ext. 128

Mobile: +92 (0)345 831 8769

Fax: +92 (0)42 3758 0027

Email: ali.ah...@panasiangroup.com



www.ebusiness-pg.com

www.panasiangroup.com

Confidentiality: This e-mail and any attachments may be confidential
and/or privileged. If you are not a named recipient, please notify the
sender immediately and do not disclose the contents to another person
use it for any purpose or store or copy the information in any medium.
Internet communications cannot be guaranteed to be timely, secure, error
or virus-free. We do not accept liability for any errors or omissions.



get_range_slices and tombstones

2011-02-04 Thread Patrik Modesto
Hi!

I'm getting tombstones from get_range_slices(). I know that's normal.
But is there a way to know that a key is tombstone? I know tombstone
has no columns but I can create a row without any columns that would
look like a tombstone in get_range_slices().

Regards,
Patrik


RE: CQL

2011-02-04 Thread Vivek Mishra
Thanks Eric.
I am able to make it running.

-Original Message-
From: Eric Evans [mailto:eev...@rackspace.com]
Sent: Wednesday, February 02, 2011 9:34 PM
To: user@cassandra.apache.org
Subject: Re: CQL

On Wed, 2011-02-02 at 06:57 +, Vivek Mishra wrote:
 I am trying to run CQL from a java client and facing one issue.
 Keyspace is passed as null. When I execute Use Keyspace1 followed by
 my Select query it is still not working.

Can you provide some minimal sample code that demonstrates the problem you're 
seeing?

--
Eric Evans
eev...@rackspace.com




Impetus to Present Big Data -- Analytics Solutions and Strategies at O'Reilly 
Strata Conference (Feb 1-3) in Santa Clara, CA. Our Big Data technology 
evangelist to speak on 'Deriving Intelligence From Large Data - Using Hadoop 
and Applying Analytics'.

Impetus to organize and host CloudCamp, Delhi on Feb 12. CloudCamp is an 
unconference where early adopters of Cloud Computing technologies exchange 
ideas.

Click http://www.impetus.com to know more.


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: Using Cassandra to store files

2011-02-04 Thread Daniel Doubleday
We are doing this with cassandra.

But we cache a lot. We get around 20 writes/s and 1k reads/s (~ 100Mbit/s) for 
that particular CF but only 1% of them hit our cassandra cluster (5 nodes, 
rf=3).

/Daniel

On Feb 4, 2011, at 9:37 AM, Brendan Poole wrote:

 Hi Daniel
  
 When you say We are doing this do you mean via NFS or Cassandra.
  
 Thanks
  
 Brendan
  
  
  
 
 
 
 Signature.jpg Brendan Poole
  Systems Developer
   NewLaw Solicitors
  Helmont House  
  Churchill Way
  Cardiff
  brendan.po...@new-law.co.uk
  029 2078 4283
  www.new-law.co.uk
 
 
 
 
 
 From: Daniel Doubleday [mailto:daniel.double...@gmx.net] 
 Sent: 03 February 2011 17:21
 To: user@cassandra.apache.org
 Subject: Re: Using Cassandra to store files
 
 Hundreds of thousands doesn't sound too bad. Good old NFS would do with an ok 
 directory structure.
 
 We are doing this. Our documents are pretty small though (a few kb). We have 
 around 40M right now with around 300GB total.
 
 Generally the problem is that much data usually means that cassandra becomes 
 io bound during repairs and compactions even if your hot dataset would fit in 
 the page cache. There are efforts to overcome this and 0.7 will help with 
 repair problems but for the time being you have to have quite some headroom 
 in terms of io performance to handle these situations.  
 
 Here is a related post:
 
 http://comments.gmane.org/gmane.comp.db.cassandra.user/11190
 
 On Feb 3, 2011, at 1:33 PM, Brendan Poole wrote:
 
 Hi
  
 Would anyone recommend using Cassandra for storing hundreds of thousands of 
 documents in Word/PDF format? The manual says it can store documents under 
 64MB with no issue but was wondering if anyone is using it for this specific 
 perpose.  Would it be efficient/reliable and is there anything I need to 
 bear in mind?
  
 Thanks in advance
  
 
 Signature.jpg Brendan Poole
  Systems Developer
   NewLaw Solicitors
  Helmont House  
  Churchill Way
  Cardiff
  brendan.po...@new-law.co.uk
  029 2078 4283
  www.new-law.co.uk
 
 
  
 
 
 P Please consider the environment before printing this e-mail
 Important - The information contained in this email (and any attached files) 
 is confidential and may be legally privileged and protected by law.
 The intended recipient is authorised to access it. If you are not the 
 intended recipient, please notify the sender immediately and delete or 
 destroy all copies. You must not disclose the contents of this email to 
 anyone. Unauthorised use, dissemination, distribution, publication or 
 copying of this communication is prohibited.
 NewLaw Solicitors does not accept any liability for any inaccuracies or 
 omissions in the contents of this email that may have arisen as a result of 
 transmission. This message and any attachments are believed to be free of 
 any virus or defect that might affect any computer system into which it is 
 received and opened. However, it is the responsibility of the recipient to 
 ensure that it is virus free; therefore, no responsibility is accepted for 
 any loss or damage in any way arising from its use.
 NewLaw Solicitors is the trading name of NewLaw Legal Ltd, a limited company 
 registered in England and Wales with registered number 07200038.
 NewLaw Legal Ltd is regulated by the Solicitors Regulation Authority whose 
 website is http://www.sra.org.uk
 The registered office of NewLaw Legal Ltd is at Helmont House, Churchill 
 Way, Cardiff, CF10 2HE. Tel: 0845 756 6870, Fax: 0845 756 6871, Email: 
 i...@new-law.co.uk. www.new-law.co.uk.
 We use the word ‘partner’ to refer to a shareowner or director of the 
 company, or an employee or consultant of the company who is a lawyer with 
 equivalent standing and qualifications. A list of the directors is displayed 
 at the above address, together with a list of those persons who are 
 designated as partners.
 
  
 
 
 P Please consider the environment before printing this e-mail
 Important - The information contained in this email (and any attached files) 
 is confidential and may be legally privileged and protected by law.
 The intended recipient is authorised to access it. If you are not the 
 intended recipient, please notify the sender immediately and delete or 
 destroy all copies. You must not disclose the contents of this email to 
 anyone. Unauthorised use, dissemination, distribution, publication or copying 
 of this communication is prohibited.
 NewLaw Solicitors does not accept any liability for any inaccuracies or 
 omissions in the contents of this email that may have arisen as a result of 
 transmission. This message and any attachments are believed to be free of any 
 virus or defect that might affect any computer system into which it is 
 received and opened. However, it is the responsibility of the recipient to 
 ensure that it is virus free; therefore, no responsibility is accepted for 
 any loss or damage in any way arising from its use.
 

Column Sorting of integer names

2011-02-04 Thread Aditya Narayan
Is there any way to sort the columns named as integers in the descending order ?


Regards
-Aditya


Moving data

2011-02-04 Thread Morey, Gary
I have several large SQL Server 2005 tables.  I need to load the data in
these tables into Cassandra.  FYI, the Cassandra installation is on a
linux server running CentOS.

 

Can anyone suggest the best way to accomplish this?  I am a newbie to
Cassandra, so any advice would be greatly appreciated.

 

Best,

 

Gary



Re: Column Sorting of integer names

2011-02-04 Thread Jonathan Ellis
create a ReversedIntegerType.

On Fri, Feb 4, 2011 at 5:15 AM, Aditya Narayan ady...@gmail.com wrote:
 Is there any way to sort the columns named as integers in the descending 
 order ?


 Regards
 -Aditya




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Unavalible Exception

2011-02-04 Thread Oleg Proudnikov
ruslan usifov ruslan.usifov at gmail.com writes:

 
 HelloWhy i can get Unavalible Exception on live cluster (all nodes is up and
never shutdown)PS: v 0.7.0


Can the nodes see each other? Check Cassandra logs for messages regarding other
nodes.

Oleg




Re: Unavalible Exception

2011-02-04 Thread ruslan usifov
2011/2/4 Oleg Proudnikov ol...@cloudorange.com

 ruslan usifov ruslan.usifov at gmail.com writes:

 
  HelloWhy i can get Unavalible Exception on live cluster (all nodes is up
 and
 never shutdown)PS: v 0.7.0


 Can the nodes see each other? Check Cassandra logs for messages regarding
 other
 nodes.


Yes they can, nodetool ring show well configured ring, and ther is nothing
in logs (no WARN or ERROR)


Re: Unavalible Exception

2011-02-04 Thread Oleg Proudnikov
ruslan usifov ruslan.usifov at gmail.com writes:

 
 
 2011/2/4 Oleg Proudnikov olegp at cloudorange.com
 ruslan usifov ruslan.usifov at gmail.com writes:
 
  HelloWhy i can get Unavalible Exception on live cluster (all nodes is up
andnever shutdown)PS: v 0.7.0
 Can the nodes see each other? Check Cassandra logs for messages regarding 
 other
 nodes.
 
 
 Yes they can, nodetool ring show well configured ring, and ther is nothing in
logs (no WARN or ERROR) 
 
 
 
 

Try searching for InetAddress as INFO






Re: Using a synchronized counter that keeps track of no of users on the application using it to allot UserIds/ keys to the new users after sign up

2011-02-04 Thread Ryan King
On Thu, Feb 3, 2011 at 9:12 PM, Aklin_81 asdk...@gmail.com wrote:
 Thanks Matthew  Ryan,

 The main inspiration behind me trying to generate Ids in sequential
 manner is to reduce the size of the userId, since I am using it for
 heavy denormalization. UUIDs are 16 bytes long, but I can also have a
 unique Id in just 4 bytes, and since this is just a one time process
 when the user signs-up, it makes sense to try cutting down the space
 requirements, if it is feasible without any downsides(!?).

 I am also using userIds to attach to Id of the other data of the user
 on my application. If I could reduce the userId size that I can also
 reduce the size of other Ids, I could drastically cut down the space
 requirements.


 [Sorry for this question is not directly related to cassandra but I
 think Cassandra factors here because of its  tuneable consistency]

Don't generate these ids in cassandra. Use something like snowflake,
flickr's ticket servers [2] or zookeeper sequential nodes.

-ryan


1. http://github.com/twitter/snowflake
2. 
http://code.flickr.com/blog/2010/02/08/ticket-servers-distributed-unique-primary-keys-on-the-cheap/


Re: Move

2011-02-04 Thread Jonathan Ellis
Looks like https://issues.apache.org/jira/browse/CASSANDRA-1992, fixed
for 0.7.1.

On Fri, Feb 4, 2011 at 12:18 AM, Stu King s...@stuartrexking.com wrote:
 I am running a move on one node in a 5 node cluster. There are no writes to
 the cluster during the move.
 I am seeing an exception on one of the nodes (not the node which I am doing
 the move on).
 The exception stack is
 ERROR [CompactionExecutor:1] 2011-02-04 08:10:46,855 PrecompactedRow.java
 (line 82) Skipping row DecoratedKey(656517988577125179070965247963445,
 555345524e414d452e6a6f746173696c766573747265) in
 /var/lib/cassandra/data/Wenzani/UUID_UUID_SUPER-e-408-Data.db
 java.io.EOFException
 at java.io.RandomAccessFile.readFully(RandomAccessFile.java:416)
 at
 org.apache.cassandra.utils.FBUtilities.readByteArray(FBUtilities.java:280)
 at
 org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:94)
 at
 org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:364)
 at
 org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:313)
 at
 org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:129)
 at
 org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:137)
 at org.apache.cassandra.io.PrecompactedRow.init(PrecompactedRow.java:78)
 at
 org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:138)
 at
 org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:107)
 at
 org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:42)
 at
 org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73)
 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
 at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
 at
 org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
 at
 org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
 at
 org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:323)
 at
 org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:122)
 at
 org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:92)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:636)
 Output from nodetool ring.
 Address         Status State   Load            Owns    Token


 105916716988735575505223832861775432335
 1.1.1.2   Up     Normal  34.29 GB        45.36%
  12956529933298582072612274413196299151
 1.1.1.3    Up     Normal  34.46 GB        11.41%
  32366675628954067180152712803029297247
 1.1.1.4   Up     Normal  48.96 GB        11.40%
  51756081624280481651195537730585467204
 1.1.1.5  Up     Normal  22 GB           22.78%
  90515859237527157456212262236145255573
 1.1.1.6   Up     Leaving 13.34 GB        9.05%
 105916716988735575505223832861775432335
 1.1.1.6 is the node which I executed the move on. It seems to be locked in
 the Leaving state. Is this normal until the move completes?
 There is almost no activity in the logs and very little cpu usage across the
 cluster.
 Is this expected for a move?
 Cheers
 Stu



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: How to delete bulk data from cassandra 0.6.3

2011-02-04 Thread Jonathan Ellis
You should use truncate instead. (Then remove the snapshot truncate creates.)

On Fri, Feb 4, 2011 at 2:05 AM, Ali Ahsan ali.ah...@panasiangroup.com wrote:
 Hi All

 Is there any way i can delete column families data (not removing column
 families ) from Cassandra without effecting ring integrity.What if  i delete
 some column families data in linux with rm command  ?

 --
 S.Ali Ahsan

 Senior System Engineer

 e-Business (Pvt) Ltd

 49-C Jail Road, Lahore, P.O. Box 676
 Lahore 54000, Pakistan

 Tel: +92 (0)42 3758 7140 Ext. 128

 Mobile: +92 (0)345 831 8769

 Fax: +92 (0)42 3758 0027

 Email: ali.ah...@panasiangroup.com



 www.ebusiness-pg.com

 www.panasiangroup.com

 Confidentiality: This e-mail and any attachments may be confidential
 and/or privileged. If you are not a named recipient, please notify the
 sender immediately and do not disclose the contents to another person
 use it for any purpose or store or copy the information in any medium.
 Internet communications cannot be guaranteed to be timely, secure, error
 or virus-free. We do not accept liability for any errors or omissions.





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: get_range_slices and tombstones

2011-02-04 Thread Jonathan Ellis
You can't create a row with no columns without tombstones being
involved somehow. :)

There's no distinction between a row with no columns because the
individual columns were removed, and a row with no columns because
the row was removed.  the latter is just a more efficient expression
of the former.

On Fri, Feb 4, 2011 at 2:26 AM, Patrik Modesto patrik.mode...@gmail.com wrote:
 Hi!

 I'm getting tombstones from get_range_slices(). I know that's normal.
 But is there a way to know that a key is tombstone? I know tombstone
 has no columns but I can create a row without any columns that would
 look like a tombstone in get_range_slices().

 Regards,
 Patrik




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: How to delete bulk data from cassandra 0.6.3

2011-02-04 Thread roshandawrani
I thought truncate() was not available before 0.7 (in 0.6.3)was it?

---
Sent from BlackBerry

-Original Message-
From: Jonathan Ellis jbel...@gmail.com
Date: Fri, 4 Feb 2011 08:58:35 
To: useruser@cassandra.apache.org
Reply-To: user@cassandra.apache.org
Subject: Re: How to delete bulk data from cassandra 0.6.3

You should use truncate instead. (Then remove the snapshot truncate creates.)

On Fri, Feb 4, 2011 at 2:05 AM, Ali Ahsan ali.ah...@panasiangroup.com wrote:
 Hi All

 Is there any way i can delete column families data (not removing column
 families ) from Cassandra without effecting ring integrity.What if  i delete
 some column families data in linux with rm command  ?

 --
 S.Ali Ahsan

 Senior System Engineer

 e-Business (Pvt) Ltd

 49-C Jail Road, Lahore, P.O. Box 676
 Lahore 54000, Pakistan

 Tel: +92 (0)42 3758 7140 Ext. 128

 Mobile: +92 (0)345 831 8769

 Fax: +92 (0)42 3758 0027

 Email: ali.ah...@panasiangroup.com



 www.ebusiness-pg.com

 www.panasiangroup.com

 Confidentiality: This e-mail and any attachments may be confidential
 and/or privileged. If you are not a named recipient, please notify the
 sender immediately and do not disclose the contents to another person
 use it for any purpose or store or copy the information in any medium.
 Internet communications cannot be guaranteed to be timely, secure, error
 or virus-free. We do not accept liability for any errors or omissions.





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: How to delete bulk data from cassandra 0.6.3

2011-02-04 Thread Jonathan Ellis
In that case, you should shut down the server before removing data files.

On Fri, Feb 4, 2011 at 9:01 AM,  roshandawr...@gmail.com wrote:
 I thought truncate() was not available before 0.7 (in 0.6.3)was it?

 ---
 Sent from BlackBerry

 -Original Message-
 From: Jonathan Ellis jbel...@gmail.com
 Date: Fri, 4 Feb 2011 08:58:35
 To: useruser@cassandra.apache.org
 Reply-To: user@cassandra.apache.org
 Subject: Re: How to delete bulk data from cassandra 0.6.3

 You should use truncate instead. (Then remove the snapshot truncate creates.)

 On Fri, Feb 4, 2011 at 2:05 AM, Ali Ahsan ali.ah...@panasiangroup.com wrote:
 Hi All

 Is there any way i can delete column families data (not removing column
 families ) from Cassandra without effecting ring integrity.What if  i delete
 some column families data in linux with rm command  ?

 --
 S.Ali Ahsan

 Senior System Engineer

 e-Business (Pvt) Ltd

 49-C Jail Road, Lahore, P.O. Box 676
 Lahore 54000, Pakistan

 Tel: +92 (0)42 3758 7140 Ext. 128

 Mobile: +92 (0)345 831 8769

 Fax: +92 (0)42 3758 0027

 Email: ali.ah...@panasiangroup.com



 www.ebusiness-pg.com

 www.panasiangroup.com

 Confidentiality: This e-mail and any attachments may be confidential
 and/or privileged. If you are not a named recipient, please notify the
 sender immediately and do not disclose the contents to another person
 use it for any purpose or store or copy the information in any medium.
 Internet communications cannot be guaranteed to be timely, secure, error
 or virus-free. We do not accept liability for any errors or omissions.





 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


CF Read and Write Latency Histograms

2011-02-04 Thread Oleg Proudnikov
Hi All,

I suspect that Write and Read Latency column headers need to be swapped. I am
running a bulk load with no reads on this CF but I see Read column with values
while the Write column has zeros only. The MBean shows the values correctly.

Thank you,
Oleg






Re: Using Cassandra to store files

2011-02-04 Thread Aditya Narayan
I am also looking to possible solutions to store pdfs  word documents.

But why wont you store in them in the filesystem instead of a database
unless your files are too small in which case it would be recommended
to use a database.

-Aditya


On Fri, Feb 4, 2011 at 5:30 PM, Daniel Doubleday
daniel.double...@gmx.net wrote:
 We are doing this with cassandra.
 But we cache a lot. We get around 20 writes/s and 1k reads/s (~ 100Mbit/s)
 for that particular CF but only 1% of them hit our cassandra cluster (5
 nodes, rf=3).

 /Daniel
 On Feb 4, 2011, at 9:37 AM, Brendan Poole wrote:

 Hi Daniel

 When you say We are doing this do you mean via NFS or Cassandra.

 Thanks

 Brendan





 Signature.jpg Brendan Poole
  Systems Developer
   NewLaw Solicitors
  Helmont House
  Churchill Way
  Cardiff
  brendan.po...@new-law.co.uk
  029 2078 4283
  www.new-law.co.uk

 


 From: Daniel Doubleday [mailto:daniel.double...@gmx.net]
 Sent: 03 February 2011 17:21
 To: user@cassandra.apache.org
 Subject: Re: Using Cassandra to store files

 Hundreds of thousands doesn't sound too bad. Good old NFS would do with an
 ok directory structure.
 We are doing this. Our documents are pretty small though (a few kb). We have
 around 40M right now with around 300GB total.
 Generally the problem is that much data usually means that cassandra becomes
 io bound during repairs and compactions even if your hot dataset would fit
 in the page cache. There are efforts to overcome this and 0.7 will help with
 repair problems but for the time being you have to have quite some headroom
 in terms of io performance to handle these situations.
 Here is a related post:
 http://comments.gmane.org/gmane.comp.db.cassandra.user/11190

 On Feb 3, 2011, at 1:33 PM, Brendan Poole wrote:

 Hi

 Would anyone recommend using Cassandra for storing hundreds of thousands of
 documents in Word/PDF format? The manual says it can store documents under
 64MB with no issue but was wondering if anyone is using it for this specific
 perpose.  Would it be efficient/reliable and is there anything I need to
 bear in mind?

 Thanks in advance

 Signature.jpg Brendan Poole
  Systems Developer
   NewLaw Solicitors
  Helmont House
  Churchill Way
  Cardiff
  brendan.po...@new-law.co.uk
  029 2078 4283
  www.new-law.co.uk



 P Please consider the environment before printing this e-mail
 Important - The information contained in this email (and any attached files)
 is confidential and may be legally privileged and protected by law.

 The intended recipient is authorised to access it. If you are not the
 intended recipient, please notify the sender immediately and delete or
 destroy all copies. You must not disclose the contents of this email to
 anyone. Unauthorised use, dissemination, distribution, publication or
 copying of this communication is prohibited.

 NewLaw Solicitors does not accept any liability for any inaccuracies or
 omissions in the contents of this email that may have arisen as a result of
 transmission. This message and any attachments are believed to be free of
 any virus or defect that might affect any computer system into which it is
 received and opened. However, it is the responsibility of the recipient to
 ensure that it is virus free; therefore, no responsibility is accepted for
 any loss or damage in any way arising from its use.

 NewLaw Solicitors is the trading name of NewLaw Legal Ltd, a limited company
 registered in England and Wales with registered number 07200038.
 NewLaw Legal Ltd is regulated by the Solicitors Regulation Authority whose
 website is http://www.sra.org.uk

 The registered office of NewLaw Legal Ltd is at Helmont House, Churchill
 Way, Cardiff, CF10 2HE. Tel: 0845 756 6870, Fax: 0845 756 6871, Email:
 i...@new-law.co.uk. www.new-law.co.uk.

 We use the word ‘partner’ to refer to a shareowner or director of the
 company, or an employee or consultant of the company who is a lawyer with
 equivalent standing and qualifications. A list of the directors is displayed
 at the above address, together with a list of those persons who are
 designated as partners.



 P Please consider the environment before printing this e-mail
 Important - The information contained in this email (and any attached files)
 is confidential and may be legally privileged and protected by law.

 The intended recipient is authorised to access it. If you are not the
 intended recipient, please notify the sender immediately and delete or
 destroy all copies. You must not disclose the contents of this email to
 anyone. Unauthorised use, dissemination, distribution, publication or
 copying of this communication is prohibited.

 NewLaw Solicitors does not accept any liability for any inaccuracies or
 omissions in the contents of this email that may have arisen as a result of
 transmission. This message and any attachments are believed to be free of
 any virus or 

Re: CF Read and Write Latency Histograms

2011-02-04 Thread Jonathan Ellis
Can you create a ticket?

On Fri, Feb 4, 2011 at 9:41 AM, Oleg Proudnikov ol...@cloudorange.com wrote:
 Hi All,

 I suspect that Write and Read Latency column headers need to be swapped. I am
 running a bulk load with no reads on this CF but I see Read column with values
 while the Write column has zeros only. The MBean shows the values correctly.

 Thank you,
 Oleg








-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Using Cassandra to store files

2011-02-04 Thread buddhasystem

Even when storage is in NFS, Cassandra can still be quite useful as a file
catalog. Your physical storage can change, move etc. Therefore, it's a good
idea to provide mapping of logical names to physical store points (which in
fact can be many). This is a standard technique used in mass storage.

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-to-store-files-tp5988698p5993357.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Using Cassandra to store files

2011-02-04 Thread Aditya Narayan
yes, definitely a database for mapping ofcourse!

On Fri, Feb 4, 2011 at 11:17 PM, buddhasystem potek...@bnl.gov wrote:

 Even when storage is in NFS, Cassandra can still be quite useful as a file
 catalog. Your physical storage can change, move etc. Therefore, it's a good
 idea to provide mapping of logical names to physical store points (which in
 fact can be many). This is a standard technique used in mass storage.

 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-to-store-files-tp5988698p5993357.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.



Re: Moving data

2011-02-04 Thread Jonathan Ellis
I'm afraid there is no short answer.

The long answer is,

1) Read about Cassandra data modeling at
http://wiki.apache.org/cassandra/ArticlesAndPresentations.  It is not
as simple as one table equals one columnfamily.
2) Write a program to read your data out of SQL Server and write it
into Cassandra, preferably with multiple threads

On Fri, Feb 4, 2011 at 6:00 AM, Morey, Gary gary.mo...@xerox.com wrote:
 I have several large SQL Server 2005 tables.  I need to load the data in
 these tables into Cassandra.  FYI, the Cassandra installation is on a linux
 server running CentOS.



 Can anyone suggest the best way to accomplish this?  I am a newbie to
 Cassandra, so any advice would be greatly appreciated.



 Best,



 Gary



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Using Cassandra to store files

2011-02-04 Thread sridhar basam
For the  number of file the OP has why not just use a traditional filesystem
and solr to index the pdf data. You get to search inside of the files for
relevant information?

 Sri

On Fri, Feb 4, 2011 at 12:47 PM, buddhasystem potek...@bnl.gov wrote:


 Even when storage is in NFS, Cassandra can still be quite useful as a file
 catalog. Your physical storage can change, move etc. Therefore, it's a good
 idea to provide mapping of logical names to physical store points (which in
 fact can be many). This is a standard technique used in mass storage.

 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-to-store-files-tp5988698p5993357.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.



Re: Moving data

2011-02-04 Thread buddhasystem

FWIW, I'm working on migrating a large amount of data out of Oracle into my
test cluster. The data has been warehoused as CSV files on Amazon S3. Having
that in place allows me to not put extra load on the production service when
doing many repeated tests. I then parse the data using CSV Python module
and, as Jonathan says, use threads to batch upload data into Cassandra.
Notable points: since the data is relatively sparse (i.e. many zeros for
integers and empty strings for strings etc), I establish a default value
dictionary, and don't write these to Cassandra at all -- they can be
reconstructed as needed when reading back.

Also, make sure you wrap Cassandra writes etc into exceptions. When load is
high, you might get timeouts at TSocket level etc.

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Moving-data-tp5992669p5993443.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


RE: CF Read and Write Latency Histograms

2011-02-04 Thread David Dabbs
Is this 0.7?

-Original Message-
From: Oleg Proudnikov [mailto:ol...@cloudorange.com] 
Sent: Friday, February 04, 2011 11:42 AM
To: user@cassandra.apache.org
Subject: CF Read and Write Latency Histograms

Hi All,

I suspect that Write and Read Latency column headers need to be swapped. I
am
running a bulk load with no reads on this CF but I see Read column with
values
while the Write column has zeros only. The MBean shows the values correctly.

Thank you,
Oleg







Re: CF Read and Write Latency Histograms

2011-02-04 Thread Oleg Proudnikov
David Dabbs dmdabbs at gmail.com writes:

 
 Is this 0.7?
 

Yes





Re: Using a synchronized counter that keeps track of no of users on the application using it to allot UserIds/ keys to the new users after sign up

2011-02-04 Thread Aklin_81
Thanks so much Ryan for the links; I'll definitely take them into
consideration.

Just another thought which came to my mind:-
perhaps it may be beneficial to store(or duplicate) some of the data
like the Login credentials  particularly userId to User's Name
mapping, etc (which is very heavily read), in a fast MyISAM table.
This could solve the problem of keys though auto-generated unique 
sequential primary keys. I could use the same keys for Cassandra rows
for that user. And also since Cassandra reads are relatively slow, it
makes sense to store data like userId to Name mapping in MyISAM as
this data would be required after almost all queries to the database.

Regards
-Asil



On Fri, Feb 4, 2011 at 10:14 PM, Ryan King r...@twitter.com wrote:
 On Thu, Feb 3, 2011 at 9:12 PM, Aklin_81 asdk...@gmail.com wrote:
 Thanks Matthew  Ryan,

 The main inspiration behind me trying to generate Ids in sequential
 manner is to reduce the size of the userId, since I am using it for
 heavy denormalization. UUIDs are 16 bytes long, but I can also have a
 unique Id in just 4 bytes, and since this is just a one time process
 when the user signs-up, it makes sense to try cutting down the space
 requirements, if it is feasible without any downsides(!?).

 I am also using userIds to attach to Id of the other data of the user
 on my application. If I could reduce the userId size that I can also
 reduce the size of other Ids, I could drastically cut down the space
 requirements.


 [Sorry for this question is not directly related to cassandra but I
 think Cassandra factors here because of its  tuneable consistency]

 Don't generate these ids in cassandra. Use something like snowflake,
 flickr's ticket servers [2] or zookeeper sequential nodes.

 -ryan


 1. http://github.com/twitter/snowflake
 2. 
 http://code.flickr.com/blog/2010/02/08/ticket-servers-distributed-unique-primary-keys-on-the-cheap/



read latency in cassandra

2011-02-04 Thread Dan Kuebrich
Hi all,

It often takes more than two seconds to load:

- one row of ~450 events comprising ~600k
- cluster size of 1
- client is pycassa 1.04
- timeout on recv
- cold read (I believe)
- load generally  0.5 on a 4-core machine, 2 EC2 instance store drives for
cassandra
- cpu wait generally  1%

Often the following sequence occurs:

1. First attempt times out after 2 sec
2. Second attempt loads fine on immediate retry

So, I assume it's an issue about cache miss and going to disk.  Is 2 seconds
the normal I went to disk latency for cassandra?  What should we look to
tune, if anything? I don't think keeping everything in-memory is an option
for us given dataset size and access pattern (hot set is stuff being
currently written, stuff being accessed is likely to be older).

I didn't notice this problem with cassandra 0.6.8 and pycassa 0.3.

Thanks,
dan


Re: How to monitor Cassandra's throughput?

2011-02-04 Thread Oleg Proudnikov
The issue has been resolved, the fix is on Hector's GitHub.


Oleg Proudnikov olegp at cloudorange.com writes:

 
 I have posted on Hector ML:
 
 http://thread.gmane.org/gmane.comp.db.hector.user/1690
 
 Oleg
 
 






RE: Tracking down read latency

2011-02-04 Thread David Dabbs

Thank you both for your advice. See my updated iostats below.


From: sridhar.ba...@gmail.com [mailto:sridhar.ba...@gmail.com] On Behalf Of
sridhar basam
Sent: Thursday, February 03, 2011 10:58 AM
To: user@cassandra.apache.org
Subject: Re: Tracking down read latency

The data provided is also a average value since boot time. Run the -x as
suggested below but run it via a interval of around 5 seconds. You very well
could be having i/o issue, it is hard to tell from the overall average
value you provided. Collect iostat -x 5 during the times when you see slow
reads and see how busy the disks are.

Sridhar

On Thu, Feb 3, 2011 at 3:21 AM, Peter Schuller wrote:
 $ iostat

As rcoli already mentioned you don't seen to have an I/O problem, but
as a point of general recommendation: When determining whether you are
blocking on disk I/O, pretty much *always* use iostat -x rather than
the much less useful default mode of iostat. The %util and queue
wait/average time columns are massively useful/important; without them
one is much more blind as to whether or not storage devices are
actually saturated.

 Peter Schuller


Our data is on sdb, commit logs on sdc.
So do I read this correctly that we're 'await'ing 6+millis on average for
data drive (sdb) 
requests to be serviced?


$iostat -x 5

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   0.590.000.220.940.00   98.25

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda   0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
sda1  0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
sda2  0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
sda3  0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
sdb  11.20 0.00 42.00  0.00  4993.60 0.00   118.90
0.286.77   5.22  21.92
sdb1 11.20 0.00 42.00  0.00  4993.60 0.00   118.90
0.286.77   5.22  21.92
sdc   0.0031.00  0.00  1.40 0.00   259.20   185.14
0.000.14   0.14   0.02
sdc1  0.0031.00  0.00  1.40 0.00   259.20   185.14
0.000.14   0.14   0.02

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   0.560.000.181.080.00   98.17

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda   0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
sda1  0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
sda2  0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
sda3  0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
sdb   8.80 0.00 49.40  0.00  5936.00 0.00   120.16
0.336.62   5.22  25.78
sdb1  8.80 0.00 49.40  0.00  5936.00 0.00   120.16
0.336.62   5.22  25.78
sdc   0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
sdc1  0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   0.990.000.221.080.00   97.71

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda   0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
sda1  0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
sda2  0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
sda3  0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
sdb  11.40 0.00 46.20  0.00  5147.20 0.00   111.41
0.306.55   5.58  25.80
sdb1 11.40 0.00 46.20  0.00  5147.20 0.00   111.41
0.306.55   5.58  25.80
sdc   0.00 7.40  0.00  0.80 0.0065.6082.00
0.000.25   0.25   0.02
sdc1  0.00 7.40  0.00  0.80 0.0065.6082.00
0.000.25   0.25   0.02

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   0.680.000.230.950.00   98.13

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda   0.00 0.80  0.00  0.80 0.0012.7716.00
0.000.25   0.25   0.02
sda1  0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
sda2  0.00 0.80  0.00  0.80 0.0012.7716.00
0.000.25   0.25   0.02
sda3  0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
sdb   5.19 0.00 38.12  0.00  4356.09 0.00   114.26
0.266.70   5.91  

Re: How to delete bulk data from cassandra 0.6.3

2011-02-04 Thread Ali Ahsan
So do we need to write a script ? or its some thing i can do as a system 
admin without involving and developer.If yes please guide me in this case.





On 02/04/2011 10:36 PM, Jonathan Ellis wrote:

In that case, you should shut down the server before removing data files.

On Fri, Feb 4, 2011 at 9:01 AM,roshandawr...@gmail.com  wrote:

I thought truncate() was not available before 0.7 (in 0.6.3)was it?

---
Sent from BlackBerry

-Original Message-
From: Jonathan Ellisjbel...@gmail.com
Date: Fri, 4 Feb 2011 08:58:35
To: useruser@cassandra.apache.org
Reply-To: user@cassandra.apache.org
Subject: Re: How to delete bulk data from cassandra 0.6.3

You should use truncate instead. (Then remove the snapshot truncate creates.)

On Fri, Feb 4, 2011 at 2:05 AM, Ali Ahsanali.ah...@panasiangroup.com  wrote:

Hi All

Is there any way i can delete column families data (not removing column
families ) from Cassandra without effecting ring integrity.What if  i delete
some column families data in linux with rm command  ?

--
S.Ali Ahsan

Senior System Engineer

e-Business (Pvt) Ltd

49-C Jail Road, Lahore, P.O. Box 676
Lahore 54000, Pakistan

Tel: +92 (0)42 3758 7140 Ext. 128

Mobile: +92 (0)345 831 8769

Fax: +92 (0)42 3758 0027

Email: ali.ah...@panasiangroup.com



www.ebusiness-pg.com

www.panasiangroup.com

Confidentiality: This e-mail and any attachments may be confidential
and/or privileged. If you are not a named recipient, please notify the
sender immediately and do not disclose the contents to another person
use it for any purpose or store or copy the information in any medium.
Internet communications cannot be guaranteed to be timely, secure, error
or virus-free. We do not accept liability for any errors or omissions.





--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com







--
S.Ali Ahsan

Senior System Engineer

e-Business (Pvt) Ltd

49-C Jail Road, Lahore, P.O. Box 676
Lahore 54000, Pakistan

Tel: +92 (0)42 3758 7140 Ext. 128

Mobile: +92 (0)345 831 8769

Fax: +92 (0)42 3758 0027

Email: ali.ah...@panasiangroup.com



www.ebusiness-pg.com

www.panasiangroup.com

Confidentiality: This e-mail and any attachments may be confidential
and/or privileged. If you are not a named recipient, please notify the
sender immediately and do not disclose the contents to another person
use it for any purpose or store or copy the information in any medium.
Internet communications cannot be guaranteed to be timely, secure, error
or virus-free. We do not accept liability for any errors or omissions.



Re: read latency in cassandra

2011-02-04 Thread aaron morton
What operation are you calling ? Are you trying to read the entire row back?

How many SSTables do you have for the CF? Does your data have a lot of 
overwrites ? Have you modified the default compaction settings ?

Do you have row cache enabled ? 

How long does the second request take ?

Can you use JConsole to check the read latency for the CF?

Sorry for all the questions, the answer to your initial question is mmm, that 
does not sound right. It will depend on

Aaron

On 5 Feb 2011, at 08:13, Dan Kuebrich wrote:

 Hi all,
 
 It often takes more than two seconds to load:
 
 - one row of ~450 events comprising ~600k
 - cluster size of 1
 - client is pycassa 1.04
 - timeout on recv
 - cold read (I believe)
 - load generally  0.5 on a 4-core machine, 2 EC2 instance store drives for 
 cassandra
 - cpu wait generally  1%
 
 Often the following sequence occurs:
 
 1. First attempt times out after 2 sec
 2. Second attempt loads fine on immediate retry
 
 So, I assume it's an issue about cache miss and going to disk.  Is 2 seconds 
 the normal I went to disk latency for cassandra?  What should we look to 
 tune, if anything? I don't think keeping everything in-memory is an option 
 for us given dataset size and access pattern (hot set is stuff being 
 currently written, stuff being accessed is likely to be older).
 
 I didn't notice this problem with cassandra 0.6.8 and pycassa 0.3.
 
 Thanks,
 dan



Re: Tracking down read latency

2011-02-04 Thread sridhar basam
On Fri, Feb 4, 2011 at 2:44 PM, David Dabbs dmda...@gmail.com wrote:


 Our data is on sdb, commit logs on sdc.
 So do I read this correctly that we're 'await'ing 6+millis on average for
 data drive (sdb)
 requests to be serviced?


That is right. Those numbers look pretty good for rotational media. What
sort of read latencies do you see? Have you also looked into GC.

 Sridhar


New Generation Size guidelines

2011-02-04 Thread Oleg Proudnikov
Hi All,

I have a 3 server cluster with RF=2. My heap is 2G out of a 4G RAM. The servers
have 4 cores. I used default heap settings. The Eden space ended up around 60M
and the Survivor spaces are around 7M. This feels a little bit low for a process
that creates so much short-lived garbage. I just wanted to get your thoughts on
this. Space used in the Old Generation stays in a short range 1.2G-1.6G but when
the activity is low and I force GC it drops too 120M. It feels like there is a
lot of garbage that does not have a chance to get collected. The server is
running a batch load and its CPUs are 10-40% busy. The higher value is at 1.6G.
Yet I am reluctant to push my data load because I do hit OOMs.

The amount of data loaded so far is small - around 100G in total.

Should I increase my Gew Generation size?

Thank you,
Oleg




Re: New Generation Size guidelines

2011-02-04 Thread Ryan King
On Fri, Feb 4, 2011 at 1:45 PM, Oleg Proudnikov ol...@cloudorange.com wrote:

 Hi All,

 I have a 3 server cluster with RF=2. My heap is 2G out of a 4G RAM. The 
 servers
 have 4 cores. I used default heap settings. The Eden space ended up around 60M
 and the Survivor spaces are around 7M. This feels a little bit low for a 
 process
 that creates so much short-lived garbage. I just wanted to get your thoughts 
 on
 this. Space used in the Old Generation stays in a short range 1.2G-1.6G but 
 when
 the activity is low and I force GC it drops too 120M. It feels like there is a
 lot of garbage that does not have a chance to get collected. The server is
 running a batch load and its CPUs are 10-40% busy. The higher value is at 
 1.6G.
 Yet I am reluctant to push my data load because I do hit OOMs.

 The amount of data loaded so far is small - around 100G in total.

Almost certainly yes.

-ryan


Re: Problems with Python Stress Test

2011-02-04 Thread Sameer Farooqui
Brandon,

Thanks for the response. I have also noticed that stress.py's progress
interval gets thrown off in low memory situations.

What did you mean by contrib/stress on 0.7 instead.  I don't see that dir
in the src version of 0.7.

- Sameer


On Thu, Feb 3, 2011 at 5:22 PM, Brandon Williams dri...@gmail.com wrote:

 On Thu, Feb 3, 2011 at 7:02 PM, Sameer Farooqui 
 cassandral...@gmail.comwrote:

 Hi guys,

 I was playing around with the stress.py test this week and noticed a few
 things.

 1) Progress-interval does not always work correctly. I set it to 5 in the
 example below, but am instead getting varying intervals:


 Generally indicates that the client machine is being overloaded in my
 experience.

 2) The key_rate and op_rate doesn't seem to be calculated correctly. Also,
 what is the difference between the interval_key_rate and the
 interval_op_rate? For example in the example above, the first row shows 6662
 keys inserted in 5 seconds and 6662 / 5 = 1332, which matches the
 interval_op_rate.


 There should be no difference unless you're doing range slices, but IPC
 timing makes them vary somewhat.

 3) If I write x KB to Cassandra with py_stress, the used disk space doesn't
 grow by x after the test. In the example below I tried to write 500,000 keys
 * 32 bytes * 5 columns = 78,125 kilobytes of data to the database. When I
 checked the amount of disk space used after the test it actually grew by
 2,684,920 - 2,515,864 = 169,056 kilobytes. Is this because perhaps the
 commit log got duplicate copies of the data as the SSTables?


 Commitlogs could be part of it, you're not factoring in the column names,
 and then there's index and bloom filter overhead.

 Use contrib/stress on 0.7 instead.

 -Brandon



Re: Sorting in time order without using TimeUUID type column names

2011-02-04 Thread aaron morton
IMHO If you know the time of the event use store the time as a long, rather 
than a UUID. It will make it easier to get back to a 
time and make it easier for you to compare columns. TimeUUIDS has a pseudo 
random part as well as the time part, it could be set to a constant. By why 
bother if you know the absolute time.

I'm not sure what the ReminderCountOfThisUser is for, and as Sylvain says there 
is no need for the user name if this is in a row just for the user. 

Hope that helps.
Aaron
 
On 4 Feb 2011, at 01:32, Aditya Narayan wrote:

 If I use : TimestampOfDueTimeInFuture: UserId : ReminderCountOfThisUser
 as key pattern for the rows of reminders, then I am storing the key,
 just as it is, as the column name and thus column values  need not
 contain a link to the row containing the reminder details.
 
 I think UserId would be required along with timestamp in the key
 pattern to provide uniqueness to the key as there may be several
 reminders generated by users on the application, at the same time.
 
 But my question is about whether it is really advisable to even
 generate the keys like this pattern ... instead of going with
 timeuuids ?
 Are there are any downsides which I am not perhaps not aware of ?
 
 
 
 On Thu, Feb 3, 2011 at 5:43 PM, Sylvain Lebresne sylv...@datastax.com wrote:
 On Thu, Feb 3, 2011 at 11:27 AM, Aditya Narayan ady...@gmail.com wrote:
 
 Hey all,
 
 I want to store some columns that are reminders to the users on my
 application, in time sorted order in a row(timeline row of the user).
 
 Would it be recommended to store these reminder columns in the
 timeline row with column names like: combination of timestamp(of time
 when the reminder gets due) + UserId+ Reminders Count of that user;
 Column Name= TimestampOfDueTimeInFuture: UserId :
 ReminderCountOfThisUser
 
 If you have one row by user (which is a good idea), why keep the UserId in
 the column name ?
 
 
 Then what comparator could I use to sort them in order of the their
 due time ? This comparator should be able to sort no. in descending
 order.(I guess ascii type would do the opposite order) (Reminders need
 to be sorted in the timeline in the order of their due time.)
 
 *The* solution is write a custom comparator.
 Have a look at http://www.datastax.com/docs/0.7/data_model/column_families
 and http://www.sodeso.nl/?p=421 for instance.
 
 As a side note, the fact that the comparator sort in ascending order when
 you
 need descending order would be that much of a problem, since you can always
 do slice queries in reversed order. But even then, asciiType is not a very
 satisfying solution as you would have to be careful about the padding of
 your
 timestamp for it to work correctly. So again, custom comparator is the way
 to go.
 
 Basically I am trying to avoid 16 bytes long timeUUID first because
 they are too long and the above defined key pattern is guaranteeing me
 a unique key/Id for the reminder row always.
 
 
 Thanks
 Aditya Narayan
 
 --
 Sylvain



Re: Unavalible Exception

2011-02-04 Thread aaron morton
Please provide some information the client you are using, the client side error 
stack, the command you are running, the output from nodetool ring 

Aaron
 
On 5 Feb 2011, at 05:10, Oleg Proudnikov wrote:

 ruslan usifov ruslan.usifov at gmail.com writes:
 
 
 
 2011/2/4 Oleg Proudnikov olegp at cloudorange.com
 ruslan usifov ruslan.usifov at gmail.com writes:
 
 HelloWhy i can get Unavalible Exception on live cluster (all nodes is up
 andnever shutdown)PS: v 0.7.0
 Can the nodes see each other? Check Cassandra logs for messages regarding 
 other
 nodes.
 
 
 Yes they can, nodetool ring show well configured ring, and ther is nothing in
 logs (no WARN or ERROR) 
 
 
 
 
 
 Try searching for InetAddress as INFO
 
 
 
 



Re: Sorting in time order without using TimeUUID type column names

2011-02-04 Thread Aditya Narayan
Thanks Aaron,

Yes I can put the column names without using the userId in the
timeline row, and when I want to retrieve the row corresponding to
that column name, I will attach the userId to get the row key.

Yes I'll store it as a long  I guess I'll have to write with a custom
comparator type (ReversedIntegerType) to sort those longs in
descending order.

Regards
Aditya


On Sat, Feb 5, 2011 at 6:24 AM, aaron morton aa...@thelastpickle.com wrote:
 IMHO If you know the time of the event use store the time as a long, rather 
 than a UUID. It will make it easier to get back to a
 time and make it easier for you to compare columns. TimeUUIDS has a pseudo 
 random part as well as the time part, it could be set to a constant. By why 
 bother if you know the absolute time.

 I'm not sure what the ReminderCountOfThisUser is for, and as Sylvain says 
 there is no need for the user name if this is in a row just for the user.

 Hope that helps.
 Aaron

 On 4 Feb 2011, at 01:32, Aditya Narayan wrote:

 If I use : TimestampOfDueTimeInFuture: UserId : ReminderCountOfThisUser
 as key pattern for the rows of reminders, then I am storing the key,
 just as it is, as the column name and thus column values  need not
 contain a link to the row containing the reminder details.

 I think UserId would be required along with timestamp in the key
 pattern to provide uniqueness to the key as there may be several
 reminders generated by users on the application, at the same time.

 But my question is about whether it is really advisable to even
 generate the keys like this pattern ... instead of going with
 timeuuids ?
 Are there are any downsides which I am not perhaps not aware of ?



 On Thu, Feb 3, 2011 at 5:43 PM, Sylvain Lebresne sylv...@datastax.com 
 wrote:
 On Thu, Feb 3, 2011 at 11:27 AM, Aditya Narayan ady...@gmail.com wrote:

 Hey all,

 I want to store some columns that are reminders to the users on my
 application, in time sorted order in a row(timeline row of the user).

 Would it be recommended to store these reminder columns in the
 timeline row with column names like: combination of timestamp(of time
 when the reminder gets due) + UserId+ Reminders Count of that user;
 Column Name= TimestampOfDueTimeInFuture: UserId :
 ReminderCountOfThisUser

 If you have one row by user (which is a good idea), why keep the UserId in
 the column name ?


 Then what comparator could I use to sort them in order of the their
 due time ? This comparator should be able to sort no. in descending
 order.(I guess ascii type would do the opposite order) (Reminders need
 to be sorted in the timeline in the order of their due time.)

 *The* solution is write a custom comparator.
 Have a look at http://www.datastax.com/docs/0.7/data_model/column_families
 and http://www.sodeso.nl/?p=421 for instance.

 As a side note, the fact that the comparator sort in ascending order when
 you
 need descending order would be that much of a problem, since you can always
 do slice queries in reversed order. But even then, asciiType is not a very
 satisfying solution as you would have to be careful about the padding of
 your
 timestamp for it to work correctly. So again, custom comparator is the way
 to go.

 Basically I am trying to avoid 16 bytes long timeUUID first because
 they are too long and the above defined key pattern is guaranteeing me
 a unique key/Id for the reminder row always.


 Thanks
 Aditya Narayan

 --
 Sylvain




Re: Unavalible Exception

2011-02-04 Thread David King
We're going to need *way* more information than this

On 03 Feb 2011, at 20:03, ruslan usifov wrote:

 Hello
 
 Why i can get Unavalible Exception on live cluster (all nodes is up and never 
 shutdown)
 
 PS: v 0.7.0



Merging the rows of two column families(with similar attributes) into one ??

2011-02-04 Thread Ertio Lew
I read somewhere that more no of column families is not a good idea as
it consumes more memory and more compactions to occur  thus I am
trying to reduce the no. of column families by adding the rows of
other Column families(with similar attributes) as separate rows into
one.

I have two kinds of data for two separate features on my application.
If I store them in two different column families then both of them
will have similar attributes like same comparator type  sorting
needs. Thus I can also merge both of them in one column family, just
by adding the rows of another to this one(increasing the no of rows).
However some rows of 1st kind of data are very frequently used and
rows of 2nd data are less freq. used. But I dont think this will be a
problem as I am not merging two rows into one, but just adding them as
separate rows in the column family.
1st kind of data has wider rows and 2nd kind of data has very less wide rows.

But the caching requirements may be different as they cater to two
different features.(but I think it is even advantageous since
resources are free to be utilized by any data that's more frequently
used)


Is it recommended to merge these two column families into one ?? Thoughts ?

--

Ertio


Re: Unavalible Exception

2011-02-04 Thread Jonathan Ellis
Start with grep -i down system.log on each machine

On Fri, Feb 4, 2011 at 7:37 PM, David King dk...@ketralnis.com wrote:
 We're going to need *way* more information than this

 On 03 Feb 2011, at 20:03, ruslan usifov wrote:

 Hello

 Why i can get Unavalible Exception on live cluster (all nodes is up and 
 never shutdown)

 PS: v 0.7.0





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Pig not reading all cassandra data

2011-02-04 Thread Matt Kennedy
Found the culprit.  There is a new feature in Pig 0.8 that will try to reduce 
the number of splits used to speed up the whole job.  Since the 
ColumnFamilyInputFormat lists the input size as zero, this feature eliminates 
all of the splits except for one.  

The workaround is to disable this feature for jobs that use CassandraStorage by 
setting -Dpig.splitCombination=false in the pig_cassandra script.

Hope somebody finds this useful, you wouldn't believe how many dead-ends I ran 
down trying to figure this out.

-Matt 
On Feb 2, 2011, at 4:34 PM, Matthew E. Kennedy wrote:

 
 I noticed in the jobtracker log that when the pig job kicks off, I get the 
 following info message:
 
 2011-02-02 09:13:07,269 INFO org.apache.hadoop.mapred.JobInProgress: Input 
 size for job job_201101241634_0193 = 0. Number of splits = 1
 
 So I looked at the job.split file that is created for the Pig job and 
 compared it to the job.split file created for the map-reduce job.  The map 
 reduce file contains an entry for each split, whereas the  job.split file for 
 the Pig job contains just the one split.
 
 I added some code to the ColumnFamilyInputFormat to output what it thinks it 
 sees as it should be creating input splits for the pig jobs, and the call to 
 getSplits() appears to be returning the correct list of splits.  I can't 
 figure out where it goes wrong though when the splits should be written to 
 the job.split file.
 
 Does anybody know the specific class responsible for creating that file in a 
 Pig job, and why it might be affected by using the pig CassandraStorage 
 module?
 
 Is anyone else successfully running Pig jobs against a 0.7 cluster?
 
 Thanks,
 Matt



Re: Pig not reading all cassandra data

2011-02-04 Thread Jonathan Ellis
On Fri, Feb 4, 2011 at 9:47 PM, Matt Kennedy stinkym...@gmail.com wrote:
 Found the culprit.  There is a new feature in Pig 0.8 that will try to
 reduce the number of splits used to speed up the whole job.  Since the
 ColumnFamilyInputFormat lists the input size as zero, this feature
 eliminates all of the splits except for one.

 The workaround is to disable this feature for jobs that use CassandraStorage
 by setting -Dpig.splitCombination=false in the pig_cassandra script.

 Hope somebody finds this useful, you wouldn't believe how many dead-ends I
 ran down trying to figure this out.

Ouch, thanks for tracking that down.

What should CFIF be returning differently?  Do you mean the
InputSplit.getLength?

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Merging the rows of two column families(with similar attributes) into one ??

2011-02-04 Thread Ertio Lew
Thanks Tyler !

I could not fully understand the reason why more no of column families
would mean more memory.. if you have under control parameters like
memtable_throughput  memtable_operations which are set per column
family basis then you can directly control  adjust by splitting the
memory space between two CFs in proportion to what you would do in
single CF.
Hence there should be no extra memory consumption for multiple CFs
that have been split from single one??

Regarding the compactions, I think even if they are more the size of
the SST files to be compacted is smaller as the data has been split
into two.
Then more compactions but smaller too!!


Then, provided the same amount of data, how can greater no of column
families could be a bad option(if you split the values of parameters
for memory consumption proportionately) ??

--
Regards,
Ertio





On Sat, Feb 5, 2011 at 10:43 AM, Tyler Hobbs ty...@datastax.com wrote:

 I read somewhere that more no of column families is not a good idea as
 it consumes more memory and more compactions to occur

 This is primarily true, but not in every case.

 But the caching requirements may be different as they cater to two
 different features.

 This is a great reason to *not* merge them.  Besides the key and row caches,
 don't forget about the OS buffer cache.

 Is it recommended to merge these two column families into one ?? Thoughts
 ?

 No, this sounds like an anti-pattern to me.  The overhead from having two
 separate CFs is not that high.

 --
 Tyler Hobbs
 Software Engineer, DataStax
 Maintainer of the pycassa Cassandra Python client library




Re: Merging the rows of two column families(with similar attributes) into one ??

2011-02-04 Thread Ertio Lew
Yes, a disadvantage of more no. of CF in terms of memory utilization
which I see is: -

if some CF is written less often as compared to other CFs, then the
memtable would consume space in the memory until it is flushed, this
memory space could have been much better used by a CF that's heavily
written and read. And if you try to make the thresholds for flush
smaller then more compactions would be needed.





On Sat, Feb 5, 2011 at 11:58 AM, Ertio Lew ertio...@gmail.com wrote:
 Thanks Tyler !

 I could not fully understand the reason why more no of column families
 would mean more memory.. if you have under control parameters like
 memtable_throughput  memtable_operations which are set per column
 family basis then you can directly control  adjust by splitting the
 memory space between two CFs in proportion to what you would do in
 single CF.
 Hence there should be no extra memory consumption for multiple CFs
 that have been split from single one??

 Regarding the compactions, I think even if they are more the size of
 the SST files to be compacted is smaller as the data has been split
 into two.
 Then more compactions but smaller too!!


 Then, provided the same amount of data, how can greater no of column
 families could be a bad option(if you split the values of parameters
 for memory consumption proportionately) ??

 --
 Regards,
 Ertio





 On Sat, Feb 5, 2011 at 10:43 AM, Tyler Hobbs ty...@datastax.com wrote:

 I read somewhere that more no of column families is not a good idea as
 it consumes more memory and more compactions to occur

 This is primarily true, but not in every case.

 But the caching requirements may be different as they cater to two
 different features.

 This is a great reason to *not* merge them.  Besides the key and row caches,
 don't forget about the OS buffer cache.

 Is it recommended to merge these two column families into one ?? Thoughts
 ?

 No, this sounds like an anti-pattern to me.  The overhead from having two
 separate CFs is not that high.

 --
 Tyler Hobbs
 Software Engineer, DataStax
 Maintainer of the pycassa Cassandra Python client library