Re: Cassandra Write Performance, CPU usage
Hi Rishi The writes in Cassandra are not directly written to the Disk, they are stored in memory and later on flushed to the disk. May be thats why you are not getting much out of iostat. Cant say about high cpu usage. ___ Vineet Daniel ___ Let your email find you On Fri, Jun 11, 2010 at 6:12 AM, Rishi Bhardwaj wrote: > Hi > > I am investigating Cassandra write performance and see very heavy CPU usage > from Cassandra. I have a single node Cassandra instance running on a dual > core (2.66 Ghz Intel ) Ubuntu 9.10 server. The writes to Cassandra are being > generated from the same server using BatchMutate(). The client makes exactly > one RPC call at a time to Cassandra. Each BatchMutate() RPC contains 2 MB of > data and once it is acknowledged by Cassandra, the next RPC is done. > Cassandra has two separate disks, one for commitlog with a sequential b/w of > 130MBps and the other a solid state disk for data with b/w of 90MBps. Tuning > various parameters, I observe that I am able to attain a maximum write > performance of about 45 to 50 MBps from Cassandra. I see that the > Cassandra java process consistently uses 100% to 150% of CPU resources (as > shown by top) during the entire write operation. Also, iostat clearly shows > that the max disk bandwidth is not reached anytime during the write > operation, every now and then the i/o activity on "commitlog" disk and the > data disk spike but it is never consistently maintained by cassandra close > to their peak. I would imagine that the CPU is probably the bottleneck > here. Does anyone have any idea why Cassandra beats the heck out of the CPU > here? Any suggestions on how to go about finding the exact bottleneck here? > > Some more information about the writes: I have 2 column families, the data > though is mostly written in one column family with column sizes of around > 32k and each row having around 256 or 512 columns. I would really appreciate > any help here. > > Thanks, > Rishi > > >
Re: searching keys of the form substring*
As I told you on IRC channel dont go for shortcuts ...learn java first. ___ Vineet Daniel ___ Let your email find you On Tue, Jun 1, 2010 at 11:47 AM, Sagar Agrawal wrote: > Thanks Vineet for replying, but I am not able to understand how can we use > variable substitution in it. > > > > > On Mon, May 31, 2010 at 4:42 PM, vd wrote: > >> Hi Sagar >> >> You can use variable substitution. >> ___ >> Vineet Daniel >> ___ >> >> Let your email find you >> >> >> >> On Mon, May 31, 2010 at 3:44 PM, Sagar Agrawal wrote: >> >>> Hi folks, >>> >>> I want to fetch all those records from my column family such that the >>> key starts with a specified string... >>> >>> e.g. Suppose I have a CF keyed on full names(first name + last name) of >>> persons... >>> now I want to fetch all those records whose first name is 'John' >>> >>> Right now, I am using OPP and KeyRange in the following way: >>> >>> KeyRange keyRange = new KeyRange(); >>> keyRange.setStart_key("John"); >>> keyRange.setEnd_key("Joho"); >>> >>> but this is sort of hard coding can anyone suggest a better way to >>> achieve this? >>> >>> I would be really grateful... thank you. >>> >>> >>> >> >
cluster throwing errors when new or existing node joins
Hi I have a setup of 4 nodes, whenever I am restarting any of the nodes, even after deleting the data directories and commit log I get the following error ERROR 18:46:41,296 Fatal exception in thread Thread[COMMIT-LOG-WRITER,5,main] java.lang.RuntimeException: java.lang.NullPointerException at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.lang.Thread.run(Thread.java:636) Caused by: java.lang.NullPointerException at org.apache.cassandra.db.Table$TableMetadata.getColumnFamilyId(Table.java:131) at org.apache.cassandra.db.Table.getColumnFamilyId(Table.java:364) at org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:103) at org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:475) at org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:52) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 1 more ERROR 18:46:41,297 Error in ThreadPoolExecutor java.lang.NullPointerException at org.apache.cassandra.db.Table.apply(Table.java:407) at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:68) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:40) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) ERROR 18:46:41,299 Fatal exception in thread Thread[ROW-MUTATION-STAGE:5,5,main] java.lang.NullPointerException at org.apache.cassandra.db.Table.apply(Table.java:407) at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:68) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:40) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) ERROR 18:46:51,309 Error in ThreadPoolExecutor java.lang.NullPointerException at org.apache.cassandra.db.Table.apply(Table.java:407) at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:68) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:40) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) ERROR 18:46:51,310 Fatal exception in thread Thread[ROW-MUTATION-STAGE:6,5,main] Kindly suggest what can be the reason for this error. ___ VD ___ Let your email find you
Re: searching keys of the form substring*
Hi Sagar You can use variable substitution. ___ Vineet Daniel ___ Let your email find you On Mon, May 31, 2010 at 3:44 PM, Sagar Agrawal wrote: > Hi folks, > > I want to fetch all those records from my column family such that the key > starts with a specified string... > > e.g. Suppose I have a CF keyed on full names(first name + last name) of > persons... > now I want to fetch all those records whose first name is 'John' > > Right now, I am using OPP and KeyRange in the following way: > > KeyRange keyRange = new KeyRange(); > keyRange.setStart_key("John"); > keyRange.setEnd_key("Joho"); > > but this is sort of hard coding can anyone suggest a better way to > achieve this? > > I would be really grateful... thank you. > > >
Re: what is DCQUORUM
Thanks Eben On Wed, May 12, 2010 at 7:33 PM, Eben Hewitt wrote: > QUORUM is a high consistency level. It refers to the number of nodes that > have to acknowledge read or write operations in order to be assured that > Cassandra is in a consistent state. It uses / 2 + 1. > > DCQUORUM means "Data Center Quorum", and balances consistency with > performance. It puts multiple replicas in each Data Center so operations can > prefer replicas in the same DC for lower latency. > > See https://issues.apache.org/jira/browse/CASSANDRA-492 for a little > discussion. > > Also see > http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/locator/DatacenterShardStrategy.java > and > > http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/service/DatacenterWriteResponseHandler.java > > Eben > > > > On Wed, May 12, 2010 at 6:15 AM, vd wrote: > >> Hi >> >> I have read about QUORUM but lately came across DCQUORUM. What is it and >> whats the difference between the two ? >> >> > > > -- > "In science there are no 'depths'; there is surface everywhere." > --Rudolph Carnap >
what is DCQUORUM
Hi I have read about QUORUM but lately came across DCQUORUM. What is it and whats the difference between the two ?
Re: Is SuperColumn necessary?
Hi Can we make range search on ID:ID format as this would be treated as single ID by API or can it bifurcate on ':' . If now then how do can we ignore usage of supercolumns where we need to associate 'n' number of rows to a single ID. Like CatID1-> articleID1 CatID1-> articleID2 CatID1-> articleID3 CatID1-> articleID4 How can we map such scenarios with simple column families. Rgds. On Tue, May 11, 2010 at 2:11 PM, Torsten Curdt wrote: > Exactly. > > On Tue, May 11, 2010 at 10:20, David Boxenhorn wrote: >> Don't think of it as getting rid of supercolum. Think of it as adding >> superdupercolums, supertriplecolums, etc. Or, in sparse array terminology: >> array[dim1][dim2][dim3].[dimN] = value >> >> Or, as said above: >> >> > Type="UTF8"> >> > Type="UTF8"> >> >> >> >> >> >> >> >
updating column names
Hi I have a column named colom. Can we update column name "colom" to "column" during runtime or via API ?
how to count the columns
Hi Can we count the total no. of columns in a ColumnFamily, if yes how ?
Re: How to write WHERE .. LIKE query ?
Hi Mike AFAIK cassandra queries only on keys and not on column names, please verify. On Tue, May 11, 2010 at 11:06 AM, Mike Malone wrote: > > > On Mon, May 10, 2010 at 9:00 PM, Shuge Lee wrote: >> >> Hi all: >> How to write WHERE ... LIKE query ? >> For examples(described in Python): >> Schema: >> # columnfamily name >> resources = [ >> # key >> 'foo': { >> # columns and value >> 'url': 'foo.com', >> 'pushlier': 'foo', >> }, >> 'oof': { >> 'url': 'oof.com', >> 'pushlier': 'off', >> }, >> # ... , >> } >> # this is very easy, >> SELECT * FROM KEY = 'foo' >> but following are really hard: >> SELECT * FROM resources WHERE key LIKE 'o%' # get all records which key >> name contains character 'o'? > > get_range_slices(, ColumnParent(column_family), > SlicePredicate(slice_range=SliceRange('',''), KeyRange('o', 'o~'), > ConsistencyLevel.ONE); > >> >> SELECT * FROM resources WHERE url == 'oof.com' > > This is a projection. Cassandra doesn't support this sort of query out of > the box. You'll have to structure your data so that data you want to query > by is in the key or column name. Or you'll have to manually build secondary > indexes. > > Mike >
Re: Tuning Cassandra
What is the complete code string you are using to connect with cassandra from Java code On Mon, May 10, 2010 at 1:49 PM, David Boxenhorn wrote: > I don't know what "TSocket or the buffered one" means. Maybe I should know? > > I'm using Hector. Does that explain anything? > > On Mon, May 10, 2010 at 11:15 AM, vd wrote: > >> >> Hi >> >> what is it that you are using to connect with cassnadra TSocket or the >> buffered one ? >> >> >> >> >> ___ >> >> >> >> >> On Mon, May 10, 2010 at 1:29 PM, David Boxenhorn wrote: >> >>> I'm running Java on the client, jdbc queries on Oracle, Hector on >>> Cassandra. >>> >>> The Cassandra and Oracle database designs are radically different, as you >>> might guess. >>> >>> I have no doubt that Cassandra can be tuned, in a multiple-server >>> cluster, to have superior throughput (that's why I'm doing it!). But for >>> now, it's really frustrating my development effort that Cassandra is so >>> slow. Can't I get it up to twice as slow as Oracle in my configuration? >>> >>> On Mon, May 10, 2010 at 10:47 AM, vd wrote: >>> >>>> Hi David >>>> >>>> If I may ask...how do you plan to import data from oracle to cassandra ? >>>> As answer AFAIK cassandra's true ability comes into play when running on >>>> more than one machine...and please share how you are making comparisons >>>> like >>>> on writes or reads from cassandra. >>>> >>>> >>>> >>>> ___ >>>> ___ >>>> >>>> >>>> >>>> >>>> >>>> On Mon, May 10, 2010 at 1:04 PM, David Boxenhorn wrote: >>>> >>>>> I'm running Oracle and Cassandra on my machine, trying to import my >>>>> data to Cassandra from Oracle. >>>>> >>>>> In my configuration Oracle is about ten times faster than Cassandra. >>>>> Cassandra has out-of-the-box tuning. >>>>> >>>>> I am new to Cassandra. How do I begin trying to tune it? >>>>> >>>>> Thanks. >>>>> >>>> >>>> >>> >> >
Re: Tuning Cassandra
Hi what is it that you are using to connect with cassnadra TSocket or the buffered one ? ___ On Mon, May 10, 2010 at 1:29 PM, David Boxenhorn wrote: > I'm running Java on the client, jdbc queries on Oracle, Hector on > Cassandra. > > The Cassandra and Oracle database designs are radically different, as you > might guess. > > I have no doubt that Cassandra can be tuned, in a multiple-server cluster, > to have superior throughput (that's why I'm doing it!). But for now, it's > really frustrating my development effort that Cassandra is so slow. Can't I > get it up to twice as slow as Oracle in my configuration? > > On Mon, May 10, 2010 at 10:47 AM, vd wrote: > >> Hi David >> >> If I may ask...how do you plan to import data from oracle to cassandra ? >> As answer AFAIK cassandra's true ability comes into play when running on >> more than one machine...and please share how you are making comparisons like >> on writes or reads from cassandra. >> >> >> >> ___ >> ___ >> >> >> >> >> >> On Mon, May 10, 2010 at 1:04 PM, David Boxenhorn wrote: >> >>> I'm running Oracle and Cassandra on my machine, trying to import my data >>> to Cassandra from Oracle. >>> >>> In my configuration Oracle is about ten times faster than Cassandra. >>> Cassandra has out-of-the-box tuning. >>> >>> I am new to Cassandra. How do I begin trying to tune it? >>> >>> Thanks. >>> >> >> >
Re: Tuning Cassandra
Hi David If I may ask...how do you plan to import data from oracle to cassandra ? As answer AFAIK cassandra's true ability comes into play when running on more than one machine...and please share how you are making comparisons like on writes or reads from cassandra. ___ ___ On Mon, May 10, 2010 at 1:04 PM, David Boxenhorn wrote: > I'm running Oracle and Cassandra on my machine, trying to import my data to > Cassandra from Oracle. > > In my configuration Oracle is about ten times faster than Cassandra. > Cassandra has out-of-the-box tuning. > > I am new to Cassandra. How do I begin trying to tune it? > > Thanks. >