hadoop cassandra

2011-03-17 Thread Sagar Kohli
hi all,

is there any example of hadoop and cassandra integration where input is from 
hdfs and out put to cassandra

NOTE: i have gone through word count example provided with the source code, but 
it does not have above case..


regards
Sagar



Are you exploring a Big Data Strategy ? Listen to this recorded webinar on 
Planning your Hadoop/ NoSQL projects for 2011 at 
www.impetus.com/featured_webinar?eventid=37

Follow us on www.twitter.com/impetuscalling or visit www.impetus.com to know 
more.


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Cassandra 0.6.3 ring not balance in terms of data size

2011-03-17 Thread Ali Ahsan

Hi All

We are running Cassandra 0.6.3,We have two node's with replication 
factor one and ordered partitioning.Problem we are facing at the moment 
all data is being send to one Cassandra node  and its filling up quite 
rapidly and we are short of disk space.Unfortunately we have hardware 
constrain and we cannot add more hard disks.my current ring situation is 
that one node contain 89 GB and other have only 500 MB we are not able 
to sort this out why this is happening.Please give suggestion how can we 
solve this issue asap.


--
S.Ali Ahsan

Senior System Engineer

e-Business (Pvt) Ltd

49-C Jail Road, Lahore, P.O. Box 676
Lahore 54000, Pakistan

Tel: +92 (0)42 3758 7140 Ext. 128

Mobile: +92 (0)345 831 8769

Fax: +92 (0)42 3758 0027

Email: ali.ah...@panasiangroup.com



www.ebusiness-pg.com

www.panasiangroup.com

Confidentiality: This e-mail and any attachments may be confidential
and/or privileged. If you are not a named recipient, please notify the
sender immediately and do not disclose the contents to another person
use it for any purpose or store or copy the information in any medium.
Internet communications cannot be guaranteed to be timely, secure, error
or virus-free. We do not accept liability for any errors or omissions.



Re: Cassandra 0.6.3 ring not balance in terms of data size

2011-03-17 Thread aaron morton
With the Order Preserving Partitioner you are responsible for balancing the 
rows around the cluster, 
http://wiki.apache.org/cassandra/Operations?highlight=%28partitioner%29#Token_selection
 

Was there a reason for using the ordered partitioner rather than the random 
one?  

What does the output from nodetool ring look like ? 

You can change the token for the small node so that it takes ownership of more 
of the data see http://wiki.apache.org/cassandra/Operations#Moving_nodes You 
will need to understand the row keys you are using and pick the token 
appropriately. 

Hope that helps. 
Aaron

On 17 Mar 2011, at 22:50, Ali Ahsan wrote:

> Hi All
> 
> We are running Cassandra 0.6.3,We have two node's with replication factor one 
> and ordered partitioning.Problem we are facing at the moment all data is 
> being send to one Cassandra node  and its filling up quite rapidly and we are 
> short of disk space.Unfortunately we have hardware constrain and we cannot 
> add more hard disks.my current ring situation is that one node contain 89 GB 
> and other have only 500 MB we are not able to sort this out why this is 
> happening.Please give suggestion how can we solve this issue asap.
> 
> -- 
> S.Ali Ahsan
> 
> Senior System Engineer
> 
> e-Business (Pvt) Ltd
> 
> 49-C Jail Road, Lahore, P.O. Box 676
> Lahore 54000, Pakistan
> 
> Tel: +92 (0)42 3758 7140 Ext. 128
> 
> Mobile: +92 (0)345 831 8769
> 
> Fax: +92 (0)42 3758 0027
> 
> Email: ali.ah...@panasiangroup.com
> 
> 
> 
> www.ebusiness-pg.com
> 
> www.panasiangroup.com
> 
> Confidentiality: This e-mail and any attachments may be confidential
> and/or privileged. If you are not a named recipient, please notify the
> sender immediately and do not disclose the contents to another person
> use it for any purpose or store or copy the information in any medium.
> Internet communications cannot be guaranteed to be timely, secure, error
> or virus-free. We do not accept liability for any errors or omissions.
> 



Pauses of GC

2011-03-17 Thread ruslan usifov
Hello

Some times i have very long GC pauses:


Total time for which application threads were stopped: 0.0303150 seconds
2011-03-17T13:19:56.476+0300: 33295.671: [GC 33295.671: [ParNew:
678855K->20708K(737280K), 0.0271230 secs] 1457643K->806795K(4112384K),
0.027305
0 secs] [Times: user=0.33 sys=0.00, real=0.03 secs]
Total time for which application threads were stopped: 0.0291820 seconds
2011-03-17T13:20:32.962+0300: 2.157: [GC 2.157: [ParNew:
676068K->23527K(737280K), 0.0302180 secs] 1462155K->817599K(4112384K),
0.030402
0 secs] [Times: user=0.31 sys=0.00, real=0.03 secs]
Total time for which application threads were stopped: 0.1270270 seconds
2011-03-17T13:21:11.908+0300: 33371.103: [GC 33371.103: [ParNew:
678887K->21564K(737280K), 0.0268160 secs] 1472959K->823191K(4112384K),
0.027011
0 secs] [Times: user=0.28 sys=0.00, real=0.03 secs]
Total time for which application threads were stopped: 0.0293330 seconds
2011-03-17T13:21:50.482+0300: 33409.677: [GC 33409.677: [ParNew:
676924K->21115K(737280K), 0.0281720 secs] 1478551K->829900K(4112384K),
0.028363
0 secs] [Times: user=0.27 sys=0.00, real=0.03 secs]
Total time for which application threads were stopped: 0.0339610 seconds
2011-03-17T13:22:32.849+0300: 33452.044: [GC 33452.044: [ParNew:
676475K->25948K(737280K), 0.0317600 secs] 1485260K->842061K(4112384K),
0.031952
0 secs] [Times: user=0.22 sys=0.00, real=0.03 secs]
Total time for which application threads were stopped: 0.0344430 seconds
2011-03-17T13:23:14.924+0300: 33494.119: [GC 33494.119: [ParNew:
681308K->25087K(737280K), 0.0282600 secs] 1497421K->848300K(4112384K),
0.028436
0 secs] [Times: user=0.32 sys=0.00, real=0.03 secs]
Total time for which application threads were stopped: 0.0309160 seconds
2011-03-17T13:23:57.192+0300: 33536.387: [GC 33536.387: [ParNew:
680447K->24805K(737280K), 0.0299910 secs] 1503660K->855829K(4112384K),
0.030167
0 secs] [Times: user=0.29 sys=0.01, real=0.03 secs]
Total time for which application threads were stopped: 0.0324200 seconds
2011-03-17T13:24:01.553+0300: 33540.748: [GC 33540.749: [ParNew:
680165K->31886K(737280K), 0.0495620 secs] 1511189K->936503K(4112384K),
0.049742
0 secs] [Times: user=0.57 sys=0.00, real=0.05 secs]
Total time for which application threads were stopped: 0.0507030 seconds
2011-03-17T13:37:56.009+0300: 34375.204: [GC 34375.204: [ParNew:
687246K->28727K(737280K), 0.0244720 secs] 1591863K->942459K(4112384K),
0.024690
0 secs] [Times: user=0.18 sys=0.00, real=0.02 secs]
Total time for which application threads were stopped: 806.7442720 seconds
Total time for which application threads were stopped: 0.0006590 seconds
Total time for which application threads were stopped: 0.0004360 seconds
Total time for which application threads were stopped: 0.0004630 seconds
Total time for which application threads were stopped: 0.0008120 seconds
2011-03-17T13:37:59.018+0300: 34378.213: [GC 34378.213: [ParNew:
676678K->21640K(737280K), 0.0137740 secs] 1590410K->949991K(4112384K),
0.013961
0 secs] [Times: user=0.13 sys=0.02, real=0.01 secs]
Total time for which application threads were stopped: 0.0145920 seconds
Total time for which application threads were stopped: 0.1036080 seconds
Total time for which application threads were stopped: 0.0585600 seconds
Total time for which application threads were stopped: 0.0600550 seconds
Total time for which application threads were stopped: 0.0008560 seconds
Total time for which application threads were stopped: 0.0006770 seconds
Total time for which application threads were stopped: 0.0005910 seconds
Total time for which application threads were stopped: 0.0351330 seconds
Total time for which application threads were stopped: 0.0329020 seconds
Total time for which application threads were stopped: 0.0728490 seconds
Total time for which application threads were stopped: 0.0480990 seconds
Total time for which application threads were stopped: 0.0804250 seconds
2011-03-17T13:38:04.394+0300: 34383.589: [GC 34383.589: [ParNew:
677000K->8375K(737280K), 0.0218310 secs] 1605351K->944271K(4112384K),
0.0220300
 secs]




I have follow nodetoll cfstats on hung node:

Keyspace: fishdom_tuenti
Read Count: 4970999
Read Latency: 1.0267005945887335 ms.
Write Count: 1441619
Write Latency: 0.013146585887117193 ms.
Pending Tasks: 0
Column Family: decor
SSTable count: 3
Space used (live): 1296203532
Space used (total): 1302520037
Memtable Columns Count: 1066
Memtable Data Size: 121742
Memtable Switch Count: 11
Read Count: 108125
Read Latency: 2.809 ms.
Write Count: 11261
Write Latency: 0.006 ms.
Pending Tasks: 0
Key cache capacity: 30
Key cache size: 46470
Key cache hit rate: 0.40384615384615385
Row cache: disabled
Compacted row minimum size: 36
Compacted row maximum size: 73457
Compacted row mean size: 958

Column Family: adopt
SSTable count: 1

Re: Cassandra 0.6.3 ring not balance in terms of data size

2011-03-17 Thread Ali Ahsan

Below is the ouput of nodetool ring

Address   Status Load  
Range  Ring

   TuL8jLqs7uxLipP6
192.168.100.3 Up 89.91 GB  
JDtVOU0YVQ6MtBYA   |<--|
192.168.100.4 Up 487.41 MB 
TuL8jLqs7uxLipP6   |-->|


Reasons for using ordered partitioner because we wanted range slices and 
while keeping the order of the keys.




On 03/17/2011 03:40 PM, aaron morton wrote:
With the Order Preserving Partitioner you are responsible for 
balancing the rows around the cluster, 
http://wiki.apache.org/cassandra/Operations?highlight=%28partitioner%29#Token_selection 



Was there a reason for using the ordered partitioner rather than the 
random one?


What does the output from nodetool ring look like ?

You can change the token for the small node so that it takes ownership 
of more of the data see 
http://wiki.apache.org/cassandra/Operations#Moving_nodes 
 You 
will need to understand the row keys you are using and pick the token 
appropriately.


Hope that helps.
Aaron

On 17 Mar 2011, at 22:50, Ali Ahsan wrote:


Hi All

We are running Cassandra 0.6.3,We have two node's with replication 
factor one and ordered partitioning.Problem we are facing at the 
moment all data is being send to one Cassandra node  and its filling 
up quite rapidly and we are short of disk space.Unfortunately we have 
hardware constrain and we cannot add more hard disks.my current ring 
situation is that one node contain 89 GB and other have only 500 MB 
we are not able to sort this out why this is happening.Please give 
suggestion how can we solve this issue asap.


--
S.Ali Ahsan

Senior System Engineer

e-Business (Pvt) Ltd

49-C Jail Road, Lahore, P.O. Box 676
Lahore 54000, Pakistan

Tel: +92 (0)42 3758 7140 Ext. 128

Mobile: +92 (0)345 831 8769

Fax: +92 (0)42 3758 0027

Email: ali.ah...@panasiangroup.com 



www.ebusiness-pg.com 

www.panasiangroup.com

Confidentiality: This e-mail and any attachments may be confidential
and/or privileged. If you are not a named recipient, please notify the
sender immediately and do not disclose the contents to another person
use it for any purpose or store or copy the information in any medium.
Internet communications cannot be guaranteed to be timely, secure, error
or virus-free. We do not accept liability for any errors or omissions.






--
S.Ali Ahsan

Senior System Engineer

e-Business (Pvt) Ltd

49-C Jail Road, Lahore, P.O. Box 676
Lahore 54000, Pakistan

Tel: +92 (0)42 3758 7140 Ext. 128

Mobile: +92 (0)345 831 8769

Fax: +92 (0)42 3758 0027

Email: ali.ah...@panasiangroup.com



www.ebusiness-pg.com

www.panasiangroup.com

Confidentiality: This e-mail and any attachments may be confidential
and/or privileged. If you are not a named recipient, please notify the
sender immediately and do not disclose the contents to another person
use it for any purpose or store or copy the information in any medium.
Internet communications cannot be guaranteed to be timely, secure, error
or virus-free. We do not accept liability for any errors or omissions.



Replacing a dead seed

2011-03-17 Thread Jonathan Colby
Hi - 

If a seed crashes (i.e., suddenly unavailable due to HW problem),   what is the 
best way to replace the seed in the cluster?

I've read that you should not bootstrap a seed.  Therefore I came up with this 
procedure, but it seems pretty complicated.  any better ideas?
 
1. update the seed list on all nodes, taking out the dead node  and restart the 
nodes in the  cluster so the new seed list is updated
2. then bootstrap the new (replacement ) node as a normal node  (not yet as a 
seed)
3. when bootstrapping is done, make the new node a seed.
4. update the seed list again adding back the replacement seed (and rolling 
restart the cluster as in step 1)


That seems to me like a whole lot of work.  Surely there is a better way?

Jon

Re: super_column.name?

2011-03-17 Thread Jonathan Ellis
I see super-column-0 in there.  Not sure what the question is.

On Wed, Mar 16, 2011 at 10:20 PM, Michael Fortin  wrote:
> Hi,
>
> I've been working on a scala based api for cassandra.  I've built it directly 
> on top of thrift.  I'm having a problem getting a slice of a superColumn.  
> When I get a columnOrSuperColumn back, and call 'cos.super_column.name' and 
> deserialize the bytes I'm not getting the expected output.
>
> Here's whats in cassandra
> ---
> RowKey: key
> => (super_column=super-col-0,
>     (column=column, value=76616c756530, timestamp=1300330948240)
>     (column=column1, value=76616c756530, timestamp=1300330948244))
> ….
>
> and this is the deserialized string
>
> ?       get_slice                    super-col-0              column       
> value0
>     .?æ?        column1       value0
>     .?æ?            super-col-1              column       value1
>     .?æ?        column1       value1
>     .?æ?            super-col-2              column       value2
>     .?æ?        column1       value2
>     .?æ?            super-col-3              column       value3
>     .?æ?        column1       value3
>     .?æ?
>
> I would expect
> super-col-0
>
> Any ideas on what I'm doing wrong?
>
> Thanks,
> Mike



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: getting exception when cassandra 0.7.3 is starting

2011-03-17 Thread Jonathan Ellis
Remove the cache file or upgrade to 0.7.4

On Thu, Mar 17, 2011 at 1:15 AM, Anurag Gujral  wrote:
> I am getting exception when starting cassandra 0.7.3
>
> ERROR 01:10:48,321 Exception encountered during startup.
> java.lang.NegativeArraySizeException
>     at
> org.apache.cassandra.db.ColumnFamilyStore.readSavedCache(ColumnFamilyStore.java:274)
>     at
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:213)
>     at
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:466)
>     at
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:447)
>     at org.apache.cassandra.db.Table.initCf(Table.java:317)
>     at org.apache.cassandra.db.Table.(Table.java:254)
>     at org.apache.cassandra.db.Table.open(Table.java:110)
>     at org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:207)
>     at
> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:129)
>     at
> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:316)
>     at
> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:79)
> Exception encountered during startup.
> java.lang.NegativeArraySizeException
>     at
> org.apache.cassandra.db.ColumnFamilyStore.readSavedCache(ColumnFamilyStore.java:274)
>     at
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:213)
>     at
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:466)
>     at
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:447)
>     at org.apache.cassandra.db.Table.initCf(Table.java:317)
>     at org.apache.cassandra.db.Table.(Table.java:254)
>     at org.apache.cassandra.db.Table.open(Table.java:110)
>     at org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:207)
>     at
> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:129)
>     at
> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:316)
>     at
> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:79)
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: insert during forced compaction

2011-03-17 Thread Jonathan Ellis
We're aware of the potential for races during schema change but it
looks like we missed this one.  Can you create a ticket?

On Wed, Mar 16, 2011 at 11:55 PM, Jeffrey Wang  wrote:
> Hey all,
>
>
>
> I’m running 0.7.0 on a cluster of 5 machines. When I create a new column
> family after I run nodetool compact (but before it finishes), I see the
> error below. Seems like StorageService.getValidColumnFamilies() should make
> a copy of the set of column families in the case where cfNames.length == 0.
> Incidentally, the interaction of schema changes with forcing
> compaction/flush/etc seems like it could be the source of many race
> conditions (e.g. CF is deleted before the compaction gets to it). Any
> thoughts?
>
>
>
> -Jeffrey
>
>
>
> java.util.ConcurrentModificationException
>
>     at java.util.HashMap$HashIterator.nextEntry(Unknown Source)
>
>     at java.util.HashMap$ValueIterator.next(Unknown Source)
>
>     at java.util.Collections$UnmodifiableCollection$1.next(Unknown
> Source)
>
>     at
> org.apache.cassandra.service.StorageService.forceTableCompaction(StorageService.java:1140)
>
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>     at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>
>     at java.lang.reflect.Method.invoke(Unknown Source)
>
>     at
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source)
>
>     at
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(Unknown Source)
>
>     at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(Unknown Source)
>
>     at com.sun.jmx.mbeanserver.PerInterface.invoke(Unknown Source)
>
>     at com.sun.jmx.mbeanserver.MBeanSupport.invoke(Unknown Source)
>
>     at
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(Unknown Source)
>
>     at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(Unknown Source)
>
>     at javax.management.remote.rmi.RMIConnectionImpl.doOperation(Unknown
> Source)
>
>     at javax.management.remote.rmi.RMIConnectionImpl.access$200(Unknown
> Source)
>
>     at
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(Unknown
> Source)
>
>     at
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(Unknown
> Source)
>
>     at javax.management.remote.rmi.RMIConnectionImpl.invoke(Unknown
> Source)
>
>     at sun.reflect.GeneratedMethodAccessor84.invoke(Unknown Source)
>
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>
>     at java.lang.reflect.Method.invoke(Unknown Source)
>
>     at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source)
>
>     at sun.rmi.transport.Transport$1.run(Unknown Source)
>
>     at java.security.AccessController.doPrivileged(Native Method)
>
>     at sun.rmi.transport.Transport.serviceCall(Unknown Source)
>
>     at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source)
>
>     at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown
> Source)
>
>     at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown
> Source)
>
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
> Source)
>
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
>
>     at java.lang.Thread.run(Unknown Source)



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: super_column.name?

2011-03-17 Thread Sylvain Lebresne
Are you sure you don't have a problem with handling ByteBuffers ?
What do you mean by 'deserialized string' ?

--
Sylvain

On Thu, Mar 17, 2011 at 4:20 AM, Michael Fortin  wrote:
> Hi,
>
> I've been working on a scala based api for cassandra.  I've built it directly 
> on top of thrift.  I'm having a problem getting a slice of a superColumn.  
> When I get a columnOrSuperColumn back, and call 'cos.super_column.name' and 
> deserialize the bytes I'm not getting the expected output.
>
> Here's whats in cassandra
> ---
> RowKey: key
> => (super_column=super-col-0,
>     (column=column, value=76616c756530, timestamp=1300330948240)
>     (column=column1, value=76616c756530, timestamp=1300330948244))
> ….
>
> and this is the deserialized string
>
> ?       get_slice                    super-col-0              column       
> value0
>     .?æ?        column1       value0
>     .?æ?            super-col-1              column       value1
>     .?æ?        column1       value1
>     .?æ?            super-col-2              column       value2
>     .?æ?        column1       value2
>     .?æ?            super-col-3              column       value3
>     .?æ?        column1       value3
>     .?æ?
>
> I would expect
> super-col-0
>
> Any ideas on what I'm doing wrong?
>
> Thanks,
> Mike


Re: Cassandra 0.6.3 ring not balance in terms of data size

2011-03-17 Thread Ali Ahsan

Dear Aaron,

We are little confused about OPP token.How to calculate OPP Token? Few 
of our column families have UUID as key and other's  have  integer as key.



On 03/17/2011 04:22 PM, Ali Ahsan wrote:

Below is the ouput of nodetool ring

Address   Status Load  
Range  Ring

   TuL8jLqs7uxLipP6
192.168.100.3 Up 89.91 GB  
JDtVOU0YVQ6MtBYA   |<--|
192.168.100.4 Up 487.41 MB 
TuL8jLqs7uxLipP6   |-->|


Reasons for using ordered partitioner because we wanted range slices 
and while keeping the order of the keys.




On 03/17/2011 03:40 PM, aaron morton wrote:
With the Order Preserving Partitioner you are responsible for 
balancing the rows around the cluster, 
http://wiki.apache.org/cassandra/Operations?highlight=%28partitioner%29#Token_selection 



Was there a reason for using the ordered partitioner rather than the 
random one?


What does the output from nodetool ring look like ?

You can change the token for the small node so that it takes 
ownership of more of the data see 
http://wiki.apache.org/cassandra/Operations#Moving_nodes 
 You 
will need to understand the row keys you are using and pick the token 
appropriately.


Hope that helps.
Aaron

On 17 Mar 2011, at 22:50, Ali Ahsan wrote:


Hi All

We are running Cassandra 0.6.3,We have two node's with replication 
factor one and ordered partitioning.Problem we are facing at the 
moment all data is being send to one Cassandra node  and its filling 
up quite rapidly and we are short of disk space.Unfortunately we 
have hardware constrain and we cannot add more hard disks.my current 
ring situation is that one node contain 89 GB and other have only 
500 MB we are not able to sort this out why this is happening.Please 
give suggestion how can we solve this issue asap.


--
S.Ali Ahsan

Senior System Engineer

e-Business (Pvt) Ltd

49-C Jail Road, Lahore, P.O. Box 676
Lahore 54000, Pakistan

Tel: +92 (0)42 3758 7140 Ext. 128

Mobile: +92 (0)345 831 8769

Fax: +92 (0)42 3758 0027

Email: ali.ah...@panasiangroup.com 



www.ebusiness-pg.com 

www.panasiangroup.com

Confidentiality: This e-mail and any attachments may be confidential
and/or privileged. If you are not a named recipient, please notify the
sender immediately and do not disclose the contents to another person
use it for any purpose or store or copy the information in any medium.
Internet communications cannot be guaranteed to be timely, secure, error
or virus-free. We do not accept liability for any errors or omissions.






--
S.Ali Ahsan

Senior System Engineer

e-Business (Pvt) Ltd

49-C Jail Road, Lahore, P.O. Box 676
Lahore 54000, Pakistan

Tel: +92 (0)42 3758 7140 Ext. 128

Mobile: +92 (0)345 831 8769

Fax: +92 (0)42 3758 0027

Email:ali.ah...@panasiangroup.com



www.ebusiness-pg.com

www.panasiangroup.com

Confidentiality: This e-mail and any attachments may be confidential
and/or privileged. If you are not a named recipient, please notify the
sender immediately and do not disclose the contents to another person
use it for any purpose or store or copy the information in any medium.
Internet communications cannot be guaranteed to be timely, secure, error
or virus-free. We do not accept liability for any errors or omissions.



--
S.Ali Ahsan

Senior System Engineer

e-Business (Pvt) Ltd

49-C Jail Road, Lahore, P.O. Box 676
Lahore 54000, Pakistan

Tel: +92 (0)42 3758 7140 Ext. 128

Mobile: +92 (0)345 831 8769

Fax: +92 (0)42 3758 0027

Email: ali.ah...@panasiangroup.com



www.ebusiness-pg.com

www.panasiangroup.com

Confidentiality: This e-mail and any attachments may be confidential
and/or privileged. If you are not a named recipient, please notify the
sender immediately and do not disclose the contents to another person
use it for any purpose or store or copy the information in any medium.
Internet communications cannot be guaranteed to be timely, secure, error
or virus-free. We do not accept liability for any errors or omissions.



Re: hadoop cassandra

2011-03-17 Thread Jeremy Hanna
You can start with a word count example that's only for hdfs.  Then you can 
replace the reducer in that with the ReducerToCassandra that's in the cassandra 
word_count example.  You need to match up your Mapper's output to the Reducer's 
input and set a couple of configuration variables to tell it how to hook up to 
cassandra, but that should be it - a working word count example that takes 
input from hdfs and outputs to cassandra.

We kind of figured that plenty of documentation was out there for hadoop with 
hdfs.  The word count example just demonstrates something specific to 
cassandra.  However hadoop is so pluggable that as long as the input and output 
types line up, you can mix and match most anything with the inputformat and 
outputformat (like in word count you can output to cassandra or to the local 
filesystem - there are two different inner classes).

Does that help?

Jeremy

On Mar 17, 2011, at 3:28 AM, Sagar Kohli wrote:

> hi all,
> 
> is there any example of hadoop and cassandra integration where input is from 
> hdfs and out put to cassandra
> 
> NOTE: i have gone through word count example provided with the source code, 
> but it does not have above case..
> 
> 
> regards
> Sagar
> 
> 
> Are you exploring a Big Data Strategy ? Listen to this recorded webinar on 
> Planning your Hadoop/ NoSQL projects for 2011 at 
> www.impetus.com/featured_webinar?eventid=37 
> 
> Follow us on www.twitter.com/impetuscalling or visit www.impetus.com to know 
> more. 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, 
> privileged or otherwise protected by law. The message is intended solely for 
> the named addressee. If received in error, please destroy and notify the 
> sender. Any use of this email is prohibited when received in error. Impetus 
> does not represent, warrant and/or guarantee, that the integrity of this 
> communication has been maintained nor that the communication is free of 
> errors, virus, interception or interference.



Re: Replacing a dead seed

2011-03-17 Thread Edward Capriolo
On Thu, Mar 17, 2011 at 9:09 AM, Jonathan Colby
 wrote:
> Hi -
>
> If a seed crashes (i.e., suddenly unavailable due to HW problem),   what is 
> the best way to replace the seed in the cluster?
>
> I've read that you should not bootstrap a seed.  Therefore I came up with 
> this procedure, but it seems pretty complicated.  any better ideas?
>
> 1. update the seed list on all nodes, taking out the dead node  and restart 
> the nodes in the  cluster so the new seed list is updated
> 2. then bootstrap the new (replacement ) node as a normal node  (not yet as a 
> seed)
> 3. when bootstrapping is done, make the new node a seed.
> 4. update the seed list again adding back the replacement seed (and rolling 
> restart the cluster as in step 1)
>
>
> That seems to me like a whole lot of work.  Surely there is a better way?
>
> Jon

It is true that Seeds do not auto bootstrap. But in this case it does
not matter if the other nodes believe this node is a seed. It only
matters what the joining node is configured to believe.

On the joining node do not include it's hostname/IP in the seed list
and it should auto-bootstrap normally.


RE: hadoop cassandra

2011-03-17 Thread Sagar Kohli
thanks Jeremy, its good pointer to start with

regards
Sagar

From: Jeremy Hanna [jeremy.hanna1...@gmail.com]
Sent: Thursday, March 17, 2011 7:34 PM
To: user@cassandra.apache.org
Subject: Re: hadoop cassandra

You can start with a word count example that's only for hdfs.  Then you can 
replace the reducer in that with the ReducerToCassandra that's in the cassandra 
word_count example.  You need to match up your Mapper's output to the Reducer's 
input and set a couple of configuration variables to tell it how to hook up to 
cassandra, but that should be it - a working word count example that takes 
input from hdfs and outputs to cassandra.

We kind of figured that plenty of documentation was out there for hadoop with 
hdfs.  The word count example just demonstrates something specific to 
cassandra.  However hadoop is so pluggable that as long as the input and output 
types line up, you can mix and match most anything with the inputformat and 
outputformat (like in word count you can output to cassandra or to the local 
filesystem - there are two different inner classes).

Does that help?

Jeremy

On Mar 17, 2011, at 3:28 AM, Sagar Kohli wrote:

> hi all,
>
> is there any example of hadoop and cassandra integration where input is from 
> hdfs and out put to cassandra
>
> NOTE: i have gone through word count example provided with the source code, 
> but it does not have above case..
>
>
> regards
> Sagar
>
>
> Are you exploring a Big Data Strategy ? Listen to this recorded webinar on 
> Planning your Hadoop/ NoSQL projects for 2011 at 
> www.impetus.com/featured_webinar?eventid=37
>
> Follow us on www.twitter.com/impetuscalling or visit www.impetus.com to know 
> more.
>
>
> NOTE: This message may contain information that is confidential, proprietary, 
> privileged or otherwise protected by law. The message is intended solely for 
> the named addressee. If received in error, please destroy and notify the 
> sender. Any use of this email is prohibited when received in error. Impetus 
> does not represent, warrant and/or guarantee, that the integrity of this 
> communication has been maintained nor that the communication is free of 
> errors, virus, interception or interference.




Are you exploring a Big Data Strategy ? Listen to this recorded webinar on 
Planning your Hadoop/ NoSQL projects for 2011 at 
www.impetus.com/featured_webinar?eventid=37

Follow us on www.twitter.com/impetuscalling or visit www.impetus.com to know 
more.


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


moving data from single node cassandra

2011-03-17 Thread Komal Goyal
Hi,

I am having single node cassandra setup on a windows machine.
Very soon I have ran out of space on this machine so have increased the
hardisk capacity of the machine.
Now I want to know how I configure cassandra to start storing data in these
high space partitions?

Also how the existing data store in this single node cassandra can be moved
from C drive to the other drives?

Is there any documentation as to how these configurations can be done?

some supporting links will be very helpful..



Thanks,

*Komal Goyal*


InvalidRequestException: Mutation must have one ColumnOrSuperColumn or one Deletion

2011-03-17 Thread Anurag Gujral
Hi All,
  I am using function batch_mutate of cassandra 0.7 and I am getting
the error InvalidRequestException: Mutation must have one
ColumnOrSuperColumn or one Deletion. I have my own C++ cassandra client
using thrift 0.0.5 api.

Any Suggestions.

Sample Code
map > cfmap;

vector mutations;

   Column temp_col;
temp_col.name.assign("abcd");
temp_col.value.assign("efgh");
temp_col.timestamp = timestamp;
temp_col.ttl = 0; // TODO: TTL
ColumnOrSuperColumn cosc;
cosc.column = temp_col;
cosc.__isset.column = true; // must set which data type, col or
super

Mutation mutation ;
mutation.column_or_supercolumn=cosc;
mutations.push_back(mutation);

   cfmap.insert(make_pair(colspace, mutations));
map > > mutationMap;
mutationMap.insert(make_pair("firstRow",cfmap))

conn->client->batch_mutate(
mutationMap,
(org::apache::cassandra::ConsistencyLevel::type)cons);


Thanks
Anurag


Re: InvalidRequestException: Mutation must have one ColumnOrSuperColumn or one Deletion

2011-03-17 Thread Tyler Hobbs
You need to set the __isset on the Mutation object as well.

On Thu, Mar 17, 2011 at 10:13 AM, Anurag Gujral wrote:

> Hi All,
>   I am using function batch_mutate of cassandra 0.7 and I am
> getting the error InvalidRequestException: Mutation must have one
> ColumnOrSuperColumn or one Deletion. I have my own C++ cassandra client
> using thrift 0.0.5 api.
>
> Any Suggestions.
>
> Sample Code
> map > cfmap;
>
> vector mutations;
>
>Column temp_col;
> temp_col.name.assign("abcd");
> temp_col.value.assign("efgh");
> temp_col.timestamp = timestamp;
> temp_col.ttl = 0; // TODO: TTL
> ColumnOrSuperColumn cosc;
> cosc.column = temp_col;
> cosc.__isset.column = true; // must set which data type, col or
> super
>
> Mutation mutation ;
> mutation.column_or_supercolumn=cosc;
> mutations.push_back(mutation);
>
>cfmap.insert(make_pair(colspace, mutations));
> map > > mutationMap;
> mutationMap.insert(make_pair("firstRow",cfmap))
>
> conn->client->batch_mutate(
> mutationMap,
> (org::apache::cassandra::ConsistencyLevel::type)cons);
>
>
> Thanks
> Anurag
>



-- 
Tyler Hobbs
Software Engineer, DataStax 
Maintainer of the pycassa  Cassandra
Python client library


Re: InvalidRequestException: Mutation must have one ColumnOrSuperColumn or one Deletion

2011-03-17 Thread Anurag Gujral
Thanks for the reply. I added mutation.__isset.column_or_supercolumn=true;

Now I am getting TApplicationException: Internal error processing
batch_mutate

Any suggestions?
Thanks
Anurag

On Thu, Mar 17, 2011 at 8:13 AM, Anurag Gujral wrote:

> Hi All,
>   I am using function batch_mutate of cassandra 0.7 and I am
> getting the error InvalidRequestException: Mutation must have one
> ColumnOrSuperColumn or one Deletion. I have my own C++ cassandra client
> using thrift 0.0.5 api.
>
> Any Suggestions.
>
> Sample Code
> map > cfmap;
>
> vector mutations;
>
>Column temp_col;
> temp_col.name.assign("abcd");
> temp_col.value.assign("efgh");
> temp_col.timestamp = timestamp;
> temp_col.ttl = 0; // TODO: TTL
> ColumnOrSuperColumn cosc;
> cosc.column = temp_col;
> cosc.__isset.column = true; // must set which data type, col or
> super
>
> Mutation mutation ;
> mutation.column_or_supercolumn=cosc;
> mutations.push_back(mutation);
>
>cfmap.insert(make_pair(colspace, mutations));
> map > > mutationMap;
> mutationMap.insert(make_pair("firstRow",cfmap))
>
> conn->client->batch_mutate(
> mutationMap,
> (org::apache::cassandra::ConsistencyLevel::type)cons);
>
>
> Thanks
> Anurag
>


Re: Pauses of GC

2011-03-17 Thread Narendra Sharma
What heap size are you running with? and Which version of Cassandra?

Thanks,
Naren

On Thu, Mar 17, 2011 at 3:45 AM, ruslan usifov wrote:

> Hello
>
> Some times i have very long GC pauses:
>
>
> Total time for which application threads were stopped: 0.0303150 seconds
> 2011-03-17T13:19:56.476+0300: 33295.671: [GC 33295.671: [ParNew:
> 678855K->20708K(737280K), 0.0271230 secs] 1457643K->806795K(4112384K),
> 0.027305
> 0 secs] [Times: user=0.33 sys=0.00, real=0.03 secs]
> Total time for which application threads were stopped: 0.0291820 seconds
> 2011-03-17T13:20:32.962+0300: 2.157: [GC 2.157: [ParNew:
> 676068K->23527K(737280K), 0.0302180 secs] 1462155K->817599K(4112384K),
> 0.030402
> 0 secs] [Times: user=0.31 sys=0.00, real=0.03 secs]
> Total time for which application threads were stopped: 0.1270270 seconds
> 2011-03-17T13:21:11.908+0300: 33371.103: [GC 33371.103: [ParNew:
> 678887K->21564K(737280K), 0.0268160 secs] 1472959K->823191K(4112384K),
> 0.027011
> 0 secs] [Times: user=0.28 sys=0.00, real=0.03 secs]
> Total time for which application threads were stopped: 0.0293330 seconds
> 2011-03-17T13:21:50.482+0300: 33409.677: [GC 33409.677: [ParNew:
> 676924K->21115K(737280K), 0.0281720 secs] 1478551K->829900K(4112384K),
> 0.028363
> 0 secs] [Times: user=0.27 sys=0.00, real=0.03 secs]
> Total time for which application threads were stopped: 0.0339610 seconds
> 2011-03-17T13:22:32.849+0300: 33452.044: [GC 33452.044: [ParNew:
> 676475K->25948K(737280K), 0.0317600 secs] 1485260K->842061K(4112384K),
> 0.031952
> 0 secs] [Times: user=0.22 sys=0.00, real=0.03 secs]
> Total time for which application threads were stopped: 0.0344430 seconds
> 2011-03-17T13:23:14.924+0300: 33494.119: [GC 33494.119: [ParNew:
> 681308K->25087K(737280K), 0.0282600 secs] 1497421K->848300K(4112384K),
> 0.028436
> 0 secs] [Times: user=0.32 sys=0.00, real=0.03 secs]
> Total time for which application threads were stopped: 0.0309160 seconds
> 2011-03-17T13:23:57.192+0300: 33536.387: [GC 33536.387: [ParNew:
> 680447K->24805K(737280K), 0.0299910 secs] 1503660K->855829K(4112384K),
> 0.030167
> 0 secs] [Times: user=0.29 sys=0.01, real=0.03 secs]
> Total time for which application threads were stopped: 0.0324200 seconds
> 2011-03-17T13:24:01.553+0300: 33540.748: [GC 33540.749: [ParNew:
> 680165K->31886K(737280K), 0.0495620 secs] 1511189K->936503K(4112384K),
> 0.049742
> 0 secs] [Times: user=0.57 sys=0.00, real=0.05 secs]
> Total time for which application threads were stopped: 0.0507030 seconds
> 2011-03-17T13:37:56.009+0300: 34375.204: [GC 34375.204: [ParNew:
> 687246K->28727K(737280K), 0.0244720 secs] 1591863K->942459K(4112384K),
> 0.024690
> 0 secs] [Times: user=0.18 sys=0.00, real=0.02 secs]
> Total time for which application threads were stopped: 806.7442720 seconds
> Total time for which application threads were stopped: 0.0006590 seconds
> Total time for which application threads were stopped: 0.0004360 seconds
> Total time for which application threads were stopped: 0.0004630 seconds
> Total time for which application threads were stopped: 0.0008120 seconds
> 2011-03-17T13:37:59.018+0300: 34378.213: [GC 34378.213: [ParNew:
> 676678K->21640K(737280K), 0.0137740 secs] 1590410K->949991K(4112384K),
> 0.013961
> 0 secs] [Times: user=0.13 sys=0.02, real=0.01 secs]
> Total time for which application threads were stopped: 0.0145920 seconds
> Total time for which application threads were stopped: 0.1036080 seconds
> Total time for which application threads were stopped: 0.0585600 seconds
> Total time for which application threads were stopped: 0.0600550 seconds
> Total time for which application threads were stopped: 0.0008560 seconds
> Total time for which application threads were stopped: 0.0006770 seconds
> Total time for which application threads were stopped: 0.0005910 seconds
> Total time for which application threads were stopped: 0.0351330 seconds
> Total time for which application threads were stopped: 0.0329020 seconds
> Total time for which application threads were stopped: 0.0728490 seconds
> Total time for which application threads were stopped: 0.0480990 seconds
> Total time for which application threads were stopped: 0.0804250 seconds
> 2011-03-17T13:38:04.394+0300: 34383.589: [GC 34383.589: [ParNew:
> 677000K->8375K(737280K), 0.0218310 secs] 1605351K->944271K(4112384K),
> 0.0220300
>  secs]
>
>
>
>
> I have follow nodetoll cfstats on hung node:
>
> Keyspace: fishdom_tuenti
> Read Count: 4970999
> Read Latency: 1.0267005945887335 ms.
> Write Count: 1441619
> Write Latency: 0.013146585887117193 ms.
> Pending Tasks: 0
> Column Family: decor
> SSTable count: 3
> Space used (live): 1296203532
> Space used (total): 1302520037
> Memtable Columns Count: 1066
> Memtable Data Size: 121742
> Memtable Switch Count: 11
> Read Count: 108125
> Read Latency: 2.809 ms.
> Write Count: 11261
> Write Latency: 0.006 ms.
> Pendi

Re: Pauses of GC

2011-03-17 Thread ruslan usifov
2011/3/17 Narendra Sharma 

> What heap size are you running with? and Which version of Cassandra?
>
> 4G with cassandra 0.7.4


Re: [RELEASE] 0.7.4

2011-03-17 Thread A J
I don't see binary_memtable_throughput_in_mb parameter in
cassandra.yaml anymore.
What is it replaced by ?

thanks.

On Tue, Mar 15, 2011 at 11:32 PM, Eric Evans  wrote:
> On Tue, 2011-03-15 at 22:19 -0500, Eric Evans wrote:
>> On Tue, 2011-03-15 at 14:26 -0700, Mark wrote:
>> > Still not seeing 0.7.4 as a download option on the main site?
>>
>> Something about the site's pubsub isn't working; I'll contact INFRA.
>
> https://issues.apache.org/jira/browse/INFRA-3520
>
> --
> Eric Evans
> eev...@rackspace.com
>
>


Re: InvalidRequestException: Mutation must have one ColumnOrSuperColumn or one Deletion

2011-03-17 Thread Jonathan Ellis
Internal error means "there is a stacktrace in the server system.log"
and in this case probably also means "you sent some kind of invalid
request that our validation didn't catch."

On Thu, Mar 17, 2011 at 11:29 AM, Anurag Gujral  wrote:
> Thanks for the reply. I added mutation.__isset.column_or_supercolumn=true;
>
> Now I am getting TApplicationException: Internal error processing
> batch_mutate
>
> Any suggestions?
> Thanks
> Anurag
>
> On Thu, Mar 17, 2011 at 8:13 AM, Anurag Gujral 
> wrote:
>>
>> Hi All,
>>   I am using function batch_mutate of cassandra 0.7 and I am
>> getting the error InvalidRequestException: Mutation must have one
>> ColumnOrSuperColumn or one Deletion. I have my own C++ cassandra client
>> using thrift 0.0.5 api.
>>
>> Any Suggestions.
>>
>> Sample Code
>>     map > cfmap;
>>
>>     vector mutations;
>>
>>    Column temp_col;
>>     temp_col.name.assign("abcd");
>>     temp_col.value.assign("efgh");
>>     temp_col.timestamp = timestamp;
>>     temp_col.ttl = 0; // TODO: TTL
>>     ColumnOrSuperColumn cosc;
>>     cosc.column = temp_col;
>>     cosc.__isset.column = true; // must set which data type, col or
>> super
>>
>>     Mutation mutation ;
>>     mutation.column_or_supercolumn=cosc;
>>     mutations.push_back(mutation);
>>
>>        cfmap.insert(make_pair(colspace, mutations));
>>     map > > mutationMap;
>>     mutationMap.insert(make_pair("firstRow",cfmap))
>>
>> conn->client->batch_mutate(
>>     mutationMap,
>> (org::apache::cassandra::ConsistencyLevel::type)cons);
>>
>>
>> Thanks
>> Anurag
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: InvalidRequestException: Mutation must have one ColumnOrSuperColumn or one Deletion

2011-03-17 Thread Anurag Gujral
Yes thanks I was able to see that .
Now I am getting the following error

OutboundTcpConnection.java (line 159) attempting to connect to astrix.com
where astrix.com is the machine on which I have installed cassandra

Any suggestions.

Thanks
Anurag

On Thu, Mar 17, 2011 at 9:49 AM, Jonathan Ellis  wrote:

> Internal error means "there is a stacktrace in the server system.log"
> and in this case probably also means "you sent some kind of invalid
> request that our validation didn't catch."
>
> On Thu, Mar 17, 2011 at 11:29 AM, Anurag Gujral 
> wrote:
> > Thanks for the reply. I added
> mutation.__isset.column_or_supercolumn=true;
> >
> > Now I am getting TApplicationException: Internal error processing
> > batch_mutate
> >
> > Any suggestions?
> > Thanks
> > Anurag
> >
> > On Thu, Mar 17, 2011 at 8:13 AM, Anurag Gujral 
> > wrote:
> >>
> >> Hi All,
> >>   I am using function batch_mutate of cassandra 0.7 and I am
> >> getting the error InvalidRequestException: Mutation must have one
> >> ColumnOrSuperColumn or one Deletion. I have my own C++ cassandra client
> >> using thrift 0.0.5 api.
> >>
> >> Any Suggestions.
> >>
> >> Sample Code
> >> map > cfmap;
> >>
> >> vector mutations;
> >>
> >>Column temp_col;
> >> temp_col.name.assign("abcd");
> >> temp_col.value.assign("efgh");
> >> temp_col.timestamp = timestamp;
> >> temp_col.ttl = 0; // TODO: TTL
> >> ColumnOrSuperColumn cosc;
> >> cosc.column = temp_col;
> >> cosc.__isset.column = true; // must set which data type, col or
> >> super
> >>
> >> Mutation mutation ;
> >> mutation.column_or_supercolumn=cosc;
> >> mutations.push_back(mutation);
> >>
> >>cfmap.insert(make_pair(colspace, mutations));
> >> map > > mutationMap;
> >> mutationMap.insert(make_pair("firstRow",cfmap))
> >>
> >> conn->client->batch_mutate(
> >> mutationMap,
> >> (org::apache::cassandra::ConsistencyLevel::type)cons);
> >>
> >>
> >> Thanks
> >> Anurag
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


Re: Upgrade to a different version?

2011-03-17 Thread Paul Pak
I'm at a crossroads right now.  We built an application around .7 and
the features in .7, so going back to .6 wasn't an option for us.  Now,
we are in the middle of setting up dual mysql and cassandra support so
that we can "fallback" to mysql if Cassandra can't handle the workload
properly.  It's a stupid amount of extra work, but I think it's
unavoidable for us given the state of things with .7.  It also gives us
the benefit of seeing the true benefit of Cassandra over mysql in our
particular application and make a decision from there.

Paul

On 3/16/2011 9:03 PM, Joshua Partogi wrote:
> So did you downgraded it back to 0.6.x series?
>



Re: [RELEASE] 0.7.4

2011-03-17 Thread Jonathan Ellis
It is still there, but we took it out of the sample config because
people think it affects normal writes which it does not.

On Thu, Mar 17, 2011 at 11:48 AM, A J  wrote:
> I don't see binary_memtable_throughput_in_mb parameter in
> cassandra.yaml anymore.
> What is it replaced by ?
>
> thanks.
>
> On Tue, Mar 15, 2011 at 11:32 PM, Eric Evans  wrote:
>> On Tue, 2011-03-15 at 22:19 -0500, Eric Evans wrote:
>>> On Tue, 2011-03-15 at 14:26 -0700, Mark wrote:
>>> > Still not seeing 0.7.4 as a download option on the main site?
>>>
>>> Something about the site's pubsub isn't working; I'll contact INFRA.
>>
>> https://issues.apache.org/jira/browse/INFRA-3520
>>
>> --
>> Eric Evans
>> eev...@rackspace.com
>>
>>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Upgrade to a different version?

2011-03-17 Thread Thibaut Britz
If it helps you to sleep better,

we use cassandra  (0.7.2 with the flush fix) in production on > 100 servers.

Thibaut

On Thu, Mar 17, 2011 at 5:58 PM, Paul Pak  wrote:

> I'm at a crossroads right now.  We built an application around .7 and
> the features in .7, so going back to .6 wasn't an option for us.  Now,
> we are in the middle of setting up dual mysql and cassandra support so
> that we can "fallback" to mysql if Cassandra can't handle the workload
> properly.  It's a stupid amount of extra work, but I think it's
> unavoidable for us given the state of things with .7.  It also gives us
> the benefit of seeing the true benefit of Cassandra over mysql in our
> particular application and make a decision from there.
>
> Paul
>
> On 3/16/2011 9:03 PM, Joshua Partogi wrote:
> > So did you downgraded it back to 0.6.x series?
> >
>
>


Re: nodetool repair on cluster

2011-03-17 Thread Huy Le
Thanks Jonathan, Aaron, Daniel!  I have a related question.

I would like to get a copy of data from these 12-server cluster with
manually assigned babanced server tokens, and set it up on a new cluster.  I
would like to minimize the number of the server on the new cluster without
having to  build 12 servers on new cluster, copy snapshots from old cluster
to new cluster, and start de-commission some servers on new cluster to get
it down to the desired number of servers on the new cluster.  I am OK with
missing data since the last repair ran on the old cluster.   Would the
following data copy strategy work?

Create a new cluster with 4 servers and manually assign balanced server
tokens to these four servers; copy most recent snapshot from every 3rd
server from the old cluster (of 12 servers) and put them on the new cluster
in the same order as they were from old cluster.  Run repair on every other
node or all nodes on new cluster.

Thanks!

Huy


On Tue, Mar 15, 2011 at 5:16 PM, aaron morton wrote:

> AFAIK you should run it on every node.
>
> http://wiki.apache.org/cassandra/Operations#Repairing_missing_or_inconsistent_data
>
>
> 
> Aaron
>
> On 16 Mar 2011, at 06:58, Daniel Doubleday wrote:
>
> At least if you are using RackUnawareStrategy
>
> Cheers,
> Daniel
>
> On Mar 15, 2011, at 6:44 PM, Huy Le wrote:
>
> Hi,
>
> We have a cluster with 12 servers and use RF=3.  When running nodetool
> repair, do we have to run it on all nodes on the cluster or can we run on
> every 3rd node?  Thanks!
>
> Huy
>
> --
> Huy Le
> Spring Partners, Inc.
> http://springpadit.com
>
>
>
>


-- 
Huy Le
Spring Partners, Inc.
http://springpadit.com


Re: Upgrade to a different version?

2011-03-17 Thread Paul Pak
On 3/17/2011 1:06 PM, Thibaut Britz wrote:
> If it helps you to sleep better,
>
> we use cassandra  (0.7.2 with the flush fix) in production on > 100
> servers.
>
> Thibaut
>

Thanks Thibaut, believe it or not, it does. :)

Is your use case a typical web app or something like a scientific/data
mining app?  I ask because I'm wondering how you have managed to deal
with the stop-the-world garbage collection issues that seems to hit most
clusters that have significant load and cause application timeouts. 
Have you found that cassandra scales in read/write capacity reasonably
well as you add nodes?

Also, you may also want to backport these fixes at a minimum?

 * reduce memory use during streaming of multiple sstables (CASSANDRA-2301)
 * update memtable_throughput to be a long (CASSANDRA-2158)





hadoop streaming input

2011-03-17 Thread Ethan Rowe
Hello.

What's the current thinking on input support for Hadoop streaming?  It seems
like the relevant Jira issue has been quiet for some time:
https://issues.apache.org/jira/browse/CASSANDRA-1497

Thanks.
- Ethan


Cassandra 0.7.* replication question

2011-03-17 Thread Oleg Tsvinev
I wonder what it the right way to configure replication in Cassandra cluster.

I need to have 3 copies of my data in a cluster consisting of 6 nodes.
3 of these nodes are in one datacenter - let's call it DC1 - and 3 in
another, DC2. There is a significant latency between these datacenters
and originally my application is going to be configured to only talk
to 3 local cassandra nodes, without auto-discovery.

It looks like I can use
org.apache.cassandra.locator.NetworkTopologyStrategy for a
placement_strategy in keyspace definition and provide values for
strategy_options from
$CASSANDRA_HOME/conf/cassandra-topology.properties. It does not seem
to be well-documented and I would like additional examples to
configure this option.

However, org.apache.cassandra.locator.NetworkTopologyStrategy is not
flexible enough because it requires changes in
$CASSANDRA_HOME/conf/cassandra-topology.properties every time the
cluster configuration changes, and that requires changes in keyspace
definition as well. Thus does not seem to scale well.

Is there elaborated examples for
org.apache.cassandra.locator.NetworkTopologyStrategy and keyspace
configuration?
Do I have any other options for my configuration?

Thank you,
  Oleg


Re: Cassandra 0.6.3 ring not balance in terms of data size

2011-03-17 Thread Ali Ahsan

Please can any one give their comment on this

On 03/17/2011 07:02 PM, Ali Ahsan wrote:

Dear Aaron,

We are little confused about OPP token.How to calculate OPP Token? Few 
of our column families have UUID as key and other's  have  integer as key.






Re: hadoop streaming input

2011-03-17 Thread Jeremy Hanna
I started it and added the tentative patch at the end of October.  It needs to 
be rebased with the current 0.7-branch and completed - it's mostly there.  I 
just tried to abstract some things in the process.

I have changed jobs since then and I just haven't had time with the things I've 
been doing here.  If you'd like to take a stab at it, you're welcome to rebase 
and get it finished.

On Mar 17, 2011, at 12:57 PM, Ethan Rowe wrote:

> Hello.
> 
> What's the current thinking on input support for Hadoop streaming?  It seems 
> like the relevant Jira issue has been quiet for some time:
> https://issues.apache.org/jira/browse/CASSANDRA-1497
> 
> Thanks.
> - Ethan



Re: hadoop streaming input

2011-03-17 Thread Ethan Rowe
Thanks, Jeremy.  I looked over the work that was done and it seemed like it
was mostly there, though some comments in the ticket indicated possible
problems.

I may well need to take a crack at this sometime in the next few weeks, but
if somebody beats me to it, I certainly won't complain.

On Thu, Mar 17, 2011 at 2:06 PM, Jeremy Hanna wrote:

> I started it and added the tentative patch at the end of October.  It needs
> to be rebased with the current 0.7-branch and completed - it's mostly there.
>  I just tried to abstract some things in the process.
>
> I have changed jobs since then and I just haven't had time with the things
> I've been doing here.  If you'd like to take a stab at it, you're welcome to
> rebase and get it finished.
>
> On Mar 17, 2011, at 12:57 PM, Ethan Rowe wrote:
>
> > Hello.
> >
> > What's the current thinking on input support for Hadoop streaming?  It
> seems like the relevant Jira issue has been quiet for some time:
> > https://issues.apache.org/jira/browse/CASSANDRA-1497
> >
> > Thanks.
> > - Ethan
>
>


Re: hadoop streaming input

2011-03-17 Thread Jeremy Hanna
Cool - let me know if you have any questions if you do.  I'm @jeromatron in irc 
and on twitter.

On Mar 17, 2011, at 1:10 PM, Ethan Rowe wrote:

> Thanks, Jeremy.  I looked over the work that was done and it seemed like it 
> was mostly there, though some comments in the ticket indicated possible 
> problems.
> 
> I may well need to take a crack at this sometime in the next few weeks, but 
> if somebody beats me to it, I certainly won't complain.
> 
> On Thu, Mar 17, 2011 at 2:06 PM, Jeremy Hanna  
> wrote:
> I started it and added the tentative patch at the end of October.  It needs 
> to be rebased with the current 0.7-branch and completed - it's mostly there.  
> I just tried to abstract some things in the process.
> 
> I have changed jobs since then and I just haven't had time with the things 
> I've been doing here.  If you'd like to take a stab at it, you're welcome to 
> rebase and get it finished.
> 
> On Mar 17, 2011, at 12:57 PM, Ethan Rowe wrote:
> 
> > Hello.
> >
> > What's the current thinking on input support for Hadoop streaming?  It 
> > seems like the relevant Jira issue has been quiet for some time:
> > https://issues.apache.org/jira/browse/CASSANDRA-1497
> >
> > Thanks.
> > - Ethan
> 
> 



Re: Cassandra 0.6.3 ring not balance in terms of data size

2011-03-17 Thread Ching-Cheng Chen
>From OrderPreservingPartition.java

public StringToken getToken(ByteBuffer key)
{
String skey;
try
{
skey = ByteBufferUtil.string(key, Charsets.UTF_8);
}
catch (CharacterCodingException e)
{
throw new RuntimeException("The provided key was not UTF8
encoded.", e);
}
return new StringToken(skey);
}

Regards,

Chen

Senior Developer, EvidentSoftware(Leaders in Monitoring of NoSQL & JAVA )

http://www.evidentsoftware.com



On Thu, Mar 17, 2011 at 2:06 PM, Ali Ahsan wrote:

>  Please can any one give their comment on this
>
> On 03/17/2011 07:02 PM, Ali Ahsan wrote:
>
> Dear Aaron,
>
> We are little confused about OPP token.How to calculate OPP Token? Few of
> our column families have UUID as key and other's  have  integer as key.
>
>
>
>


Does "concurrent_reads" relate to number of drives in RAID0?

2011-03-17 Thread buddhasystem
Hello, in the instructions, I need to link "concurrent_reads" to number of
drives. Is this related to number of physical drives that I have in my
RAID0, or something else?

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-concurrent-reads-relate-to-number-of-drives-in-RAID0-tp6182346p6182346.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Pauses of GC

2011-03-17 Thread Narendra Sharma
Depending on your memtable thresholds the heap may be too small for the
deployment. At the same time I don't see any other log statements around
that long pause that you have shown in the log snippet. It looks little odd
to me. All the ParNew collected almost same amount of heap and did not take
lot of time.

Check if it is due to some JVM bug.
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6477891

-Naren

On Thu, Mar 17, 2011 at 9:47 AM, ruslan usifov wrote:

>
>
> 2011/3/17 Narendra Sharma 
>
>> What heap size are you running with? and Which version of Cassandra?
>>
>> 4G with cassandra 0.7.4
>


Re: Does "concurrent_reads" relate to number of drives in RAID0?

2011-03-17 Thread Stu Hood
The comment in the example config file next to that setting explains it more
fully, but something like 16 * number of drives is a reasonable setting for
readers. Writers should be a multiple of the number of cores.

On Thu, Mar 17, 2011 at 1:09 PM, buddhasystem  wrote:

> Hello, in the instructions, I need to link "concurrent_reads" to number of
> drives. Is this related to number of physical drives that I have in my
> RAID0, or something else?
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-concurrent-reads-relate-to-number-of-drives-in-RAID0-tp6182346p6182346.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>


Re: super_column.name?

2011-03-17 Thread Michael Fortin
Thanks for the response, sorry if my initial question wasn't clear.  

When using thrift, I call
client.get_slice(keyBytes, columnParent, range, level)

i get a list of ColumnOrSuperColumns back.  When I iterate over them and and 
call:
byte[] nameBytes = columnOrSuperColumn.getSuper_column().getName()

I seem to be getting a byte array that contains, not only the name of the super 
column, but all of the child columns as well.  The output that start '? 
get_slice' is the byte array converted to a string.  Shouldn't nameBytes only 
return the name of the superColumn?  In my case 'super-col-0' in byte form?

This is using cassandra 0.7.0 & 0.7.4.

cheers,
M!ke

On Mar 17, 2011, at 9:43 AM, Sylvain Lebresne wrote:

> Are you sure you don't have a problem with handling ByteBuffers ?
> What do you mean by 'deserialized string' ?
> 
> --
> Sylvain
> 
> On Thu, Mar 17, 2011 at 4:20 AM, Michael Fortin  wrote:
>> Hi,
>> 
>> I've been working on a scala based api for cassandra.  I've built it 
>> directly on top of thrift.  I'm having a problem getting a slice of a 
>> superColumn.  When I get a columnOrSuperColumn back, and call 
>> 'cos.super_column.name' and deserialize the bytes I'm not getting the 
>> expected output.
>> 
>> Here's whats in cassandra
>> ---
>> RowKey: key
>> => (super_column=super-col-0,
>> (column=column, value=76616c756530, timestamp=1300330948240)
>> (column=column1, value=76616c756530, timestamp=1300330948244))
>> ….
>> 
>> and this is the deserialized string
>> 
>> ?   get_slicesuper-col-0  column   
>> value0
>> .?æ?column1   value0
>> .?æ?super-col-1  column   value1
>> .?æ?column1   value1
>> .?æ?super-col-2  column   value2
>> .?æ?column1   value2
>> .?æ?super-col-3  column   value3
>> .?æ?column1   value3
>> .?æ?
>> 
>> I would expect
>> super-col-0
>> 
>> Any ideas on what I'm doing wrong?
>> 
>> Thanks,
>> Mike



Re: AW: problems while TimeUUIDType-index-querying with two expressions

2011-03-17 Thread Aaron Morton
Good work.

Aaron

On 17/03/2011, at 4:37 PM, Jonathan Ellis  wrote:

> Thanks for tracking that down, Roland.  I've created
> https://issues.apache.org/jira/browse/CASSANDRA-2347 to fix this.
> 
> On Wed, Mar 16, 2011 at 10:37 AM, Roland Gude  
> wrote:
>> I have applied the suggested changes in my local source tree and did run all
>> my testcases (the supplied ones as well as those with real data).
>> 
>> They do work now.
>> 
>> 
>> 
>> Von: Roland Gude [mailto:roland.g...@yoochoose.com]
>> Gesendet: Mittwoch, 16. März 2011 16:29
>> 
>> An: user@cassandra.apache.org
>> Betreff: AW: AW: problems while TimeUUIDType-index-querying with two
>> expressions
>> 
>> 
>> 
>> With debugging into it i found something that might be the issue (please
>> correct me if I am wrong):
>> 
>> In ColumnFamilyStore.java lines 1597 to 1613 is the code that checks whether
>> some column satisfies an index expression.
>> 
>> In line 1608 it compares the value of the index expression with the value
>> given in the expression.
>> 
>> 
>> 
>> For this comparison it utilizes the comparator of the columnfamily while it
>> should use the comparator of the Column validation class.
>> 
>> 
>> 
>> private static boolean satisfies(ColumnFamily data, IndexClause clause,
>> IndexExpression first)
>> 
>> {
>> 
>> for (IndexExpression expression : clause.expressions)
>> 
>> {
>> 
>> // (we can skip "first" since we already know it's satisfied)
>> 
>> if (expression == first)
>> 
>> continue;
>> 
>> // check column data vs expression
>> 
>> IColumn column = data.getColumn(expression.column_name);
>> 
>> if (column == null)
>> 
>> return false;
>> 
>> int v = data.getComparator().compare(column.value(),
>> expression.value);
>> 
>> if (!satisfies(v, expression.op))
>> 
>> return false;
>> 
>> }
>> 
>> return true;
>> 
>> }
>> 
>> 
>> 
>> 
>> 
>> The line 1608 should be changed from:
>> 
>> int v = data.getComparator().compare(column.value(),
>> expression.value);
>> 
>> 
>> 
>> to
>> 
>> int v = data.metadata().getValueValidator
>> (expression.column_name).compare(column.value(), expression.value);
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> greetings roland
>> 
>> 
>> 
>> 
>> 
>> Von: Roland Gude [mailto:roland.g...@yoochoose.com]
>> Gesendet: Mittwoch, 16. März 2011 14:50
>> An: user@cassandra.apache.org
>> Betreff: AW: AW: problems while TimeUUIDType-index-querying with two
>> expressions
>> 
>> 
>> 
>> Hi Aaron,
>> 
>> 
>> 
>> now I am completely confused.
>> 
>> The code that did not work for days now – like a miracle – works even
>> against the unpatched Cassandra 0.7.3 but the testcase still does not…
>> 
>> There seems to be some randomness in whether it works or not (which is a bad
>> sign I think)… I will debug a little deeper into this and report anything I
>> find.
>> 
>> 
>> 
>> Greetings,
>> 
>> roland
>> 
>> 
>> 
>> Von: aaron morton [mailto:aa...@thelastpickle.com]
>> Gesendet: Mittwoch, 16. März 2011 01:15
>> An: user@cassandra.apache.org
>> Betreff: Re: AW: problems while TimeUUIDType-index-querying with two
>> expressions
>> 
>> 
>> 
>> Have attached a patch
>> to https://issues.apache.org/jira/browse/CASSANDRA-2328
>> 
>> 
>> 
>> Can you give it a try ? You should not get a InvalidRequestException when
>> you send an invalid name or value in the query expression.
>> 
>> 
>> 
>> Aaron
>> 
>> 
>> 
>> On 16 Mar 2011, at 10:30, aaron morton wrote:
>> 
>> 
>> 
>> Will have the Jira I created finished soon, it's a legitimate issue we
>> should be validating the column names and values when a ger_indexed_slice()
>> request is sent. The error in your original email shows that.
>> 
>> 
>> 
>> WRT your code example. You are using the TimeUUID Validator for the column
>> name when creating the index expression, but are using a string serialiser
>> for the value...
>> 
>> IndexedSlicesQuery indexQuery = HFactory
>> .createIndexedSlicesQuery(keyspace,
>>stringSerializer,
>> UUID_SERIALIZER, stringSerializer);
>> indexQuery.addEqualsExpression(MANDATOR_UUID, mandator);
>> 
>> But your schema is saying it is a bytes type...
>> 
>> 
>> 
>> column_metadata=[{column_name: --1000--,
>> validation_class: BytesType, index_name: mandatorIndex, index_type: KEYS},
>> {column_name: 0001--1000--, validation_class:
>> BytesType, index_name: useridIndex, index_type: KEYS}];"On 15 Mar 2011, at
>> 22:41,
>> 
>> 
>> 
>> Once I have the patch can you apply it and run your test again ?
>> 
>> 
>> 
>> You may also want to ask on the Hector list if it automagically check you
>> are using the correct types when creating an IndexedSlicesQuery.
>> 
>> 
>> 
>> Aaron
>> 
>> 
>> 
>> Roland Gude wrote:
>> 
>> 
>> 
>> Forgot to attach the source code… 

Re: Upgrade to a different version?

2011-03-17 Thread Thibaut Britz
As for the version,

we will wait a few more days, and if nothing really bad shows up, move to
0.7.4.


On Thu, Mar 17, 2011 at 10:40 PM, Thibaut Britz <
thibaut.br...@trendiction.com> wrote:

> Hi Paul,
>
> It's more of a scientific mining app. We crawl websites and extract
> information from these websites for our clients. For us, it doesn't really
> matter if one cassandra node replies after 1 second or a few ms, as long as
> the throughput over time stays high. And so far, this seems to be the case.
>
> If you are using hector, be sure to use the latest hector version. There
> were a few bugs related to error handling in earlier versions. (e.g also
> threads hanging forever waiting for an answer). I occasionaly see timeouts,
> but we then just move to another node and retry.
>
> Thibaut
>
>
>
> On Thu, Mar 17, 2011 at 6:53 PM, Paul Pak  wrote:
>
>> On 3/17/2011 1:06 PM, Thibaut Britz wrote:
>> > If it helps you to sleep better,
>> >
>> > we use cassandra  (0.7.2 with the flush fix) in production on > 100
>> > servers.
>> >
>> > Thibaut
>> >
>>
>> Thanks Thibaut, believe it or not, it does. :)
>>
>> Is your use case a typical web app or something like a scientific/data
>> mining app?  I ask because I'm wondering how you have managed to deal
>> with the stop-the-world garbage collection issues that seems to hit most
>> clusters that have significant load and cause application timeouts.
>> Have you found that cassandra scales in read/write capacity reasonably
>> well as you add nodes?
>>
>> Also, you may also want to backport these fixes at a minimum?
>>
>>  * reduce memory use during streaming of multiple sstables
>> (CASSANDRA-2301)
>>  * update memtable_throughput to be a long (CASSANDRA-2158)
>>
>>
>>
>>
>


Re: Upgrade to a different version?

2011-03-17 Thread Thibaut Britz
Hi Paul,

It's more of a scientific mining app. We crawl websites and extract
information from these websites for our clients. For us, it doesn't really
matter if one cassandra node replies after 1 second or a few ms, as long as
the throughput over time stays high. And so far, this seems to be the case.

If you are using hector, be sure to use the latest hector version. There
were a few bugs related to error handling in earlier versions. (e.g also
threads hanging forever waiting for an answer). I occasionaly see timeouts,
but we then just move to another node and retry.

Thibaut


On Thu, Mar 17, 2011 at 6:53 PM, Paul Pak  wrote:

> On 3/17/2011 1:06 PM, Thibaut Britz wrote:
> > If it helps you to sleep better,
> >
> > we use cassandra  (0.7.2 with the flush fix) in production on > 100
> > servers.
> >
> > Thibaut
> >
>
> Thanks Thibaut, believe it or not, it does. :)
>
> Is your use case a typical web app or something like a scientific/data
> mining app?  I ask because I'm wondering how you have managed to deal
> with the stop-the-world garbage collection issues that seems to hit most
> clusters that have significant load and cause application timeouts.
> Have you found that cassandra scales in read/write capacity reasonably
> well as you add nodes?
>
> Also, you may also want to backport these fixes at a minimum?
>
>  * reduce memory use during streaming of multiple sstables (CASSANDRA-2301)
>  * update memtable_throughput to be a long (CASSANDRA-2158)
>
>
>
>


Re: Pauses of GC

2011-03-17 Thread ruslan usifov
At this moments java hungs. Only one thread is work and it run mostly in OS
core, with follow trace:


  [pid  1953]  0.050157 futex(0x7fbe141ea428, FUTEX_WAKE_PRIVATE,
1) = 0 <0.22>
[pid  1953]  0.59 futex(0x7fbc24023794,
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1300202329,
797618000}, ) = -1 ETIMEDOUT (Connection timed out) <0.050093>
[pid  1953]  0.050152 futex(0x7fbe141ea428, FUTEX_WAKE_PRIVATE, 1)
= 0 <0.21>
[pid  1953]  0.67 futex(0x7fbc24023794,
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1300202329,
847838000}, ) = -1 ETIMEDOUT (Connection timed out) <0.050090>
[pid  1953]  0.050150 futex(0x7fbe141ea428, FUTEX_WAKE_PRIVATE, 1)
= 0 <0.22>
[pid  1953]  0.67 futex(0x7fbc24023794,
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1300202329,
898054000}, ) = -1 ETIMEDOUT (Connection timed out) <0.050086>
[pid  1953]  0.050144 futex(0x7fbe141ea428, FUTEX_WAKE_PRIVATE, 1)
= 0 <0.22>
[pid  1953]  0.60 futex(0x7fbc24023794,
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1300202329,
948258000}, ) = -1 ETIMEDOUT (Connection timed out) <0.050085>
[pid  1953]  0.050144 futex(0x7fbe141ea428, FUTEX_WAKE_PRIVATE, 1)
= 0 <0.21>
[pid  1953]  0.67 futex(0x7fbc24023794,
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1300202329,
998469000}, ) = -1 ETIMEDOUT (Connection timed out) <0.050067>
[pid  1953]  0.050127 futex(0x7fbe141ea428, FUTEX_WAKE_PRIVATE, 1)
= 0 <0.21>
[pid  1953]  0.67 futex(0x7fbc24023794,
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1300202330,
48664000}, ) = -1 ETIMEDOUT (Connection timed out) <0.050102>
[pid  1953]  0.050161 futex(0x7fbe141ea428, FUTEX_WAKE_PRIVATE, 1)
= 0 <0.21>
[pid  1953]  0.59 futex(0x7fbc24023794,
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1300202330,
98884000}, ) = -1 ETIMEDOUT (Connection timed out) <0.050102>
[pid  1953]  0.050160 futex(0x7fbe141ea428, FUTEX_WAKE_PRIVATE, 1)
= 0 <0.22>
[pid  1953]  0.67 futex(0x7fbc24023794,
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1300202330,
149111000}, ) = -1 ETIMEDOUT (Connection timed out) <0.050097>
[pid  1953]  0.050157 futex(0x7fbe141ea428, FUTEX_WAKE_PRIVATE, 1)
= 0 <0.22>
[pid  1953]  0.59 futex(0x7fbc24023794,
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1300202330,
199327000}, ) = -1 ETIMEDOUT (Connection timed out) <0.050093>
[pid  1953]  0.050153 futex(0x7fbe141ea428, FUTEX_WAKE_PRIVATE, 1)
= 0 <0.22>
[pid  1953]  0.67 futex(0x7fbc24023794,
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1300202330,
249547000}, ) = -1 ETIMEDOUT (Connection timed out) <0.050095>
[pid  1953]  0.050155 futex(0x7fbe141ea428, FUTEX_WAKE_PRIVATE, 1)
= 0 <0.22>
[pid  1953]  0.59 futex(0x7fbc24023794,
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1300202330,
299761000}, ) = -1 ETIMEDOUT (Connection timed out) <0.050094>
[pid  1953]  0.050154 futex(0x7fbe141ea428, FUTEX_WAKE_PRIVATE, 1)
= 0 <0.21>
[pid  1953]  0.67 futex(0x7fbc24023794,
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1300202330,
349981000}, ) = -1 ETIMEDOUT (Connection timed out) <0.050092>
[pid  1953]  0.050168 futex(0x7fbe141ea428, FUTEX_WAKE_PRIVATE, 1)
= 0 <0.23>
[pid  1953]  0.66 futex(0x7fbc24023794,
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1300202330,
400216000}, ) = -1 ETIMEDOUT (Connection timed out) <0.050090>



And this happens when mmap disck access is on, and in my case when VIRTUAL
space of java process is greate then 16G. It that case all system work
badly, utilities launch very slow (but not any swap activity), when kill
java process all system functionality back.  What is that i don know,
perhaps this is OS depend i use Ubuntu 10.0.4(LTS)

Linux slv007 2.6.32-24-generic #43-Ubuntu SMP Thu Sep 16 14:58:24 UTC 2010
x86_64 GNU/Linux


2011/3/17 Narendra Sharma 

> Depending on your memtable thresholds the heap may be too small for the
> deployment. At the same time I don't see any other log statements around
> that long pause that you have shown in the log snippet. It looks little odd
> to me. All the ParNew collected almost same amount of heap and did not take
> lot of time.
>
> Check if it is due to some JVM bug.
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6477891
>
> -Naren
>
>
> On Thu, Mar 17, 2011 at 9:47 AM, ruslan usifov wrote:
>
>>
>>
>> 2011/3/17 Narendra Sharma 
>>
>>> What heap size are you running with? and Which version of Cassandra?
>>>
>>> 4G with cassandra 0.7.4
>>
>
>


Re: Upgrade to a different version?

2011-03-17 Thread Dan Kuebrich
Do people have success stories with 0.7.4?  It seems like the list only
hears if there's a major problem with a release, which means that if you're
trying to judge the stability of a release you're looking for silence.  But
maybe that means not many people have tried it yet.  Is there a record of
this anywhere?

On Thu, Mar 17, 2011 at 5:41 PM, Thibaut Britz <
thibaut.br...@trendiction.com> wrote:

> As for the version,
>
> we will wait a few more days, and if nothing really bad shows up, move to
> 0.7.4.
>
>
>
> On Thu, Mar 17, 2011 at 10:40 PM, Thibaut Britz <
> thibaut.br...@trendiction.com> wrote:
>
>> Hi Paul,
>>
>> It's more of a scientific mining app. We crawl websites and extract
>> information from these websites for our clients. For us, it doesn't really
>> matter if one cassandra node replies after 1 second or a few ms, as long as
>> the throughput over time stays high. And so far, this seems to be the case.
>>
>> If you are using hector, be sure to use the latest hector version. There
>> were a few bugs related to error handling in earlier versions. (e.g also
>> threads hanging forever waiting for an answer). I occasionaly see timeouts,
>> but we then just move to another node and retry.
>>
>> Thibaut
>>
>>
>>
>> On Thu, Mar 17, 2011 at 6:53 PM, Paul Pak  wrote:
>>
>>> On 3/17/2011 1:06 PM, Thibaut Britz wrote:
>>> > If it helps you to sleep better,
>>> >
>>> > we use cassandra  (0.7.2 with the flush fix) in production on > 100
>>> > servers.
>>> >
>>> > Thibaut
>>> >
>>>
>>> Thanks Thibaut, believe it or not, it does. :)
>>>
>>> Is your use case a typical web app or something like a scientific/data
>>> mining app?  I ask because I'm wondering how you have managed to deal
>>> with the stop-the-world garbage collection issues that seems to hit most
>>> clusters that have significant load and cause application timeouts.
>>> Have you found that cassandra scales in read/write capacity reasonably
>>> well as you add nodes?
>>>
>>> Also, you may also want to backport these fixes at a minimum?
>>>
>>>  * reduce memory use during streaming of multiple sstables
>>> (CASSANDRA-2301)
>>>  * update memtable_throughput to be a long (CASSANDRA-2158)
>>>
>>>
>>>
>>>
>>
>


Re: Replacing a dead seed

2011-03-17 Thread Jonathan Colby
Of course!  why didn't i think of that?  Thanks!!
On Mar 17, 2011, at 3:11 PM, Edward Capriolo wrote:

> On Thu, Mar 17, 2011 at 9:09 AM, Jonathan Colby
>  wrote:
>> Hi -
>> 
>> If a seed crashes (i.e., suddenly unavailable due to HW problem),   what is 
>> the best way to replace the seed in the cluster?
>> 
>> I've read that you should not bootstrap a seed.  Therefore I came up with 
>> this procedure, but it seems pretty complicated.  any better ideas?
>> 
>> 1. update the seed list on all nodes, taking out the dead node  and restart 
>> the nodes in the  cluster so the new seed list is updated
>> 2. then bootstrap the new (replacement ) node as a normal node  (not yet as 
>> a seed)
>> 3. when bootstrapping is done, make the new node a seed.
>> 4. update the seed list again adding back the replacement seed (and rolling 
>> restart the cluster as in step 1)
>> 
>> 
>> That seems to me like a whole lot of work.  Surely there is a better way?
>> 
>> Jon
> 
> It is true that Seeds do not auto bootstrap. But in this case it does
> not matter if the other nodes believe this node is a seed. It only
> matters what the joining node is configured to believe.
> 
> On the joining node do not include it's hostname/IP in the seed list
> and it should auto-bootstrap normally.



Re: Does "concurrent_reads" relate to number of drives in RAID0?

2011-03-17 Thread Peter Schuller
> The comment in the example config file next to that setting explains it more
> fully, but something like 16 * number of drives is a reasonable setting for
> readers. Writers should be a multiple of the number of cores.

In addition, if you're running on Linux in a situation where you're
trying to saturate I/O capacity of an underlying device that is an SSD
or a multi-device RAID, I *strongly* suggest switching Linux to the
deadline or noop scheduler. The CFQ scheduler is very poor
out-of-the-box for saturating your I/O subsystem with random reads,
when using SSD:s or RAID controllers with multiple disks.

-- 
/ Peter Schuller


Re: Does "concurrent_reads" relate to number of drives in RAID0?

2011-03-17 Thread buddhasystem
Thanks to all for replying, but frankly I didn't get the answer I wanted.
Does the "number of disks" apply to number of spindles in RAID0? Or
something else like a separate disk for commitlog and for data?


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-concurrent-reads-relate-to-number-of-drives-in-RAID0-tp6182346p6183033.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Does "concurrent_reads" relate to number of drives in RAID0?

2011-03-17 Thread Peter Schuller
> Thanks to all for replying, but frankly I didn't get the answer I wanted.
> Does the "number of disks" apply to number of spindles in RAID0? Or
> something else like a separate disk for commitlog and for data?

The number of actual disks (spindles) in the device on which your
sstables are on (not the commit log).

The reason for this is that you want to be able to saturate your
storage subsystem, and that means keeping all spindles working at all
times and efficiently. This is accomplished by ensuring you are able
to sustain a sufficient queue depth (number of outstanding commands)
on each device. This in turn, in the case of a RAID0, means
multiplying the target maximum queue depth with the number of drives.

-- 
/ Peter Schuller


Re: Does "concurrent_reads" relate to number of drives in RAID0?

2011-03-17 Thread Peter Schuller
> The reason for this is that you want to be able to saturate your
> storage subsystem, and that means keeping all spindles working at all
> times and efficiently. This is accomplished by ensuring you are able
> to sustain a sufficient queue depth (number of outstanding commands)
> on each device. This in turn, in the case of a RAID0, means
> multiplying the target maximum queue depth with the number of drives.

(But this is all predicated on the operating system actually letting
the I/O requests pass through to the device, which is why I replied
about choosing the deadline or noop scheduler instead of cfq.)

-- 
/ Peter Schuller


Re: Does "concurrent_reads" relate to number of drives in RAID0?

2011-03-17 Thread buddhasystem
Thanks Peter, I can see it better now.


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-concurrent-reads-relate-to-number-of-drives-in-RAID0-tp6182346p6183051.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Does "concurrent_reads" relate to number of drives in RAID0?

2011-03-17 Thread buddhasystem
Where and how do I choose it?


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-concurrent-reads-relate-to-number-of-drives-in-RAID0-tp6182346p6183069.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Does "concurrent_reads" relate to number of drives in RAID0?

2011-03-17 Thread mcasandra
Also when it comes to RAID controller there are other options like write
policy, read policy, cache io/direct io. Is there any preference on which
policies should be chosen?

In our case:

http://support.dell.com/support/edocs/software/svradmin/1.9/en/stormgmt/cntrls.html

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Does-concurrent-reads-relate-to-number-of-drives-in-RAID0-tp6182346p6183075.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: moving data from single node cassandra

2011-03-17 Thread Maki Watanabe
Refer to:
http://wiki.apache.org/cassandra/StorageConfiguration

You can specify the data directories with following parameter in
storage-config.xml (or cassandra.yaml in 0.7+).

commit_log_directory : where commitlog will be written
data_file_directories : data files
saved_cache_directory : saved row cache

maki


2011/3/17 Komal Goyal :
> Hi,
> I am having single node cassandra setup on a windows machine.
> Very soon I have ran out of space on this machine so have increased the
> hardisk capacity of the machine.
> Now I want to know how I configure cassandra to start storing data in these
> high space partitions?
> Also how the existing data store in this single node cassandra can be moved
> from C drive to the other drives?
> Is there any documentation as to how these configurations can be done?
> some supporting links will be very helpful..
>
>
> Thanks,
>
> Komal Goyal
>


Re: moving data from single node cassandra

2011-03-17 Thread John Lewis
| data_file_directories makes it seem as though cassandra can use more than one 
location for sstable storage. Does anyone know how it splits up the data 
between partitions? I am trying to plan for just about every worst case 
scenario I can right now, and I want to know if I can change the config to open 
up some secondary storage for a compaction if needed.

Lewis

On Mar 17, 2011, at 6:03 PM, Maki Watanabe wrote:

> Refer to:
> http://wiki.apache.org/cassandra/StorageConfiguration
> 
> You can specify the data directories with following parameter in
> storage-config.xml (or cassandra.yaml in 0.7+).
> 
> commit_log_directory : where commitlog will be written
> data_file_directories : data files
> saved_cache_directory : saved row cache
> 
> maki
> 
> 
> 2011/3/17 Komal Goyal :
>> Hi,
>> I am having single node cassandra setup on a windows machine.
>> Very soon I have ran out of space on this machine so have increased the
>> hardisk capacity of the machine.
>> Now I want to know how I configure cassandra to start storing data in these
>> high space partitions?
>> Also how the existing data store in this single node cassandra can be moved
>> from C drive to the other drives?
>> Is there any documentation as to how these configurations can be done?
>> some supporting links will be very helpful..
>> 
>> 
>> Thanks,
>> 
>> Komal Goyal
>> 



Re: moving data from single node cassandra

2011-03-17 Thread Komal Goyal
Thanks Maki :)

I copied the existing var folder to the new hardisk

and changes the path to the data directories  in the storage-config.xml

I was successfully able to connect with cassandra and read the data that was
shifted to the new location.




On Fri, Mar 18, 2011 at 6:33 AM, Maki Watanabe wrote:

> Refer to:
> http://wiki.apache.org/cassandra/StorageConfiguration
>
> You can specify the data directories with following parameter in
> storage-config.xml (or cassandra.yaml in 0.7+).
>
> commit_log_directory : where commitlog will be written
> data_file_directories : data files
> saved_cache_directory : saved row cache
>
> maki
>
>
> 2011/3/17 Komal Goyal :
> > Hi,
> > I am having single node cassandra setup on a windows machine.
> > Very soon I have ran out of space on this machine so have increased the
> > hardisk capacity of the machine.
> > Now I want to know how I configure cassandra to start storing data in
> these
> > high space partitions?
> > Also how the existing data store in this single node cassandra can be
> moved
> > from C drive to the other drives?
> > Is there any documentation as to how these configurations can be done?
> > some supporting links will be very helpful..
> >
> >
> > Thanks,
> >
> > Komal Goyal
> >
>



-- 

*Komal Goyal*


ensarm Solutions | www.ensarm.com
2nd floor, Liberty 1, North main road, Koregaon park, Pune, India 1
(O) +91 20 40024476

*Your Enterprise, Our Passion*