Secondary indices: Why low cardinality?

2011-03-10 Thread Kevin
There's pretty limited information on Cassandra's built-in secondary index
facility as is, but trying to find out why the secondary index has to have
low cardinality has been like finding a needle in a haystack..that is
floating somewhere in the Atlantic.

 

Can someone explain why low cardinality is advised for the secondary index?
Has this been confirmed by anyone else besides DataStax?

 



Re: Pig output to Cassandra

2011-03-10 Thread Mark
Sweet! This is exactly what I was looking for and it looks like it was 
just resolved.


Are there any working examples or documentation on this feature?

Thanks

On 3/10/11 8:57 PM, Matt Kennedy wrote:

On its way... https://issues.apache.org/jira/browse/CASSANDRA-1828

On Mar 10, 2011, at 11:17 PM, Mark wrote:


I thought I read somewhere that Pig has an output format that can write to 
Cassandra but I am unable to find any documentation on this. Is this possible 
and if so can someone please point me in the right direction. Thanks


Re: Fatal configuration error, so how to change listen_address:storage_port in cassandra.yaml ?

2011-03-10 Thread Aaron Morton
Something else is using the port, perhaps an existing Cassandra process?

Use "lsof -i | grep 7000" to see what is.

If you need to change it, you are looking for storage_port in the config.

Aaron

On 11/03/2011, at 3:43 PM, Bob Futrelle  wrote:

> Now that I've made the JMX_PORT change cassandra will attempt to run.
> (Dumb me, I didn't need to ask - the answer about changing JMX_PORT was 
> already in the archives.  I'm getting with it now, so I know to look there 
> first.  Just finding my way around cassandra)
> 
> Made the change:
> 
> JMX_PORT="8080
> to
> JMX_PORT="9980"
> in cassandra-env.sh
> 
> Then ran
> sudo ./bin/cassandra -f -p pidfile
> threw a Fatal configuration error:
> 
> org.apache.cassandra.config.ConfigurationException: localhost/10.0.1.3:7000 
> is in use by another process.  Change listen_address:storage_port in 
> cassandra.yaml to values that do not conflict with other services
> 
> In cassandra.yaml the line causing the problem is
> 
> listen_address: localhost
> 
> Problem is, I have no idea what to change it to ???
> Maybe localhost:8000?
> And in addition, do I then need to make other changes?
> 
>   - Bob Futrelle
> Northeastern University
> CCIS
> 
>


Re: Pig output to Cassandra

2011-03-10 Thread Matt Kennedy
On its way... https://issues.apache.org/jira/browse/CASSANDRA-1828

On Mar 10, 2011, at 11:17 PM, Mark wrote:

> I thought I read somewhere that Pig has an output format that can write to 
> Cassandra but I am unable to find any documentation on this. Is this possible 
> and if so can someone please point me in the right direction. Thanks



Pig output to Cassandra

2011-03-10 Thread Mark
I thought I read somewhere that Pig has an output format that can write 
to Cassandra but I am unable to find any documentation on this. Is this 
possible and if so can someone please point me in the right direction. 
Thanks


How long will all nodes data sync.

2011-03-10 Thread Vincent Lu (ECL)
Hi all,

 

I have a question about eventually consistency.

If there are 3 nodes and RF=3, Write-C=Quorum.

How long will all 3 nodes data sync?

Does any configuration can change that?

Thanks in advance.

 

Vincent

This correspondence is from Cyberlink Corp. and is intended only for use by the 
recipient named herein, and may contain privileged, proprietary and/or 
confidential information, and is intended only to be seen and used by named 
addressee(s). You are notified that any discussion, dissemination, distribution 
or copying of this correspondence and any attachments, is strictly prohibited, 
unless otherwise authorized or consented to in writing by the sender. If you 
have received this correspondence in error, please notify the sender 
immediately, and please permanently delete the original and any copies of it 
and any attachment and destroy any related printouts without reading or copying 
them.


Re: Secondary Index not working?

2011-03-10 Thread Jonathan Ellis
https://issues.apache.org/jira/browse/CASSANDRA-2244

On Thu, Mar 10, 2011 at 9:28 PM, Rommel Garcia  wrote:
> I tried the tutorial on this site
> - http://www.datastax.com/docs/0.7/data_model/secondary_indexes and worked
> on creating an index on a new column. That went good. But when I indexed an
> existing column, my query below returns 0 row where in fact it should return
> 1.
> Query:
> get users where state = 'GA' and birth_date > 1970;
> Here's my Column Family Metadata:
> Column Families:
>     ColumnFamily: users
>       Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>       Row cache size / save period: 0.0/0
>       Key cache size / save period: 20.0/14400
>       Memtable thresholds: 0.290624997/62/60
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Built indexes: [users.62697274685f64617465, users.7374617465]
>       Column Metadata:
>         Column Name: full_name (full_name)
>           Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>         Column Name: state (state)
>           Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>           Index Type: KEYS
>         Column Name: birth_date (birth_date)
>           Validation Class: org.apache.cassandra.db.marshal.LongType
>           Index Type: KEYS
> Here's the data set:
> RowKey: birt_date
> => (column=birth_date, value=1973, timestamp=1299200995417000)
> => (column=full_name, value=Patrick Rothfuss, timestamp=1299200746636000)
> => (column=state, value=GA, timestamp=1299200968945000)
> I'm not sure what I'm missing. Appreciate any help!
> Rom
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Secondary Index not working?

2011-03-10 Thread Rommel Garcia
I tried the tutorial on this site - 
http://www.datastax.com/docs/0.7/data_model/secondary_indexes and worked on 
creating an index on a new column. That went good. But when I indexed an 
existing column, my query below returns 0 row where in fact it should return 1.

Query:

get users where state = 'GA' and birth_date > 1970;

Here's my Column Family Metadata:
Column Families:
ColumnFamily: users
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period: 0.0/0
  Key cache size / save period: 20.0/14400
  Memtable thresholds: 0.290624997/62/60
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Built indexes: [users.62697274685f64617465, users.7374617465]
  Column Metadata:
Column Name: full_name (full_name)
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Column Name: state (state)
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
  Index Type: KEYS
Column Name: birth_date (birth_date)
  Validation Class: org.apache.cassandra.db.marshal.LongType
  Index Type: KEYS

Here's the data set:

RowKey: birt_date
=> (column=birth_date, value=1973, timestamp=1299200995417000)
=> (column=full_name, value=Patrick Rothfuss, timestamp=1299200746636000)
=> (column=state, value=GA, timestamp=1299200968945000)

I'm not sure what I'm missing. Appreciate any help!

Rom





Fatal configuration error, so how to change listen_address:storage_port in cassandra.yaml ?

2011-03-10 Thread Bob Futrelle
Now that I've made the JMX_PORT change cassandra will attempt to run.
(Dumb me, I didn't need to ask - the answer about changing JMX_PORT was
already in the archives.  I'm getting with it now, so I know to look there
first.  Just finding my way around cassandra)

Made the change:

JMX_PORT="8080
to
JMX_PORT="9980"
in cassandra-env.sh

Then ran
sudo ./bin/cassandra -f -p pidfile
threw a Fatal configuration error:

org.apache.cassandra.config.ConfigurationException:
localhost/10.0.1.3:7000is in use by another process.  Change
listen_address:storage_port in
cassandra.yaml to values that do not conflict with other services

In cassandra.yaml the line causing the problem is

listen_address: localhost

Problem is, I have no idea what to change it to ???
Maybe localhost:8000?
And in addition, do I then need to make other changes?

  - Bob Futrelle
Northeastern University
CCIS


Re: FW: Very slow batch insert using version 0.7.2

2011-03-10 Thread Erik Forkalsrud


I see the same behavior with smaller batch sizes.  It appears to happen 
when starting Cassandra with the defaults on relatively large systems.  
Attached is a script I created to reproduce the problem. (usage: 
mutate.sh  /path/to/apache-cassandra-0.7.3-bin.tar.gz)   It extracts a 
stock cassandra distribution to a temp dir and starts it up, (single 
node) then creates a keyspace with a column family and does a batch 
mutate to insert 5000 columns.


When I run it on my laptop (Fedora 14, 64-bit, 4 cores, 8GB RAM)  it 
flushes one Memtable with 5000 operations
When I run it on a server  (RHEL5, 64-bit, 16 cores, 96GB RAM) it 
flushes 100 Memtables with anywhere between 1 operation and 359 
operations (35 bytes and 12499 bytes)


I'm guessing I can override the JVM memory parameters to avoid the 
frequent flushing on the server, but I haven't experimented with that 
yet.  The only difference in the effective command line between the 
laptop and server is "-Xms3932M -Xmx3932M -Xmn400M" on the laptop and 
"-Xms48334M -Xmx48334M -Xmn1600M" on the server.




--
Erik Forkalsrud
Commission Junction


On 03/10/2011 09:18 AM, Ryan King wrote:

Why use such a large batch size?

-ryan

On Thu, Mar 10, 2011 at 6:31 AM, Desimpel, Ignace
  wrote:


Hello,

I had a demo application with embedded cassandra version 0.6.x, inserting
about 120 K  row mutations in one call.

In version 0.6.x that usually took about 5 seconds, and I could repeat this
step adding each time the same amount of data.

Running on a single CPU computer, single hard disk, XP 32 bit OS, 1G memory

I tested this again on CentOS 64 bit OS, 6G memory, different settings of
memtable_throughput_in_mb and memtable_operations_in_millions.

Also tried version 0.7.3. Also the same behavior.



Now with version 0.7.2 the call returns with a timeout exception even using
a timeout of 12 (2 minutes). I see the CPU time going to 100%, a lot of
disk writing ( giga bytes), a lot of log messages  about compacting,
flushing, commitlog, …



Below you can find some information using the nodetool at start of the batch
mutation and also after 14 minutes. The MutationStage is clearly showing how
slow the system handles the row mutations.



Attached : Cassandra.yaml with at end the description of my database
structure using yaml

Attached : log file with cassandra output.



Any idea what I could be doing wrong?



Regards,



Ignace Desimpel



ignace.desim...@nuance.com



At start of the insert (after inserting 124360 row mutations) I get the
following info from the nodetool :



C:\apache-cassandra-07.2\bin>nodetool --host ads.nuance.com info

Starting NodeTool

34035877798200531112672274220979640561

Gossip active: true

Load : 5.49 MB

Generation No: 1299502115

Uptime (seconds) : 1152

Heap Memory (MB) : 179,84 / 1196,81



C:\apache-cassandra-07.2\bin>nodetool --host ads.nuance.com tpstats

Starting NodeTool

Pool NameActive   Pending  Completed

ReadStage 0 0  40637

RequestResponseStage  0 0 30

MutationStage32121679  72149

GossipStage   0 0  0

AntiEntropyStage  0 0  0

MigrationStage0 0  1

MemtablePostFlusher   0 0  6

StreamStage   0 0  0

FlushWriter   0 0  5

MiscStage 0 0  0

FlushSorter   0 0  0

InternalResponseStage 0 0  0

HintedHandoff 0 0  0



After 14 minutes (timeout exception after 2 minutes : see log file) I get :



C:\apache-cassandra-07.2\bin>nodetool --host ads.nuance.com info

Starting NodeTool

34035877798200531112672274220979640561

Gossip active: true

Load : 10.31 MB

Generation No: 1299502115

Uptime (seconds) : 2172

Heap Memory (MB) : 733,82 / 1196,81



C:\apache-cassandra-07.2\bin>nodetool --host ads.nuance.com tpstats

Starting NodeTool

Pool NameActive   Pending  Completed

ReadStage 0 0  40646

RequestResponseStage  0 0 30

MutationStage32103310  90526

GossipStage   0 0  0

AntiEntropyStage  0 0  0

MigrationStage0 0  1

MemtablePostFlusher   0 0 69

StreamStage   0 0  0

FlushWriter   0 0 68

FILEUTILS-DELETE-POOL 0 0 42

MiscStage 0 0  0

FlushSorter  

Re: memory utilization

2011-03-10 Thread Jonathan Ellis
http://wiki.apache.org/cassandra/FAQ#mmap

On Thu, Mar 10, 2011 at 8:26 PM, Bill Hastings  wrote:
> Hi All
>
> Memory utilization reported by JCOnsole for Cassandra seems to be much
> lesser than that reported by top ("RES" memory). Can someone explain this?
> Maybe off topic but would appreciate a response.
>
> --
> Cheers
> Bill
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Exception when running a clean up

2011-03-10 Thread Jonathan Ellis
Unrelated to either upgrade or scrub.  That just means you need to
install JNA to get native linking instead of having to fork to run ln.

On Thu, Mar 10, 2011 at 5:54 PM, Stu King  wrote:
> I have upgraded from 0.7.0 to 0.7.3. I then run nodetool scrub on my
> keyspace and now see this exception:
> Exception in thread "main" java.io.IOError: java.io.IOException: Cannot run
> program "ln": java.io.IOException: error=24, Too many open files
> at
> org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1658)
> at
> org.apache.cassandra.db.ColumnFamilyStore.scrub(ColumnFamilyStore.java:962)
> at
> org.apache.cassandra.service.StorageService.scrub(StorageService.java:1256)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:616)
> at
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
> at
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)
> at
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:226)
> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
> at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:251)
> at
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:857)
> at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:795)
> at
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1450)
> at
> javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:90)
> at
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1285)
> at
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1383)
> at
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:807)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:616)
> at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
> at sun.rmi.transport.Transport$1.run(Transport.java:177)
> at java.security.AccessController.doPrivileged(Native Method)
> at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
> at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553)
> at
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808)
> at
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:636)
> Caused by: java.io.IOException: Cannot run program "ln":
> java.io.IOException: error=24, Too many open files
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:475)
> at
> org.apache.cassandra.utils.CLibrary.createHardLinkWithExec(CLibrary.java:181)
> at org.apache.cassandra.utils.CLibrary.createHardLink(CLibrary.java:147)
> at
> org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:617)
> at
> org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1651)
> ... 32 more
> Caused by: java.io.IOException: java.io.IOException: error=24, Too many open
> files
> at java.lang.UNIXProcess.(UNIXProcess.java:164)
> at java.lang.ProcessImpl.start(ProcessImpl.java:81)
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:468)
> ... 36 more
>
> On Thu, Mar 10, 2011 at 5:49 PM, aaron morton 
> wrote:
>>
>> What version of cassandra are you using and what is the upgrade history
>> for the cluster?
>> Aaron
>>
>> On 10/03/2011, at 8:24 PM, Stu King wrote:
>>
>> > I am seeing this exception when I am trying to run a cleanup. I want to
>> > decommission the node after the cleanup.
>> >
>> > java.util.concurrent.ExecutionException: java.io.IOError:
>> > java.io.EOFException
>> >       at
>> > java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
>> >       at java.util.concurrent.FutureTask.get(FutureTask.java:111)
>> >       at
>> > org.apache.cassandra.db.CompactionManager.performCleanup(CompactionManager.java:180)
>> >       at
>> > org.apache.cassandra.db.ColumnFamilyStore.forceCleanup(ColumnFamilyStore.java:909)
>> >       at
>> > org.apache.cassandra.service.StorageService.forceTableCleanup(StorageService.java:1127)
>> >       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >       at
>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorIm

memory utilization

2011-03-10 Thread Bill Hastings
Hi All

Memory utilization reported by JCOnsole for Cassandra seems to be much
lesser than that reported by top ("RES" memory). Can someone explain this?
Maybe off topic but would appreciate a response.

-- 
Cheers
Bill


Re: Cassandra startup port problem, apache-cassandra-0.7.3 on Snow Leopard.

2011-03-10 Thread Jeremy Hanna
Comments in-line.

On Mar 10, 2011, at 8:10 PM, Bob Futrelle wrote:

> After a reboot, cassandra spits out many lines on startup but then appears to 
> stall. 
> 
> Worse, trying to run cassandra a second time stops immediately because of a 
> port problem:
> 
> apache-cassandra-0.7.3: sudo ./bin/cassandra -f -p pidfile
> Password:
> Error: Exception thrown by the agent : java.rmi.server.ExportException: Port 
> already in use: 8080; nested exception is: 
>   java.net.BindException: Address already in use

Do you have tomcat or something installed that is starting on port 8080?  
Cassandra uses it as the default JMX port for monitoring.

> 
> I have seen some discussions of the problem but have not found a solution.
> 
> 
> ADDITIONAL QUESTIONS:
> 
> I neve explicitly installed Thrift. Should I?

No - you shouldn't need this.

> 
> How do I know that cassandra.in.sh is being used?
> Assuming it is, can I change the port in that file? 
> If so, how?
> 
> Do any of these variables need to be set?
> 
> CASSANDRA_HOME
> CASSANDRA_CONF
> 
> and is this necessary?
> 
> for jar in $CASSANDRA_HOME/lib/*.jar; do
> CLASSPATH=$CLASSPATH:$jar
> done
> 

For normal use you don't need to change any of those settings.

> 
> Java:
> 
> I noticed that  
> $JAVA_HOME = /System/Library/Frameworks/JavaVM.framework/Versions/1.5.0/Home

In my .profile on my mac, I just do

export JAVA_HOME=$(/usr/libexec/java_home)

that sets it to the current JVM, which should be 1.6.0_24.

> 
> but on the command line I get:
> 
> apache-cassandra-0.7.3: java -version
> java version "1.6.0_24"
> Java(TM) SE Runtime Environment (build 1.6.0_24-b07-334-10M3326)
> Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02-334, mixed mode)
> 
> Should probably change $JAVA_HOME to point to 1.6
> 
> I get the digest.  Feel free to cc me at bob.futre...@gmail.com
> 
>  - Bob
>  
>Northeastern U.
>College of Computer and Information Science
> 



Cassandra startup port problem, apache-cassandra-0.7.3 on Snow Leopard.

2011-03-10 Thread Bob Futrelle
After a reboot, cassandra spits out many lines on startup but then appears
to stall.

Worse, trying to run cassandra a second time stops immediately because of a
port problem:

apache-cassandra-0.7.3: sudo ./bin/cassandra -f -p pidfile
Password:
Error: Exception thrown by the agent : java.rmi.server.ExportException: Port
already in use: 8080; nested exception is:
java.net.BindException: Address already in use

I have seen some discussions of the problem but have not found a solution.


ADDITIONAL QUESTIONS:

I neve explicitly installed Thrift. Should I?

How do I know that cassandra.in.sh is being used?
Assuming it is, can I change the port in that file?
If so, how?

Do any of these variables need to be set?

CASSANDRA_HOME
CASSANDRA_CONF

and is this necessary?

for jar in $CASSANDRA_HOME/lib/*.jar; do
CLASSPATH=$CLASSPATH:$jar
done


Java:

I noticed that
$JAVA_HOME = /System/Library/Frameworks/JavaVM.framework/Versions/1.5.0/Home

but on the command line I get:

apache-cassandra-0.7.3: java -version
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07-334-10M3326)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02-334, mixed mode)

Should probably change $JAVA_HOME to point to 1.6

I get the digest.  Feel free to cc me at bob.futre...@gmail.com

 - Bob

   Northeastern U.
   College of Computer and Information Science


Re: mutator.execute() timings - big variance noted - pointers needed on understanding/improving it

2011-03-10 Thread Roshan Dawrani
Hi All,

Thanks for the inputs. I will start investigating this morning with the help
of these.

Regards,
Roshan

On Fri, Mar 11, 2011 at 2:49 AM, aaron morton wrote:

> http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts
>
> Aaron
>
> On 11 Mar 2011, at 05:08, sridhar basam wrote:
>
>
> Sounds like GC from your description of fast->slow->fast. Collect GC times
> from both the client and server side and plot against your application
> timing.
>
>  If you uncomment the verbose GC entries in the cassandra-env.sh file you
> should get timing for the server side, pass in the same arguments for your
> client. Align time across the 3 files and plot to see if GC is the cause.
>
>  Sridhar
>
>
>
> On Thu, Mar 10, 2011 at 9:30 AM, Roshan Dawrani 
> wrote:
>
>> Hi,
>>
>> I am in the middle of some load testing on a 1-node Cassandra setup. We
>> are not on very high loads yet. We have recorded the timings taken up by
>> mutator.execute() calls and we see this kind of variation during the test
>> run:
>>
>> So, 25% of the times, execute() calls come back in 25 milli-seconds, but
>> the longer calls go upto 4 seconds.
>>
>> Can someone please provide some pointers on what and where to focus on in
>> my Hector / Cassandra setup? We are mostly on the default Cassandra
>> configuration at this time - only change is the max connection pool size
>> (CassandraHostConfigurator.maxActive) is changed to 300 from a default of
>> 50.
>>
>> I would also like to add that the time increase is not linear - it starts
>> fast, goes, slow, very slow, and becomes faster again.
>>
>> 
>>   25% 29
>>   50%105
>>   66%185
>>   70%208
>>   75%240
>>   80%297
>>   90%510
>>   95%854
>>   98%   1075
>>   99%   1215
>>  100%   4442
>> 
>>
>> --
>> Roshan
>> Blog: http://roshandawrani.wordpress.com/
>> Twitter: @roshandawrani 
>> Skype: roshandawrani
>>
>>
>
>


Re: Is secondary index consistent with its base table?

2011-03-10 Thread Alvin UW
Thanks.

Why secondary indexes are recommended for only attributes with low
cardinality and they are not very useful for high cardinality values?

2011/3/7 Jonathan Ellis 

> It does, but this is an implementation detail subject to change (e.g.,
> the bitmap indexes being added do not).
>
> On Mon, Mar 7, 2011 at 10:55 AM, Alvin UW  wrote:
> > Thanks.
> >
> > Does Cassandra store secondary index with an extra CF?
> >
> > 2011/3/7 Jonathan Ellis 
> >>
> >> Yes, this is guaranteed the same way single-row updates are guaranteed
> >> to be atomic (the commitlog).
> >>
> >> On Mon, Mar 7, 2011 at 10:13 AM, Alvin UW  wrote:
> >> > Hello,
> >> >
> >> > I was wondering whether Secondary Index is consistent with its base
> >> > table?
> >> > How did you guarantee the consistency?
> >> >
> >> > Thanks
> >> >
> >> > Alvin
> >> >
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder of DataStax, the source for professional Cassandra support
> >> http://www.datastax.com
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


Re: Cassandra LongType data insertion problem for secondary index usage

2011-03-10 Thread buddhasystem
Tyler, as a collateral issue - I've been wondering for a while what advantage
if any it buys me, if I declare a value 'long' (which it roughly is) as
opposed to passing around strings. String is flattened onto a replica of
itself, I assume? No conversion? Maybe it even means better speed.

Thanks,
Maxim

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-LongType-data-insertion-problem-for-secondary-index-usage-tp6158486p6159840.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Exception when running a clean up

2011-03-10 Thread Stu King
I have upgraded from 0.7.0 to 0.7.3. I then run nodetool scrub on my
keyspace and now see this exception:

Exception in thread "main" java.io.IOError: java.io.IOException: Cannot run
program "ln": java.io.IOException: error=24, Too many open files
at
org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1658)
at
org.apache.cassandra.db.ColumnFamilyStore.scrub(ColumnFamilyStore.java:962)
at
org.apache.cassandra.service.StorageService.scrub(StorageService.java:1256)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)
at
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:226)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:251)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:857)
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:795)
at
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1450)
at
javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:90)
at
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1285)
at
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1383)
at
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:807)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
at sun.rmi.transport.Transport$1.run(Transport.java:177)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:553)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:808)
at
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:667)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Caused by: java.io.IOException: Cannot run program "ln":
java.io.IOException: error=24, Too many open files
at java.lang.ProcessBuilder.start(ProcessBuilder.java:475)
at
org.apache.cassandra.utils.CLibrary.createHardLinkWithExec(CLibrary.java:181)
at org.apache.cassandra.utils.CLibrary.createHardLink(CLibrary.java:147)
at
org.apache.cassandra.io.sstable.SSTableReader.createLinks(SSTableReader.java:617)
at
org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1651)
... 32 more
Caused by: java.io.IOException: java.io.IOException: error=24, Too many open
files
at java.lang.UNIXProcess.(UNIXProcess.java:164)
at java.lang.ProcessImpl.start(ProcessImpl.java:81)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:468)
... 36 more


On Thu, Mar 10, 2011 at 5:49 PM, aaron morton wrote:

> What version of cassandra are you using and what is the upgrade history for
> the cluster?
> Aaron
>
> On 10/03/2011, at 8:24 PM, Stu King wrote:
>
> > I am seeing this exception when I am trying to run a cleanup. I want to
> decommission the node after the cleanup.
> >
> > java.util.concurrent.ExecutionException: java.io.IOError:
> java.io.EOFException
> >   at
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
> >   at java.util.concurrent.FutureTask.get(FutureTask.java:111)
> >   at
> org.apache.cassandra.db.CompactionManager.performCleanup(CompactionManager.java:180)
> >   at
> org.apache.cassandra.db.ColumnFamilyStore.forceCleanup(ColumnFamilyStore.java:909)
> >   at
> org.apache.cassandra.service.StorageService.forceTableCleanup(StorageService.java:1127)
> >   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >   at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >   at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >   at java.lang.reflect.Method.invoke(Method.java:616)
> >   at
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
> >   at
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospe

Re: Understanding index builds (updated: crashed cluster)

2011-03-10 Thread Matt Kennedy
Great, that worked, thanks for your time.

On Thu, Mar 10, 2011 at 4:57 PM, Jonathan Ellis  wrote:

> Drop the index, then restart once more.  It shouldn't try to rebuild
> the index after that.
>
> On Thu, Mar 10, 2011 at 3:36 PM, Matt Kennedy 
> wrote:
> > Sorry, I wasn't clear on the timeline of events.  I started the index
> build
> > and then posted this message to the list. Once I read the links you
> posted,
> > I did expect the cluster to crash, but I let it run until it blew up
> anyway,
> > since I didn't really know how to stop the index build.
> >
> > Which is sort of where I'm still stuck, I don't want to corrupt that
> column
> > family by issuing an "update column family" that has a smaller set of
> > indexes while the index build is going on without some encouragement from
> > the list that doing that won't wreck the column family. Is there a safe
> way
> > to tell an index build to stop after the cluster starts up from a crash
> due
> > to the index build?
> >
> > Thanks,
> > Matt
> >
> > On Thu, Mar 10, 2011 at 1:40 PM, Jonathan Ellis 
> wrote:
> >>
> >> If you read the bugs I linked, you would see that this is expected
> >> behavior with 0.7.3 once you get more data than you can index
> >> in-memory.
> >>
> >> You should wait for the next Hudson build (which will include 2295)
> >> and use that.  Or, create your indexes before adding the data.
> >>
> >> On Thu, Mar 10, 2011 at 12:26 PM, Matt Kennedy 
> >> wrote:
> >> > Well it looks like the index creation job crashed the cluster.  All of
> >> > the
> >> > nodes were down having dumped out .hprof files.  I brought the cluster
> >> > back
> >> > up and when I do "describe keyspace ks" it looks like the index build
> >> > process has started over again.  Is it safe to attempt to stop that by
> >> > running an "update column family" command with fewer indexes defined?
> >> > Or is
> >> > there a better way to safely terminate this index creation process
> that
> >> > I
> >> > assume will crash the cluster again eventually?
> >> >
> >> > Would creating the indexes one at a time help? Or will the same
> problem
> >> > occur once I get to a certain number of indexes on the column family?
> >> >
> >> > Thanks,
> >> > Matt
> >> >
> >> > On Wed, Mar 9, 2011 at 8:40 PM, Jonathan Ellis 
> >> > wrote:
> >> >>
> >> >> https://issues.apache.org/jira/browse/CASSANDRA-2294
> >> >> https://issues.apache.org/jira/browse/CASSANDRA-2295
> >> >>
> >> >> On Wed, Mar 9, 2011 at 5:47 PM, Matt Kennedy 
> >> >> wrote:
> >> >> > I'm trying to gain some insight into what happens with a cluster
> when
> >> >> > indexes are being built, or when CFs with indexed columns are being
> >> >> > written
> >> >> > to.
> >> >> >
> >> >> > Over the past couple of days we've been doing some loads into a CF
> >> >> > with
> >> >> > 29
> >> >> > indexed columns.  Eventually, the nodes just got overwhelmed and
> the
> >> >> > client
> >> >> > (Hector) started getting timeouts.  We were using using a MapReduce
> >> >> > job
> >> >> > to
> >> >> > load an HDFS file into Cassandra, though we had limited the load
> job
> >> >> > to
> >> >> > one
> >> >> > task per node.  My confusion comes from how difficult it was to
> know
> >> >> > that
> >> >> > the nodes were becoming overwhelmed.  The ring consistently
> reported
> >> >> > that
> >> >> > all nodes were up and it did not appear that there were pending
> >> >> > operations
> >> >> > under tpstats.  I also monitor this cluster with Ganglia, and at no
> >> >> > point
> >> >> > did any of the machine loads appear very high at all, yet our job
> >> >> > kept
> >> >> > failing with Hector reporting timeouts.
> >> >> >
> >> >> > Today we decided to leave index creation until the end, and just
> load
> >> >> > the
> >> >> > data using the same Hector code.  We bumped up the hadoop
> concurrency
> >> >> > to
> >> >> > two
> >> >> > concurrent tasks per node, and everything went fine, as expected,
> >> >> > we've
> >> >> > done
> >> >> > much larger loads than this using Hadoop and as long as you don't
> >> >> > shoot
> >> >> > for
> >> >> > too much concurrency, Cassandra can deal with it.  So now we have
> the
> >> >> > data
> >> >> > in the column family and I updated the column family metadata in
> the
> >> >> > CLI
> >> >> > to
> >> >> > enable the 29 indexes.  As soon as I do that, the ring starts
> >> >> > reporting
> >> >> > that
> >> >> > nodes are down intermittently, and HintedHandoffs are starting to
> >> >> > accumulate
> >> >> > under tpstats. Ganglia is reporting very low overall load, so I'm
> >> >> > wondering
> >> >> > why it's taking so long for cli and nodetool commands to return.
> >> >> >
> >> >> > I'm just trying to get a better handle on what kind of actions have
> a
> >> >> > serious impact on cluster availability and to know the right places
> >> >> > to
> >> >> > look
> >> >> > to try to get ahead of those conditions.
> >> >> >
> >> >> > Thanks for any insight you can provide,
> >> >> > Matt
> >> >> >
> >> >>
> >

Re: problem with bootstrap

2011-03-10 Thread Peter Schuller
> Bootstrapping uses the same mechanisms as a repair to streams data from other 
> nodes. This can be a heavy weight process and you may want to control when it 
> starts.
>
> Joining the ring just tells the other nodes you exists and this is your token.

And in general, except when initially setting up a cluster, you would
not normally want to have a node join the ring without auto_bootstrap
enabled since it will serve requests yet be void of any data.

-- 
/ Peter Schuller


Re: problem with bootstrap

2011-03-10 Thread Peter Schuller
> Could it be because once auto_bootstrap is off it's off forever?

I am not entirely sure if this answers your question (I revisisted the
thread history but I'm a bit confused myself):
If by that you mean that given a node which was started with
auto_bootstrap=false, and it successfully joined the ring etc, that
changing autobootstrap to true in the configuration and restarting has
no effect - then yes. The auto_bootstrap and initial_token settings
only make sense on a node which has not yet been made part of the
ring.

The only time at which it might bootstrap (word used in the sense of
"populate with data by grabbing data form other nodes") is on initial
start-up. As soon as start-up completes, it is written to the system
tables that an initial start-up has been completed.

(See StorageService.joinTokenRing(); the
SystemTable.setBootstrapped(true) call at the end is what causes a
future start-up to never trigger bootstrapping.)

-- 
/ Peter Schuller


Re: problem with bootstrap

2011-03-10 Thread mcasandra
I am completely confused. I repeated same test after turning on
auto_bootstrap to true and it worked this time. I did it exactly same way
where I killed 2 nodes and this time it started with no issues.

Could it be because once auto_bootstrap is off it's off forever?

I am using hector and upgraded hector this morning.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/problem-with-bootstrap-tp6127315p6159679.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Understanding index builds (updated: crashed cluster)

2011-03-10 Thread Jonathan Ellis
Drop the index, then restart once more.  It shouldn't try to rebuild
the index after that.

On Thu, Mar 10, 2011 at 3:36 PM, Matt Kennedy  wrote:
> Sorry, I wasn't clear on the timeline of events.  I started the index build
> and then posted this message to the list. Once I read the links you posted,
> I did expect the cluster to crash, but I let it run until it blew up anyway,
> since I didn't really know how to stop the index build.
>
> Which is sort of where I'm still stuck, I don't want to corrupt that column
> family by issuing an "update column family" that has a smaller set of
> indexes while the index build is going on without some encouragement from
> the list that doing that won't wreck the column family. Is there a safe way
> to tell an index build to stop after the cluster starts up from a crash due
> to the index build?
>
> Thanks,
> Matt
>
> On Thu, Mar 10, 2011 at 1:40 PM, Jonathan Ellis  wrote:
>>
>> If you read the bugs I linked, you would see that this is expected
>> behavior with 0.7.3 once you get more data than you can index
>> in-memory.
>>
>> You should wait for the next Hudson build (which will include 2295)
>> and use that.  Or, create your indexes before adding the data.
>>
>> On Thu, Mar 10, 2011 at 12:26 PM, Matt Kennedy 
>> wrote:
>> > Well it looks like the index creation job crashed the cluster.  All of
>> > the
>> > nodes were down having dumped out .hprof files.  I brought the cluster
>> > back
>> > up and when I do "describe keyspace ks" it looks like the index build
>> > process has started over again.  Is it safe to attempt to stop that by
>> > running an "update column family" command with fewer indexes defined?
>> > Or is
>> > there a better way to safely terminate this index creation process that
>> > I
>> > assume will crash the cluster again eventually?
>> >
>> > Would creating the indexes one at a time help? Or will the same problem
>> > occur once I get to a certain number of indexes on the column family?
>> >
>> > Thanks,
>> > Matt
>> >
>> > On Wed, Mar 9, 2011 at 8:40 PM, Jonathan Ellis 
>> > wrote:
>> >>
>> >> https://issues.apache.org/jira/browse/CASSANDRA-2294
>> >> https://issues.apache.org/jira/browse/CASSANDRA-2295
>> >>
>> >> On Wed, Mar 9, 2011 at 5:47 PM, Matt Kennedy 
>> >> wrote:
>> >> > I'm trying to gain some insight into what happens with a cluster when
>> >> > indexes are being built, or when CFs with indexed columns are being
>> >> > written
>> >> > to.
>> >> >
>> >> > Over the past couple of days we've been doing some loads into a CF
>> >> > with
>> >> > 29
>> >> > indexed columns.  Eventually, the nodes just got overwhelmed and the
>> >> > client
>> >> > (Hector) started getting timeouts.  We were using using a MapReduce
>> >> > job
>> >> > to
>> >> > load an HDFS file into Cassandra, though we had limited the load job
>> >> > to
>> >> > one
>> >> > task per node.  My confusion comes from how difficult it was to know
>> >> > that
>> >> > the nodes were becoming overwhelmed.  The ring consistently reported
>> >> > that
>> >> > all nodes were up and it did not appear that there were pending
>> >> > operations
>> >> > under tpstats.  I also monitor this cluster with Ganglia, and at no
>> >> > point
>> >> > did any of the machine loads appear very high at all, yet our job
>> >> > kept
>> >> > failing with Hector reporting timeouts.
>> >> >
>> >> > Today we decided to leave index creation until the end, and just load
>> >> > the
>> >> > data using the same Hector code.  We bumped up the hadoop concurrency
>> >> > to
>> >> > two
>> >> > concurrent tasks per node, and everything went fine, as expected,
>> >> > we've
>> >> > done
>> >> > much larger loads than this using Hadoop and as long as you don't
>> >> > shoot
>> >> > for
>> >> > too much concurrency, Cassandra can deal with it.  So now we have the
>> >> > data
>> >> > in the column family and I updated the column family metadata in the
>> >> > CLI
>> >> > to
>> >> > enable the 29 indexes.  As soon as I do that, the ring starts
>> >> > reporting
>> >> > that
>> >> > nodes are down intermittently, and HintedHandoffs are starting to
>> >> > accumulate
>> >> > under tpstats. Ganglia is reporting very low overall load, so I'm
>> >> > wondering
>> >> > why it's taking so long for cli and nodetool commands to return.
>> >> >
>> >> > I'm just trying to get a better handle on what kind of actions have a
>> >> > serious impact on cluster availability and to know the right places
>> >> > to
>> >> > look
>> >> > to try to get ahead of those conditions.
>> >> >
>> >> > Thanks for any insight you can provide,
>> >> > Matt
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Jonathan Ellis
>> >> Project Chair, Apache Cassandra
>> >> co-founder of DataStax, the source for professional Cassandra support
>> >> http://www.datastax.com
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>
>



-- 
Jonat

RE: Nodes frozen in GC

2011-03-10 Thread Gregory Szorc
I do believe there is a fundamental issue with compactions allocating too much 
memory and incurring too many garbage collections (at least with 0.6.12).

On nearly every Cassandra node I operate, garbage collections simply get out of 
control during compactions of any reasonably sized CF (>1GB). I can reproduce 
it on CF's with many wider rows (1000's of columns) consisting of smaller 
columns (10's-100's of bytes) and CF's with thinner rows (<20 columns) with 
larger columns (10's MBs) and everything in between.

From the GC logs, I can infer that Cassandra is allocating upwards of 4GB/s. I 
once gave the JVM 30GB of heap and saw it run through the entire heap in a few 
seconds while doing a compaction! It would continuously blow through the heap, 
incur a stop-the-world collection, and repeat. Meanwhile, the listed compacted 
bytes from the JMX interface was never increasing and the tmp sstable wasn't 
growing in size.

My current/relevant JVM args are as follows (running on Sun 1.6.0.24 w/ JNA 
3.2.7):

-Xms9G -Xmx9G -Xmn256M -XX:+PrintGCDetails -XX:+PrintGCTimeStamps 
-XX:+PrintClassHistogram -XX:+PrintTenuringDistribution 
-Xloggc:/var/log/cassandra/gc.log -XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
-XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=3 
-XX:CMSInitiatingOccupancyFraction=40 -XX:+HeapDumpOnOutOfMemoryError 
-XX:+UseCMSInitiatingOccupancyOnly -XX:CMSFullGCsBeforeCompaction=1 
-XX:ParallelGCThreads=6

I've tweaked with nearly every setting imaginable 
(http://www.md.pp.ru/~eu/jdk6options.html is a great resource, BTW) and can't 
control the problem. No matter what I do, nothing can solve the problem of 
Cassandra allocating objects faster than the GC can clean them. And, when we're 
talking about >1GB/s of allocations, I don't think you can blame GC for not 
keeping up.

Since there is no way to prevent these frequent stop-the-world collections, we 
get frequent client timeouts and an occasional unavailable response if we're 
unfortunate to have a couple of nodes compacting large CFs at the same time 
(which happens more than I'd like).

For the past two weeks, we had N= adjacent nodes in our 
cluster that failed to perform their daily major compaction on a particular 
column family. All N would spew GCInspector logs and the GC logs revealed heavy 
memory allocation rate. The only resolution was to restart Cassandra to abort 
the compaction. I isolated one node from network connectivity and restarted it 
in a cluster of 1 with no caching, memtables, or any operations. Under these 
ideal compacting conditions, I still ran into issues. I experimented with 
extremely large young generations (up to 10GB), very low 
CMSInitiatingOccupancyFraction, etc, but Cassandra would always allocate faster 
than JVM could collect, eventually leading to stop-the-world.

Recently, we rolled out a change to the application accessing the cluster which 
effectively resaved every column in every row. When this was mostly done, our 
daily major compaction for the trouble CF that refused to compact for two 
weeks, suddenly completed! Most interesting. (Although, it still went through 
memory to no end.)

One of my observations is that memory allocations during compaction seems to be 
mostly short-lived objects. The young generation is almost never promoting 
objects to the tenured generation (we changed our MaxTenuringThreshold to 3, 
from Cassandra's default of 1 to discourage early promotion- a default of 1 
seems rather silly to me). However, when the young generation is being 
collected (which happens VERY often during compactions b/c allocation rate is 
so high), objects are allocated directly into the tenured generation. Even with 
relatively short ParNew collections (often <0.05s, almost always <0.1s wall 
time), these tenured allocations quickly accumulate, initiating CMS and 
eventually stop-the-world.

Anyway, not sure how much additional writing is going to help resolve this 
issue. I have gobs of GC logs and supplementary metrics data to back up my 
claims if those will help. But, I have a feeling that if you just create a CF 
of a few GB and incur a compaction with the JVM under a profiler, it will be 
pretty easy to identify the culprit. I've started down this path and will let 
you know if I find anything. But, I'm no Java expert and am quite busy with 
other tasks, so don't expect anything useful from me anytime soon.

I hope this information helps. If you need anything else, just ask, and I'll 
see what I can do.

Gregory Szorc
gregory.sz...@xobni.com

> -Original Message-
> From: sc...@scode.org [mailto:sc...@scode.org] On Behalf Of Peter
> Schuller
> Sent: Thursday, March 10, 2011 10:36 AM
> To: ruslan usifov
> Cc: user@cassandra.apache.org
> Subject: Re: Nodes frozen in GC
> 
> I think it would be very useful to get to the bottom of this but without
> further details (like the asked for GC logs) I'm not sure what to do/suggest.
> 
> It's clear that a single CF wi

Re: Understanding index builds (updated: crashed cluster)

2011-03-10 Thread Matt Kennedy
Sorry, I wasn't clear on the timeline of events.  I started the index build
and then posted this message to the list. Once I read the links you posted,
I did expect the cluster to crash, but I let it run until it blew up anyway,
since I didn't really know how to stop the index build.

Which is sort of where I'm still stuck, I don't want to corrupt that column
family by issuing an "update column family" that has a smaller set of
indexes while the index build is going on without some encouragement from
the list that doing that won't wreck the column family. Is there a safe way
to tell an index build to stop after the cluster starts up from a crash due
to the index build?

Thanks,
Matt

On Thu, Mar 10, 2011 at 1:40 PM, Jonathan Ellis  wrote:

> If you read the bugs I linked, you would see that this is expected
> behavior with 0.7.3 once you get more data than you can index
> in-memory.
>
> You should wait for the next Hudson build (which will include 2295)
> and use that.  Or, create your indexes before adding the data.
>
> On Thu, Mar 10, 2011 at 12:26 PM, Matt Kennedy 
> wrote:
> > Well it looks like the index creation job crashed the cluster.  All of
> the
> > nodes were down having dumped out .hprof files.  I brought the cluster
> back
> > up and when I do "describe keyspace ks" it looks like the index build
> > process has started over again.  Is it safe to attempt to stop that by
> > running an "update column family" command with fewer indexes defined?  Or
> is
> > there a better way to safely terminate this index creation process that I
> > assume will crash the cluster again eventually?
> >
> > Would creating the indexes one at a time help? Or will the same problem
> > occur once I get to a certain number of indexes on the column family?
> >
> > Thanks,
> > Matt
> >
> > On Wed, Mar 9, 2011 at 8:40 PM, Jonathan Ellis 
> wrote:
> >>
> >> https://issues.apache.org/jira/browse/CASSANDRA-2294
> >> https://issues.apache.org/jira/browse/CASSANDRA-2295
> >>
> >> On Wed, Mar 9, 2011 at 5:47 PM, Matt Kennedy 
> wrote:
> >> > I'm trying to gain some insight into what happens with a cluster when
> >> > indexes are being built, or when CFs with indexed columns are being
> >> > written
> >> > to.
> >> >
> >> > Over the past couple of days we've been doing some loads into a CF
> with
> >> > 29
> >> > indexed columns.  Eventually, the nodes just got overwhelmed and the
> >> > client
> >> > (Hector) started getting timeouts.  We were using using a MapReduce
> job
> >> > to
> >> > load an HDFS file into Cassandra, though we had limited the load job
> to
> >> > one
> >> > task per node.  My confusion comes from how difficult it was to know
> >> > that
> >> > the nodes were becoming overwhelmed.  The ring consistently reported
> >> > that
> >> > all nodes were up and it did not appear that there were pending
> >> > operations
> >> > under tpstats.  I also monitor this cluster with Ganglia, and at no
> >> > point
> >> > did any of the machine loads appear very high at all, yet our job kept
> >> > failing with Hector reporting timeouts.
> >> >
> >> > Today we decided to leave index creation until the end, and just load
> >> > the
> >> > data using the same Hector code.  We bumped up the hadoop concurrency
> to
> >> > two
> >> > concurrent tasks per node, and everything went fine, as expected,
> we've
> >> > done
> >> > much larger loads than this using Hadoop and as long as you don't
> shoot
> >> > for
> >> > too much concurrency, Cassandra can deal with it.  So now we have the
> >> > data
> >> > in the column family and I updated the column family metadata in the
> CLI
> >> > to
> >> > enable the 29 indexes.  As soon as I do that, the ring starts
> reporting
> >> > that
> >> > nodes are down intermittently, and HintedHandoffs are starting to
> >> > accumulate
> >> > under tpstats. Ganglia is reporting very low overall load, so I'm
> >> > wondering
> >> > why it's taking so long for cli and nodetool commands to return.
> >> >
> >> > I'm just trying to get a better handle on what kind of actions have a
> >> > serious impact on cluster availability and to know the right places to
> >> > look
> >> > to try to get ahead of those conditions.
> >> >
> >> > Thanks for any insight you can provide,
> >> > Matt
> >> >
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder of DataStax, the source for professional Cassandra support
> >> http://www.datastax.com
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


Re: problem with bootstrap

2011-03-10 Thread aaron morton
Can you include this info...

- output from nodetool ring for all nodes so we can see whats in the ring
- what you've run on the node you are trying to bring in
- the nodetool command you are trying to run
- error logs 

In general asking the cluster to replicate data more times than the number of 
nodes it has is a bad thing. How did you create the keyspace ? 

Aaron

On 11 Mar 2011, at 03:45, Patrik Modesto wrote:

> Hi,
> 
> I'm stil fighting the
> Exception in thread "main" java.lang.IllegalStateException:
> replication factor (3) exceeds number of endpoints (2).
> 
> When I have a 2-server cluster, create Keyspace with RF 3, I'm able to
> add (without auto_bootstrap) another node but cluster nodetool
> commands don't work and fail with the exception above. The new node
> serve data but I can't do loadbalance, decommission or move on the
> cluster.
> 
> Patrik



Re: Modeling Multi-Valued Fields

2011-03-10 Thread Sasha Dolgy
hm.  i use this approach and have secondary indexes configured on the
columns if i need to do a specific search for an address.

alternately, in the user cf, if you wanted to be very uncool, but
optimized for always retrieving the user email addresses, you could
have the uuid for the user record and another row for uuid:email and
each column is an email ... this way you can allow them to add as many
as they want with key being the "type" as in the original email and
the value being the email ...


On Thu, Mar 10, 2011 at 10:15 PM, aaron morton  wrote:


> Slight variation is to use a standard CF and pack the column names, e.g.
> "email.home"  or "email.work" as column names. (Mentioned for completeness,
> not the best approach)


Re: mutator.execute() timings - big variance noted - pointers needed on understanding/improving it

2011-03-10 Thread aaron morton
http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts

Aaron

On 11 Mar 2011, at 05:08, sridhar basam wrote:

> 
> Sounds like GC from your description of fast->slow->fast. Collect GC times 
> from both the client and server side and plot against your application timing.
> 
>  If you uncomment the verbose GC entries in the cassandra-env.sh file you 
> should get timing for the server side, pass in the same arguments for your 
> client. Align time across the 3 files and plot to see if GC is the cause.
> 
>  Sridhar
> 
> 
> 
> On Thu, Mar 10, 2011 at 9:30 AM, Roshan Dawrani  
> wrote:
> Hi,
> 
> I am in the middle of some load testing on a 1-node Cassandra setup. We are 
> not on very high loads yet. We have recorded the timings taken up by 
> mutator.execute() calls and we see this kind of variation during the test run:
> 
> So, 25% of the times, execute() calls come back in 25 milli-seconds, but the 
> longer calls go upto 4 seconds.
> 
> Can someone please provide some pointers on what and where to focus on in my 
> Hector / Cassandra setup? We are mostly on the default Cassandra 
> configuration at this time - only change is the max connection pool size 
> (CassandraHostConfigurator.maxActive) is changed to 300 from a default of 50.
> 
> I would also like to add that the time increase is not linear - it starts 
> fast, goes, slow, very slow, and becomes faster again.
> 
> 
>   25% 29
>   50%105
>   66%185
>   70%208
>   75%240
>   80%297
>   90%510
>   95%854
>   98%   1075
>   99%   1215
>  100%   4442
> 
> 
> -- 
> Roshan
> Blog: http://roshandawrani.wordpress.com/
> Twitter: @roshandawrani
> Skype: roshandawrani
> 
> 



Re: Modeling Multi-Valued Fields

2011-03-10 Thread aaron morton
Two approaches here.

First the "many columns" approach. Have a super column called Email, for each 
email address store the type as the column name and the email address as the 
column name. In cassandra you can store information in the column names as well 
as the column values. And you do not need to know the column names to read them 
back, see get_slice() on the API http://wiki.apache.org/cassandra/API 

Slight variation is to use a standard CF and pack the column names, e.g. 
"email.home"  or "email.work" as column names. (Mentioned for completeness, not 
the best approach)

Second the "few columns" approach. Pack all the email addresses for the 
customer into something like a JSON document and store that in one field, using 
a standard CF for the user. 

Slight variation is to pack almost everything for the User into a JSON doc and 
store that. 

If you are always pulling back all the data for the user, and you will always 
want to update all the data at once then consider trying the second approach. 
Otherwise try the first. 

IMHO it's better to pull back a bit more data than is needed to the client 
(e.g. all their data or all their email addresses), than it is to optimize to 
read just one particular field. The overall goal here is to optimize your 
storage model to support read requests, even if it means duplication and 
de-normalisation. 

Hope that helps. 
Aaron

 
On 10 Mar 2011, at 14:43, Cameron Leach wrote:

> Is there a best-practice for modeling multi-valued fields (fields that are 
> repeated or collections of fields)? Our current data model allows for a User 
> to store multiple email addresses:
> 
> User {
>   Integer id; //row key  
>   List emails;
> 
>   Email {
> String type; //home, work, gmail, hotmail, etc...
> String address;
>   }
> }
> 
> So if I setup a 'User' column family with an 'Email' super column, how would 
> one support multiple email addresses, storing values for the 'type' and 
> 'address' column names? I've seen it suggested to have dynamic column names, 
> but this doesn't seem practical, unless someone can make it more clear how 
> that strategy would work.
> 
> Thanks!
> 
> 



how to force a GC in cronjob to free up disk space?

2011-03-10 Thread Karl Hiramoto
Reading the FAQ  http://wiki.apache.org/cassandra/FAQ

"SSTables that are obsoleted by a compaction are deleted asynchronously
when the JVM performs a GC. You can force a GC from jconsole if necessary"

How can i force the GC with a simple java commandline?  Is this a bad idea?

Since i have lots of inserts and  deletes  I could  save 2X or 3X of my
disk requirements, about 40GB to 80GB which is useful on hosting where
disk is expensive.

Thanks,

Karl


Re: problem with bootstrap

2011-03-10 Thread mcasandra

mcasandra wrote:
> 
> 
> aaron morton wrote:
>> 
>> 
>> The issue I think you and Patrik are seeing occurs when you *remove*
>> nodes from the ring. The ring does not know if they are up or down. E.g.
>> you have a ring of 3 nodes, and add a keyspace with RF 3. Then for
>> whatever reason 2 nodes are removed from the ring. When bootstrapping a
>> node into this ring it will fail because it detects the cluster does not
>> have enough *endpoints* (different to up nodes) to support the keyspace. 
>> 
>> 
> Thanks for more info. However I am still not understanding why I am
> running in this situation since this node was once up like other node. In
> your previous post you mentioned that the node got removed. I am trying to
> understand what that really means and what causes a node to remove? All I
> did was kill -9 and then sudo cassandra to start the node.
> 
I am still trying to see how to find the root cause of this behaviour. I
wonder if this were to happen in production how will we debug or what will
we do :(

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/problem-with-bootstrap-tp6127315p6158996.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Cassandra LongType data insertion problem for secondary index usage

2011-03-10 Thread Adi
That was it. Thanks thobbs :-) The queries work as expected now.


-Adi

On Thu, Mar 10, 2011 at 1:01 PM, Tyler Hobbs  wrote:

> I looked again at the original 
> emailand
>  noticed that besides the bit-shift issue that gets corrected in the next
> email in the thread, there is another problem.  The long is being created in
> little-endian order instead of big endian.
>
> Here's the fully correct way to pack a long:
>
> int64_t my_long = 12345678;
> char chars[8];
> for(int i = 7; i >= 0; i--) {
> chars[i] = my_long & 0xff;
> my_long = my_long >> 8;
> }
>
> std::string str_long(chars, 8);
>
> Column c1;
> c1.name = str_long;
> // etc ...
>
>
> On Thu, Mar 10, 2011 at 11:05 AM, Adi  wrote:
>
>> Environment: Cassandra 0.7.0 , C++ Thrift client on windows
>>
>> I have a column family with a secondary index
>>  ColumnFamily: Page
>>   Columns sorted by: org.apache.cassandra.db.marshal.BytesType
>>   Built indexes: [Page.index_domain, Page.index_content_size]
>>   Column Metadata:
>> Column Name: domain (646f6d61696e)
>>   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>>   Index Name: index_domain
>>   Index Type: KEYS
>> Column Name: original_content_size
>> (6f726967696e616c5f636f6e74656e745f73697a65)
>>   Validation Class: org.apache.cassandra.db.marshal.LongType
>>   Index Name: index_content_size
>>   Index Type: KEYS
>>
>> As suggested by thobbs in an earlier posting I am sending the
>> original_content_size as binary strings. I am able to write and read from
>> the c++ client correctly.
>> But on the cassandra-cli I am not able to see the values of
>> original_content_size as longs. following are the results seen for a value 5
>> that was sent.
>>
>> get Page['test1234'][original_content_size];
>> => (column=6f726967696e616c5f636f6e74656e745f73697a65,
>> value=360287970189639680, timestamp=1299773217120)
>>
>> get Page['test1234'][original_content_size] as bytes;
>> => (column=6f726967696e616c5f636f6e74656e745f73697a65,
>> value=0500, timestamp=1299773217120)
>>
>> Similarly the queries do not work as expected. Example get Page where
>> domain = 'testabc.com' and original_content_size = 5; does not return the
>> row that was inserted.
>>
>> Any suggestions on what I might be doing incorrectly either in schema
>> definition or the way I am sending the values are welcome.
>>
>> -Adi
>>
>>
>
> --
> Tyler Hobbs
> Software Engineer, DataStax 
> Maintainer of the pycassa  Cassandra
> Python client library
>
>


Re: Understanding index builds (updated: crashed cluster)

2011-03-10 Thread Jonathan Ellis
If you read the bugs I linked, you would see that this is expected
behavior with 0.7.3 once you get more data than you can index
in-memory.

You should wait for the next Hudson build (which will include 2295)
and use that.  Or, create your indexes before adding the data.

On Thu, Mar 10, 2011 at 12:26 PM, Matt Kennedy  wrote:
> Well it looks like the index creation job crashed the cluster.  All of the
> nodes were down having dumped out .hprof files.  I brought the cluster back
> up and when I do "describe keyspace ks" it looks like the index build
> process has started over again.  Is it safe to attempt to stop that by
> running an "update column family" command with fewer indexes defined?  Or is
> there a better way to safely terminate this index creation process that I
> assume will crash the cluster again eventually?
>
> Would creating the indexes one at a time help? Or will the same problem
> occur once I get to a certain number of indexes on the column family?
>
> Thanks,
> Matt
>
> On Wed, Mar 9, 2011 at 8:40 PM, Jonathan Ellis  wrote:
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-2294
>> https://issues.apache.org/jira/browse/CASSANDRA-2295
>>
>> On Wed, Mar 9, 2011 at 5:47 PM, Matt Kennedy  wrote:
>> > I'm trying to gain some insight into what happens with a cluster when
>> > indexes are being built, or when CFs with indexed columns are being
>> > written
>> > to.
>> >
>> > Over the past couple of days we've been doing some loads into a CF with
>> > 29
>> > indexed columns.  Eventually, the nodes just got overwhelmed and the
>> > client
>> > (Hector) started getting timeouts.  We were using using a MapReduce job
>> > to
>> > load an HDFS file into Cassandra, though we had limited the load job to
>> > one
>> > task per node.  My confusion comes from how difficult it was to know
>> > that
>> > the nodes were becoming overwhelmed.  The ring consistently reported
>> > that
>> > all nodes were up and it did not appear that there were pending
>> > operations
>> > under tpstats.  I also monitor this cluster with Ganglia, and at no
>> > point
>> > did any of the machine loads appear very high at all, yet our job kept
>> > failing with Hector reporting timeouts.
>> >
>> > Today we decided to leave index creation until the end, and just load
>> > the
>> > data using the same Hector code.  We bumped up the hadoop concurrency to
>> > two
>> > concurrent tasks per node, and everything went fine, as expected, we've
>> > done
>> > much larger loads than this using Hadoop and as long as you don't shoot
>> > for
>> > too much concurrency, Cassandra can deal with it.  So now we have the
>> > data
>> > in the column family and I updated the column family metadata in the CLI
>> > to
>> > enable the 29 indexes.  As soon as I do that, the ring starts reporting
>> > that
>> > nodes are down intermittently, and HintedHandoffs are starting to
>> > accumulate
>> > under tpstats. Ganglia is reporting very low overall load, so I'm
>> > wondering
>> > why it's taking so long for cli and nodetool commands to return.
>> >
>> > I'm just trying to get a better handle on what kind of actions have a
>> > serious impact on cluster availability and to know the right places to
>> > look
>> > to try to get ahead of those conditions.
>> >
>> > Thanks for any insight you can provide,
>> > Matt
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Nodes frozen in GC

2011-03-10 Thread Peter Schuller
I think it would be very useful to get to the bottom of this but
without further details (like the asked for GC logs) I'm not sure what
to do/suggest.

It's clear that a single CF with a 64 MB memtable flush threshold and
without key cache and row cache and some bulk insertion, should not be
causing the problems you are seeing, in general. Especially not with a
> 5 gb heap size. I think it is highly likely that there is some
little detail/mistake going on here rather than a fundamental issue.
But regardless, it would be nice to discover what.

-- 
/ Peter Schuller


Re: Understanding index builds (updated: crashed cluster)

2011-03-10 Thread Matt Kennedy
Well it looks like the index creation job crashed the cluster.  All of the
nodes were down having dumped out .hprof files.  I brought the cluster back
up and when I do "describe keyspace ks" it looks like the index build
process has started over again.  Is it safe to attempt to stop that by
running an "update column family" command with fewer indexes defined?  Or is
there a better way to safely terminate this index creation process that I
assume will crash the cluster again eventually?

Would creating the indexes one at a time help? Or will the same problem
occur once I get to a certain number of indexes on the column family?

Thanks,
Matt

On Wed, Mar 9, 2011 at 8:40 PM, Jonathan Ellis  wrote:

> https://issues.apache.org/jira/browse/CASSANDRA-2294
> https://issues.apache.org/jira/browse/CASSANDRA-2295
>
> On Wed, Mar 9, 2011 at 5:47 PM, Matt Kennedy  wrote:
> > I'm trying to gain some insight into what happens with a cluster when
> > indexes are being built, or when CFs with indexed columns are being
> written
> > to.
> >
> > Over the past couple of days we've been doing some loads into a CF with
> 29
> > indexed columns.  Eventually, the nodes just got overwhelmed and the
> client
> > (Hector) started getting timeouts.  We were using using a MapReduce job
> to
> > load an HDFS file into Cassandra, though we had limited the load job to
> one
> > task per node.  My confusion comes from how difficult it was to know that
> > the nodes were becoming overwhelmed.  The ring consistently reported that
> > all nodes were up and it did not appear that there were pending
> operations
> > under tpstats.  I also monitor this cluster with Ganglia, and at no point
> > did any of the machine loads appear very high at all, yet our job kept
> > failing with Hector reporting timeouts.
> >
> > Today we decided to leave index creation until the end, and just load the
> > data using the same Hector code.  We bumped up the hadoop concurrency to
> two
> > concurrent tasks per node, and everything went fine, as expected, we've
> done
> > much larger loads than this using Hadoop and as long as you don't shoot
> for
> > too much concurrency, Cassandra can deal with it.  So now we have the
> data
> > in the column family and I updated the column family metadata in the CLI
> to
> > enable the 29 indexes.  As soon as I do that, the ring starts reporting
> that
> > nodes are down intermittently, and HintedHandoffs are starting to
> accumulate
> > under tpstats. Ganglia is reporting very low overall load, so I'm
> wondering
> > why it's taking so long for cli and nodetool commands to return.
> >
> > I'm just trying to get a better handle on what kind of actions have a
> > serious impact on cluster availability and to know the right places to
> look
> > to try to get ahead of those conditions.
> >
> > Thanks for any insight you can provide,
> > Matt
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


Re: Cassandra LongType data insertion problem for secondary index usage

2011-03-10 Thread Tyler Hobbs
I looked again at the original
emailand
noticed that besides the bit-shift issue that gets corrected in the
next
email in the thread, there is another problem.  The long is being created in
little-endian order instead of big endian.

Here's the fully correct way to pack a long:

int64_t my_long = 12345678;
char chars[8];
for(int i = 7; i >= 0; i--) {
chars[i] = my_long & 0xff;
my_long = my_long >> 8;
}

std::string str_long(chars, 8);

Column c1;
c1.name = str_long;
// etc ...


On Thu, Mar 10, 2011 at 11:05 AM, Adi  wrote:

> Environment: Cassandra 0.7.0 , C++ Thrift client on windows
>
> I have a column family with a secondary index
>  ColumnFamily: Page
>   Columns sorted by: org.apache.cassandra.db.marshal.BytesType
>   Built indexes: [Page.index_domain, Page.index_content_size]
>   Column Metadata:
> Column Name: domain (646f6d61696e)
>   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>   Index Name: index_domain
>   Index Type: KEYS
> Column Name: original_content_size
> (6f726967696e616c5f636f6e74656e745f73697a65)
>   Validation Class: org.apache.cassandra.db.marshal.LongType
>   Index Name: index_content_size
>   Index Type: KEYS
>
> As suggested by thobbs in an earlier posting I am sending the
> original_content_size as binary strings. I am able to write and read from
> the c++ client correctly.
> But on the cassandra-cli I am not able to see the values of
> original_content_size as longs. following are the results seen for a value 5
> that was sent.
>
> get Page['test1234'][original_content_size];
> => (column=6f726967696e616c5f636f6e74656e745f73697a65,
> value=360287970189639680, timestamp=1299773217120)
>
> get Page['test1234'][original_content_size] as bytes;
> => (column=6f726967696e616c5f636f6e74656e745f73697a65,
> value=0500, timestamp=1299773217120)
>
> Similarly the queries do not work as expected. Example get Page where
> domain = 'testabc.com' and original_content_size = 5; does not return the
> row that was inserted.
>
> Any suggestions on what I might be doing incorrectly either in schema
> definition or the way I am sending the values are welcome.
>
> -Adi
>
>

-- 
Tyler Hobbs
Software Engineer, DataStax 
Maintainer of the pycassa  Cassandra
Python client library


Re: FW: Very slow batch insert using version 0.7.2

2011-03-10 Thread Ryan King
Why use such a large batch size?

-ryan

On Thu, Mar 10, 2011 at 6:31 AM, Desimpel, Ignace
 wrote:
>
>
> Hello,
>
> I had a demo application with embedded cassandra version 0.6.x, inserting
> about 120 K  row mutations in one call.
>
> In version 0.6.x that usually took about 5 seconds, and I could repeat this
> step adding each time the same amount of data.
>
> Running on a single CPU computer, single hard disk, XP 32 bit OS, 1G memory
>
> I tested this again on CentOS 64 bit OS, 6G memory, different settings of
> memtable_throughput_in_mb and memtable_operations_in_millions.
>
> Also tried version 0.7.3. Also the same behavior.
>
>
>
> Now with version 0.7.2 the call returns with a timeout exception even using
> a timeout of 12 (2 minutes). I see the CPU time going to 100%, a lot of
> disk writing ( giga bytes), a lot of log messages  about compacting,
> flushing, commitlog, …
>
>
>
> Below you can find some information using the nodetool at start of the batch
> mutation and also after 14 minutes. The MutationStage is clearly showing how
> slow the system handles the row mutations.
>
>
>
> Attached : Cassandra.yaml with at end the description of my database
> structure using yaml
>
> Attached : log file with cassandra output.
>
>
>
> Any idea what I could be doing wrong?
>
>
>
> Regards,
>
>
>
> Ignace Desimpel
>
>
>
> ignace.desim...@nuance.com
>
>
>
> At start of the insert (after inserting 124360 row mutations) I get the
> following info from the nodetool :
>
>
>
> C:\apache-cassandra-07.2\bin>nodetool --host ads.nuance.com info
>
> Starting NodeTool
>
> 34035877798200531112672274220979640561
>
> Gossip active    : true
>
> Load : 5.49 MB
>
> Generation No    : 1299502115
>
> Uptime (seconds) : 1152
>
> Heap Memory (MB) : 179,84 / 1196,81
>
>
>
> C:\apache-cassandra-07.2\bin>nodetool --host ads.nuance.com tpstats
>
> Starting NodeTool
>
> Pool Name    Active   Pending  Completed
>
> ReadStage 0 0  40637
>
> RequestResponseStage  0 0 30
>
> MutationStage    32    121679  72149
>
> GossipStage   0 0  0
>
> AntiEntropyStage  0 0  0
>
> MigrationStage    0 0  1
>
> MemtablePostFlusher   0 0  6
>
> StreamStage   0 0  0
>
> FlushWriter   0 0  5
>
> MiscStage 0     0  0
>
> FlushSorter   0 0  0
>
> InternalResponseStage 0 0  0
>
> HintedHandoff 0 0  0
>
>
>
> After 14 minutes (timeout exception after 2 minutes : see log file) I get :
>
>
>
> C:\apache-cassandra-07.2\bin>nodetool --host ads.nuance.com info
>
> Starting NodeTool
>
> 34035877798200531112672274220979640561
>
> Gossip active    : true
>
> Load : 10.31 MB
>
> Generation No    : 1299502115
>
> Uptime (seconds) : 2172
>
> Heap Memory (MB) : 733,82 / 1196,81
>
>
>
> C:\apache-cassandra-07.2\bin>nodetool --host ads.nuance.com tpstats
>
> Starting NodeTool
>
> Pool Name    Active   Pending  Completed
>
> ReadStage 0 0  40646
>
> RequestResponseStage  0 0 30
>
> MutationStage    32    103310  90526
>
> GossipStage   0 0  0
>
> AntiEntropyStage  0 0  0
>
> MigrationStage    0 0  1
>
> MemtablePostFlusher   0 0 69
>
> StreamStage   0 0  0
>
> FlushWriter   0 0 68
>
> FILEUTILS-DELETE-POOL 0 0 42
>
> MiscStage 0 0  0
>
> FlushSorter   0 0  0
>
> InternalResponseStage 0 0  0
>
> HintedHandoff 0 0  0
>
>
>
>


Cassandra LongType data insertion problem for secondary index usage

2011-03-10 Thread Adi
Environment: Cassandra 0.7.0 , C++ Thrift client on windows

I have a column family with a secondary index
 ColumnFamily: Page
  Columns sorted by: org.apache.cassandra.db.marshal.BytesType
  Built indexes: [Page.index_domain, Page.index_content_size]
  Column Metadata:
Column Name: domain (646f6d61696e)
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
  Index Name: index_domain
  Index Type: KEYS
Column Name: original_content_size
(6f726967696e616c5f636f6e74656e745f73697a65)
  Validation Class: org.apache.cassandra.db.marshal.LongType
  Index Name: index_content_size
  Index Type: KEYS

As suggested by thobbs in an earlier posting I am sending the
original_content_size as binary strings. I am able to write and read from
the c++ client correctly.
But on the cassandra-cli I am not able to see the values of
original_content_size as longs. following are the results seen for a value 5
that was sent.

get Page['test1234'][original_content_size];
=> (column=6f726967696e616c5f636f6e74656e745f73697a65,
value=360287970189639680, timestamp=1299773217120)

get Page['test1234'][original_content_size] as bytes;
=> (column=6f726967696e616c5f636f6e74656e745f73697a65,
value=0500, timestamp=1299773217120)

Similarly the queries do not work as expected. Example get Page where domain
= 'testabc.com' and original_content_size = 5; does not return the row that
was inserted.

Any suggestions on what I might be doing incorrectly either in schema
definition or the way I am sending the values are welcome.

-Adi


Re: mutator.execute() timings - big variance noted - pointers needed on understanding/improving it

2011-03-10 Thread sridhar basam
Sounds like GC from your description of fast->slow->fast. Collect GC times
from both the client and server side and plot against your application
timing.

 If you uncomment the verbose GC entries in the cassandra-env.sh file you
should get timing for the server side, pass in the same arguments for your
client. Align time across the 3 files and plot to see if GC is the cause.

 Sridhar



On Thu, Mar 10, 2011 at 9:30 AM, Roshan Dawrani wrote:

> Hi,
>
> I am in the middle of some load testing on a 1-node Cassandra setup. We are
> not on very high loads yet. We have recorded the timings taken up by
> mutator.execute() calls and we see this kind of variation during the test
> run:
>
> So, 25% of the times, execute() calls come back in 25 milli-seconds, but
> the longer calls go upto 4 seconds.
>
> Can someone please provide some pointers on what and where to focus on in
> my Hector / Cassandra setup? We are mostly on the default Cassandra
> configuration at this time - only change is the max connection pool size
> (CassandraHostConfigurator.maxActive) is changed to 300 from a default of
> 50.
>
> I would also like to add that the time increase is not linear - it starts
> fast, goes, slow, very slow, and becomes faster again.
>
> 
>   25% 29
>   50%105
>   66%185
>   70%208
>   75%240
>   80%297
>   90%510
>   95%854
>   98%   1075
>   99%   1215
>  100%   4442
> 
>
> --
> Roshan
> Blog: http://roshandawrani.wordpress.com/
> Twitter: @roshandawrani 
> Skype: roshandawrani
>
>


Re: problem with bootstrap

2011-03-10 Thread Patrik Modesto
Hi,

I'm stil fighting the
Exception in thread "main" java.lang.IllegalStateException:
replication factor (3) exceeds number of endpoints (2).

When I have a 2-server cluster, create Keyspace with RF 3, I'm able to
add (without auto_bootstrap) another node but cluster nodetool
commands don't work and fail with the exception above. The new node
serve data but I can't do loadbalance, decommission or move on the
cluster.

Patrik


mutator.execute() timings - big variance noted - pointers needed on understanding/improving it

2011-03-10 Thread Roshan Dawrani
Hi,

I am in the middle of some load testing on a 1-node Cassandra setup. We are
not on very high loads yet. We have recorded the timings taken up by
mutator.execute() calls and we see this kind of variation during the test
run:

So, 25% of the times, execute() calls come back in 25 milli-seconds, but the
longer calls go upto 4 seconds.

Can someone please provide some pointers on what and where to focus on in my
Hector / Cassandra setup? We are mostly on the default Cassandra
configuration at this time - only change is the max connection pool size
(CassandraHostConfigurator.maxActive) is changed to 300 from a default of
50.

I would also like to add that the time increase is not linear - it starts
fast, goes, slow, very slow, and becomes faster again.


  25% 29
  50%105
  66%185
  70%208
  75%240
  80%297
  90%510
  95%854
  98%   1075
  99%   1215
 100%   4442


-- 
Roshan
Blog: http://roshandawrani.wordpress.com/
Twitter: @roshandawrani 
Skype: roshandawrani


On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising data transfer

2011-03-10 Thread Jedd Rashbrooke
 Howdi,

 Assortment of questions relating to an upgrade combined with a
 possible migration between Data Centers (or perhaps a multi-DC
 redesign).  Apologies if some of these have been asked before - I
 have kept half an eye on the list in recent times but haven't seen
 anything covering these particular aspects.


 Upgrade path:
 We're running a 16 node cluster on Amazon EC2, in a single DC
 (US) using 0.6.6.  We didn't do the 0.6.x upgrades mostly because
 things have 'just worked' (and it took a while to get to that stage).
 My question is whether it's considered safer to upgrade via 0.6.12
 to 0.7, or if a direct 0.6.6 -> 0.7 upgrade is safe enough?


 Copying a cluster between AWS DC's:
 We have ~ 150-250GB per node, with a Replication Factor of 4.
 I ack that 0.6 -> 0.7 is necessarily STW, so in an attempt to
 minimise that outage period I was wondering if it's possible to
 drain & stop the cluster, then copy over only the 1st, 5th, 9th,
 and 13th nodes' worth of data (which should be a full copy of
 all our actual data - we are nicely partitioned, despite the
 disparity in GB per node) and have Cassandra re-populate the
 new destination 16 nodes from those four data sets.  If this is
 feasible, is it likely to be more expensive (in terms of time the
 new cluster is unresponsive as it rebuilds) than just copying
 across all 16 sets of data - about 2.7TB.


 Chattiness / gossip traffic requirements on DC-aware:
 I haven't pondered deeply on a 7 design yet, so this question is
 even more nebulous.  We're seeing growth (raw) of about 100GB
 per month on our 16 node RF4 cluster - say about 25GB of 'actual'
 data growth.  We don't delete (much) data.  Amazon's calculator
 suggests even 100GB in/out of a data center is modestly priced,
 but I'm cautious in case the replication traffic is particularly chatty
 or excessive.  And how expensive (in terms of traffic) a compaction
 or repair would be across data centers.  Has anyone had any
 experience with an EC2 cluster running 0.7 and traversing the
 pond?  Either in terms of traffic to cluster size, or $-cost to cluster
 size ratios would be fantastic.

 taa,
 Jedd.


Re: understanding tombstones

2011-03-10 Thread Sylvain Lebresne
2011/3/10 Wangpei (Peter) 

> My question:
> what the client would get, when following happens:(RF=3, N=3)
> 1, write with timestamp T and succeed in all nodes.
> 2, delete with timestamp T+1, CL=Q, and succeed in node1 and node2 but
> failed in node3.
> 3, force flush + compaction
> 4, read CL=Q
>
> Does the client will get the row and read repair will "fix" the data?
> If not, how cassandra prevent from this?
>

No, the client won't get the row unless you have set an unreasonable value
to gc_grace_seconds (like 0 in the example of the first mail of this thread,
that is *not* a value you should use). Cassandra prevents this using the
gc_grace_seconds option, see
http://wiki.apache.org/cassandra/DistributedDeletes for more details.

--
Sylvain



>
> -邮件原件-
> 发件人: Jonathan Ellis [mailto:jbel...@gmail.com]
> 发送时间: 2011年3月10日 10:19
> 收件人: user@cassandra.apache.org
> 主题: Re: understanding tombstones
>
> On Wed, Mar 9, 2011 at 4:54 PM, Jeffrey Wang  wrote:
> > insert row X with timestamp T
> > delete row X with timestamp T+1
> > force flush + compaction
> > insert row X with timestamp T
> >
> > My understanding is that the tombstone created by the delete (and row X)
> > will disappear with the flush + compaction which means the last insertion
> > should show up.
>
> Right.
>
> > I believe I have traced this to the fact that the markedForDeleteAt field
> on
> > the ColumnFamily does not get reset after a compaction (after
> > gc_grace_seconds has passed); is this desirable? I think it introduces an
> > inconsistency in how tombstoned columns work versus tombstoned CFs.
> Thanks.
>
> That does sound like a bug.  Can you create a ticket?
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


Re: understanding tombstones

2011-03-10 Thread Wangpei (Peter)
My question: 
what the client would get, when following happens:(RF=3, N=3)
1, write with timestamp T and succeed in all nodes.
2, delete with timestamp T+1, CL=Q, and succeed in node1 and node2 but failed 
in node3.
3, force flush + compaction
4, read CL=Q

Does the client will get the row and read repair will "fix" the data?
If not, how cassandra prevent from this?

-邮件原件-
发件人: Jonathan Ellis [mailto:jbel...@gmail.com] 
发送时间: 2011年3月10日 10:19
收件人: user@cassandra.apache.org
主题: Re: understanding tombstones

On Wed, Mar 9, 2011 at 4:54 PM, Jeffrey Wang  wrote:
> insert row X with timestamp T
> delete row X with timestamp T+1
> force flush + compaction
> insert row X with timestamp T
>
> My understanding is that the tombstone created by the delete (and row X)
> will disappear with the flush + compaction which means the last insertion
> should show up.

Right.

> I believe I have traced this to the fact that the markedForDeleteAt field on
> the ColumnFamily does not get reset after a compaction (after
> gc_grace_seconds has passed); is this desirable? I think it introduces an
> inconsistency in how tombstoned columns work versus tombstoned CFs. Thanks.

That does sound like a bug.  Can you create a ticket?

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com