Re: Upgrading Cassandra

2014-04-10 Thread Robert Coli
On Thu, Apr 10, 2014 at 3:52 PM, Tyler Hobbs  wrote:

>
> Given the complexity of it and the multiple places this could give you
> trouble if you're not careful, I wouldn't suggest it as a general best
> practice.
>

+1. Any plan that contains an extended period of split major version
operation is Not Supported and I therefore do not recommend it.

=Rob


Re: Multiget performance

2014-04-10 Thread Tyler Hobbs
On Thu, Apr 10, 2014 at 6:26 PM, Allan C  wrote:

>
> Looks like the amount of data returned has a big effect. When I only
> return one column, python reports only 20ms compared to 150ms when
> returning the whole row. Rows are each less than 1k in size, but there must
> be client overhead.
>

That's a surprising amount of overhead in pycassa.  What's your schema like
for this CF?


-- 
Tyler Hobbs
DataStax 


Re: Upgrading Cassandra

2014-04-10 Thread Tyler Hobbs
On Thu, Apr 10, 2014 at 4:03 PM, Alain RODRIGUEZ  wrote:

> Would you guys consider this way of upgrading as a "best practice" to
> achieve a safe major release upgrade in the cloud (where you can easily add
> clusters and remove old ones) ?


Given the complexity of it and the multiple places this could give you
trouble if you're not careful, I wouldn't suggest it as a general best
practice.  My "best practice" suggestion would be to have a pre-production
environment where you can test the upgrade, and then do a normal rolling
upgrade in production.  (I'm sure ops guys have a bag of different
preferred techniques.)


-- 
Tyler Hobbs
DataStax 


Re: How to replace cluster name without any impact?

2014-04-10 Thread Robert Coli
On Wed, Apr 9, 2014 at 10:50 PM, Mark Reddy  wrote:
>
> Please be aware that you will have two partial clusters until you complete
> your rolling restart. Also considering that the cluster name is only a
> cosmetic value my opinion would be to leave it, as the risk far outweighs
> the benefits of changing it.
>

+1, though you should of course avoid having other clusters (dev version)
with the same name, which is the case cluster name exists to protect you
from.

@OP : did you get the bogus default cluster name by installing a debian
package which auto-started? If so, your input as an operator who had a
negative consequence from this behavior is welcome at :

https://issues.apache.org/jira/browse/CASSANDRA-2356

=Rob


Re: Point in Time Recovery

2014-04-10 Thread Robert Coli
On Thu, Apr 10, 2014 at 1:19 AM, Dennis Schwan wrote:

> do you know any description how to perform a point-in-time recovery
> using the archived commitlogs?
> We have already tried several things but it just did not work.
>

Are you restoring the entire *cluster* to a point in time, or a given node?
And why?

The only people who are likely to have any experience/expertise with that
archived commitlog stuff are the people from Netflix who contributed it.

=Rob


Re: Upgrading Cassandra

2014-04-10 Thread Alain RODRIGUEZ
Thanks for this confirmation Tyler.

Would you guys consider this way of upgrading as a "best practice" to
achieve a safe major release upgrade in the cloud (where you can easily add
clusters and remove old ones) ?

I am seriously thinking about giving it a try for our upcoming 1.2 to 2.0
migration and see how things behave.

See you around,

Alain


2014-04-10 0:54 GMT+02:00 Tyler Hobbs :

>
> On Tue, Apr 8, 2014 at 4:39 AM, Alain RODRIGUEZ wrote:
>
>>
>> Yet, can't we rebuild a new DC with the current C* version, upgrade it to
>> the new major once it is fully part of the C* cluster, and then switch all
>> the clients to the new DC once we are sure everything is ok and shut down
>> the old one ?
>>
>
> Yes
>
>
>>
>> I mean, on a multiDC setup, while upgrading, there must be a moment that
>> 2 DCs haven't the same major version, this is probably supported.
>>
>
> It is supported, you just don't want to do add/remove nodes, run repairs,
> etc with a mixed cluster.
>
>
> --
> Tyler Hobbs
> DataStax 
>


Re: Commitlog questions

2014-04-10 Thread Russell Hatch
>
>  If the commitlog is in periodic mode and the fsync happens every 10
> seconds, Cassandra is storing the stuff that needs to be sync'd somewhere
> for a period of 10 seconds.  I'm talking about before it even hits any
> disk.  This has to be in memory, correct?


The information you are referring to is stored in the OS page cache[1] so
it's not part of Cassandra's memory, though I imagine Cassandra will keep a
small handle of some kind on the mutation for making the system fsync[2]
call when appropriate.

[1] http://en.wikipedia.org/wiki/Page_cache
[2] http://linux.die.net/man/2/fsync

Thanks,

Russ


On Thu, Apr 10, 2014 at 1:11 PM, Parag Patel wrote:

> Oleg,
>
> Thanks for the response.  If the commitlog is in periodic mode and the
> fsync happens every 10 seconds, Cassandra is storing the stuff that needs
> to be sync'd somewhere for a period of 10 seconds.  I'm talking about
> before it even hits any disk.  This has to be in memory, correct?
>
> Parag
>
> -Original Message-
> From: Oleg Dulin [mailto:oleg.du...@gmail.com]
> Sent: Wednesday, April 09, 2014 10:42 AM
> To: user@cassandra.apache.org
> Subject: Re: Commitlog questions
>
> Parag:
>
> To answer your questions:
>
> 1) Default is just that, a default. I wouldn't advise raising it though.
> The bigger it is the longer it takes to restart the node.
> 2) I think they juse use fsync. There is no queue. All files in cassandra
> use java.nio buffers, but they need to be fsynced periodically. Look at
> commitlog_sync parameters in cassandra.yaml file, the comments there
> explain how it works. I believe the difference between periodic and batch
> is just that -- if it is periodic, it will fsync every 10 seconds, if it is
> batch it will fsync if there were any changes within a time window.
>
> On 2014-04-09 10:06:52 +, Parag Patel said:
>
> >
> > 1)  Why is the default 4GB?  Has anyone changed this? What are
> > some aspects to consider when determining the commitlog size?
> > 2)  If the commitlog is in periodic mode, there is a property
> > to set a time interval to flush the incoming mutations to disk.
> > This implies that there is a queue inside Cassandra to hold this
> > data in memory until it is flushed.
> > a.   Is there a name for this queue?
> > b.  Is there a limit for this queue?
> > c.   Are there any tuning parameters for this queue?
> >
> > Thanks,
> > Parag
>
>
> --
> Regards,
> Oleg Dulin
> http://www.olegdulin.com
>
>
>


Re: Cassandra memory consumption

2014-04-10 Thread DuyHai Doan
"what portion of the above is in the memtable ?"  --> partition key +
clustering key + stored data + memtable data structure size (actually it is
a ConcurrentSkipListMap so I guess there is some overhead with the data
structure)

If the data has been "flushed" to disk (data directory) the memtable is
again empty...


On Thu, Apr 10, 2014 at 10:10 PM, Parag Patel wrote:

>  If I'm inserting the following :
>
>
>
> Partition key = 8 byte String
>
> Clustering key = 20 byte String
>
> Stored Data = 150 byte byte[]
>
>
>
> If the insert is still in the memtable, what portion of the above is in
> the memtable?  All of it, or just the keys?  If just the keys, where does
> the stored data live?  (keep in mind in this scenario the data has been
> been purged to the data directory.  It's only been added to the commit log).
>
>
>
> Parag
>
>
>
> *From:* DuyHai Doan [mailto:doanduy...@gmail.com]
> *Sent:* Thursday, April 10, 2014 3:35 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Cassandra memory consumption
>
>
>
> Data structures that are stored off heaps:
>
> 1) Row cache (if JNA enabled, otherwise on heap)
>
> 2) Bloom filter
>
> 3) Compression offset
>
> 4) Key Index sample
>
> On heap:
>
>  1) Memtables
>
>  2) Partition Key cache
>
> Hope that I did not forget anything
>
>  Regards
>
>  Duy Hai DOAN
>
>
>
> On Thu, Apr 10, 2014 at 9:13 PM, Parag Patel 
> wrote:
>
> We're using Cassandra 1.2.12.  What aspects of the data is stored in off
> heap memory vs heap memory?
>
>
>


RE: Cassandra memory consumption

2014-04-10 Thread Parag Patel
If I'm inserting the following :

Partition key = 8 byte String
Clustering key = 20 byte String
Stored Data = 150 byte byte[]

If the insert is still in the memtable, what portion of the above is in the 
memtable?  All of it, or just the keys?  If just the keys, where does the 
stored data live?  (keep in mind in this scenario the data has been been purged 
to the data directory.  It's only been added to the commit log).

Parag

From: DuyHai Doan [mailto:doanduy...@gmail.com]
Sent: Thursday, April 10, 2014 3:35 PM
To: user@cassandra.apache.org
Subject: Re: Cassandra memory consumption

Data structures that are stored off heaps:
1) Row cache (if JNA enabled, otherwise on heap)
2) Bloom filter
3) Compression offset
4) Key Index sample
On heap:
 1) Memtables
 2) Partition Key cache
Hope that I did not forget anything
 Regards

 Duy Hai DOAN

On Thu, Apr 10, 2014 at 9:13 PM, Parag Patel 
mailto:ppa...@clearpoolgroup.com>> wrote:
We're using Cassandra 1.2.12.  What aspects of the data is stored in off heap 
memory vs heap memory?



Re: Cassandra memory consumption

2014-04-10 Thread DuyHai Doan
Data structures that are stored off heaps:

1) Row cache (if JNA enabled, otherwise on heap)
2) Bloom filter
3) Compression offset
4) Key Index sample

On heap:

 1) Memtables
 2) Partition Key cache

Hope that I did not forget anything

 Regards

 Duy Hai DOAN


On Thu, Apr 10, 2014 at 9:13 PM, Parag Patel wrote:

>  We're using Cassandra 1.2.12.  What aspects of the data is stored in off
> heap memory vs heap memory?
>


Re: binary protocol server side sockets

2014-04-10 Thread Eric Plowe
I am having the exact same issue. I see the connections pile up and pile
up, but they never seem to come down. Any insight into this would be
amazing.


Eric Plowe


On Wed, Apr 9, 2014 at 4:17 PM, graham sanderson  wrote:

> Thanks Michael,
>
> Yup keepalive is not the default. It is possible they are going away after
> nf_conntrack_tcp_timeout_established; will have to do more digging (it is
> hard to tell how old a connection is - there are no visible timers (thru
> netstat) on an ESTABLISHED connection))...
>
> This is actually low on my priority list, I was just spending a bit of
> time trying to track down the source of
>
> ERROR [Native-Transport-Requests:3833603] 2014-04-09 17:46:48,833
> ErrorMessage.java (line 222) Unexpected exception during request
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> at
> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
> at
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
> at
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
> at
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
> at
> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
>
> errors, which are spamming our server logs quite a lot (I originally
> thought this might be caused by KEEPALIVE, which is when I realized that
> the connections weren't in keep alive and were building up) - it would be
> nice if netty would tell us which a little about the Socket channel in the
> error message (maybe there is a way to do this by changing log levels, but
> as I say I haven't had time to go digging there)
>
> I will probably file a JIRA issue to add the setting (since I can't see
> any particular harm to setting keepalive)
>
> On Apr 9, 2014, at 1:34 PM, Michael Shuler  wrote:
>
> > On 04/09/2014 12:41 PM, graham sanderson wrote:
> >> Michael, it is not that the connections are being dropped, it is that
> >> the connections are not being dropped.
> >
> > Thanks for the clarification.
> >
> >> These server side sockets are ESTABLISHED, even though the client
> >> connection on the other side of the network device is long gone. This
> >> may well be an issue with the network device (it is valiantly trying
> >> to keep the connection alive it seems).
> >
> > Have you tested if they *ever* time out on their own, or do they just
> keep sticking around forever? (maybe 432000 sec (120 hours), which is the
> default for nf_conntrack_tcp_timeout_established?) Trying out all the usage
> scenarios is really the way to track it down - directly on switch,
> behind/in front of firewall, on/off the VPN.
> >
> >> That said KEEPALIVE on the server side would not be a bad idea. At
> >> least then the OS on the server would eventually (probably after 2
> >> hours of inactivity) attempt to ping the client. At that point
> >> hopefully something interesting would happen perhaps causing an error
> >> and destroying the server side socket (note KEEPALIVE is also good
> >> for preventing idle connections from being dropped by other network
> >> devices along the way)
> >
> > Tuning net.ipv4.tcp_keepalive_* could be helpful, if you know they
> timeout after 2 hours, which is the default.
> >
> >> rpc_keepalive on the server sets keep alive on the server side
> >> sockets for thrift, and is true by default
> >>
> >> There doesn't seem to be a setting for the native protocol
> >>
> >> Note this isn't a huge issue for us, they can be cleaned up by a
> >> rolling restart, and this particular case is not production, but
> >> related to development/testing against alpha by people working
> >> remotely over VPN - and it may well be the VPNs fault in this case...
> >> that said and maybe this is a dev list question, it seems like the
> >> option to set keepalive should exist.
> >
> > Yeah, but I agree you shouldn't have to restart to clean up connections
> - that's why I think it is lower in the network stack, and that a bit of
> troubleshooting and tuning might be helpful. That setting sounds like a
> good Jira request - keepalive may be the default, I'm not sure. :)
> >
> > --
> > Michael
> >
> >> On Apr 9, 2014, at 12:25 PM, Michael Shuler 
> >> wrote:
> >>
> >>> On 04/09/2014 11:39 AM, graham sanderson wrote:
>  Thanks, but I would think that just sets keep alive from the
>  

Cassandra memory consumption

2014-04-10 Thread Parag Patel
We're using Cassandra 1.2.12.  What aspects of the data is stored in off heap 
memory vs heap memory?


RE: Commitlog questions

2014-04-10 Thread Parag Patel
Oleg,

Thanks for the response.  If the commitlog is in periodic mode and the fsync 
happens every 10 seconds, Cassandra is storing the stuff that needs to be 
sync'd somewhere for a period of 10 seconds.  I'm talking about before it even 
hits any disk.  This has to be in memory, correct?

Parag

-Original Message-
From: Oleg Dulin [mailto:oleg.du...@gmail.com] 
Sent: Wednesday, April 09, 2014 10:42 AM
To: user@cassandra.apache.org
Subject: Re: Commitlog questions

Parag:

To answer your questions:

1) Default is just that, a default. I wouldn't advise raising it though. The 
bigger it is the longer it takes to restart the node.
2) I think they juse use fsync. There is no queue. All files in cassandra use 
java.nio buffers, but they need to be fsynced periodically. Look at 
commitlog_sync parameters in cassandra.yaml file, the comments there explain 
how it works. I believe the difference between periodic and batch is just that 
-- if it is periodic, it will fsync every 10 seconds, if it is batch it will 
fsync if there were any changes within a time window.

On 2014-04-09 10:06:52 +, Parag Patel said:

>  
> 1)  Why is the default 4GB?  Has anyone changed this? What are 
> some aspects to consider when determining the commitlog size?
> 2)  If the commitlog is in periodic mode, there is a property 
> to set a time interval to flush the incoming mutations to disk.  
> This implies that there is a queue inside Cassandra to hold this 
> data in memory until it is flushed.
> a.   Is there a name for this queue?
> b.  Is there a limit for this queue?
> c.   Are there any tuning parameters for this queue?
>  
> Thanks,
> Parag


--
Regards,
Oleg Dulin
http://www.olegdulin.com




Re: Minimum database size and ops/second to start considering Cassandra

2014-04-10 Thread Tim Wintle
On Thu, 2014-04-10 at 11:17 -0700, motta.lrd wrote:
> What is the minimum database size and number of Operations/Second (reads and
> write) for which I should seriously consider this database? 

Significant number of writes / second -> possibly a good use case for
cassandra.


Database size is a difficult one: In theory if you can fit it in memory
on one machine you're better off with one machine if you can tolerate
downtime: but then you lose redundancy/replication across multiple
nodes/racks/datacentres etc.

Tim



Minimum database size and ops/second to start considering Cassandra

2014-04-10 Thread motta.lrd
Hello everyone,

What is the minimum database size and number of Operations/Second (reads and
write) for which I should seriously consider this database? 
I have recently studied the theoretical aspects of Cassandra distributions,
and my doubts are left to what is a good fit (in terms of database size and
workload) for adopting it in production.

Thank you

 



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Minimum-database-size-and-ops-second-to-start-considering-Cassandra-tp7593918.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


C* 1.2.15 Decommission issues

2014-04-10 Thread Russell Bradberry
We have about a 30 node cluster running the latest C* 1.2 series DSE.  One 
datacenter uses VNodes and the other datacenter has VNodes Disabled (because it 
is running DSE-Seearch)

We have been replacing nodes in the VNode datacenter with faster ones and we 
have yet to have a successful decommission.  Every time we attempt to 
decommission a node we get an “Operation Timed Out” error and the decommission 
fails.  We keep retrying it and sometimes it will work and other times we will 
just give up and force the node removal.  It seems though, that all the data 
has streamed out of the node before the decommission fails.

What exactly does it need to read before leaving that would cause this?  We 
also have noticed that in several nodes after the removal that there are ghost 
entries for the removed node in the system.peers table and this doesn’t get 
removed until we restart Cassandra on that node.

Also, we have noticed that running repairs with VNodes is considerably slower. 
Is this a misconfiguration? Or is it expected that VNodes repairs will be slow?


Here is the stack trace from the decommission failure:

Exception in thread "main" java.lang.RuntimeException: 
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
received only 0 responses.
        at 
org.apache.cassandra.db.HintedHandOffManager.getHintsSlice(HintedHandOffManager.java:578)
        at 
org.apache.cassandra.db.HintedHandOffManager.listEndpointsPendingHints(HintedHandOffManager.java:528)
        at 
org.apache.cassandra.service.StorageService.streamHints(StorageService.java:2925)
        at 
org.apache.cassandra.service.StorageService.unbootstrap(StorageService.java:2905)
        at 
org.apache.cassandra.service.StorageService.decommission(StorageService.java:2866)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75)
        at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279)
        at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
        at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
        at 
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
        at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
        at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
        at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
        at 
com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
        at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1487)
        at 
javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:97)
        at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1328)
        at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1420)
        at 
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:848)
        at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:322)
        at sun.rmi.transport.Transport$1.run(Transport.java:177)
        at sun.rmi.transport.Transport$1.run(Transport.java:174)
        at java.security.AccessController.doPrivileged(Native Method)
        at sun.rmi.transport.Transport.serviceCall(Transport.java:173)
        at 
sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:556)
        at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:811)
        at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:670)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.cassandra.exceptions.ReadTimeoutException: Operation 
timed out - received only 0 responses.
        at org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:105)
        at 
org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:

More node imbalance questions

2014-04-10 Thread Oleg Dulin

At a different customer, I have this situation:

10.194.2.5RAC1Up Normal  192.2 GB50.00%  0
10.194.2.4RAC1Up Normal  348.07 GB   50.00% 
127605887595351923798765477786913079295
10.194.2.7RAC1Up Normal  387.31 GB   50.00% 
85070591730234615865843651857942052864
10.194.2.6RAC1Up Normal  454.97 GB   50.00% 
42535295865117307932921825928971026432


Is my understanding correct that I should just move token arround by 
proportional amounts to bring the disk utilization in line ?


--
Regards,
Oleg Dulin
http://www.olegdulin.com




Re: Point in Time Recovery

2014-04-10 Thread Jonathan Lacefield
Hello,

  Have you tried the procedure documented here:
http://www.datastax.com/documentation/cassandra/1.2/cassandra/configuration/configLogArchive_t.html

Thanks,

Jonathan

Jonathan Lacefield
Solutions Architect, DataStax
(404) 822 3487






On Thu, Apr 10, 2014 at 1:19 AM, Dennis Schwan wrote:

> Hey there,
>
> do you know any description how to perform a point-in-time recovery
> using the archived commitlogs?
> We have already tried several things but it just did not work.
> We have a 20 Node Cluster (10 in each DC).
>
> Thanks in Advance,
> Dennis
>
> --
> Dennis Schwan
>
> Oracle DBA
> Mail Core
>
> 1&1 Internet AG | Brauerstraße 48 | 76135 Karlsruhe | Germany
> Phone: +49 721 91374-8738
> E-Mail: dennis.sch...@1und1.de | Web: www.1und1.de
>
> Hauptsitz Montabaur, Amtsgericht Montabaur, HRB 6484
>
> Vorstand: Ralph Dommermuth, Frank Einhellinger, Robert Hoffmann, Andreas
> Hofmann, Markus Huhn, Hans-Henning Kettler, Uwe Lamnek, Jan Oetjen,
> Christian Würst
> Aufsichtsratsvorsitzender: Michael Scheeren
>
> Member of United Internet
>
> Diese E-Mail kann vertrauliche und/oder gesetzlich geschützte
> Informationen enthalten. Wenn Sie nicht der bestimmungsgemäße Adressat sind
> oder diese E-Mail irrtümlich erhalten haben, unterrichten Sie bitte den
> Absender und vernichten Sie diese Email. Anderen als dem bestimmungsgemäßen
> Adressaten ist untersagt, diese E-Mail zu speichern, weiterzuleiten oder
> ihren Inhalt auf welche Weise auch immer zu verwenden.
>
> This E-Mail may contain confidential and/or privileged information. If you
> are not the intended recipient of this E-Mail, you are hereby notified that
> saving, distribution or use of the content of this E-Mail in any way is
> prohibited. If you have received this E-Mail in error, please notify the
> sender and delete the E-Mail.
>


Re: Commitlog questions

2014-04-10 Thread Panagiotis Garefalakis
The incoming mutations are written per column in a Memtable (an in memory
cache) . The default size for this table is 64MB if I can recall correctly.
For more information take a look here:
https://wiki.apache.org/cassandra/MemtableSSTable
http://wiki.apache.org/cassandra/MemtableThresholds

Regards,
Panagiotis


On Wed, Apr 9, 2014 at 8:44 PM, Robert Coli  wrote:

> On Wed, Apr 9, 2014 at 3:06 AM, Parag Patel wrote:
>
>>   
>>
>
> https://issues.apache.org/jira/browse/CASSANDRA-6764
>
> You might wish to get in contact with the reporter here, who has similar
> questions!
>
> =Rob
>
>


AssertionError as a result of a timeout

2014-04-10 Thread Ben Hood
Hi all,

This is just a follow up to say that this issue is being tracked here:

https://issues.apache.org/jira/browse/CASSANDRA-6796

I managed to work around this issue for my workload by increasing the
write timeout threshold in the server, but YMMV.

Sorry that the original list thread had an empty subject :-(

Cheers,

Ben

On Wed, Apr 9, 2014 at 11:34 AM, Ben Hood <0x6e6...@gmail.com> wrote:
> Hi all,
>
> I'm getting the following error in a 2.0.6 instance:
>
> ERROR [Native-Transport-Requests:16633] 2014-04-09 10:11:45,811
> ErrorMessage.java (line 222) Unexpected exception during request
> java.lang.AssertionError: localhost/127.0.0.1
> at org.apache.cassandra.service.StorageProxy.submitHint(StorageProxy.java:860)
> at org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:480)
> at 
> org.apache.cassandra.service.StorageProxy.mutateWithTriggers(StorageProxy.java:524)
> at 
> org.apache.cassandra.cql3.statements.BatchStatement.executeWithoutConditions(BatchStatement.java:210)
> at 
> org.apache.cassandra.cql3.statements.BatchStatement.execute(BatchStatement.java:203)
> at 
> org.apache.cassandra.cql3.statements.BatchStatement.executeWithPerStatementVariables(BatchStatement.java:192)
> at 
> org.apache.cassandra.cql3.QueryProcessor.processBatch(QueryProcessor.java:373)
> at 
> org.apache.cassandra.transport.messages.BatchMessage.execute(BatchMessage.java:206)
> at 
> org.apache.cassandra.transport.Message$Dispatcher.messageReceived(Message.java:304)
> at 
> org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:43)
> at 
> org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:67)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
>
> Looking at the source for this, it appears to be related to a timeout:
>
> // local write that time out should be handled by LocalMutationRunnable
> assert !target.equals(FBUtilities.getBroadcastAddress()) : target;
>
> Cursory testing indicates that this occurs during larger batch ingests.
>
> But the error does not appear to be propagated properly back to the
> client and it seems like this could be due to some misconfiguration.
>
> Has anybody seen something like this before?
>
> Cheers,
>
> Ben


Re: Multiget performance

2014-04-10 Thread DuyHai Doan
As far  as I understood, the multiget performance is bound to the slowest
node responding to the coordinator.

If you are fetching 100 partitions within *n* nodes, the coordinator will
issue requests to those nodes and wait until all the responses are given
back before returning the results to the client.

 Consequently if one node among *n* is under heavy load and takes longer to
respond, it will impact greatly the response time of your multiget.

Now, with the introduction of the recent rapid read protection, this
behavior might be mitigated

 Regards

 Duy Hai DOAN


On Thu, Apr 10, 2014 at 12:52 AM, Tyler Hobbs  wrote:

> Can you trace the query and paste the results?
>
>
> On Wed, Apr 9, 2014 at 11:17 AM, Allan C  wrote:
>
>> As one CQL statement:
>>
>>  SELECT * from Event WHERE key IN ([100 keys]);
>>
>> -Allan
>>
>> On April 9, 2014 at 12:52:13 AM, Daniel Chia (danc...@coursera.org)
>> wrote:
>>
>> Are you making the 100 calls in serial, or in parallel?
>>
>> Thanks,
>> Daniel
>>
>>
>> On Tue, Apr 8, 2014 at 11:22 PM, Allan C  wrote:
>>
>>>  Hi all,
>>>
>>>  I've always been told that multigets are a Cassandra anti-pattern for
>>> performance reasons. I ran a quick test tonight to prove it to myself, and,
>>> sure enough, slowness ensued. It takes about 150ms to get 100 keys for my
>>> use case. Not terrible, but at least an order of magnitude from what I need
>>> it to be.
>>>
>>>  So far, I've been able to denormalize and not have any problems. Today,
>>> I ran into a use case where denormalization introduces a huge amount of
>>> complexity to the code.
>>>
>>>  It's very tempting to cache a subset in Redis and call it a day --
>>> probably will. But, that's not a very satisfying answer. It's only about
>>> 5GB of data and it feels like I should be able to tune a Cassandra CF to be
>>> within 2x.
>>>
>>>  The workload is around 70% reads. Most of the writes are updates to
>>> existing data. Currently, it's in an LCS CF with ~30M rows. The cluster is
>>> 300GB total with 3-way replication, running across 12 fairly large boxes
>>> with 16G RAM. All on SSDs. Striped across 3 AZs in AWS (hi1.4xlarges, fwiw).
>>>
>>>
>>> Has anyone had success getting good results for this kind of workload?
>>> Or, is Cassandra just not suited for it at all and I should just use an
>>> in-memory store?
>>>
>>>  -Allan
>>>
>>
>>
>
>
> --
> Tyler Hobbs
> DataStax 
>


Point in Time Recovery

2014-04-10 Thread Dennis Schwan
Hey there,

do you know any description how to perform a point-in-time recovery 
using the archived commitlogs?
We have already tried several things but it just did not work.
We have a 20 Node Cluster (10 in each DC).

Thanks in Advance,
Dennis

-- 
Dennis Schwan

Oracle DBA
Mail Core

1&1 Internet AG | Brauerstraße 48 | 76135 Karlsruhe | Germany
Phone: +49 721 91374-8738
E-Mail: dennis.sch...@1und1.de | Web: www.1und1.de

Hauptsitz Montabaur, Amtsgericht Montabaur, HRB 6484

Vorstand: Ralph Dommermuth, Frank Einhellinger, Robert Hoffmann, Andreas 
Hofmann, Markus Huhn, Hans-Henning Kettler, Uwe Lamnek, Jan Oetjen, Christian 
Würst
Aufsichtsratsvorsitzender: Michael Scheeren

Member of United Internet

Diese E-Mail kann vertrauliche und/oder gesetzlich geschützte Informationen 
enthalten. Wenn Sie nicht der bestimmungsgemäße Adressat sind oder diese E-Mail 
irrtümlich erhalten haben, unterrichten Sie bitte den Absender und vernichten 
Sie diese Email. Anderen als dem bestimmungsgemäßen Adressaten ist untersagt, 
diese E-Mail zu speichern, weiterzuleiten oder ihren Inhalt auf welche Weise 
auch immer zu verwenden.

This E-Mail may contain confidential and/or privileged information. If you are 
not the intended recipient of this E-Mail, you are hereby notified that saving, 
distribution or use of the content of this E-Mail in any way is prohibited. If 
you have received this E-Mail in error, please notify the sender and delete the 
E-Mail.