Pagination and timeouts

2017-03-27 Thread Tom van den Berge
I have a table with some 1M rows, and I would like to get the partition key
of each row. Using the java driver (2.1.9), I'm executing the query

select distinct key from table;

The result set is paginated automatically. My C* cluster has two
datacenters, and when I run this query using consistency level LOCAL_ONE,
it starts returning results (page by page) as expected. But after some
time, it will give a ReadTimeoutException. This happens anywhere between 30
seconds and a few minutes.
The java driver's read timeout is set to 50 ms, and the cluster's
read_request_timeout_in_ms is 30 ms.

I'm wondering what is causing this timeout?

What is also not clear to me is whether the driver and server timeout apply
to a single page, or to the entire query?

Thanks,
Tom


Re: Modeling Audit Trail on Cassandra

2016-03-19 Thread Tom van den Berge
>
> Is text the most appropriate data type to store JSON that contain couple
> of dozen lines ?
>

It sure is the simplest way to store JSON.

The query requirement  is  "where executedby = ?”.
>

Since executedby is a timeuuid, I guess you don't want to query a single
record, since that would require you to know the exact timeuuid. Do you
mean that you would like to query all changes in a certain time frame, e.g.
today? In that case, you would have to group your rows in time buckets,
e.g. PRIMARY KEY ((period), auditid). Period can be a day, month, or any
other period that suits your situation. Retrieving all changes in a
specific time frame is done by retrieving all relevant periods.

Tom


Re: Unexplainably large reported partition sizes

2016-03-10 Thread Tom van den Berge
Thanks guys. I've upgraded to 2.2.5, and the problem is gone.


Tom

On Wed, Mar 9, 2016 at 10:47 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Mon, Mar 7, 2016 at 1:25 PM, Nate McCall <n...@thelastpickle.com>
> wrote:
>
>>
>>> Rob, can you remember which bug/jira this was? I have not been able to
>>> find it.
>>> I'm using 2.1.9.
>>>
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-7953
>>
>> Rob may have a different one, but I've something similar from this issue.
>> Fixed in 2.1.12.
>>
>
> Nate is correct, I was referring to CASSANDRA-7953... :)
>
> =Rob
>
>



-- 
Tom van den Berge
Lead Software Engineer

[image: Drillster]

Middenburcht 136
3452 MT Vleuten
Netherlands +31 30 755 53 30
www.drillster.com

[image: Follow us on Facebook] Follow us on Facebook
<https://www.facebook.com/Drillster>


Re: Unexplainably large reported partition sizes

2016-03-07 Thread Tom van den Berge
Hi Bryan,


> Do you use any collections on this column family? We've had issues in the
> past with unexpectedly large partitions reported on data models with
> collections, which can also generate tons of tombstones on UPDATE (
> https://issues.apache.org/jira/browse/CASSANDRA-10547)
>

 I've been bitten by this one some time ago, too. I stopped using
collections because of this. The table in question doesn't use them either.

Thanks for the suggestion anyway!
Tom


Re: Unexplainably large reported partition sizes

2016-03-07 Thread Tom van den Berge
Hi Rob,

The reason I didn't dump the table with sstable2json is that I didn't think
of it ;) I just used it, and it looks very much like the "avalanche of
tombstones" bug you are describing!

I took one of the three sstables containing the key, and it resulted in a
4.75 million-line json file, of which 4.73 million lines contain a
tombstone ("t") !
The timestamps of the tombstones I've checked were all many months old, so
obviously compaction failed to clean them up. I can also see many, many
identical tombstoned rows.

Rob, can you remember which bug/jira this was? I have not been able to find
it.
I'm using 2.1.9.

Thanks a lot for pointing me in this direction!
Tom


Re: Unexplainably large reported partition sizes

2016-03-06 Thread Tom van den Berge
No, data is hardly ever deleted from this table. The cfstats conform this.
The data is also nog reinserted.
Op 5 mrt. 2016 6:20 PM schreef "DuyHai Doan" <doanduy...@gmail.com>:

> Maybe tombstones ? Do you issue a lot of DELETE statements ? Or do you
> re-insert in the same partition with different TTL values ?
>
> On Sat, Mar 5, 2016 at 7:16 PM, Tom van den Berge <t...@drillster.com>
> wrote:
>
>> I don't think compression can be the cause of the difference, because of
>> two reasons:
>>
>> 1) The partition size I calculated myself (3 MB) is the uncompressed
>> size, and so is the reported size (2.3 GB)
>>
>> 2) The difference is simply way too big to be explained by compression,
>> even if the calculated size would have been the compressed size. The
>> compression would be 0.125% of the original, which is not realistic. In the
>> logs, I can see that the typical compression that is achieved for this
>> table is around 80% of the original.
>>
>> Tom
>>
>> On Fri, Mar 4, 2016 at 9:48 PM, Robert Coli <rc...@eventbrite.com> wrote:
>>
>>> On Fri, Mar 4, 2016 at 5:56 AM, Tom van den Berge <t...@drillster.com>
>>> wrote:
>>>
>>>>  Compacting large partition
>>>> drillster/subscriberstats:rqtPewK-1chi0JSO595u-Q (1,470,058,292 bytes)
>>>>
>>>> This means that this single partition is about 1.4GB large. This is
>>>> much larger that it can possibly be, because of two reasons:
>>>>   1) the partition has appr. 50K rows, each roughly 62 bytes = ~3 MB
>>>>   2) the entire table consumes appr. 500MB of disk space on the node
>>>> containing the partition (including snapshots)
>>>>
>>>> Furthermore, nodetool cfstats tells me this:
>>>> Space used (live): 253,928,111
>>>> Space used (total): 253,928,111
>>>> Compacted partition maximum bytes: 2,395,318,855
>>>> The space used seem to match the actual size (excl. snapshots), but the
>>>> Compacted partition maximum bytes (2,3 GB) seems to be far higher than
>>>> possible. Does anyone know how it is possible that Cassandra reports such
>>>> unlikely sizes?
>>>>
>>>
>>> Compression is enabled by default, and compaction reports the
>>> uncompressed size.
>>>
>>> =Rob
>>>
>>>
>>
>>
>>
>> --
>> Tom van den Berge
>> Lead Software Engineer
>>
>> [image: Drillster]
>>
>> Middenburcht 136
>> 3452 MT Vleuten
>> Netherlands +31 30 755 53 30
>> www.drillster.com
>>
>> [image: Follow us on Facebook] Follow us on Facebook
>> <https://www.facebook.com/Drillster>
>>
>
>


Re: Unexplainably large reported partition sizes

2016-03-05 Thread Tom van den Berge
I don't think compression can be the cause of the difference, because of
two reasons:

1) The partition size I calculated myself (3 MB) is the uncompressed size,
and so is the reported size (2.3 GB)

2) The difference is simply way too big to be explained by compression,
even if the calculated size would have been the compressed size. The
compression would be 0.125% of the original, which is not realistic. In the
logs, I can see that the typical compression that is achieved for this
table is around 80% of the original.

Tom

On Fri, Mar 4, 2016 at 9:48 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Fri, Mar 4, 2016 at 5:56 AM, Tom van den Berge <t...@drillster.com>
> wrote:
>
>>  Compacting large partition
>> drillster/subscriberstats:rqtPewK-1chi0JSO595u-Q (1,470,058,292 bytes)
>>
>> This means that this single partition is about 1.4GB large. This is much
>> larger that it can possibly be, because of two reasons:
>>   1) the partition has appr. 50K rows, each roughly 62 bytes = ~3 MB
>>   2) the entire table consumes appr. 500MB of disk space on the node
>> containing the partition (including snapshots)
>>
>> Furthermore, nodetool cfstats tells me this:
>> Space used (live): 253,928,111
>> Space used (total): 253,928,111
>> Compacted partition maximum bytes: 2,395,318,855
>> The space used seem to match the actual size (excl. snapshots), but the
>> Compacted partition maximum bytes (2,3 GB) seems to be far higher than
>> possible. Does anyone know how it is possible that Cassandra reports such
>> unlikely sizes?
>>
>
> Compression is enabled by default, and compaction reports the uncompressed
> size.
>
> =Rob
>
>



-- 
Tom van den Berge
Lead Software Engineer

[image: Drillster]

Middenburcht 136
3452 MT Vleuten
Netherlands +31 30 755 53 30
www.drillster.com

[image: Follow us on Facebook] Follow us on Facebook
<https://www.facebook.com/Drillster>


Unexplainably large reported partition sizes

2016-03-04 Thread Tom van den Berge
Hi,

I'm seeing warnings in my logs about compacting large partitions, e.g.:

 Compacting large partition
drillster/subscriberstats:rqtPewK-1chi0JSO595u-Q (1,470,058,292 bytes)

This means that this single partition is about 1.4GB large. This is much
larger that it can possibly be, because of two reasons:
  1) the partition has appr. 50K rows, each roughly 62 bytes = ~3 MB
  2) the entire table consumes appr. 500MB of disk space on the node
containing the partition (including snapshots)

Furthermore, nodetool cfstats tells me this:
Space used (live): 253,928,111
Space used (total): 253,928,111
Compacted partition maximum bytes: 2,395,318,855
The space used seem to match the actual size (excl. snapshots), but the
Compacted partition maximum bytes (2,3 GB) seems to be far higher than
possible. Does anyone know how it is possible that Cassandra reports such
unlikely sizes?

>From time to time, I'm noticing relatively bad latencies when such
partitions are (fully) queried. So I'm not fully convinced that the actual
partition size is not in the order of 1 or 2 GB. Does anyone have an
explanation for these discrepancies?

Thanks,
Tom


Re: Removed node is not completely removed

2015-10-15 Thread Tom van den Berge
Thanks Sebastian, a restart solved the problem!


On Wed, Oct 14, 2015 at 3:46 PM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> We still keep endpoints in memory. Not sure how you git to this state but
> try a rolling restart.
> On Oct 14, 2015 9:43 AM, "Tom van den Berge" <tom.vandenbe...@gmail.com>
> wrote:
>
>> Thanks for that Michael, I did not know that. However, the node is not
>> listed in the system.peers table on any node, so it seems that the problem
>> is not in this table.
>>
>>
>>
>> On Wed, Oct 14, 2015 at 3:30 PM, Laing, Michael <
>> michael.la...@nytimes.com> wrote:
>>
>>> Remember that the system keyspace uses LocalStrategy: each node has its
>>> own set of system tables. -ml
>>>
>>> On Wed, Oct 14, 2015 at 9:17 AM, Tom van den Berge <
>>> tom.vandenbe...@gmail.com> wrote:
>>>
>>>> Hi Carlos,
>>>>
>>>> I'm using 2.1.6. The mysterious node is not in the peers table. Any
>>>> other ideas?
>>>> One of my existing nodes is not present in the system.peers table,
>>>> though. Should I be worried?
>>>>
>>>> Regards,
>>>> Tom
>>>>
>>>> On Wed, Oct 14, 2015 at 2:27 PM, Carlos Rolo <r...@pythian.com> wrote:
>>>>
>>>>> Check system.peers table to see if the IP is still there. If so edit
>>>>> the table and remove the offending IP.
>>>>>
>>>>> You are probably running into this:
>>>>> https://issues.apache.org/jira/browse/CASSANDRA-6053
>>>>>
>>>>> Regards,
>>>>>
>>>>> Carlos Juzarte Rolo
>>>>> Cassandra Consultant
>>>>>
>>>>> Pythian - Love your data
>>>>>
>>>>> rolo@pythian | Twitter: cjrolo | Linkedin: 
>>>>> *linkedin.com/in/carlosjuzarterolo
>>>>> <http://linkedin.com/in/carlosjuzarterolo>*
>>>>> Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
>>>>> www.pythian.com
>>>>>
>>>>> On Wed, Oct 14, 2015 at 12:26 PM, Tom van den Berge <
>>>>> tom.vandenbe...@gmail.com> wrote:
>>>>>
>>>>>> I have removed a node with nodetool removenode, which completed ok.
>>>>>> Nodetool status does not list the node anymore.
>>>>>>
>>>>>> But since then, Im seeing messages in my other nodes log files
>>>>>> referring to the removed node:
>>>>>>
>>>>>>  INFO [GossipStage:38] 2015-10-14 11:18:26,322 Gossiper.java (line
>>>>>> 968) InetAddress /10.68.56.200 is now DOWN
>>>>>>  INFO [GossipStage:38] 2015-10-14 11:18:26,324 StorageService.java
>>>>>> (line 1891) Removing tokens [85070591730234615865843651857942052863] for 
>>>>>> /
>>>>>> 10.68.56.200
>>>>>>
>>>>>>
>>>>>> These two messages appear every minute.
>>>>>> I've tried nodetool removenode again (Host ID not found) and
>>>>>> removenode force (no token removals in process).
>>>>>> The jmx unsafeAssassinateEndpoint gives a NullPointerException.
>>>>>>
>>>>>> What can I do to remove the node entirely?
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>


Re: Removed node is not completely removed

2015-10-14 Thread Tom van den Berge
Thanks for that Michael, I did not know that. However, the node is not
listed in the system.peers table on any node, so it seems that the problem
is not in this table.



On Wed, Oct 14, 2015 at 3:30 PM, Laing, Michael <michael.la...@nytimes.com>
wrote:

> Remember that the system keyspace uses LocalStrategy: each node has its
> own set of system tables. -ml
>
> On Wed, Oct 14, 2015 at 9:17 AM, Tom van den Berge <
> tom.vandenbe...@gmail.com> wrote:
>
>> Hi Carlos,
>>
>> I'm using 2.1.6. The mysterious node is not in the peers table. Any other
>> ideas?
>> One of my existing nodes is not present in the system.peers table,
>> though. Should I be worried?
>>
>> Regards,
>> Tom
>>
>> On Wed, Oct 14, 2015 at 2:27 PM, Carlos Rolo <r...@pythian.com> wrote:
>>
>>> Check system.peers table to see if the IP is still there. If so edit the
>>> table and remove the offending IP.
>>>
>>> You are probably running into this:
>>> https://issues.apache.org/jira/browse/CASSANDRA-6053
>>>
>>> Regards,
>>>
>>> Carlos Juzarte Rolo
>>> Cassandra Consultant
>>>
>>> Pythian - Love your data
>>>
>>> rolo@pythian | Twitter: cjrolo | Linkedin: 
>>> *linkedin.com/in/carlosjuzarterolo
>>> <http://linkedin.com/in/carlosjuzarterolo>*
>>> Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
>>> www.pythian.com
>>>
>>> On Wed, Oct 14, 2015 at 12:26 PM, Tom van den Berge <
>>> tom.vandenbe...@gmail.com> wrote:
>>>
>>>> I have removed a node with nodetool removenode, which completed ok.
>>>> Nodetool status does not list the node anymore.
>>>>
>>>> But since then, Im seeing messages in my other nodes log files
>>>> referring to the removed node:
>>>>
>>>>  INFO [GossipStage:38] 2015-10-14 11:18:26,322 Gossiper.java (line 968)
>>>> InetAddress /10.68.56.200 is now DOWN
>>>>  INFO [GossipStage:38] 2015-10-14 11:18:26,324 StorageService.java
>>>> (line 1891) Removing tokens [85070591730234615865843651857942052863] for /
>>>> 10.68.56.200
>>>>
>>>>
>>>> These two messages appear every minute.
>>>> I've tried nodetool removenode again (Host ID not found) and removenode
>>>> force (no token removals in process).
>>>> The jmx unsafeAssassinateEndpoint gives a NullPointerException.
>>>>
>>>> What can I do to remove the node entirely?
>>>>
>>>>
>>>>
>>>
>>> --
>>>
>>>
>>>
>>>
>>
>


Re: Removed node is not completely removed

2015-10-14 Thread Tom van den Berge
Hi Carlos,

I'm using 2.1.6. The mysterious node is not in the peers table. Any other
ideas?
One of my existing nodes is not present in the system.peers table, though.
Should I be worried?

Regards,
Tom

On Wed, Oct 14, 2015 at 2:27 PM, Carlos Rolo <r...@pythian.com> wrote:

> Check system.peers table to see if the IP is still there. If so edit the
> table and remove the offending IP.
>
> You are probably running into this:
> https://issues.apache.org/jira/browse/CASSANDRA-6053
>
> Regards,
>
> Carlos Juzarte Rolo
> Cassandra Consultant
>
> Pythian - Love your data
>
> rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
> <http://linkedin.com/in/carlosjuzarterolo>*
> Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
> www.pythian.com
>
> On Wed, Oct 14, 2015 at 12:26 PM, Tom van den Berge <
> tom.vandenbe...@gmail.com> wrote:
>
>> I have removed a node with nodetool removenode, which completed ok.
>> Nodetool status does not list the node anymore.
>>
>> But since then, Im seeing messages in my other nodes log files referring
>> to the removed node:
>>
>>  INFO [GossipStage:38] 2015-10-14 11:18:26,322 Gossiper.java (line 968)
>> InetAddress /10.68.56.200 is now DOWN
>>  INFO [GossipStage:38] 2015-10-14 11:18:26,324 StorageService.java (line
>> 1891) Removing tokens [85070591730234615865843651857942052863] for /
>> 10.68.56.200
>>
>>
>> These two messages appear every minute.
>> I've tried nodetool removenode again (Host ID not found) and removenode
>> force (no token removals in process).
>> The jmx unsafeAssassinateEndpoint gives a NullPointerException.
>>
>> What can I do to remove the node entirely?
>>
>>
>>
>
> --
>
>
>
>


Re: Do vnodes need more memory?

2015-09-24 Thread Tom van den Berge
On Thu, Sep 24, 2015 at 12:45 AM, Robert Coli <rc...@eventbrite.com> wrote:

> On Wed, Sep 23, 2015 at 7:09 AM, Tom van den Berge <
> tom.vandenbe...@gmail.com> wrote:
>
>> So it seems that Cassandra simply doesn't have enough memory. I'm trying
>> to understand if this can be caused by the use of vnodes? Is there an
>> sensible reason why vnodes would consume more memory than regular nodes? Or
>> does any of you have the same experience? If not, I might be barking up the
>> wrong tree here, and I would love to know it before upgrading my servers
>> with more memory.
>>
>
> Yes, range ownership has a RAM/heap cost per-range-owned. This cost is
> paid during many, but not all, operations. Owning 256 ranges > Owning 1
> range.
>
> I have not had the same experience but am not at all surprised to hear
> that vnodes increase heap consumption for otherwise identical
> configurations. I am surprised to hear that it makes a significant
> difference in GC time, but you might have been close enough to heap
> saturation that vnodes tip you over.
>

That's apparently exactly what's going on. We've just increased the memory
from 8 to 16 GB, and all is fine now. This seems to confirm that using
vnodes indeed increase heap consumption significantly. I think it would be
great if this could be advertised in the documentation, as a warning. From
the current documentation, it seems that vnodes don't come at any cost.

What's also interesting is this: Before increasing the memory we have been
changing our code not to use our secondary indexes anymore. We still had a
number of those, and we were suspecting them to be the cause of the
increased heap consumption. It did not eliminate the problem, but it
definitely helped to bring the GC times dramatically. I already knew that
secondary indexes are best not to use, but it seems that using them in
combination with vnodes makes it far worse.


> As an aside, one is likely to win very little net win from vnodes if one's
> cluster is not now and will never be more than approximately 15 nodes.
>

That's a very interesting observation. Especially since vnodes are enabled
by default for some time now, and apparently it has a (heap) price. And my
guess is that a significant percentage of all clusters will never exceed 15
nodes.

Thx,
Tom


Re: Do vnodes need more memory?

2015-09-23 Thread Tom van den Berge
nodetool gcstat tells me this (the Total GC Elapsed is half or more of the
Interval).

We had to take the production load off the new vnode DC, since it was
messing things up badly. It means I'm not able to run any tools against it
at the moment.
The env.sh is default, and the servers have 8G ram.

It would be great if you could respond to my initial question though.
Thanks,
Tom

On Wed, Sep 23, 2015 at 4:14 PM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> This is interesting, where are you seeing that you're collecting 50% of
> the time? Is your env.sh the default? How much ram?
>
> Also, can you run this tool and send a minute worth of thread info:
>
> wget
> https://bintray.com/artifact/download/aragozin/generic/sjk-plus-0.3.6.jar
> java -jar sjk-plus-0.3.6.jar ttop -s localhost:7199 -n 30 -o CPU
> On Sep 23, 2015 7:09 AM, "Tom van den Berge" <tom.vandenbe...@gmail.com>
> wrote:
>
>> I have two data centers, each with the same number of nodes, same
>> hardware (CPUs, memory), Cassandra version (2.1.6), replication factory,
>> etc. The only difference it that one data center uses vnodes, and the other
>> doesn't.
>>
>> The non-vnode DC works fine (and has been for a long time) under
>> production load: I'm seeing normal CPU and IO load and garbage collection
>> figures. But the vnode DC is struggling very hard under the same load. It
>> has been set up recently. The CPU load is very high, due to excessive
>> garbage collection (>50% of the time is spent collecting).
>>
>> So it seems that Cassandra simply doesn't have enough memory. I'm trying
>> to understand if this can be caused by the use of vnodes? Is there an
>> sensible reason why vnodes would consume more memory than regular nodes? Or
>> does any of you have the same experience? If not, I might be barking up the
>> wrong tree here, and I would love to know it before upgrading my servers
>> with more memory.
>>
>> Thanks,
>> Tom
>>
>


Do vnodes need more memory?

2015-09-23 Thread Tom van den Berge
I have two data centers, each with the same number of nodes, same hardware
(CPUs, memory), Cassandra version (2.1.6), replication factory, etc. The
only difference it that one data center uses vnodes, and the other doesn't.

The non-vnode DC works fine (and has been for a long time) under production
load: I'm seeing normal CPU and IO load and garbage collection figures. But
the vnode DC is struggling very hard under the same load. It has been set
up recently. The CPU load is very high, due to excessive garbage collection
(>50% of the time is spent collecting).

So it seems that Cassandra simply doesn't have enough memory. I'm trying to
understand if this can be caused by the use of vnodes? Is there an sensible
reason why vnodes would consume more memory than regular nodes? Or does any
of you have the same experience? If not, I might be barking up the wrong
tree here, and I would love to know it before upgrading my servers with
more memory.

Thanks,
Tom


Secondary index is causing high CPU load

2015-09-15 Thread Tom van den Berge
Read queries on a secondary index are somehow causing an excessively high
CPU load on all nodes in my DC.

The table has some 60K records, and the cardinality of the index is very
low (~10 distinct values). The returned result set typically contains
10-30K records.
The same queries on nodes in another DC are working fine. The nodes with
the high CPU are in a newly set up DC (see my previous message below). The
hardware in both DCs is the same, as well as the C* version (2.1.6). The
only difference in the C* setup is that the new DC is using vnodes (256),
while the old DC is not. Both DCs have 4 nodes, and RF=2.

I've rebuilt the index, but that didn't help.

It looks a bit like CASSANDRA-8530
<https://issues.apache.org/jira/browse/CASSANDRA-8530> (unresolved).

What really surprised me is that executing a single query on this secondary
index makes the "Local read count" in the cfstats for the index go up with
almost 20! When doing the same query on one of my "good" nodes, it only
increases with a small number, as I would expect.

Could it be that the use of vnodes is causing these problems?

Regards,
Tom



On Mon, Sep 14, 2015 at 8:09 PM, Tom van den Berge <
tom.vandenbe...@gmail.com> wrote:

> I have a DC of 4 nodes that must be expanded to accommodate an expected
> growth in data. Since the DC is not using vnodes, we have decided to set up
> a new DC with vnodes enabled, start using the new DC, and decommission the
> old DC.
>
> Both DCs have 4 nodes. The idea is to add additional nodes to the new DC
> later on.
> The servers in both DCs are very similar: quad-core machines with 8GB.
>
> We have bootstrapped/rebuilt the nodes in the new DC. When that finished,
> the nodes in the new DC were showing little CPU activity, as you would
> expect, because they are receiving writes from the other DC. So far, so
> good.
>
> Then we switched the clients from the old DC to the new DC. The CPU load
> on all nodes in the new DC immediately rose to excessively high levels (15
> - 25), which made the servers effectively unavailable. The load did not
> drop structurally within 20 minutes, so we had to switch the clients back
> to the old DC. Then the load dropped again.
>
> What can be the reason for the high CPU loads on the new nodes?
>
> Performance test shows that the servers in the new DC perform slightly
> better (both IO and CPU) than the servers in the old DC.
> I did not see anything abnormal in the Cassandra logs, like garbage
> collection warnings. I also did not see any strange things in the tpstats.
> The only difference I'm aware of between the old and new DC is the use of
> vnodes.
>
> Any help is appreciated!
> Thanks,
> Tom
>


Extremely high CPU load in new data center

2015-09-14 Thread Tom van den Berge
I have a DC of 4 nodes that must be expanded to accommodate an expected
growth in data. Since the DC is not using vnodes, we have decided to set up
a new DC with vnodes enabled, start using the new DC, and decommission the
old DC.

Both DCs have 4 nodes. The idea is to add additional nodes to the new DC
later on.
The servers in both DCs are very similar: quad-core machines with 8GB.

We have bootstrapped/rebuilt the nodes in the new DC. When that finished,
the nodes in the new DC were showing little CPU activity, as you would
expect, because they are receiving writes from the other DC. So far, so
good.

Then we switched the clients from the old DC to the new DC. The CPU load on
all nodes in the new DC immediately rose to excessively high levels (15 -
25), which made the servers effectively unavailable. The load did not drop
structurally within 20 minutes, so we had to switch the clients back to the
old DC. Then the load dropped again.

What can be the reason for the high CPU loads on the new nodes?

Performance test shows that the servers in the new DC perform slightly
better (both IO and CPU) than the servers in the old DC.
I did not see anything abnormal in the Cassandra logs, like garbage
collection warnings. I also did not see any strange things in the tpstats.
The only difference I'm aware of between the old and new DC is the use of
vnodes.

Any help is appreciated!
Thanks,
Tom


Re: Is it possible to bootstrap the 1st node of a new DC?

2015-09-09 Thread Tom van den Berge
>
>
> I've learned from experience that the node immediately joins the cluster,
>> and starts accepting reads (from other DCs) for the range it owns.
>
>
> This seems to be the incorrect assumption at the heart of the confusion.
> You "should" be able to prevent this behavior entirely via correct use of
> ConsistencyLevel and client configuration.
>

That is correct, but I just learned that  CASSANDRA-9753
 is (in my situation)
causing problems by incorrectly sending reads to the new DC. A work around
for this bug is to set speculative_retry to 'NONE' for all involved tables.
This seems to solve the issue for me.


Re: How to prevent queries being routed to new DC?

2015-09-08 Thread Tom van den Berge
Hi Anuj,

That could indeed explain reads on my new DC. However, what I'm seeing in
my client application is that every now and then, a read query does not
produce any result, while I'm sure that it should. If I understand the read
repair process correctly, it will never cause a read query fail to find a
replica, right?



On Tue, Sep 8, 2015 at 4:40 AM, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote:

> Hi Tom,
>
> While reading data ( even at CL LOCAL_QUORUM), if data in different nodes
> required to meet CL in your local cluster doesnt match, data will be read
> from remote dc for read repair if read_repair_chance is not 0.
>
> Imp points:
> 1.If you are reading and writing at local_quorum you can set
> read_repair_chance to 0 to prevent cross dc read repair.
> 2. For enabling dc local read repairs you can use
> dclocal_read_repair_chance and set read_repair_chance to 0.
> 3. If you are experiencing frequent requests being routed due to digest
> mismatch you may need to investigate mutation drops in your cluster using
> tpstats.
>
> Refer to similar issue raised by us :
> https://issues.apache.org/jira/browse/CASSANDRA-8479
>
> Thanks
> Anuj
>
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
> --
> *From*:"Tom van den Berge" <t...@drillster.com>
> *Date*:Tue, 8 Sep, 2015 at 1:31 am
> *Subject*:Re: How to prevent queries being routed to new DC?
>
> NetworkTopologyStrategy
>
> On Mon, Sep 7, 2015 at 4:39 PM, Ryan Svihla <r...@foundev.pro> wrote:
>
>> What's your keyspace replication strategy?
>>
>> On Thu, Sep 3, 2015 at 3:16 PM Tom van den Berge <
>> tom.vandenbe...@gmail.com> wrote:
>>
>>> Thanks for your help so far!
>>>
>>> I have some problems trying to understand the jira mentioned by Rob :(
>>>
>>> I'm currently trying to set up the first node in the new DC with
>>> auto_bootstrap = true. The node then becomes visible with status "joining",
>>> which (hopefully) prevents other DCs from sending queries to it.
>>>
>>> Do you think this will work?
>>>
>>>
>>>
>>> On Thu, Sep 3, 2015 at 9:46 PM, Robert Coli <rc...@eventbrite.com>
>>> wrote:
>>>
>>>> On Thu, Sep 3, 2015 at 12:25 PM, Bryan Cheng <br...@blockcypher.com>
>>>> wrote:
>>>>
>>>>> I'd recommend you enable tracing and do a few queries in a controlled
>>>>> environment to verify that queries are being routed to your new nodes.
>>>>> Provided you have followed the procedure outlined above (specifically, 
>>>>> have
>>>>> set auto_bootstrap to false in your new cluster), rebuild has not been 
>>>>> run,
>>>>> the application is not connecting to the new cluster, and all your queries
>>>>> are run at LOCAL_* quorum levels, I do not believe those queries should be
>>>>> routed to the new dc.
>>>>>
>>>>
>>>> Other than CASSANDRA-9753, this is true.
>>>>
>>>> https://issues.apache.org/jira/browse/CASSANDRA-9753 (Unresolved; ):
>>>> "LOCAL_QUORUM reads can block cross-DC if there is a digest mismatch"
>>>>
>>>> =Rob
>>>>
>>>>
>>> --
>> Regards,
>>
>> Ryan Svihla
>
>
>
>
> --
> Tom van den Berge
> Lead Software Engineer
>  [image: Drillster] Middenburcht 136
> 3452 MT Vleuten
> Netherlands+31 30 755 53 30
> www.drillster.com [image: Follow us on Facebook] Follow us on Facebook
> <https://www.facebook.com/Drillster>
>


Re: Is it possible to bootstrap the 1st node of a new DC?

2015-09-08 Thread Tom van den Berge
>
> "one drawback: the node joins the cluster as soon as the bootstrapping
> begins."
> I am not sure I understand this correctly. It will get tokens, but not
> load data if you combine it with autobootstrap=false.
>
Joining the cluster means that all other nodes become aware of the new
node, and therefore it might receive reads. And yes, it will not have any
data, because auto_bootstrap=false.



> How I see it: You should be able to start all the new nodes in the new DC
> with autobootstrap=false and survey-mode=true. Then you should have a new
> DC with nodes that have tokens but no data. Then you can start rebuild on
> all new nodes. During this process, the new nodes should get writes, but
> not serve reads.
>
Maybe you're right.


>
> "It turns out that join_ring=false in this scenario does not solve this
> problem"
> I also don't see how joing_ring would help here. (Actually I have no clue
> where you would ever need that option)
>
The idea of join_ring=false is that other nodes are not aware of the new
node, and therefore never send requests to it. The new node can then be
repaired (see https://issues.apache.org/jira/browse/CASSANDRA-6961). To set
up a new DC, I was hoping that you could also rebuild (instead of a repair)
a new node while join_ring=false, but that seems not to work.

>
>
> "Currently I'm trying to auto_bootstrap my new DC. The good thing is that
> it doesn't accept reads from other DCs."
> The joining-state actually works perfectly. The joining state is a state
> where node take writes, but not serve ready. It would be really cool if you
> could boot a node into the joining state. Actually, write_survey should
> basically be the same.
>
It would be great if you could select the DC from where it's bootstrapped,
similar to nodetool rebuild. I'm currently bootstrapping a node in
San-Jose. It decides to stream all data from another DC in Amsterdam, while
we also have another DC in San-Jose, right next to it. Streaming data
across the Atlantic takes a lot more time :(



>
> kind regards,
> Christian
>
> PS: I would love to see the results, if you perform any tests on the
> write-survey. Please share it here on the mailing list :-)
>
>
>
> On Mon, Sep 7, 2015 at 11:10 PM, Tom van den Berge <
> tom.vandenbe...@gmail.com> wrote:
>
>> Hi Christian,
>>
>> No, I never tried survey mode. I didn't know it until now, but form the
>> info I was able to find it looks like it is meant for a different purpose.
>> Maybe it can be used to bootstrap a new DC, though.
>>
>> On the other hand, the auto_bootstrap=false + rebuild scenario seems to
>> be designed to do exactly what I need, except that it has one drawback: the
>> node joins the cluster as soon as the bootstrapping begins.
>>
>> It turns out that join_ring=false in this scenario does not solve this
>> problem, since nodetool rebuild does not do anything if C* is started with
>> this option.
>>
>> A workaround could be to ensure that only LOCAL_* CL is used by all
>> clients, but even then I'm seeing failed queries, because they're
>> mysteriously routed to the new DC every now and then.
>>
>> Currently I'm trying to auto_bootstrap my new DC. The good thing is that
>> it doesn't accept reads from other DCs. The bad thing is that a) I can't
>> choose where it streams its data from, and b) the two nodes I've been
>> trying to bootstrap crashed when they were almost finished...
>>
>>
>>
>> On Mon, Sep 7, 2015 at 10:22 PM, horschi <hors...@gmail.com> wrote:
>>
>>> Hi Tom,
>>>
>>> this sounds very much like my thread: "auto_bootstrap=false broken?"
>>>
>>> Did you try booting the new node with survey-mode? I wanted to try this,
>>> but I am waiting for 2.0.17 to come out (survey mode is broken in earlier
>>> versions). Imho survey mode is what you (and me too) want: start a node,
>>> accepting writes, but not serving reads. I have not tested it yet, but I
>>> think it should work.
>>>
>>> Also the manual join mentioned in CASSANDRA-9667 sounds very interesting.
>>>
>>> kind regards,
>>> Christian
>>>
>>> On Mon, Sep 7, 2015 at 10:11 PM, Tom van den Berge <t...@drillster.com>
>>> wrote:
>>>
>>>> Running nodetool rebuild on a node that was started with
>>>> join_ring=false does not work, unfortunately. The nodetool command returns
>>>> immediately, after a message appears in the log that the streaming of data
>>>> has started. After that, nothing happens.
>>>>
>>>> Tom
>>>>

Re: Trace evidence for LOCAL_QUORUM ending up in remote DC

2015-09-08 Thread Tom van den Berge
Just to be sure: can this bug result in a 0-row result while it should be >
0 ?
Op 8 sep. 2015 6:29 PM schreef "Tyler Hobbs" <ty...@datastax.com>:

> See https://issues.apache.org/jira/browse/CASSANDRA-9753
>
> On Tue, Sep 8, 2015 at 10:22 AM, Tom van den Berge <
> tom.vandenbe...@gmail.com> wrote:
>
>> I've been bugging you a few times, but now I've got trace data for a
>> query with LOCAL_QUORUM that is being sent to a remove data center.
>>
>> The setup is as follows:
>> NetworkTopologyStrategy: {"DC1":"1","DC2":"2"}
>> Both DC1 and DC2 have 2 nodes.
>> In DC2, one node is currently being rebuilt, and therefore does not
>> contain all data (yet).
>>
>> The client app connects to a node in DC1, and sends a SELECT query with
>> CL LOCAL_QUORUM, which in this case means ((1/2)+1=1.
>> If all is ok, the query always produces a result, because the requested
>> rows are guaranteed to be available in DC1.
>>
>> However, the query sometimes produces no result. I've been able to record
>> the traces of these queries, and it turns out that the coordinator node in
>> DC1 sometimes sends the query to DC2, to the node that is being rebuilt,
>> and does not have the requested rows. I've included an example trace below.
>>
>> The coordinator node is 10.55.156.67, which is in DC1. The 10.88.4.194 node
>> is in DC2.
>> I've verified that the  CL=LOCAL_QUORUM by printing it when the query is
>> sent (I'm using the datastax java driver).
>>
>>  activity
>>| source   | source_elapsed | thread
>>
>> ---+--++-
>>Message received from /
>> 10.55.156.67 |  10.88.4.194 | 48 |
>> MessagingService-Incoming-/10.55.156.67
>>  Executing single-partition query on
>> aggregate |  10.88.4.194 |286 |
>> SharedPool-Worker-2
>>   Acquiring sstable
>> references |  10.88.4.194 |306 |
>> SharedPool-Worker-2
>>Merging memtable
>> tombstones |  10.88.4.194 |321 |
>> SharedPool-Worker-2
>> Partition index lookup allows skipping sstable
>> 107 |  10.88.4.194 |458 |
>> SharedPool-Worker-2
>> Bloom filter allows skipping sstable
>> 1 |  10.88.4.194 |489 | SharedPool-Worker-2
>>  Skipped 0/2 non-slice-intersecting sstables, included 0 due to
>> tombstones |  10.88.4.194 |496 |
>> SharedPool-Worker-2
>> Merging data from memtables and 0
>> sstables |  10.88.4.194 |500 |
>> SharedPool-Worker-2
>>  Read 0 live and 0 tombstone
>> cells |  10.88.4.194 |513 |
>> SharedPool-Worker-2
>>Enqueuing response to /
>> 10.55.156.67 |  10.88.4.194 |613 |
>> SharedPool-Worker-2
>>   Sending message to /
>> 10.55.156.67 |  10.88.4.194 |672 |
>> MessagingService-Outgoing-/10.55.156.67
>> Parsing SELECT * FROM Aggregate WHERE type=? AND
>> typeId=?; | 10.55.156.67 | 10 |
>> SharedPool-Worker-4
>>Sending message to /
>> 10.88.4.194 | 10.55.156.67 |   4335 |
>>  MessagingService-Outgoing-/10.88.4.194
>> Message received from /
>> 10.88.4.194 | 10.55.156.67 |   6328 |
>>  MessagingService-Incoming-/10.88.4.194
>>Seeking to partition beginning in data
>> file | 10.55.156.67 |  10417 |
>> SharedPool-Worker-3
>>  Key cache hit for sstable
>> 389 | 10.55.156.67 |  10586 |
>> SharedPool-Worker-3
>>
>> My question is: how is it possible that the query is sent to a node in
>> DC2?
>> Since DC1 has 2 nodes and RF 1, the query should always be sent to the
>> other node in DC1 if the coordinator does not have a replica, right?
>>
>> Thanks,
>> Tom
>>
>>
>>
>>
>>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>


Re: Is it possible to bootstrap the 1st node of a new DC?

2015-09-08 Thread Tom van den Berge
> Running nodetool rebuild on a node that was started with join_ring=false
>> does not work, unfortunately. The nodetool command returns immediately,
>> after a message appears in the log that the streaming of data has started.
>> After that, nothing happens.
>
>
> Per driftx, the author of CASSANDRA-6961, this sounds like a bug. If you
> can repro, please file a JIRA and let the list know the URL.
>

I just filed https://issues.apache.org/jira/browse/CASSANDRA-10287.

(I wan't convinced that join_ring is supposed to work in conjunction with
nodetool rebuild, since CASSANDRA-6961 only speaks of repair.)


Re: Trace evidence for LOCAL_QUORUM ending up in remote DC

2015-09-08 Thread Tom van den Berge
Nate,

I've disabled it, and it's been running for about an hour now without
problems, while before, the problem occurred roughly every few minutes. I
guess it's safe to say that this proves that CASSANDRA-9753
 is the cause of the
problem.

I'm very happy to finally know the cause of this problem! Thanks for
pointing me in the right direction.
Tom

On Tue, Sep 8, 2015 at 9:13 PM, Nate McCall  wrote:

> Just to be sure: can this bug result in a 0-row result while it should be
>> > 0 ?
>>
> Per Tyler's reference to CASSANDRA-9753
> , you would see
> this if the read was routed by speculative retry to the nodes that were not
> yet finished being built.
>
> Does this work as anticipated when you set speculative_retry to NONE?
>
>
>
>
> --
> -
> Nate McCall
> Austin, TX
> @zznate
>
> Co-Founder & Sr. Technical Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>


Trace evidence for LOCAL_QUORUM ending up in remote DC

2015-09-08 Thread Tom van den Berge
I've been bugging you a few times, but now I've got trace data for a query
with LOCAL_QUORUM that is being sent to a remove data center.

The setup is as follows:
NetworkTopologyStrategy: {"DC1":"1","DC2":"2"}
Both DC1 and DC2 have 2 nodes.
In DC2, one node is currently being rebuilt, and therefore does not contain
all data (yet).

The client app connects to a node in DC1, and sends a SELECT query with CL
LOCAL_QUORUM, which in this case means ((1/2)+1=1.
If all is ok, the query always produces a result, because the requested
rows are guaranteed to be available in DC1.

However, the query sometimes produces no result. I've been able to record
the traces of these queries, and it turns out that the coordinator node in
DC1 sometimes sends the query to DC2, to the node that is being rebuilt,
and does not have the requested rows. I've included an example trace below.

The coordinator node is 10.55.156.67, which is in DC1. The 10.88.4.194 node
is in DC2.
I've verified that the  CL=LOCAL_QUORUM by printing it when the query is
sent (I'm using the datastax java driver).

 activity
 | source   | source_elapsed | thread
---+--++-
   Message received from /10.55.156.67
|  10.88.4.194 | 48 | MessagingService-Incoming-/10.55.156.67
 Executing single-partition query on aggregate
|  10.88.4.194 |286 | SharedPool-Worker-2
  Acquiring sstable references
|  10.88.4.194 |306 | SharedPool-Worker-2
   Merging memtable tombstones
|  10.88.4.194 |321 | SharedPool-Worker-2
Partition index lookup allows skipping sstable 107
|  10.88.4.194 |458 | SharedPool-Worker-2
Bloom filter allows skipping sstable 1
|  10.88.4.194 |489 | SharedPool-Worker-2
 Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones
|  10.88.4.194 |496 | SharedPool-Worker-2
Merging data from memtables and 0 sstables
|  10.88.4.194 |500 | SharedPool-Worker-2
 Read 0 live and 0 tombstone cells
|  10.88.4.194 |513 | SharedPool-Worker-2
   Enqueuing response to /10.55.156.67
|  10.88.4.194 |613 | SharedPool-Worker-2
  Sending message to /10.55.156.67
|  10.88.4.194 |672 | MessagingService-Outgoing-/10.55.156.67
Parsing SELECT * FROM Aggregate WHERE type=? AND typeId=?;
| 10.55.156.67 | 10 | SharedPool-Worker-4
   Sending message to /10.88.4.194
| 10.55.156.67 |   4335 |  MessagingService-Outgoing-/10.88.4.194
Message received from /10.88.4.194
| 10.55.156.67 |   6328 |  MessagingService-Incoming-/10.88.4.194
   Seeking to partition beginning in data file
| 10.55.156.67 |  10417 | SharedPool-Worker-3
 Key cache hit for sstable 389
| 10.55.156.67 |  10586 | SharedPool-Worker-3

My question is: how is it possible that the query is sent to a node in DC2?
Since DC1 has 2 nodes and RF 1, the query should always be sent to the
other node in DC1 if the coordinator does not have a replica, right?

Thanks,
Tom


Re: Is it possible to bootstrap the 1st node of a new DC?

2015-09-07 Thread Tom van den Berge
Hi Christian,

No, I never tried survey mode. I didn't know it until now, but form the
info I was able to find it looks like it is meant for a different purpose.
Maybe it can be used to bootstrap a new DC, though.

On the other hand, the auto_bootstrap=false + rebuild scenario seems to be
designed to do exactly what I need, except that it has one drawback: the
node joins the cluster as soon as the bootstrapping begins.

It turns out that join_ring=false in this scenario does not solve this
problem, since nodetool rebuild does not do anything if C* is started with
this option.

A workaround could be to ensure that only LOCAL_* CL is used by all
clients, but even then I'm seeing failed queries, because they're
mysteriously routed to the new DC every now and then.

Currently I'm trying to auto_bootstrap my new DC. The good thing is that it
doesn't accept reads from other DCs. The bad thing is that a) I can't
choose where it streams its data from, and b) the two nodes I've been
trying to bootstrap crashed when they were almost finished...



On Mon, Sep 7, 2015 at 10:22 PM, horschi <hors...@gmail.com> wrote:

> Hi Tom,
>
> this sounds very much like my thread: "auto_bootstrap=false broken?"
>
> Did you try booting the new node with survey-mode? I wanted to try this,
> but I am waiting for 2.0.17 to come out (survey mode is broken in earlier
> versions). Imho survey mode is what you (and me too) want: start a node,
> accepting writes, but not serving reads. I have not tested it yet, but I
> think it should work.
>
> Also the manual join mentioned in CASSANDRA-9667 sounds very interesting.
>
> kind regards,
> Christian
>
> On Mon, Sep 7, 2015 at 10:11 PM, Tom van den Berge <t...@drillster.com>
> wrote:
>
>> Running nodetool rebuild on a node that was started with join_ring=false
>> does not work, unfortunately. The nodetool command returns immediately,
>> after a message appears in the log that the streaming of data has started.
>> After that, nothing happens.
>>
>> Tom
>>
>>
>> On Fri, Sep 12, 2014 at 5:47 PM, Robert Coli <rc...@eventbrite.com>
>> wrote:
>>
>>> On Fri, Sep 12, 2014 at 6:57 AM, Tom van den Berge <t...@drillster.com>
>>> wrote:
>>>
>>>> Wouldn't it be far more efficient if a node that is rebuilding itself
>>>> is responsible for not accepting reads until the rebuild is complete? E.g.
>>>> by marking it as "Joining", similar to a node that is being bootstrapped?
>>>>
>>>
>>> Yes, and Cassandra 2.0.7 and above contain this long desired
>>> functionality.
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-6961
>>>
>>> I presume that one can also run a rebuild in this state, though I
>>> haven't tried. Driftx gives it an 80% chance... try it and see and let us
>>> know? :D
>>>
>>> =Rob
>>>
>>>
>>
>>
>>
>


Re: How to prevent queries being routed to new DC?

2015-09-07 Thread Tom van den Berge
NetworkTopologyStrategy

On Mon, Sep 7, 2015 at 4:39 PM, Ryan Svihla <r...@foundev.pro> wrote:

> What's your keyspace replication strategy?
>
> On Thu, Sep 3, 2015 at 3:16 PM Tom van den Berge <
> tom.vandenbe...@gmail.com> wrote:
>
>> Thanks for your help so far!
>>
>> I have some problems trying to understand the jira mentioned by Rob :(
>>
>> I'm currently trying to set up the first node in the new DC with
>> auto_bootstrap = true. The node then becomes visible with status "joining",
>> which (hopefully) prevents other DCs from sending queries to it.
>>
>> Do you think this will work?
>>
>>
>>
>> On Thu, Sep 3, 2015 at 9:46 PM, Robert Coli <rc...@eventbrite.com> wrote:
>>
>>> On Thu, Sep 3, 2015 at 12:25 PM, Bryan Cheng <br...@blockcypher.com>
>>> wrote:
>>>
>>>> I'd recommend you enable tracing and do a few queries in a controlled
>>>> environment to verify that queries are being routed to your new nodes.
>>>> Provided you have followed the procedure outlined above (specifically, have
>>>> set auto_bootstrap to false in your new cluster), rebuild has not been run,
>>>> the application is not connecting to the new cluster, and all your queries
>>>> are run at LOCAL_* quorum levels, I do not believe those queries should be
>>>> routed to the new dc.
>>>>
>>>
>>> Other than CASSANDRA-9753, this is true.
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-9753 (Unresolved; ):
>>> "LOCAL_QUORUM reads can block cross-DC if there is a digest mismatch"
>>>
>>> =Rob
>>>
>>>
>> --
> Regards,
>
> Ryan Svihla




-- 
Tom van den Berge
Lead Software Engineer
 [image: Drillster] Middenburcht 136
3452 MT Vleuten
Netherlands+31 30 755 53 30
www.drillster.com [image: Follow us on Facebook] Follow us on Facebook
<https://www.facebook.com/Drillster>


Re: Is it possible to bootstrap the 1st node of a new DC?

2015-09-07 Thread Tom van den Berge
Running nodetool rebuild on a node that was started with join_ring=false
does not work, unfortunately. The nodetool command returns immediately,
after a message appears in the log that the streaming of data has started.
After that, nothing happens.

Tom

On Fri, Sep 12, 2014 at 5:47 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Fri, Sep 12, 2014 at 6:57 AM, Tom van den Berge <t...@drillster.com>
> wrote:
>
>> Wouldn't it be far more efficient if a node that is rebuilding itself is
>> responsible for not accepting reads until the rebuild is complete? E.g. by
>> marking it as "Joining", similar to a node that is being bootstrapped?
>>
>
> Yes, and Cassandra 2.0.7 and above contain this long desired functionality.
>
> https://issues.apache.org/jira/browse/CASSANDRA-6961
>
> I presume that one can also run a rebuild in this state, though I haven't
> tried. Driftx gives it an 80% chance... try it and see and let us know? :D
>
> =Rob
>
>


Re: How to prevent queries being routed to new DC?

2015-09-03 Thread Tom van den Berge
Hi Bryan,

It does not generate any errors. A query for a specific row simply does not
return the row if it is sent to a node in the new DC. This makes sense,
because the node is still empty.

On Thu, Sep 3, 2015 at 9:03 PM, Bryan Cheng <br...@blockcypher.com> wrote:

> This all seems fine so far. Are you able to see what errors are being
> returned?
>
> We had a similar issue where one of our secondary, less used keyspaces was
> on a replication strategy that was not DC-aware, which was causing errors
> about being unable to satisfy LOCAL_ONE and LOCAL_QUORUM quoroum levels.
>
>
> On Thu, Sep 3, 2015 at 11:53 AM, Tom van den Berge <
> tom.vandenbe...@gmail.com> wrote:
>
>> Hi Bryan,
>>
>> I'm using the PropertyFileSnitch, and it contains entries for all nodes
>> in the old DC, and all nodes in the new DC. The replication factor for both
>> DCs is 1.
>>
>> With the first approach I described, the new nodes join the cluster, and
>> show up correctly under the new DC, so all seems to be fine.
>> With the second approach (join_ring=false), they don't show up at all,
>> which is also what I expected.
>>
>>
>> On Thu, Sep 3, 2015 at 8:44 PM, Bryan Cheng <br...@blockcypher.com>
>> wrote:
>>
>>> Hey Tom,
>>>
>>> What's your replication strategy look like? When your new nodes join the
>>> ring, can you verify that they show up under a new DC and not as part of
>>> the old?
>>>
>>> --Bryan
>>>
>>> On Thu, Sep 3, 2015 at 11:27 AM, Tom van den Berge <
>>> tom.vandenbe...@gmail.com> wrote:
>>>
>>>> I want to start using vnodes in my cluster. To do so, I've set up a new
>>>> data center with the same number of nodes as the existing one, as described
>>>> in
>>>> http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configVnodesProduction_t.html.
>>>> The new DC is in the same physical location as the old one.
>>>>
>>>> The problem I'm running into is that as soon as the nodes in the new
>>>> data center are started, the application that is using the nodes in the old
>>>> data center is frequently getting error messages because queries don't
>>>> return the expected data. I'm pretty sure this is because somehow these
>>>> queries are routed to the new, empty data center. The application is not
>>>> connecting to the nodes in the new DC.
>>>>
>>>> I've tried two different things to prevent this:
>>>>
>>>> 1) Ensure that all queries use either LOCAL_ONE or LOCAL_QUORUM
>>>> consistency. Nevertheless, I'm still seeing failed queries.
>>>> 2) Start the new nodes with -Dcassandra.join_ring=false, to prevent
>>>> them from participating in the cluster. Although they don't show up in
>>>> nodetool ring, I'm still seeing failed queries.
>>>>
>>>> If I understand it correctly, both measures should prevent queries from
>>>> ending up in the new DC, but somehow they don't in my situation.
>>>>
>>>> How is it possible that queries are routed to the new, emtpy data
>>>> center? And more importantly, how can I prevent it?
>>>>
>>>> Thanks,
>>>> Tom
>>>>
>>>
>>>
>>
>


How to prevent queries being routed to new DC?

2015-09-03 Thread Tom van den Berge
I want to start using vnodes in my cluster. To do so, I've set up a new
data center with the same number of nodes as the existing one, as described
in
http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configVnodesProduction_t.html.
The new DC is in the same physical location as the old one.

The problem I'm running into is that as soon as the nodes in the new data
center are started, the application that is using the nodes in the old data
center is frequently getting error messages because queries don't return
the expected data. I'm pretty sure this is because somehow these queries
are routed to the new, empty data center. The application is not connecting
to the nodes in the new DC.

I've tried two different things to prevent this:

1) Ensure that all queries use either LOCAL_ONE or LOCAL_QUORUM
consistency. Nevertheless, I'm still seeing failed queries.
2) Start the new nodes with -Dcassandra.join_ring=false, to prevent them
from participating in the cluster. Although they don't show up in nodetool
ring, I'm still seeing failed queries.

If I understand it correctly, both measures should prevent queries from
ending up in the new DC, but somehow they don't in my situation.

How is it possible that queries are routed to the new, emtpy data center?
And more importantly, how can I prevent it?

Thanks,
Tom


Re: How to prevent queries being routed to new DC?

2015-09-03 Thread Tom van den Berge
Hi Bryan,

I'm using the PropertyFileSnitch, and it contains entries for all nodes in
the old DC, and all nodes in the new DC. The replication factor for both
DCs is 1.

With the first approach I described, the new nodes join the cluster, and
show up correctly under the new DC, so all seems to be fine.
With the second approach (join_ring=false), they don't show up at all,
which is also what I expected.


On Thu, Sep 3, 2015 at 8:44 PM, Bryan Cheng <br...@blockcypher.com> wrote:

> Hey Tom,
>
> What's your replication strategy look like? When your new nodes join the
> ring, can you verify that they show up under a new DC and not as part of
> the old?
>
> --Bryan
>
> On Thu, Sep 3, 2015 at 11:27 AM, Tom van den Berge <
> tom.vandenbe...@gmail.com> wrote:
>
>> I want to start using vnodes in my cluster. To do so, I've set up a new
>> data center with the same number of nodes as the existing one, as described
>> in
>> http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configVnodesProduction_t.html.
>> The new DC is in the same physical location as the old one.
>>
>> The problem I'm running into is that as soon as the nodes in the new data
>> center are started, the application that is using the nodes in the old data
>> center is frequently getting error messages because queries don't return
>> the expected data. I'm pretty sure this is because somehow these queries
>> are routed to the new, empty data center. The application is not connecting
>> to the nodes in the new DC.
>>
>> I've tried two different things to prevent this:
>>
>> 1) Ensure that all queries use either LOCAL_ONE or LOCAL_QUORUM
>> consistency. Nevertheless, I'm still seeing failed queries.
>> 2) Start the new nodes with -Dcassandra.join_ring=false, to prevent them
>> from participating in the cluster. Although they don't show up in nodetool
>> ring, I'm still seeing failed queries.
>>
>> If I understand it correctly, both measures should prevent queries from
>> ending up in the new DC, but somehow they don't in my situation.
>>
>> How is it possible that queries are routed to the new, emtpy data center?
>> And more importantly, how can I prevent it?
>>
>> Thanks,
>> Tom
>>
>
>


Re: How to prevent queries being routed to new DC?

2015-09-03 Thread Tom van den Berge
Thanks for your help so far!

I have some problems trying to understand the jira mentioned by Rob :(

I'm currently trying to set up the first node in the new DC with
auto_bootstrap = true. The node then becomes visible with status "joining",
which (hopefully) prevents other DCs from sending queries to it.

Do you think this will work?



On Thu, Sep 3, 2015 at 9:46 PM, Robert Coli  wrote:

> On Thu, Sep 3, 2015 at 12:25 PM, Bryan Cheng 
> wrote:
>
>> I'd recommend you enable tracing and do a few queries in a controlled
>> environment to verify that queries are being routed to your new nodes.
>> Provided you have followed the procedure outlined above (specifically, have
>> set auto_bootstrap to false in your new cluster), rebuild has not been run,
>> the application is not connecting to the new cluster, and all your queries
>> are run at LOCAL_* quorum levels, I do not believe those queries should be
>> routed to the new dc.
>>
>
> Other than CASSANDRA-9753, this is true.
>
> https://issues.apache.org/jira/browse/CASSANDRA-9753 (Unresolved; ):
> "LOCAL_QUORUM reads can block cross-DC if there is a digest mismatch"
>
> =Rob
>
>


Fwd: MarshalException after upgrading to 2.1.6

2015-06-11 Thread Tom van den Berge
I've upgraded a node from 2.0.10 to 2.1.6. Before taking down the node,
I've run nodetool upgradesstables and nodetool scrub.

When starting up the node with 2.1.6, I'm getting a MarshalException
(stacktrace included below). For some reason, it seems that C* is trying to
convert a text value from the column 'currencyCode' to a UUID, which it
isn't.
I've had similar errors for two other columns as well, which I could work
around by dropping the table, since it wasn't used anymore.

The only thing I could do was restoring a snapshot and starting up the old
2.0.10 again. Does anyone have an idea how this can be fixed?

Thanks,
Tom

ERROR 13:51:57 Exception encountered during startup
org.apache.cassandra.serializers.MarshalException: unable to make version 1
UUID from 'currencyCode'
at org.apache.cassandra.db.marshal.UUIDType.fromString(UUIDType.java:188)
~[apache-cassandra-2.1.6.jar:2.1.6]
at
org.apache.cassandra.db.marshal.AbstractCompositeType.fromString(AbstractCompositeType.java:242)
~[apache-cassandra-2.1.6.jar:2.1.6]
at
org.apache.cassandra.config.ColumnDefinition.fromSchema(ColumnDefinition.java:397)
~[apache-cassandra-2.1.6.jar:2.1.6]
at
org.apache.cassandra.config.CFMetaData.fromSchemaNoTriggers(CFMetaData.java:1750)
~[apache-cassandra-2.1.6.jar:2.1.6]
at org.apache.cassandra.config.CFMetaData.fromSchema(CFMetaData.java:1860)
~[apache-cassandra-2.1.6.jar:2.1.6]
at
org.apache.cassandra.config.KSMetaData.deserializeColumnFamilies(KSMetaData.java:321)
~[apache-cassandra-2.1.6.jar:2.1.6]
at org.apache.cassandra.config.KSMetaData.fromSchema(KSMetaData.java:302)
~[apache-cassandra-2.1.6.jar:2.1.6]
at org.apache.cassandra.db.DefsTables.loadFromKeyspace(DefsTables.java:133)
~[apache-cassandra-2.1.6.jar:2.1.6]
at
org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:696)
~[apache-cassandra-2.1.6.jar:2.1.6]
at
org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:672)
~[apache-cassandra-2.1.6.jar:2.1.6]
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:293)
[apache-cassandra-2.1.6.jar:2.1.6]
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:536)
[apache-cassandra-2.1.6.jar:2.1.6]
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:625)
[apache-cassandra-2.1.6.jar:2.1.6]
Caused by: org.apache.cassandra.serializers.MarshalException: unable to
coerce 'currencyCode' to a  formatted date (long)
at
org.apache.cassandra.serializers.TimestampSerializer.dateStringToTimestamp(TimestampSerializer.java:111)
~[apache-cassandra-2.1.6.jar:2.1.6]
at org.apache.cassandra.db.marshal.UUIDType.fromString(UUIDType.java:184)
~[apache-cassandra-2.1.6.jar:2.1.6]
... 12 common frames omitted
Caused by: java.text.ParseException: Unable to parse the date: currencyCode
at
org.apache.commons.lang3.time.DateUtils.parseDateWithLeniency(DateUtils.java:336)
~[commons-lang3-3.1.jar:3.1]
at
org.apache.commons.lang3.time.DateUtils.parseDateStrictly(DateUtils.java:286)
~[commons-lang3-3.1.jar:3.1]
at
org.apache.cassandra.serializers.TimestampSerializer.dateStringToTimestamp(TimestampSerializer.java:107)
~[apache-cassandra-2.1.6.jar:2.1.6]
... 13 common frames omitted
org.apache.cassandra.serializers.MarshalException: unable to make version 1
UUID from 'currencyCode'
at org.apache.cassandra.db.marshal.UUIDType.fromString(UUIDType.java:188)
at
org.apache.cassandra.db.marshal.AbstractCompositeType.fromString(AbstractCompositeType.java:242)
at
org.apache.cassandra.config.ColumnDefinition.fromSchema(ColumnDefinition.java:397)
at
org.apache.cassandra.config.CFMetaData.fromSchemaNoTriggers(CFMetaData.java:1750)
at org.apache.cassandra.config.CFMetaData.fromSchema(CFMetaData.java:1860)
at
org.apache.cassandra.config.KSMetaData.deserializeColumnFamilies(KSMetaData.java:321)
at org.apache.cassandra.config.KSMetaData.fromSchema(KSMetaData.java:302)
at org.apache.cassandra.db.DefsTables.loadFromKeyspace(DefsTables.java:133)
at
org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:696)
at
org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:672)
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:293)
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:536)
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:625)
Caused by: org.apache.cassandra.serializers.MarshalException: unable to
coerce 'currencyCode' to a  formatted date (long)
at
org.apache.cassandra.serializers.TimestampSerializer.dateStringToTimestamp(TimestampSerializer.java:111)
at org.apache.cassandra.db.marshal.UUIDType.fromString(UUIDType.java:184)
... 12 more
Caused by: java.text.ParseException: Unable to parse the date: currencyCode
at
org.apache.commons.lang3.time.DateUtils.parseDateWithLeniency(DateUtils.java:336)
at
org.apache.commons.lang3.time.DateUtils.parseDateStrictly(DateUtils.java:286)
at

Re: MarshalException after upgrading to 2.1.6

2015-06-11 Thread Tom van den Berge
Sure!

I just opened https://issues.apache.org/jira/browse/CASSANDRA-9582



On Thu, Jun 11, 2015 at 5:27 PM, Tyler Hobbs ty...@datastax.com wrote:

 Can you open a JIRA ticket with details and the schema for that table
 here? https://issues.apache.org/jira/browse/CASSANDRA

 On Thu, Jun 11, 2015 at 9:23 AM, Tom van den Berge t...@drillster.com
 wrote:

 I've upgraded a node from 2.0.10 to 2.1.6. Before taking down the node,
 I've run nodetool upgradesstables and nodetool scrub.

 When starting up the node with 2.1.6, I'm getting a MarshalException
 (stacktrace included below). For some reason, it seems that C* is trying to
 convert a text value from the column 'currencyCode' to a UUID, which it
 isn't.
 I've had similar errors for two other columns as well, which I could work
 around by dropping the table, since it wasn't used anymore.

 The only thing I could do was restoring a snapshot and starting up the
 old 2.0.10 again. Does anyone have an idea how this can be fixed?

 Thanks,
 Tom

 ERROR 13:51:57 Exception encountered during startup
 org.apache.cassandra.serializers.MarshalException: unable to make version
 1 UUID from 'currencyCode'
 at org.apache.cassandra.db.marshal.UUIDType.fromString(UUIDType.java:188)
 ~[apache-cassandra-2.1.6.jar:2.1.6]
 at
 org.apache.cassandra.db.marshal.AbstractCompositeType.fromString(AbstractCompositeType.java:242)
 ~[apache-cassandra-2.1.6.jar:2.1.6]
 at
 org.apache.cassandra.config.ColumnDefinition.fromSchema(ColumnDefinition.java:397)
 ~[apache-cassandra-2.1.6.jar:2.1.6]
 at
 org.apache.cassandra.config.CFMetaData.fromSchemaNoTriggers(CFMetaData.java:1750)
 ~[apache-cassandra-2.1.6.jar:2.1.6]
 at
 org.apache.cassandra.config.CFMetaData.fromSchema(CFMetaData.java:1860)
 ~[apache-cassandra-2.1.6.jar:2.1.6]
 at
 org.apache.cassandra.config.KSMetaData.deserializeColumnFamilies(KSMetaData.java:321)
 ~[apache-cassandra-2.1.6.jar:2.1.6]
 at org.apache.cassandra.config.KSMetaData.fromSchema(KSMetaData.java:302)
 ~[apache-cassandra-2.1.6.jar:2.1.6]
 at
 org.apache.cassandra.db.DefsTables.loadFromKeyspace(DefsTables.java:133)
 ~[apache-cassandra-2.1.6.jar:2.1.6]
 at
 org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:696)
 ~[apache-cassandra-2.1.6.jar:2.1.6]
 at
 org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:672)
 ~[apache-cassandra-2.1.6.jar:2.1.6]
 at
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:293)
 [apache-cassandra-2.1.6.jar:2.1.6]
 at
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:536)
 [apache-cassandra-2.1.6.jar:2.1.6]
 at
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:625)
 [apache-cassandra-2.1.6.jar:2.1.6]
 Caused by: org.apache.cassandra.serializers.MarshalException: unable to
 coerce 'currencyCode' to a  formatted date (long)
 at
 org.apache.cassandra.serializers.TimestampSerializer.dateStringToTimestamp(TimestampSerializer.java:111)
 ~[apache-cassandra-2.1.6.jar:2.1.6]
 at org.apache.cassandra.db.marshal.UUIDType.fromString(UUIDType.java:184)
 ~[apache-cassandra-2.1.6.jar:2.1.6]
 ... 12 common frames omitted
 Caused by: java.text.ParseException: Unable to parse the date:
 currencyCode
 at
 org.apache.commons.lang3.time.DateUtils.parseDateWithLeniency(DateUtils.java:336)
 ~[commons-lang3-3.1.jar:3.1]
 at
 org.apache.commons.lang3.time.DateUtils.parseDateStrictly(DateUtils.java:286)
 ~[commons-lang3-3.1.jar:3.1]
 at
 org.apache.cassandra.serializers.TimestampSerializer.dateStringToTimestamp(TimestampSerializer.java:107)
 ~[apache-cassandra-2.1.6.jar:2.1.6]
 ... 13 common frames omitted
 org.apache.cassandra.serializers.MarshalException: unable to make version
 1 UUID from 'currencyCode'
 at org.apache.cassandra.db.marshal.UUIDType.fromString(UUIDType.java:188)
 at
 org.apache.cassandra.db.marshal.AbstractCompositeType.fromString(AbstractCompositeType.java:242)
 at
 org.apache.cassandra.config.ColumnDefinition.fromSchema(ColumnDefinition.java:397)
 at
 org.apache.cassandra.config.CFMetaData.fromSchemaNoTriggers(CFMetaData.java:1750)
 at org.apache.cassandra.config.CFMetaData.fromSchema(CFMetaData.java:1860)
 at
 org.apache.cassandra.config.KSMetaData.deserializeColumnFamilies(KSMetaData.java:321)
 at org.apache.cassandra.config.KSMetaData.fromSchema(KSMetaData.java:302)
 at
 org.apache.cassandra.db.DefsTables.loadFromKeyspace(DefsTables.java:133)
 at
 org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:696)
 at
 org.apache.cassandra.config.DatabaseDescriptor.loadSchemas(DatabaseDescriptor.java:672)
 at
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:293)
 at
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:536)
 at
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:625)
 Caused by: org.apache.cassandra.serializers.MarshalException: unable to
 coerce 'currencyCode' to a  formatted date (long

Re: Is it possible to bootstrap the 1st node of a new DC?

2014-09-12 Thread Tom van den Berge
Giving this some more thought, I think it's fair to say that using
LOCAL_ONE and LOCAL_QUORUM instead of ONE and QUORUM in this situation is a
actually workaround rather than a solution for this problem.

LOCAL_ONE and LOCAL_QUORUM are introduced to ensure that only the local DC
is used, which can be very useful. But not everybody needs this
restriction. So if you don't need this restriction, and you're normally
using ONE, when you're setting up a new DC and use rebuild to fill the
node(s), you're in trouble. To avoid this, you could (temporarily) change
the CL to LOCAL_ONE. But changing the CL of all queries of all clients can
potentially be very costly, depending on your code.

Wouldn't it be far more efficient if a node that is rebuilding itself is
responsible for not accepting reads until the rebuild is complete? E.g. by
marking it as Joining, similar to a node that is being bootstrapped?


Tom

On Thu, Sep 11, 2014 at 11:10 PM, Tom van den Berge t...@drillster.com
wrote:

 Thanks, Rob.
 I actually tried using LOCAL_ONE instead of ONE, but I still saw this
 problem. Maybe I missed some queries when updating to LOCAL_ONE. Anyway,
 it's good to know that this is supposed to work.

 Tom

 On Thu, Sep 11, 2014 at 10:28 PM, Robert Coli rc...@eventbrite.com
 wrote:

 On Thu, Sep 11, 2014 at 1:18 PM, Tom van den Berge t...@drillster.com
 wrote:

 When setting up a new (additional) data center, the documentation tells
 us to use nodetool rebuild -- old dc to fill up the node(s) in the new
 dc, and to disable auto_bootstrap.

 I'm wondering if it is possible to fill the node with
 auto_bootstrap=true instead of a nodetool rebuild command. If so, how
 will Cassandra decide from where to stream the data?


 Yes, if that node can hold 100% of the replicas for the new DC.

 Cassandra will decide from where to stream the data in the same way it
 normally does, by picking one replica per range and streaming from it.

 But you probably don't generally want to do this, rebuild exists for this
 use case.

 The reason I'm asking is that when using rebuild, I've learned from
 experience that the node immediately joins the cluster, and starts
 accepting reads (from other DCs) for the range it owns. But since the data
 is not complete yet, it can't return anything. This seems to be a dangerous
 side effect of this procedure, and therefore can't be used.


 Yes, that's why LOCAL_ONE ConsistencyLevel was created. Use it, and
 LOCAL_QUORUM, instead of ONE and QUORUM.

 =Rob





 --

 Drillster BV
 Middenburcht 136
 3452MT Vleuten
 Netherlands

 +31 30 755 5330

 Open your free account at www.drillster.com




-- 

Drillster BV
Middenburcht 136
3452MT Vleuten
Netherlands

+31 30 755 5330

Open your free account at www.drillster.com


Is it possible to bootstrap the 1st node of a new DC?

2014-09-11 Thread Tom van den Berge
When setting up a new (additional) data center, the documentation tells us
to use nodetool rebuild -- old dc to fill up the node(s) in the new dc,
and to disable auto_bootstrap.

I'm wondering if it is possible to fill the node with auto_bootstrap=true
instead of a nodetool rebuild command. If so, how will Cassandra decide
from where to stream the data?

The reason I'm asking is that when using rebuild, I've learned from
experience that the node immediately joins the cluster, and starts
accepting reads (from other DCs) for the range it owns. But since the data
is not complete yet, it can't return anything. This seems to be a dangerous
side effect of this procedure, and therefore can't be used.

Thanks
Tom


Re: Is it possible to bootstrap the 1st node of a new DC?

2014-09-11 Thread Tom van den Berge
Thanks, Rob.
I actually tried using LOCAL_ONE instead of ONE, but I still saw this
problem. Maybe I missed some queries when updating to LOCAL_ONE. Anyway,
it's good to know that this is supposed to work.

Tom

On Thu, Sep 11, 2014 at 10:28 PM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, Sep 11, 2014 at 1:18 PM, Tom van den Berge t...@drillster.com
 wrote:

 When setting up a new (additional) data center, the documentation tells
 us to use nodetool rebuild -- old dc to fill up the node(s) in the new
 dc, and to disable auto_bootstrap.

 I'm wondering if it is possible to fill the node with
 auto_bootstrap=true instead of a nodetool rebuild command. If so, how
 will Cassandra decide from where to stream the data?


 Yes, if that node can hold 100% of the replicas for the new DC.

 Cassandra will decide from where to stream the data in the same way it
 normally does, by picking one replica per range and streaming from it.

 But you probably don't generally want to do this, rebuild exists for this
 use case.

 The reason I'm asking is that when using rebuild, I've learned from
 experience that the node immediately joins the cluster, and starts
 accepting reads (from other DCs) for the range it owns. But since the data
 is not complete yet, it can't return anything. This seems to be a dangerous
 side effect of this procedure, and therefore can't be used.


 Yes, that's why LOCAL_ONE ConsistencyLevel was created. Use it, and
 LOCAL_QUORUM, instead of ONE and QUORUM.

 =Rob





-- 

Drillster BV
Middenburcht 136
3452MT Vleuten
Netherlands

+31 30 755 5330

Open your free account at www.drillster.com


Node being rebuilt receives read requests

2014-09-10 Thread Tom van den Berge
I have a datacenter with a single node, and I want to start using vnodes. I
have followed the instructions (
http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html),
and set up a new node in a new datacenter (auto_bootstrap=false, seed=node
in old dc, num_tokens=256 and initial_token is not set, topoloy file
updated).

After starting the new node, I used nodetool rebuild -- old-dc to start
filling up the new node.

The idea was to switch the client app to the new node when completed, and
to decommission the old (non-vnodes) node.

While the new node was being filled, my client application (still connected
to the old node, and no auto-discovery of nodes enabled) started showing
errors about rows that could not be found. The only reason I can think of
is that for some reason, the old node reroutes some queries to the new
(incomplete) node. Why would the old node send requests to the new node?
The old node contains 100% of all data, since it is a single-node
datacenter with replication factor 1, so I would say there is no need to
forward the request to another node. And, even more important, the new node
is in the middle of a 'rebuild' process, and therefore does not have all
data.

I noticed that after starting the new node, and before issuing the
'nodetool rebuild' command, 'nodetool info' shows that the new (emtpy) node
has status Normal. I expected that the status would be 'Joining', since
it's not ready yet.

To me, it seems that the cluster does not know the difference between a
node that's being rebuilt, and a node that is ready, and therefore nodes
that are being rebuilt also receive requests from other nodes. If this is
correct, how should one set up a new datacenter, without affecting the
clients that are connected to the old one?

I learned that some time ago, consistency level LOCAL_ONE was introduced to
prevent cross-datacenter requests. I changed my client  to use this
(instead of ONE), but it did not make a difference; I still saw many failed
queries in my client. I can't understand why.

Any help is greatly appreciated.

Thanks,
Tom


Are writes to indexes performed asynchronously?

2014-06-19 Thread Tom van den Berge
Hi,

I have a column family with a secondary index on one of its columns. I
noticed that when I write a row to the column family, and immediately query
that row through the secondary index, every now and then it won't give any
results.

Could it be that Cassandra performs the write to the internal index column
family asynchronously? That might explain this behaviour.

In other words, when writing to a indexed column family, is there, or can
there be any guarantee that the write to index is completed when the write
to the original column family is completed?

I'm using a single-node cluster, with consistency level ONE.

Thanks,
Tom


Re: Migration 1.2.14 to 2.0.8 causes Tried to create duplicate hard link at startup

2014-06-19 Thread Tom van den Berge
It turns out this is caused by an earlier, failed attempt to upgrade.
Removing all pre-sstablemetamigration snapshot directories solved the issue.

Credits to Markus Eriksson.


On Wed, Jun 11, 2014 at 9:42 AM, Tom van den Berge t...@drillster.com
wrote:

 No, unfortunately I haven't.




 On Tue, Jun 10, 2014 at 5:35 PM, Chris Burroughs 
 chris.burrou...@gmail.com wrote:

 Were you able to solve or work around this problem?


 On 06/05/2014 11:47 AM, Tom van den Berge wrote:

 Hi,

 I'm trying to migrate a development cluster from 1.2.14 to 2.0.8. When
 starting up 2.0.8, I'm seeing the following error in the logs:


   INFO 17:40:25,405 Snapshotting drillster, Account to
 pre-sstablemetamigration
 ERROR 17:40:25,407 Exception encountered during startup
 java.lang.RuntimeException: Tried to create duplicate hard link to
 /Users/tom/cassandra-data/data/drillster/Account/snapshots/pre-
 sstablemetamigration/drillster-Account-ic-65-Filter.db
  at
 org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:75)
  at
 org.apache.cassandra.db.compaction.LegacyLeveledManifest.
 snapshotWithoutCFS(LegacyLeveledManifest.java:129)
  at
 org.apache.cassandra.db.compaction.LegacyLeveledManifest.
 migrateManifests(LegacyLeveledManifest.java:91)
  at
 org.apache.cassandra.db.compaction.LeveledManifest.
 maybeMigrateManifests(LeveledManifest.java:617)
  at
 org.apache.cassandra.service.CassandraDaemon.setup(
 CassandraDaemon.java:274)
  at
 org.apache.cassandra.service.CassandraDaemon.activate(
 CassandraDaemon.java:496)
  at
 org.apache.cassandra.service.CassandraDaemon.main(
 CassandraDaemon.java:585)


 Does anyone have an idea how to solve this?


 Thanks,
 Tom





 --

 Drillster BV
 Middenburcht 136
 3452MT Vleuten
 Netherlands

 +31 30 755 5330

 Open your free account at www.drillster.com




-- 

Drillster BV
Middenburcht 136
3452MT Vleuten
Netherlands

+31 30 755 5330

Open your free account at www.drillster.com


Re: Migration 1.2.14 to 2.0.8 causes Tried to create duplicate hard link at startup

2014-06-11 Thread Tom van den Berge
No, unfortunately I haven't.




On Tue, Jun 10, 2014 at 5:35 PM, Chris Burroughs chris.burrou...@gmail.com
wrote:

 Were you able to solve or work around this problem?


 On 06/05/2014 11:47 AM, Tom van den Berge wrote:

 Hi,

 I'm trying to migrate a development cluster from 1.2.14 to 2.0.8. When
 starting up 2.0.8, I'm seeing the following error in the logs:


   INFO 17:40:25,405 Snapshotting drillster, Account to
 pre-sstablemetamigration
 ERROR 17:40:25,407 Exception encountered during startup
 java.lang.RuntimeException: Tried to create duplicate hard link to
 /Users/tom/cassandra-data/data/drillster/Account/snapshots/pre-
 sstablemetamigration/drillster-Account-ic-65-Filter.db
  at
 org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:75)
  at
 org.apache.cassandra.db.compaction.LegacyLeveledManifest.
 snapshotWithoutCFS(LegacyLeveledManifest.java:129)
  at
 org.apache.cassandra.db.compaction.LegacyLeveledManifest.
 migrateManifests(LegacyLeveledManifest.java:91)
  at
 org.apache.cassandra.db.compaction.LeveledManifest.maybeMigrateManifests(
 LeveledManifest.java:617)
  at
 org.apache.cassandra.service.CassandraDaemon.setup(
 CassandraDaemon.java:274)
  at
 org.apache.cassandra.service.CassandraDaemon.activate(
 CassandraDaemon.java:496)
  at
 org.apache.cassandra.service.CassandraDaemon.main(
 CassandraDaemon.java:585)


 Does anyone have an idea how to solve this?


 Thanks,
 Tom





-- 

Drillster BV
Middenburcht 136
3452MT Vleuten
Netherlands

+31 30 755 5330

Open your free account at www.drillster.com


Migration 1.2.14 to 2.0.8 causes Tried to create duplicate hard link at startup

2014-06-05 Thread Tom van den Berge
Hi,

I'm trying to migrate a development cluster from 1.2.14 to 2.0.8. When
starting up 2.0.8, I'm seeing the following error in the logs:


 INFO 17:40:25,405 Snapshotting drillster, Account to
pre-sstablemetamigration
ERROR 17:40:25,407 Exception encountered during startup
java.lang.RuntimeException: Tried to create duplicate hard link to
/Users/tom/cassandra-data/data/drillster/Account/snapshots/pre-sstablemetamigration/drillster-Account-ic-65-Filter.db
at
org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:75)
at
org.apache.cassandra.db.compaction.LegacyLeveledManifest.snapshotWithoutCFS(LegacyLeveledManifest.java:129)
at
org.apache.cassandra.db.compaction.LegacyLeveledManifest.migrateManifests(LegacyLeveledManifest.java:91)
at
org.apache.cassandra.db.compaction.LeveledManifest.maybeMigrateManifests(LeveledManifest.java:617)
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:274)
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)


Does anyone have an idea how to solve this?


Thanks,
Tom


StatusLogger output help

2014-03-28 Thread Tom van den Berge
Hi,

In my cassandra logs, I see a lot of StatusLogger output lines. I'm
trying to understand why this is logged, and how to interpret the output.
Maybe someone can point me to some documentation on this particular logging
aspect?

I would like to know what is triggering the StatusLogger.java to start
logging? Sometimes it logs every few seconds, and sometimes it won't log
for hours.

Also, about the lines that log Memtable ops, data per ColumnFamily, what
do these figures mean? Is it number of operations and data size (bytes, MB,
...)? Are the ops counters reset every time they are logged, or e.g. every
x minutes?


Any help is greatly appreciated!
Thanks,
Tom


Help on StatusLogger output?

2014-03-20 Thread Tom van den Berge
Hi,

In my cassandra logs, I see a lot of StatusLogger output lines. I'm
trying to understand why this is logged, and how to interpret the output.
Maybe someone can point me to some documentation on this particular logging
aspect?

I would like to know what is triggering the StatusLogger.java to start
logging? Sometimes it logs every few seconds, and sometimes it won't log
for hours.

Also, about the lines that log Memtable ops, data per ColumnFamily, what
do these figures mean? Is it number of operations and data size (bytes, MB,
...)? Are the ops counters reset every time they are logged, or e.g. every
x minutes?


Any help is greatly appreciated!
Thanks,
Tom


Re: OutOfMemory Java Heap Space error on startup...

2013-12-04 Thread Tom van den Berge
To start up your node again, you could delete the stored key caches (
/var/lib/cassandra/saved_caches/*).

Regards,
Tom


On Wed, Dec 4, 2013 at 7:46 PM, Krishna Chaitanya bnsk1990r...@gmail.comwrote:

 Hey Nate,
  Thanks for the reply. The link was really good...!!! Looking
 forward to making the necessary changes and trying this approach.

 Thanks.

 Regards,
 BNSK.


 On Wed, Dec 4, 2013 at 9:00 AM, Nate McCall n...@thelastpickle.comwrote:

 For a limited memory environment, take a look at the following:

 http://www.opensourceconnections.com/2013/08/31/building-the-perfect-cassandra-test-environment/




 On Wed, Dec 4, 2013 at 11:05 AM, Krishna Chaitanya 
 bnsk1990r...@gmail.com wrote:

 Hello,
  I am currently using Cassandra-2.0.0 on OpenSuse for storing
 netflow packets that are seen on my ethernet interface. I deliberately
 tried to test Cassandra with heavy data and it ran fine for about 30 mins
 after which it crashed  with OutOfMemory error. I set up a two-node cluster
 to which this data is getting stored with replication_factor 1. Now,
 Cassandra is not even starting up. The log is given below for your
 reference.
 Can I solve
 this problem by tweaking JVM OPTS? If yes, which all and how? How can I be
 sure that it is not someother issue like corrupted commit log headers, etc.
 so as to prevent these errors in the future? I am on a 32-bit OpenSuse i5
 machine with 4G RAM.

 Here is the output when I try to start Cassandra:-

 linux-0cpn:~/bnsk/
 experimentation/apache-cassandra-2.0.0/bin # ./cassandra -f 
 [1] 984
 linux-0cpn:~/bnsk/experimentation/apache-cassandra-2.0.0/bin #
 ./../conf/cassandra-env.sh: line 137: elseError:: command not found
 xss =  -ea -javaagent:./../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
 -XX:ThreadPriorityPolicy=42 -Xms1024M -Xmx1024M -Xmn256M
 -XX:+HeapDumpOnOutOfMemoryError -Xss256k
  INFO 03:46:43,178 Logging initialized
  INFO 03:46:43,188 32bit JVM detected.  It is recommended to run
 Cassandra on a 64bit JVM for better performance.
  INFO 03:46:43,189 JVM vendor/version: Java HotSpot(TM) Server
 VM/1.7.0_45
  INFO 03:46:43,189 Heap size: 1046937600/1046937600
  INFO 03:46:43,189 Classpath:
 ./../conf:./../build/classes/main:./../build/classes/thrift:./../lib/antlr-3.2.jar:./../lib/apache-cassandra-2.0.0.jar:./../lib/apache-cassandra-clientutil-2.0.0.jar:./../lib/apache-cassandra-thrift-2.0.0.jar:./../lib/commons-cli-1.1.jar:./../lib/commons-codec-1.2.jar:./../lib/commons-lang-2.6.jar:./../lib/compress-lzf-0.8.4.jar:./../lib/concurrentlinkedhashmap-lru-1.3.jar:./../lib/disruptor-3.0.1.jar:./../lib/guava-13.0.1.jar:./../lib/high-scale-lib-1.1.2.jar:./../lib/jackson-core-asl-1.9.2.jar:./../lib/jackson-mapper-asl-1.9.2.jar:./../lib/jamm-0.2.5.jar:./../lib/jbcrypt-0.3m.jar:./../lib/jline-1.0.jar:./../lib/json-simple-1.1.jar:./../lib/libthrift-0.9.0.jar:./../lib/log4j-1.2.16.jar:./../lib/lz4-1.1.0.jar:./../lib/metrics-core-2.0.3.jar:./../lib/netty-3.5.9.Final.jar:./../lib/servlet-api-2.5-20081211.jar:./../lib/slf4j-api-1.7.2.jar:./../lib/slf4j-log4j12-1.7.2.jar:./../lib/snakeyaml-1.11.jar:./../lib/snappy-java-1.0.5.jar:./../lib/snaptree-0.1.jar:./../lib/thrift-server-0.3.0.jar:./../lib/jamm-0.2.5.jar
  INFO 03:46:43,191 JNA not found. Native methods will be disabled.
  INFO 03:46:43,199 Loading settings from
 file:/root/bnsk/experimentation/apache-cassandra-2.0.0/conf/cassandra.yaml
  INFO 03:46:43,418 Data files directories: [/var/lib/cassandra/data]
  INFO 03:46:43,418 Commit log directory: /var/lib/cassandra/commitlog
  INFO 03:46:43,418 DiskAccessMode 'auto' determined to be standard,
 indexAccessMode is standard
  INFO 03:46:43,418 disk_failure_policy is stop
  INFO 03:46:43,422 Global memtable threshold is enabled at 332MB
  INFO 03:46:43,510 Not using multi-threaded compaction
  INFO 03:46:43,660 Initializing key cache with capacity of 49 MBs.
  INFO 03:46:43,667 Scheduling key cache save to each 14400 seconds
 (going to save all keys).
  INFO 03:46:43,668 Initializing row cache with capacity of 0 MBs
  INFO 03:46:43,674 Scheduling row cache save to each 0 seconds (going to
 save all keys).
  INFO 03:46:43,755 Initializing system.schema_triggers
  INFO 03:46:43,768 Opening
 /var/lib/cassandra/data/system/schema_triggers/system-schema_triggers-ja-104
 (57 bytes)
  INFO 03:46:43,768 Opening
 /var/lib/cassandra/data/system/schema_triggers/system-schema_triggers-ja-105
 (57 bytes)
  INFO 03:46:43,769 Opening
 /var/lib/cassandra/data/system/schema_triggers/system-schema_triggers-ja-103
 (57 bytes)
  INFO 03:46:43,790 reading saved cache
 /var/lib/cassandra/saved_caches/system-schema_triggers-KeyCache-b.db
  INFO 03:46:43,798 Initializing system.batchlog
  INFO 03:46:43,800 Initializing system.peer_events
  INFO 03:46:43,804 Initializing system.compactions_in_progress
  INFO 03:46:43,805 Opening
 /var/lib/cassandra/data/system/compactions_in_progress/system-compactions_in_progress-ja-22
 

Re: How to measure data transfer between data centers?

2013-12-04 Thread Tom van den Berge
Hi Chris,

I think streaming is used for repair tasks, bulk loading and that kind of
things, but not for regular replication traffic.

I think you're right that I should look into network tools. I don't think
cassandra can supply this information.

Thanks,
Tom


On Wed, Dec 4, 2013 at 6:08 PM, Chris Burroughs
chris.burrou...@gmail.comwrote:

 https://wiki.apache.org/cassandra/Metrics has per node Streaming metrics
 that include total bytes/in out.  That is only a small bit of what you want
 though.

 For total DC bandwidth it might be more straightforward to measure this at
 the router/switch/fancy-network-gear level.


 On 12/03/2013 06:25 AM, Tom van den Berge wrote:

 Is there a way to know how much data is transferred between two nodes, or
 more specifically, between two data centers?

 I'm especially interested in how much data is being replicated from one
 data center to another, to know how much of the available bandwidth is
 used.


 Thanks,
 Tom





-- 

Drillster BV
Middenburcht 136
3452MT Vleuten
Netherlands

+31 30 755 5330

Open your free account at www.drillster.com


How to monitor the progress of a HintedHandoff task?

2013-12-03 Thread Tom van den Berge
Hi,

Is there a way to monitor the progress of a hinted handoff task?

I found the following two mbeans providing some info:

org.apache.cassandra.internal:type=HintedHandoff, which tells me that there
is 1 active task, and
org.apache.cassandra.db:type=HintedHandoffManager#countPendingHints(),
which quite often gives a timeout when executed.

Ideally, I would like to see how many hints have been sent (e.g. over the
last minute or so), and how many hints are still to be sent (although I
assume that's what countPendingHints normally does?)

I'm experiencing hinted handoff tasks that are started, but never finish,
so I would like to know what the task is doing.

My log shows this:

INFO [HintedHandoff:1] 2013-12-02 13:49:05,325 HintedHandOffManager.java
(line 297) Started hinted handoff for host:
6f80b942-5b6d-4233-9827-3727591abf55 with IP: /10.55.156.66
(nothing more for [HintedHandoff:1])

The node is up and running, the network connection is ok, no gossip
messages appear in the logs.

Any idea is welcome.
(Casandra 1.2.3)




-- 

Drillster BV
Middenburcht 136
3452MT Vleuten
Netherlands

+31 30 755 5330

Open your free account at www.drillster.com


How to measure data transfer between data centers?

2013-12-03 Thread Tom van den Berge
Is there a way to know how much data is transferred between two nodes, or
more specifically, between two data centers?

I'm especially interested in how much data is being replicated from one
data center to another, to know how much of the available bandwidth is used.


Thanks,
Tom


Re: How to monitor the progress of a HintedHandoff task?

2013-12-03 Thread Tom van den Berge
Hi Rahul,

Thanks for your reply.

I have never seen message like Timed out replaying hints to..., which is
a good thing then, I suppose ;)

Normally, I do see the Finished hinted handoff... log message. However,
every now and then this message is not logged, not even after several
hours. This is the problem I'm trying to solve.

The log messages you describe are quite course-grained; they only tell you
that a task has started or finished, but not how this task is progressing.
And that's exactly what I would like to know if I see that a task has
started, but has not finished after a reasonable amount of time.

So I guess the only way to see learn the progress is to look inside the
'hints' column family then.I'll give that a try.


Thanks,
Tom


On Tue, Dec 3, 2013 at 1:43 PM, Rahul Menon ra...@apigee.com wrote:

 Tom,

 You should check the size of the hints column family to determine how much
 are present. The hints are a super column family and its keys are
 destination tokens. You could look at it if you would like.

 Hints send and timedouts are logged, you should be seeing something like

 Timed out replaying hints to {}; aborting ({} delivered



 OR

 Finished hinted handoff of {} rows to endpoint {}



 Thanks
 Rahul


 On Tue, Dec 3, 2013 at 2:36 PM, Tom van den Berge t...@drillster.comwrote:

 Hi,

 Is there a way to monitor the progress of a hinted handoff task?

 I found the following two mbeans providing some info:

 org.apache.cassandra.internal:type=HintedHandoff, which tells me that
 there is 1 active task, and
 org.apache.cassandra.db:type=HintedHandoffManager#countPendingHints(),
 which quite often gives a timeout when executed.

 Ideally, I would like to see how many hints have been sent (e.g. over the
 last minute or so), and how many hints are still to be sent (although I
 assume that's what countPendingHints normally does?)

 I'm experiencing hinted handoff tasks that are started, but never finish,
 so I would like to know what the task is doing.

 My log shows this:

 INFO [HintedHandoff:1] 2013-12-02 13:49:05,325 HintedHandOffManager.java
 (line 297) Started hinted handoff for host:
 6f80b942-5b6d-4233-9827-3727591abf55 with IP: /10.55.156.66
 (nothing more for [HintedHandoff:1])

 The node is up and running, the network connection is ok, no gossip
 messages appear in the logs.

 Any idea is welcome.
 (Casandra 1.2.3)




 --

 Drillster BV
 Middenburcht 136
 3452MT Vleuten
 Netherlands

 +31 30 755 5330

 Open your free account at www.drillster.com





-- 

Drillster BV
Middenburcht 136
3452MT Vleuten
Netherlands

+31 30 755 5330

Open your free account at www.drillster.com


Re: How to monitor the progress of a HintedHandoff task?

2013-12-03 Thread Tom van den Berge
Rahul,

This problem occurs every now and then, and currently everything is ok, so
there are no hints. But whenever it happens, the hints are quickly piling
up. This results in heap problems on the node (Heap is 0.813462 full...
appears many times). This in turn results in the flushing of the 'hints'
column family, to relieve memory pressure. According to the log message,
the size varies between 50 and 60MB). But since the HintedHandoffManager is
reading from the hints CF, it will probably pull it back into a memtable
again -- that's at least my understanding of how it works.

So I guess that flushing the hints CF while the HintedHandoffManager is
working on it only makes things worse, and it could be the reason that the
process never ends.

What I typically see when this happens is that the hints keep piling up,
and eventually the node comes to a grinding halt (OOM). Then I have to
rebuild the node entirely (only removing the hints doesn't work).

The reason for hints to start accumulating in the first place might be a
spike in CF writes that must be replicated to a node in another data
center. The available bandwidth to that data center might not be able to
handle the data quickly enough, resulting in stored hints. The
HintedHandoff task that is started is targeting that remote node.


Thanks,
Tom


On Tue, Dec 3, 2013 at 2:22 PM, Rahul Menon ra...@apigee.com wrote:

 Tom,

 Do you know why these hints are piling up? What is the size of the hints
 cf?

 Thanks
 Rahul


 On Tue, Dec 3, 2013 at 6:41 PM, Tom van den Berge t...@drillster.comwrote:

 Hi Rahul,

 Thanks for your reply.

 I have never seen message like Timed out replaying hints to..., which
 is a good thing then, I suppose ;)

 Normally, I do see the Finished hinted handoff... log message. However,
 every now and then this message is not logged, not even after several
 hours. This is the problem I'm trying to solve.

 The log messages you describe are quite course-grained; they only tell
 you that a task has started or finished, but not how this task is
 progressing. And that's exactly what I would like to know if I see that a
 task has started, but has not finished after a reasonable amount of time.

 So I guess the only way to see learn the progress is to look inside the
 'hints' column family then.I'll give that a try.


 Thanks,
 Tom


 On Tue, Dec 3, 2013 at 1:43 PM, Rahul Menon ra...@apigee.com wrote:

 Tom,

 You should check the size of the hints column family to determine how
 much are present. The hints are a super column family and its keys are
 destination tokens. You could look at it if you would like.

 Hints send and timedouts are logged, you should be seeing something like

 Timed out replaying hints to {}; aborting ({} delivered






 OR

 Finished hinted handoff of {} rows to endpoint {}



 Thanks
 Rahul


 On Tue, Dec 3, 2013 at 2:36 PM, Tom van den Berge t...@drillster.comwrote:

 Hi,

 Is there a way to monitor the progress of a hinted handoff task?

 I found the following two mbeans providing some info:

 org.apache.cassandra.internal:type=HintedHandoff, which tells me that
 there is 1 active task, and
 org.apache.cassandra.db:type=HintedHandoffManager#countPendingHints(),
 which quite often gives a timeout when executed.

 Ideally, I would like to see how many hints have been sent (e.g. over
 the last minute or so), and how many hints are still to be sent (although I
 assume that's what countPendingHints normally does?)

 I'm experiencing hinted handoff tasks that are started, but never
 finish, so I would like to know what the task is doing.

 My log shows this:

 INFO [HintedHandoff:1] 2013-12-02
 13:49:05,325 HintedHandOffManager.java (line 297) Started hinted handoff
 for host: 6f80b942-5b6d-4233-9827-3727591abf55 with IP: /10.55.156.66
 (nothing more for [HintedHandoff:1])

 The node is up and running, the network connection is ok, no gossip
 messages appear in the logs.

 Any idea is welcome.
 (Casandra 1.2.3)




 --

 Drillster BV
 Middenburcht 136
 3452MT Vleuten
 Netherlands

 +31 30 755 5330

 Open your free account at www.drillster.com





 --

 Drillster BV
 Middenburcht 136
 3452MT Vleuten
 Netherlands

 +31 30 755 5330

 Open your free account at www.drillster.com





-- 

Drillster BV
Middenburcht 136
3452MT Vleuten
Netherlands

+31 30 755 5330

Open your free account at www.drillster.com


What is listEndpointsPendingHints?

2013-11-26 Thread Tom van den Berge
When I run the operation listEndpointsPendingHints on the
mbean org.apache.cassandra.db:type=HintedHandoffManager, I'm getting

( 126879603237190600081737151857243914981 )

It suggests that there are pending hints, but the
org.apache.cassandra.internal:type=HintedHandoff mbean provides these
figures:

TotalBlockedTasks = 0;
CurrentlyBlockedTasks = 0;
CoreThreads = 2;
MaximumThreads = 2;
ActiveCount = 0;
PendingTasks = 0;
CompletedTasks = 0;

I'm wondering what it means that it returns a value, and what this value
is? It looks like a token, but it's not one of the tokens of my nodes.

The reason I'm looking into this is that my cluster suffering every now and
then from never ending (dead) hinted handoff tasks, resulting in a flooding
of hints on the node.

Thanks,
Tom


Re: Managing index tables

2013-11-05 Thread Tom van den Berge
Hi Thomas,

I understand your concerns about ensuring the integrity of your data when
having to maintain the indexes yourself.

In some situations, using Cassandra's built in secondary indexes is more
efficient -- when many rows contained the indexed value. Maybe your
permissions fall in this category? Obviously, the advantage is that
Cassandra will do the maintenance on the index for you.

For situations where secondary indexes are not recommended, you make your
life a lot easier if all modifications of the indexed entity (like your
user) is executed by one single piece of code, which is then also
responsible for maintaining all associated indexes. And write tests to
ensure that it works in all possible ways.

I understood that Cassandra 2.0 supports transactions. I haven't looked at
it yet, but this could also help maintaining your data integrity, when a
failed update of one of your indexes results in a rollback of the entire
transaction.

I hope this is helpful to you.
Tom


On Mon, Nov 4, 2013 at 12:20 PM, Thomas Stets thomas.st...@gmail.comwrote:

 What is the best way to manage index tables on update/deletion of the
 indexed data?

 I have a table containing all kinds of data fora user, i.e. name, address,
 contact data, company data etc. Key to this table is the user ID.

 I also maintain about a dozen index tables matching my queries, like name,
 email address, company D.U.N.S number, permissions the user has, etc. These
 index tables contain the user IDs matching the search key as column names,
 with the column values left empty.

 Whenever a user is deleted or updated I have to make sure to update the
 index tables, i.e. if the permissions of a user changes I have to remove
 the user ID from the rows matching the permission he no longer has.

 My problem is to find all matching entries, especially for data I no
 longer have.

 My solution so far is to keep a separate table to keep track of all index
 tables and keys the user can be found in. In the case mentioned I look up
 the keys for the permissions table, remove the user ID from there, then
 remove the entry in the keys table.

 This works so far (in production for more than a year and a half), and it
 also allows me to clean up after something has gone wrong.

 But still, all this additional level of meta information adds a lot of
 complexity. I was wondering wether there is some kind of pattern that
 addresses my problem. I found lots of information saying that creating the
 index tables is the way to go, but nobody ever mentions maintaining the
 index tables.

 tia, Thomas




-- 

Drillster BV
Middenburcht 136
3452MT Vleuten
Netherlands

+31 30 755 5330

Open your free account at www.drillster.com


Re: filter using timeuuid column type

2013-11-05 Thread Tom van den Berge
This is because time2 is not part of the primary key. Only the primary key
column(s) can be queried with  and . Secondary indexes (like your
timeuuid_test2_idx)  can only be queried with the = operator.

Maybe you can make time2 also part of your primary key?


Good luck,
Tom


On Mon, Nov 4, 2013 at 11:29 AM, Turi, Ferenc (GE Power  Water, Non-GE) 
ferenc.t...@ge.com wrote:

  Hi,



 Is it possible to filter records by using timeuuid column types in case
 the column is not part of the primary key?



 I tried the followings:



 [cqlsh 3.1.2 | Cassandra 1.2.10.1 | CQL spec 3.0.0 | Thrift protocol
 19.36.0]



 CREATE TABLE timeuuid_test2(

 row_key text,

 time timeuuid,

 time2 timeuuid,

 message text,

 PRIMARY KEY (row_key, time)

 )



 Cqlsh:*select * from timeuuid_test2 where time2now();*



 Bad Request: No indexed columns present in by-columns clause with Equal
 operator

 I tried to create the required index:



 *create index timeuuid_test2_idx on timeuuid_test2 (time2);*



 Bad Request: No indexed columns present in by-columns clause with Equal
 operator

 The result is the same…



 If the used column is time then everything is OK.



 *select * from timeuuid_test2 where timenow() ALLOW FILTERING;*



 The question here. Why I can’t use the ‘time2’ column  when filtering
 despite the column is indexed?



 Thanks,



 Ferenc




-- 

Drillster BV
Middenburcht 136
3452MT Vleuten
Netherlands

+31 30 755 5330

Open your free account at www.drillster.com


Re: Check out if Cassandra ready

2013-11-01 Thread Tom van den Berge
I recommend using CassandraUnit (https://github.com/jsevellec/cassandra-unit).
It makes using Cassandra in unit tests quite easy.

It allows you to start an embedded Cassandra synchronously with a single
simple method call, optionally load your schema and initial data, and
you're ready to start testing.

I'm using it in many unit tests (although formally it's not a unit test
anymore when relying on a cassandra node). The fantastic performance of
Cassandra even allows me to clear all column families and insert the test
fixture rows for each individual test case.

Good luck,
Tom



On Fri, Nov 1, 2013 at 10:00 AM, Salih Kardan karda...@gmail.com wrote:

 Hi all,

 I am a newbie to Cassandra and I am tring to write test cases to cassandra
 with JUnit.
 I use CassandraDaemon class to start cassandra in IntelliJ IDEA. I want to
 wait
 until Cassandra up and running before runnig test methods. How can I wait
 until cassandra starts or
 is there any way to check if cassandra is running (with Java)?

 Thanks.
 Salih Kardan




-- 

Drillster BV
Middenburcht 136
3452MT Vleuten
Netherlands

+31 30 755 5330

Open your free account at www.drillster.com


Re: Disappearing index data.

2013-10-09 Thread Tom van den Berge
The suggested fix to run a major compaction on the index column family
unfortunately didn't help. Though, rebuilding the index (nodetool
rebuild_index) fixed it.

This bug appears to be almost the same as
https://issues.apache.org/jira/browse/CASSANDRA-5732, (and some of the
related bugs mentioned there), but there's one difference: these bug
reports all mention the use of caching ALL as the cause of the problems.
However, the column families I'm having trouble with have caching
KEYS_ONLY.





On Mon, Oct 7, 2013 at 6:37 PM, Janne Jalkanen janne.jalka...@ecyrd.comwrote:


 https://issues.apache.org/jira/browse/CASSANDRA-5732

 There is now a reproducible test case.

 /Janne

 On Oct 7, 2013, at 16:29 , Michał Michalski mich...@opera.com wrote:

 I had similar issue (reported many times here, there's also a JIRA issue,
 but people reporting this problem were unable to reproduce it).

 What I can say is that for me the solution was to run major compaction on
 the index CF via JMX. To be clear - we're not talking about compacting the
 CF that IS indexed (your CF), but the internal Cassandra's one, which is
 responsible for storing index data.

 MBean you should look for looks like this:


 org.apache.cassandra.db:type=IndexColumnFamilies,keyspace=KS,columnfamily=CF.IDX

 M.

 W dniu 07.10.2013 15:22, Tom van den Berge pisze:

 On a 2-node cluster with replication factor 2, I have a column family with
 an index on one of the columns.

 Every now and then, I notice that a lookup of the record through the index
 on node 1 produces the record, but the same lookup on node 2 does not! If I
 do a lookup by row key, the record is found, and the indexed value is
 there.


 So as far as I can tell, the index on one of the nodes looses values, and
 is no longer in sync with the other node, even though the replication
 factor requires it. I typically repair these issues by storing the indexed
 column value again.

 The indexed data is static data; it doesn't change.

 I'm running cassandra 1.2.3. I'm running a nodetool repair on each node
 every day (although this does not fix this problem).

 This problem worries me a lot. I don't have a clue about the cause of it.
 Any help would be greatly appreciated.



 Tom





-- 

Drillster BV
Middenburcht 136
3452MT Vleuten
Netherlands

+31 30 755 5330

Open your free account at www.drillster.com


Disappearing index data.

2013-10-07 Thread Tom van den Berge
On a 2-node cluster with replication factor 2, I have a column family with
an index on one of the columns.

Every now and then, I notice that a lookup of the record through the index
on node 1 produces the record, but the same lookup on node 2 does not! If I
do a lookup by row key, the record is found, and the indexed value is there.


So as far as I can tell, the index on one of the nodes looses values, and
is no longer in sync with the other node, even though the replication
factor requires it. I typically repair these issues by storing the indexed
column value again.

The indexed data is static data; it doesn't change.

I'm running cassandra 1.2.3. I'm running a nodetool repair on each node
every day (although this does not fix this problem).

This problem worries me a lot. I don't have a clue about the cause of it.
Any help would be greatly appreciated.



Tom


Re: Disappearing index data.

2013-10-07 Thread Tom van den Berge
Thanks, I'll give that a try.

Is there a way to do this without JMX? I wouldn't know now to run a JMX
console on my production servers without a graphical interface.


On Mon, Oct 7, 2013 at 3:29 PM, Michał Michalski mich...@opera.com wrote:

 I had similar issue (reported many times here, there's also a JIRA issue,
 but people reporting this problem were unable to reproduce it).

 What I can say is that for me the solution was to run major compaction on
 the index CF via JMX. To be clear - we're not talking about compacting the
 CF that IS indexed (your CF), but the internal Cassandra's one, which is
 responsible for storing index data.

 MBean you should look for looks like this:

 org.apache.cassandra.db:type=**IndexColumnFamilies,keyspace=**
 KS,columnfamily=CF.IDX

 M.

 W dniu 07.10.2013 15:22, Tom van den Berge pisze:

  On a 2-node cluster with replication factor 2, I have a column family with
 an index on one of the columns.

 Every now and then, I notice that a lookup of the record through the index
 on node 1 produces the record, but the same lookup on node 2 does not! If
 I
 do a lookup by row key, the record is found, and the indexed value is
 there.


 So as far as I can tell, the index on one of the nodes looses values, and
 is no longer in sync with the other node, even though the replication
 factor requires it. I typically repair these issues by storing the indexed
 column value again.

 The indexed data is static data; it doesn't change.

 I'm running cassandra 1.2.3. I'm running a nodetool repair on each node
 every day (although this does not fix this problem).

 This problem worries me a lot. I don't have a clue about the cause of it.
 Any help would be greatly appreciated.



 Tom





-- 

Drillster BV
Middenburcht 136
3452MT Vleuten
Netherlands

+31 30 755 5330

Open your free account at www.drillster.com


HintedHandoff process does not finish

2013-09-27 Thread Tom van den Berge
Hi,

One one of my nodes, the (storage) load increased dramatically (doubled),
within one or two hours. The hints column family was causing the growth. I
noticed one HintedHandoff process that was started some two hours ago, but
hadn't finished. Normally, these processes take only a few seconds, 15
seconds max, in my cluster.

The not-finishing process was handing the hints over to a host in another
data center. There were no warning or error messages in the logs, other
than the repeated flushing high-traffic column family hints.
I'm using Cassandra 1.2.3.

   - What can be the reason for the handoff process not to finish?
   - What would be the best way to recover from this situation?
   - What can be done to prevent this from happening again?


Thanks in advance,
Tom


Re: is there a no disk storage mode ?

2011-12-01 Thread Tom van den Berge

Hi Dominique,

I don't think there is a way to run cassandra without disk storage. But 
running it embedded can be very useful for unit testing. I'm using 
cassandra-unit (https://github.com/jsevellec/cassandra-unit) to 
integrate it in my tests. You don't need to configure any file paths; it 
works fine out of the box.


I've set it up to drop and recreate my keyspace before each test case, 
and even then it performs quite good.


Good luck,
Tom


On 12/1/11 5:36 PM, DE VITO Dominique wrote:


Hi,

I want to use Cassandra for (fast) unit testing with a small number of 
data.


So, I imagined the Cassandra embedded server I plan to use would start 
faster and would be more portable (because no file path depending on 
OS), without disk storage mode (so, diskless if you want).


Is there some no disk storage mode for Cassandra ?

Thanks.

Regards,

Dominique