Re: User click count

2014-12-29 Thread Ajay
Thanks Janne, Alain and Eric.

Now say I go with counters (hourly, daily, monthly) and also store UUID as
below:

user Id : /mm/dd as row key and dynamic columns for each click with
column key as timestamp and value as empty. Periodically count the columns
and rows and correct the counters. Now in this case, there will be one row
per day but as many columns as user click.

Other way is to store row per hour
user id : /mm/dd/hh as row key and dynamic columns for each click with
column key as timestamp and value as empty.

Is there any difference (in performance or any known issues) between more
rows Vs more columns as Cassandra deletes them through tombstones (say by
default 20 days).

Thanks
Ajay

On Mon, Dec 29, 2014 at 7:47 PM, Eric Stevens  wrote:

> > If the counters get incorrect, it could't be corrected
>
> You'd have to store something that allowed you to correct it.  For
> example, the TimeUUID approach to keep true counts, which are slow to read
> but accurate, and a background process that trues up your counter columns
> periodically.
>
> On Mon, Dec 29, 2014 at 7:05 AM, Ajay  wrote:
>
>> Thanks for the clarification.
>>
>> In my case, Cassandra is the only storage. If the counters get incorrect,
>> it could't be corrected. For that if we store raw data, we can as well go
>> that approach. But the granularity has to be as seconds level as more than
>> one user can click the same link. So the data will be huge with more writes
>> and more rows to count for reads right?
>>
>> Thanks
>> Ajay
>>
>>
>> On Mon, Dec 29, 2014 at 7:10 PM, Alain RODRIGUEZ 
>> wrote:
>>
>>> Hi Ajay,
>>>
>>> Here is a good explanation you might want to read.
>>>
>>>
>>> http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters
>>>
>>> Though we use counters for 3 years now, we used them from start C* 0.8
>>> and we are happy with them. Limits I can see in both ways are:
>>>
>>> Counters:
>>>
>>> - accuracy indeed (Tend to be small in our use case < 5% - when the
>>> business allow 10%, so fair enough for us) + we recount them through a
>>> batch processing tool (spark / hadoop - Kind of lambda architecture). So
>>> our real-time stats are inaccurate and after a few minutes or hours we have
>>> the real value.
>>> - Read-Before-Write model, which is an anti-pattern. Makes you use more
>>> machine due to the pressure involved, affordable for us too.
>>>
>>> Raw data (counted)
>>>
>>> - Space used (can become quite impressive very fast, depending on your
>>> business) !
>>> - Time to answer a request (we expose the data to customer, they don't
>>> want to wait 10 sec for Cassandra to read 1 000 000 + columns)
>>> - Performances in o(n) (linear) instead of o(1) (constant). Customer
>>> won't always understand that for you it is harder to read 1 than 1 000 000,
>>> since it should be reading 1 number in both case, and your interface will
>>> have very unstable read time.
>>>
>>> Pick the best solution (or combination) for your use case. Those
>>> disadvantages lists are not exhaustive, just things that came to my mind
>>> right now.
>>>
>>> C*heers
>>>
>>> Alain
>>>
>>> 2014-12-29 13:33 GMT+01:00 Ajay :
>>>
 Hi,

 So you mean to say counters are not accurate? (It is highly likely that
 multiple parallel threads trying to increment the counter as users click
 the links).

 Thanks
 Ajay


 On Mon, Dec 29, 2014 at 4:49 PM, Janne Jalkanen <
 janne.jalka...@ecyrd.com> wrote:

>
> Hi!
>
> It’s really a tradeoff between accurate and fast and your read access
> patterns; if you need it to be fairly fast, use counters by all means, but
> accept the fact that they will (especially in older versions of cassandra
> or adverse network conditions) drift off from the true click count.  If 
> you
> need accurate, use a timeuuid and count the rows (this is fairly safe for
> replays too).  However, if using timeuuids your storage will need lots of
> space; and your reads will be slow if the click counts are huge (because
> Cassandra will need to read every item).  Using counters makes it easy to
> just grab a slice of the time series data and shove it to a client for
> visualization.
>
> You could of course do a hybrid system; use timeuuids and then
> periodically count and add the result to a regular column, and then remove
> the columns.  Note that you might want to optimize this so that you don’t
> end up with a lot of tombstones, e.g. by bucketing the writes so that you
> can delete everything with just a single partition delete.
>
> At Thinglink some of the more important counters that we use are
> backed up by the actual data. So for speed purposes we use always counters
> for reads, but there’s a repair process that fixes the counter value if we
> suspect it starts drifting off the real data too much.  (You might be able
> to tell that we’ve 

Re: CQL3 vs Thrift

2014-12-29 Thread Peter Lin
The kind of query language I'm thinking of is closer to Datalog, which is
what Datomic uses. It's a personal bias, but I find it easier and cleaner
to express joins, subqueries and correlated subqueries in a
LISP-like/datalog like syntax than SQL.

Since CQL is modeled/inspired by SQL, it inherits the same limitations in
syntax and expressiveness. For simple single table queries SQL is easier
and most people are familiar with it. For expressing multi-dimensional
queries on data that uses star/snowflake schema, SQL isn't a good fit.
There's plenty of literature on this topic, so I'm not alone in this.

When we get into things like temporal queries and temporal query languages,
some major changes to CQL would be needed. My guess, the changes would be
significant enough that sticking with CQL syntax starts to be counter
productive. Anyone that has used or built bi-temporal databases knows this
first hand. For more than 2 decades many people put up with using SQL for
temporal queries, but it sucks. There's no nice way to put it.

a popular use case for Cassandra is time series data. Basically people are
using it as a simple temporal database of sorts. Once you take a step back
and look at the existing research on temporal databases/active databases,
it becomes quite clear all of us have been forced to use a language that
isn't well suited for temporal databases. Just look at all the different
ways people use Cassandra to store time series data.

peter

On Mon, Dec 29, 2014 at 6:45 PM, Eric Stevens  wrote:

> So while not exactly the same, this seems like a good analogy for
> suggesting a third interface to fix problems with existing interfaces:
> http://xkcd.com/927/
>
> Even if the CQL parsing code in Cassandra is subpar (I haven't studied
> it), that's not an especially compelling case to suggest replacing the
> query language itself.  It seems like the sort of thing that could be fixed
> in a perfectly compatible and transparent way at the point when it starts
> to introduce problems.  In fact, a major upside to CQL is that it puts less
> of the work on the client, making it easier to address problems, introduce
> new features, and deprecate old ones over a fixed interface like Thrift
> which requires all client libs to keep up to date for the adoption of any
> new features.
>
> The limitations of CQL come more from the underlying storage engine
> limitations, and the query interface won't change those.  I was resistant
> to CQL at first as well.  Having used it for a while, I'm honestly glad to
> put Thrift behind me (6 months ago I probably wouldn't have had the same
> opinion).  The more I use it, the more I have come to like it.
>
> I started as a skeptic, and became a convert.
>
> On Mon, Dec 29, 2014 at 12:04 PM, Peter Lin  wrote:
>
>>
>> In my bias opinion something else should replace CQL and it needs a
>> proper rewrite on the sever side.
>>
>> I've studied the code and having written query parsers and planners, what
>> is there today isn't going to work long term.
>>
>> Whatever replaced both thrift and CQL needs to provide 100% of the
>> features that exist today
>>
>> Sent from my iPhone
>>
>> On Dec 29, 2014, at 1:34 PM, Robert Coli  wrote:
>>
>> On Tue, Dec 23, 2014 at 10:26 AM, Peter Lin  wrote:
>>
>>>
>>> I'm bias in favor of using both thrift and CQL3, though many people on
>>> the list probably think I'm crazy.
>>>
>>
>> I don't think you're "crazy" but I do think you will ultimately face the
>> deprecation of thrift.
>>
>> Briefly, I disbelieve the idea that Cassandra can or would or should keep
>> two incompatible, non-pluggable APIs. I therefore assert that the Apache
>> Cassandra team will not likely do this unreasonable thing, and thrift will
>> eventually be removed.
>>
>> I strongly anti-recommend new uses of thrift/"legacy tables" for this
>> reason.
>>
>> =Rob
>>
>>
>


Re: Internal pagination in secondary index queries

2014-12-29 Thread Jonathan Haddad
Secondary indexes are there for convenience, not performance.  If you're
looking for something performant, you'll need to maintain your own indexes.


On Mon Dec 29 2014 at 3:22:58 PM Sam Klock  wrote:

> Hi folks,
>
> Perhaps this is a question better addressed to the Cassandra developers
> directly, but I thought I'd ask it here first.  We've recently been
> benchmarking certain uses of secondary indexes in Cassandra 2.1.x, and
> we've noticed that when the number of items in an index reaches beyond
> some threshold (perhaps several tens of thousands depending on the
> cardinality) performance begins to degrade substantially.  This is
> particularly the case when the client does things it probably shouldn't
> do (like manually paginate results), but we suspect there's at least
> one issue in Cassandra having an impact here that we'd like to
> understand better.
>
> Our investigation led us to logic in Cassandra used to paginate scans
> of rows in indexes on composites.  The issue seems to be the short
> algorithm Cassandra uses to select the size of the pages for the scan,
> partially given on the following two lines (from
> o.a.c.db.index.composites.CompositesSearcher):
>
> private int meanColumns = Math.max(index.getIndexCfs().getMeanColumns(),
> 1);
> private int rowsPerQuery = Math.max(Math.min(filter.maxRows(),
> filter.maxColumns() / meanColumns), 2);
>
> The value computed for rowsPerQuery appears to be the page size.
>
> Based on our reading of the code, unless the value obtained for
> meanColumns is very small, a large query-level page size is used, or
> the DISTINCT keyword is used, the value for (filter.maxColumns() /
> meanColumns) always ends up being small enough that the page size is
> 2.  This seems to be the case both for very low-cardinality indexes
> (two different indexed values) and for indexes with higher
> cardinalities as long as the number of entries per index row is more
> than a few thousand.
>
> The fact that we consistently get such a small page size appears to
> have a substantial impact on performance.  The overhead is simply
> devastating, especially since it looks like the pages are likely to
> overlap with each other (the last element of one page is the first
> element of the next).  To wit: if we fix the index page size in code to
> a very large number, index queries in our environment that prior
> required over two minutes to complete can finish in under ten seconds.
>
> Some (but probably not this much) overhead might be acceptable if the
> algorithm is intended to achieve other worthy goals (safety?).  But
> what's puzzling to us is that we can't figure out what it's intended to
> do.  We suspect the algorithm is simply buggy, but we'd like insight
> from knowledgeable parties before we draw that conclusion and try to
> find a different solution.
>
> Does anyone here have relevant experience with secondary indexes that
> might shed light on the design choice here?  In particular, can anyone
> (perhaps the developers?) explain what this algorithm is intended to do
> and what we might do to safely get around this limitation?
>
> Also (to the developers watching this list): is this the sort of
> question we should be addressing to the dev list directly?
>
> Thanks,
> SK
>


Re: CQL3 vs Thrift

2014-12-29 Thread Eric Stevens
So while not exactly the same, this seems like a good analogy for
suggesting a third interface to fix problems with existing interfaces:
http://xkcd.com/927/

Even if the CQL parsing code in Cassandra is subpar (I haven't studied it),
that's not an especially compelling case to suggest replacing the query
language itself.  It seems like the sort of thing that could be fixed in a
perfectly compatible and transparent way at the point when it starts to
introduce problems.  In fact, a major upside to CQL is that it puts less of
the work on the client, making it easier to address problems, introduce new
features, and deprecate old ones over a fixed interface like Thrift which
requires all client libs to keep up to date for the adoption of any new
features.

The limitations of CQL come more from the underlying storage engine
limitations, and the query interface won't change those.  I was resistant
to CQL at first as well.  Having used it for a while, I'm honestly glad to
put Thrift behind me (6 months ago I probably wouldn't have had the same
opinion).  The more I use it, the more I have come to like it.

I started as a skeptic, and became a convert.

On Mon, Dec 29, 2014 at 12:04 PM, Peter Lin  wrote:

>
> In my bias opinion something else should replace CQL and it needs a proper
> rewrite on the sever side.
>
> I've studied the code and having written query parsers and planners, what
> is there today isn't going to work long term.
>
> Whatever replaced both thrift and CQL needs to provide 100% of the
> features that exist today
>
> Sent from my iPhone
>
> On Dec 29, 2014, at 1:34 PM, Robert Coli  wrote:
>
> On Tue, Dec 23, 2014 at 10:26 AM, Peter Lin  wrote:
>
>>
>> I'm bias in favor of using both thrift and CQL3, though many people on
>> the list probably think I'm crazy.
>>
>
> I don't think you're "crazy" but I do think you will ultimately face the
> deprecation of thrift.
>
> Briefly, I disbelieve the idea that Cassandra can or would or should keep
> two incompatible, non-pluggable APIs. I therefore assert that the Apache
> Cassandra team will not likely do this unreasonable thing, and thrift will
> eventually be removed.
>
> I strongly anti-recommend new uses of thrift/"legacy tables" for this
> reason.
>
> =Rob
>
>


Re: 2.0.10 to 2.0.11 upgrade and immediate ParNew and CMS GC storm

2014-12-29 Thread mck

> Perf is better, correctness seems less so. I value latter more than
> former.


Yeah no doubt.
Especially in CASSANDRA-6285 i see some scary stuff went down.

But there are no outstanding bugs that we know of, are there? 
 (CASSANDRA-6815 remains just a wrap up of how options are to be
 presented in cassandra.yaml?)

~mck


Internal pagination in secondary index queries

2014-12-29 Thread Sam Klock
Hi folks,

Perhaps this is a question better addressed to the Cassandra developers 
directly, but I thought I'd ask it here first.  We've recently been 
benchmarking certain uses of secondary indexes in Cassandra 2.1.x, and 
we've noticed that when the number of items in an index reaches beyond 
some threshold (perhaps several tens of thousands depending on the  
cardinality) performance begins to degrade substantially.  This is  
particularly the case when the client does things it probably shouldn't 
do (like manually paginate results), but we suspect there's at least  
one issue in Cassandra having an impact here that we'd like to  
understand better.

Our investigation led us to logic in Cassandra used to paginate scans 
of rows in indexes on composites.  The issue seems to be the short 
algorithm Cassandra uses to select the size of the pages for the scan, 
partially given on the following two lines (from 
o.a.c.db.index.composites.CompositesSearcher):

private int meanColumns = Math.max(index.getIndexCfs().getMeanColumns(), 1);
private int rowsPerQuery = Math.max(Math.min(filter.maxRows(), 
filter.maxColumns() / meanColumns), 2);

The value computed for rowsPerQuery appears to be the page size.

Based on our reading of the code, unless the value obtained for 
meanColumns is very small, a large query-level page size is used, or 
the DISTINCT keyword is used, the value for (filter.maxColumns() / 
meanColumns) always ends up being small enough that the page size is 
2.  This seems to be the case both for very low-cardinality indexes 
(two different indexed values) and for indexes with higher 
cardinalities as long as the number of entries per index row is more 
than a few thousand.

The fact that we consistently get such a small page size appears to 
have a substantial impact on performance.  The overhead is simply 
devastating, especially since it looks like the pages are likely to 
overlap with each other (the last element of one page is the first 
element of the next).  To wit: if we fix the index page size in code to 
a very large number, index queries in our environment that prior 
required over two minutes to complete can finish in under ten seconds.

Some (but probably not this much) overhead might be acceptable if the 
algorithm is intended to achieve other worthy goals (safety?).  But 
what's puzzling to us is that we can't figure out what it's intended to 
do.  We suspect the algorithm is simply buggy, but we'd like insight 
from knowledgeable parties before we draw that conclusion and try to 
find a different solution.

Does anyone here have relevant experience with secondary indexes that 
might shed light on the design choice here?  In particular, can anyone 
(perhaps the developers?) explain what this algorithm is intended to do 
and what we might do to safely get around this limitation?

Also (to the developers watching this list): is this the sort of 
question we should be addressing to the dev list directly?

Thanks,
SK


Re: 2.0.10 to 2.0.11 upgrade and immediate ParNew and CMS GC storm

2014-12-29 Thread Robert Coli
On Mon, Dec 29, 2014 at 2:03 PM, mck  wrote:

> We saw an improvement when we switched to HSHA, particularly for our
> offline (hadoop/spark) nodes.
> Sorry i don't have the data anymore to support that statement, although
> i can say that improvement paled in comparison to cross_node_timeout
> which we enabled shortly afterwards.
>

Perf is better, correctness seems less so. I value latter more than former.

=Rob


Re: Nodes Dying in 2.1.2

2014-12-29 Thread Robert Coli
Might be https://issues.apache.org/jira/browse/CASSANDRA-8061 or one of the
linked/duplicate tickets.

=Rob

On Mon, Dec 29, 2014 at 1:40 PM, Robert Coli  wrote:

> On Wed, Dec 24, 2014 at 9:41 AM, Phil Burress 
> wrote:
>
>> Just upgraded our cluster from 2.1.1 to 2.1.2 and our nodes keep dying.
>> The kernel is killing the process due to out of memory:
>>
>
> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
>
>
> Appears to only occur during compactions. We've tried playing with the
>> heap settings but nothing has worked thus far. We did not have this issue
>> until we upgraded. Anyone else run into this or have suggestions?
>>
>
> I would :
>
> 1) see if downgrade is possible (while unsupported, it probably is
> possible) and downgrade if so
> 2) search JIRA 2.1 era for related issues
> 3) examine changes from 2.1.1 to 2.1.2 which relate to compaction
> 4) file a JIRA describing yr experience if no prior one exists
>
> =Rob
>


Re: 2.0.10 to 2.0.11 upgrade and immediate ParNew and CMS GC storm

2014-12-29 Thread mck
> > Should I stick to 2048 or try
> > with something closer to 128 or even something else ?


2048 worked fine for us.


> > About HSHA,
> 
> I anti-recommend hsha, serious apparently unresolved problems exist with
> it.


We saw an improvement when we switched to HSHA, particularly for our
offline (hadoop/spark) nodes.
Sorry i don't have the data anymore to support that statement, although
i can say that improvement paled in comparison to cross_node_timeout
which we enabled shortly afterwards.

~mck


Re: Changing replication factor of Cassandra cluster

2014-12-29 Thread Robert Coli
On Mon, Dec 29, 2014 at 1:40 PM, Pranay Agarwal 
wrote:

> I want to understand what is the best way to increase/change the replica
> factor of the cassandra cluster? My priority is consistency and probably I
> am tolerant about some down time of the cluster. Is it totally weird to try
> changing replica later or are there people doing it for production env in
> past?
>

The way you are doing it is fine, but risks false-negative reads.

Basically, if you ask the wrong node "does this key exist" before it is
repaired, you will get the answer "no" when in fact it does exist under the
RF=1 paradigm. Unfortunately the only way to avoid this case is to do all
reads with ConsistencyLevel.ALL until the whole cluster is repaired.

=Rob


Re: Node down during move

2014-12-29 Thread Robert Coli
On Tue, Dec 23, 2014 at 12:29 AM, Jiri Horky  wrote:

> just a follow up. We've seen this behavior multiple times now. It seems
> that the receiving node loses connectivity to the cluster and thus
> thinks that it is the sole online node, whereas the rest of the cluster
> thinks that it is the only offline node, really just after the streaming
> is over. I am not sure what causes that, but it is reproducible. Restart
> of the affected node helps.
>

Streaming is pretty broken throughout 1.x. Unfortunately no one is likely
to fix whatever is wrong in your old version.

You could try tuning the phi detector, IIRC by increasing the number.

=Rob


Re: 2.0.10 to 2.0.11 upgrade and immediate ParNew and CMS GC storm

2014-12-29 Thread Robert Coli
On Mon, Dec 29, 2014 at 2:29 AM, Alain RODRIGUEZ  wrote:

> Sorry about the gravedigging, but what would be a good start value to tune
> "rpc_max_threads" ?
>

Depends on whether you prefer that clients get a slow thread or none.


> I mean, default is unlimited, the value commented is 2048. Native protocol
> seems to only allow 128 simultaneous threads. Should I stick to 2048 or try
> with something closer to 128 or even something else ?
>

Probably closer to 2048 than unlimited.


> About HSHA, I have tried this mode from time to time since C* 0.8 and
> always faced the "ERROR 12:02:18,971 Read an invalid frame size of 0. Are
> you using TFramedTransport on the client side?" error)". I haven't try for
> a while (1 year maybe), has this been fixed, or is this due to my
> configuration somehow ?
>

I anti-recommend hsha, serious apparently unresolved problems exist with
it. I understand this is FUD, but fool me once shame on you/fool me twice
shame on me.

=Rob


Re: Nodes Dying in 2.1.2

2014-12-29 Thread Robert Coli
On Wed, Dec 24, 2014 at 9:41 AM, Phil Burress 
wrote:

> Just upgraded our cluster from 2.1.1 to 2.1.2 and our nodes keep dying.
> The kernel is killing the process due to out of memory:
>

https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/

Appears to only occur during compactions. We've tried playing with the heap
> settings but nothing has worked thus far. We did not have this issue until
> we upgraded. Anyone else run into this or have suggestions?
>

I would :

1) see if downgrade is possible (while unsupported, it probably is
possible) and downgrade if so
2) search JIRA 2.1 era for related issues
3) examine changes from 2.1.1 to 2.1.2 which relate to compaction
4) file a JIRA describing yr experience if no prior one exists

=Rob


Re: Changing replication factor of Cassandra cluster

2014-12-29 Thread Pranay Agarwal
Thanks Ryan.

I want to understand what is the best way to increase/change the replica
factor of the cassandra cluster? My priority is consistency and probably I
am tolerant about some down time of the cluster. Is it totally weird to try
changing replica later or are there people doing it for production env in
past?

On Tue, Dec 16, 2014 at 9:47 AM, Ryan Svihla  wrote:

> Repair's performance is going to vary heavily by a large number of
> factors, hours for 1 node to finish is within range of what I see in the
> wild, again there are so many factors it's impossible to speculate on if
> that is good or bad for your cluster. Factors that matter include:
>
>1. speed of disk io
>2. amount of ram and cpu on each node
>3. network interface speed
>4. is this multidc or not
>5. are vnodes enabled or not
>6. what are the jvm tunings
>7. compaction settings
>8. current load on the cluster
>9. streaming settings
>
> Suffice it to say to improve repair performance is a full on tuning
> exercise, note you're current operation is going to be worse than
> tradtional repair, as your streaming copies of data around and not just
> doing normal merkel tree work.
>
> Restoring from backup to a new cluster (including how to handle token
> ranges) is discussed in detail here
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_snapshot_restore_new_cluster.html
>
>
> On Mon, Dec 15, 2014 at 4:14 PM, Pranay Agarwal 
> wrote:
>>
>> Hi All,
>>
>>
>> I have 20 nodes cassandra cluster with 500gb of data and replication
>> factor of 1. I increased the replication factor to 3 and ran nodetool
>> repair on each node one by one as the docs says. But it takes hours for 1
>> node to finish repair. Is that normal or am I doing something wrong?
>>
>> Also, I took backup of cassandra data on each node. How do I restore the
>> graph in a new cluster of nodes using the backup? Do I have to have the
>> tokens range backed up as well?
>>
>> -Pranay
>>
>
>
> --
>
> [image: datastax_logo.png] 
>
> Ryan Svihla
>
> Solution Architect
>
> [image: twitter.png]  [image: linkedin.png]
> 
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
>


Re: CQL3 vs Thrift

2014-12-29 Thread Peter Lin

In my bias opinion something else should replace CQL and it needs a proper 
rewrite on the sever side.

I've studied the code and having written query parsers and planners, what is 
there today isn't going to work long term.

Whatever replaced both thrift and CQL needs to provide 100% of the features 
that exist today

Sent from my iPhone

> On Dec 29, 2014, at 1:34 PM, Robert Coli  wrote:
> 
>> On Tue, Dec 23, 2014 at 10:26 AM, Peter Lin  wrote:
>> 
>> I'm bias in favor of using both thrift and CQL3, though many people on the 
>> list probably think I'm crazy.
> 
> I don't think you're "crazy" but I do think you will ultimately face the 
> deprecation of thrift.
> 
> Briefly, I disbelieve the idea that Cassandra can or would or should keep two 
> incompatible, non-pluggable APIs. I therefore assert that the Apache 
> Cassandra team will not likely do this unreasonable thing, and thrift will 
> eventually be removed.
> 
> I strongly anti-recommend new uses of thrift/"legacy tables" for this reason.
> 
> =Rob


Re: CQL3 vs Thrift

2014-12-29 Thread Robert Coli
On Tue, Dec 23, 2014 at 10:26 AM, Peter Lin  wrote:

>
> I'm bias in favor of using both thrift and CQL3, though many people on the
> list probably think I'm crazy.
>

I don't think you're "crazy" but I do think you will ultimately face the
deprecation of thrift.

Briefly, I disbelieve the idea that Cassandra can or would or should keep
two incompatible, non-pluggable APIs. I therefore assert that the Apache
Cassandra team will not likely do this unreasonable thing, and thrift will
eventually be removed.

I strongly anti-recommend new uses of thrift/"legacy tables" for this
reason.

=Rob


Re: diff cassandra.yaml 1.2 --> 2.1

2014-12-29 Thread Alain RODRIGUEZ
I made an error on Topic title.

We are indeed going to do it (that's why I made the mistake), but I am
speaking of 1.2 --> 2.0 here, and we will start by this before going to
2.1, since we want to do it in rolling upgrade way.

Thanks for your enlightening pointer about this vanished "pressure valve".

C*heers

2014-12-29 17:03 GMT+01:00 Jason Wee :

> What you are asking maybe answer in the code level and pretty deep stuff,
> at least from user (like me) point of view. But to quote Jonathan
> in CASSANDRA-3534, Then you will be able to say "use X amount of memory
> for memtables, Y amount for the cache (and monitor Z amount for the bloom
> filters)" which makes the old "pressure valve" code obsolete. To explain
> why is this removed.
>
> There is also another issue discussing which you might find it worth to
> read https://issues.apache.org/jira/browse/CASSANDRA-3143
>
> If I may ask, are you doing cassandra upgrade from 1.2 to 2.1?
>
> Jason
>
> On Mon, Dec 29, 2014 at 10:54 PM, Alain RODRIGUEZ 
> wrote:
>
>> Thanks for the pointer Jason,
>>
>> Yet, I thought that cache and memtables went off-heap only in version 2.1
>> and not 2.0 ("As of Cassandra 2.0, there are two major pieces of the
>> storage engine that still depend on the JVM heap: memtables and the key
>> cache." -->
>> http://www.datastax.com/dev/blog/off-heap-memtables-in-cassandra-2-1).
>> So this "clean up" makes sense to me but in the new 2.1 version of
>> Cassandra. I also read on the same blog that we might have the choice
>> in/off heap for memtables (or more precisely just get memtable buffers
>> off-heap) . If this is true, flush_largest_memtables_at still makes sense.
>> About cache, isn't key cache still in the heap, even in 2.1 ?
>>
>> It looks like the removal of these option looks to me a bit radical and
>> premature. I guess I am missing something in my reasoning but can't figure
>> out what exactly.
>>
>> C*heers,
>>
>> Alain
>>
>> 2014-12-29 14:52 GMT+01:00 Jason Wee :
>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-3534
>>>
>>> On Mon, Dec 29, 2014 at 6:58 PM, Alain RODRIGUEZ 
>>> wrote:
>>>
 Hi guys,

 I am looking at added and dropped option in Cassandra between 1.2.18
 and 2.0.11 and this makes me wonder:

 Why has the index_interval option been removed from cassandra.yaml ? I
 know we can also define it on a per table basis, yet, this global option
 was quite useful to tune memory usage. I also know that this index is now
 kept off-heap, but I can not see when and why this option has been removed,
 any pointer ? Also it seems this option still usable even if not present by
 default on cassandra.yaml, but it is marked as deprecated (
 https://github.com/apache/cassandra/blob/cassandra-2.0.11/src/java/org/apache/cassandra/config/Config.java#L165).
 Is this option deprecated on the table schema definition too ?

 Same kind of questions around the heap "emergency pressure valve" -->
 "flush_largest_memtables_at", "reduce_cache_sizes_at" and
 "reduce_cache_capacity_to", except that those params seems to have been
 dropped directly. Why, is there no more need of it, has some other
 mechanism replaced it, improving things ?

 Hope this wasn't already discussed,I was unable to find information
 about it anyway.

 C*heers !

>>>
>>>
>>
>


Re: diff cassandra.yaml 1.2 --> 2.1

2014-12-29 Thread Jason Wee
What you are asking maybe answer in the code level and pretty deep stuff,
at least from user (like me) point of view. But to quote Jonathan
in CASSANDRA-3534, Then you will be able to say "use X amount of memory for
memtables, Y amount for the cache (and monitor Z amount for the bloom
filters)" which makes the old "pressure valve" code obsolete. To explain
why is this removed.

There is also another issue discussing which you might find it worth to
read https://issues.apache.org/jira/browse/CASSANDRA-3143

If I may ask, are you doing cassandra upgrade from 1.2 to 2.1?

Jason

On Mon, Dec 29, 2014 at 10:54 PM, Alain RODRIGUEZ 
wrote:

> Thanks for the pointer Jason,
>
> Yet, I thought that cache and memtables went off-heap only in version 2.1
> and not 2.0 ("As of Cassandra 2.0, there are two major pieces of the
> storage engine that still depend on the JVM heap: memtables and the key
> cache." -->
> http://www.datastax.com/dev/blog/off-heap-memtables-in-cassandra-2-1). So
> this "clean up" makes sense to me but in the new 2.1 version of Cassandra.
> I also read on the same blog that we might have the choice in/off heap for
> memtables (or more precisely just get memtable buffers off-heap) . If this
> is true, flush_largest_memtables_at still makes sense. About cache, isn't
> key cache still in the heap, even in 2.1 ?
>
> It looks like the removal of these option looks to me a bit radical and
> premature. I guess I am missing something in my reasoning but can't figure
> out what exactly.
>
> C*heers,
>
> Alain
>
> 2014-12-29 14:52 GMT+01:00 Jason Wee :
>
>> https://issues.apache.org/jira/browse/CASSANDRA-3534
>>
>> On Mon, Dec 29, 2014 at 6:58 PM, Alain RODRIGUEZ 
>> wrote:
>>
>>> Hi guys,
>>>
>>> I am looking at added and dropped option in Cassandra between 1.2.18 and
>>> 2.0.11 and this makes me wonder:
>>>
>>> Why has the index_interval option been removed from cassandra.yaml ? I
>>> know we can also define it on a per table basis, yet, this global option
>>> was quite useful to tune memory usage. I also know that this index is now
>>> kept off-heap, but I can not see when and why this option has been removed,
>>> any pointer ? Also it seems this option still usable even if not present by
>>> default on cassandra.yaml, but it is marked as deprecated (
>>> https://github.com/apache/cassandra/blob/cassandra-2.0.11/src/java/org/apache/cassandra/config/Config.java#L165).
>>> Is this option deprecated on the table schema definition too ?
>>>
>>> Same kind of questions around the heap "emergency pressure valve" -->
>>> "flush_largest_memtables_at", "reduce_cache_sizes_at" and
>>> "reduce_cache_capacity_to", except that those params seems to have been
>>> dropped directly. Why, is there no more need of it, has some other
>>> mechanism replaced it, improving things ?
>>>
>>> Hope this wasn't already discussed,I was unable to find information
>>> about it anyway.
>>>
>>> C*heers !
>>>
>>
>>
>


Re: Repair/Compaction Completion Confirmation

2014-12-29 Thread Alain RODRIGUEZ
I noticed (and reported) a bug that made me drop this tool -->
https://github.com/BrianGallew/cassandra_range_repair/issues/16

Might this be related somehow ?

C*heers

Alain

2014-11-21 13:30 GMT+01:00 Paulo Ricardo Motta Gomes <
paulo.mo...@chaordicsystems.com>:

> Hey guys,
>
> Just reviving this thread. In case anyone is using the
> cassandra_range_repair tool (
> https://github.com/BrianGallew/cassandra_range_repair), please sync your
> repositories because the tool was not working before due to a critical bug
> on the token range definition method. For more information on the bug
> please check here:
> https://github.com/BrianGallew/cassandra_range_repair/pull/18
>
> Cheers,
>
> On Tue, Oct 28, 2014 at 7:53 AM, Colin  wrote:
>
>> When I use virtual nodes, I typically use a much smaller number - usually
>> in the range of 10.  This gives me the ability to add nodes easier without
>> the performance hit.
>>
>>
>>
>> --
>> *Colin Clark*
>> +1-320-221-9531
>>
>>
>> On Oct 28, 2014, at 10:46 AM, Alain RODRIGUEZ  wrote:
>>
>> I have been trying this yesterday too.
>>
>> https://github.com/BrianGallew/cassandra_range_repair
>>
>> "Not 100% bullet proof" --> Indeed I found that operations are done
>> multiple times, so it is not very optimised. Though it is open sourced
>> so I guess you can improve things as much as you want and contribute. Here
>> is the issue I raised yesterday
>> https://github.com/BrianGallew/cassandra_range_repair/issues/14.
>>
>> I am also trying to improve our repair automation since we now have
>> multiple DC and up to 800 GB per node. Repairs are quite heavy right now.
>>
>> Good luck,
>>
>> Alain
>>
>> 2014-10-28 4:59 GMT+01:00 Ben Bromhead :
>>
>>> https://github.com/BrianGallew/cassandra_range_repair
>>>
>>> This breaks down the repair operation into very small portions of the
>>> ring as a way to try and work around the current fragile nature of repair.
>>>
>>> Leveraging range repair should go some way towards automating repair
>>> (this is how the automatic repair service in DataStax opscenter works, this
>>> is how we perform repairs).
>>>
>>> We have had a lot of success running repairs in a similar manner against
>>> vnode enabled clusters. Not 100% bullet proof, but way better than nodetool
>>> repair
>>>
>>>
>>>
>>> On 28 October 2014 08:32, Tim Heckman  wrote:
>>>
 On Mon, Oct 27, 2014 at 1:44 PM, Robert Coli 
 wrote:

> On Mon, Oct 27, 2014 at 1:33 PM, Tim Heckman 
> wrote:
>
>> I know that when issuing some operations via nodetool, the command
>> blocks until the operation is finished. However, is there a way to 
>> reliably
>> determine whether or not the operation has finished without monitoring 
>> that
>> invocation of nodetool?
>>
>> In other words, when I run 'nodetool repair' what is the best way to
>> reliably determine that the repair is finished without running something
>> equivalent to a 'pgrep' against the command I invoked? I am curious about
>> trying to do the same for major compactions too.
>>
>
> This is beyond a FAQ at this point, unfortunately; non-incremental
> repair is awkward to deal with and probably impossible to automate.
>
> In The Future [1] the correct solution will be to use incremental
> repair, which mitigates but does not solve this challenge entirely.
>
> As brief meta commentary, it would have been nice if the project had
> spent more time optimizing the operability of the critically important
> thing you must do once a week [2].
>
> https://issues.apache.org/jira/browse/CASSANDRA-5483
>
> =Rob
> [1] http://www.datastax.com/dev/blog/anticompaction-in-cassandra-2-1
> [2] Or, more sensibly, once a month with gc_grace_seconds set to 34
> days.
>

 Thank you for getting back to me so quickly. Not the answer that I was
 secretly hoping for, but it is nice to have confirmation. :)

 Cheers!
 -Tim

>>>
>>>
>>>
>>> --
>>>
>>> Ben Bromhead
>>>
>>> Instaclustr | www.instaclustr.com | @instaclustr
>>>  | +61 415 936 359
>>>
>>
>>
>
>
> --
> *Paulo Motta*
>
> Chaordic | *Platform*
> *www.chaordic.com.br *
> +55 48 3232.3200
>


Re: Why a cluster don't start after cassandra.yaml range_timeout parameter change ?

2014-12-29 Thread Alain RODRIGUEZ
Did you solved this issue ?

I guess nobody answers you because this is very weird. I also guess you've
made some mistake on the configuration.

Anyway, let me know if you managed to get out of the mess somehow or if you
still need help.

C*heers

2014-12-03 15:57 GMT+01:00 Castelain, Alain :

>
>
> Hi,
>
> I had a three node cluster in cassandra 1.2 16 version running well
> So I have changed the range_request_timeout_in_ms form 1 to 2 on
> two nodes and this nodes restarted well.
> On the last node I received this messages from the output.log file :
>
>  INFO 19:27:18,691 Cassandra shutting down...
>  INFO 19:27:18,706 Stop listening to thrift clients
>  INFO 19:27:18,905 Stop listening for CQL clients
>  INFO 19:27:25,153 Announcing shutdown
>  INFO 19:27:25,244 GC for ConcurrentMarkSweep: 6227 ms for 3 collections,
> 1200395040 used; max is 21045379072
>  INFO 19:27:25,244 Pool NameActive   Pending
>  Completed   Blocked  All Time Blocked
>  INFO 19:27:25,245 ReadStage 0 0
>  7960169 0 0
>  INFO 19:27:25,245 RequestResponseStage  0 0
> 29973176 0 0
>  INFO 19:27:25,245 ReadRepairStage   0 0
>  4741703 0 0
>  INFO 19:27:25,245 MutationStage 0 0
> 26456216 0 0
>  INFO 19:27:25,245 ReplicateOnWriteStage 0 0
>   13 0 0
>  INFO 19:27:25,245 GossipStage   0 0
>  1656431 0 0
>  INFO 19:27:25,245 AntiEntropyStage  0 0
>  84461 0 0
>  INFO 19:27:25,246 MigrationStage0 0
>  2534350 0 0
>  INFO 19:27:25,246 MemtablePostFlusher   0 0
>  40644 0 0
>  INFO 19:27:25,246 MemoryMeter   0 0
>  145 0 0
>  INFO 19:27:25,246 FlushWriter   0 0
> 2869 020
>  INFO 19:27:25,246 MiscStage 0 0
> 1651 0 0
>  INFO 19:27:25,246 PendingRangeCalculator0 0
>6 0 0
>  INFO 19:27:25,246 commitlog_archiver0 0
>0 0 0
>  INFO 19:27:25,246 InternalResponseStage 0 0
>  4633968 0 0
>  INFO 19:27:25,247 AntiEntropySessions   0 0
>  15360 0 0
>  INFO 19:27:25,247 HintedHandoff 0 0
>  200 0 0
>  INFO 19:27:25,247 CompactionManager 0 0
>  INFO 19:27:25,247 Commitlog   n/a 0
>  INFO 19:27:25,247 MessagingServicen/a   0/0
>  INFO 19:27:25,247 Cache Type Size
> Capacity   KeysToSave
>   Provider
>  INFO 19:27:25,247 KeyCache  104857412
>  104857600  all
>  INFO 19:27:25,247 RowCache  0
>0  all
>  org.apache.cassandra.cache.SerializingCacheProvider
>  INFO 19:27:25,247 ColumnFamilyMemtable ops,data
>  INFO 19:27:25,248 Test_Sib.Files0,0
>  INFO 19:27:25,248 idfx.report_file_counters 0,0
>  INFO 19:27:25,248 idfx.report_file  0,0
>  INFO 19:27:25,248 SibBackup2.Files  0,0
>  INFO 19:27:25,248 SibTss2.Files 0,0
>  INFO 19:27:25,248 SibLeroyMerlin.Files6316,87364119
>  INFO 19:27:25,248 SibCasto.Files0,0
>  INFO 19:27:25,248 idfxr01.system_properties 0,0
>  INFO 19:27:25,248 idfxr01.edgestore 0,0
>  INFO 19:27:25,248 idfxr01.edgeindex 0,0
>  INFO 19:27:25,248 idfxr01.titan_ids 0,0
>  INFO 19:27:25,248 idfxr01.edgestore_lock_   0,0
>  INFO 19:27:25,248 idfxr01.vertexindex   0,0
>  INFO 19:27:25,248 idfxr01.vertexindex_lock_ 0,0
>  INFO 19:27:25,248 SibBackupTest.Files   0,0
>  INFO 19:27:25,248 SibNorauto.SibCassandraColumnFamilyName
>  1444,753849990
>  INFO 19:27:25,248 SibNorauto.Files  0,0
>  INFO 19:27:25,248 SibTss.Files  0,0
>  INFO 19:27:25,248 SibTss1.Files 0,0
>  INFO 19:27:25,249 OpsCenter.bestpractice_results 0,0
>  INFO 19:27:25,249 OpsCenter.events  12,3520
>  INFO 19:27:25,249 OpsCenter.rollups60   406517,77862510
>  INFO 19:27:25,249 OpsCenter.settings  216,38195
>  INFO 19:27:25,249 OpsCenter.pdps168056,12582

Re: diff cassandra.yaml 1.2 --> 2.1

2014-12-29 Thread Alain RODRIGUEZ
Thanks for the pointer Jason,

Yet, I thought that cache and memtables went off-heap only in version 2.1
and not 2.0 ("As of Cassandra 2.0, there are two major pieces of the
storage engine that still depend on the JVM heap: memtables and the key
cache." -->
http://www.datastax.com/dev/blog/off-heap-memtables-in-cassandra-2-1). So
this "clean up" makes sense to me but in the new 2.1 version of Cassandra.
I also read on the same blog that we might have the choice in/off heap for
memtables (or more precisely just get memtable buffers off-heap) . If this
is true, flush_largest_memtables_at still makes sense. About cache, isn't
key cache still in the heap, even in 2.1 ?

It looks like the removal of these option looks to me a bit radical and
premature. I guess I am missing something in my reasoning but can't figure
out what exactly.

C*heers,

Alain

2014-12-29 14:52 GMT+01:00 Jason Wee :

> https://issues.apache.org/jira/browse/CASSANDRA-3534
>
> On Mon, Dec 29, 2014 at 6:58 PM, Alain RODRIGUEZ 
> wrote:
>
>> Hi guys,
>>
>> I am looking at added and dropped option in Cassandra between 1.2.18 and
>> 2.0.11 and this makes me wonder:
>>
>> Why has the index_interval option been removed from cassandra.yaml ? I
>> know we can also define it on a per table basis, yet, this global option
>> was quite useful to tune memory usage. I also know that this index is now
>> kept off-heap, but I can not see when and why this option has been removed,
>> any pointer ? Also it seems this option still usable even if not present by
>> default on cassandra.yaml, but it is marked as deprecated (
>> https://github.com/apache/cassandra/blob/cassandra-2.0.11/src/java/org/apache/cassandra/config/Config.java#L165).
>> Is this option deprecated on the table schema definition too ?
>>
>> Same kind of questions around the heap "emergency pressure valve" -->
>> "flush_largest_memtables_at", "reduce_cache_sizes_at" and
>> "reduce_cache_capacity_to", except that those params seems to have been
>> dropped directly. Why, is there no more need of it, has some other
>> mechanism replaced it, improving things ?
>>
>> Hope this wasn't already discussed,I was unable to find information about
>> it anyway.
>>
>> C*heers !
>>
>
>


Re: User click count

2014-12-29 Thread Eric Stevens
> If the counters get incorrect, it could't be corrected

You'd have to store something that allowed you to correct it.  For example,
the TimeUUID approach to keep true counts, which are slow to read but
accurate, and a background process that trues up your counter columns
periodically.

On Mon, Dec 29, 2014 at 7:05 AM, Ajay  wrote:

> Thanks for the clarification.
>
> In my case, Cassandra is the only storage. If the counters get incorrect,
> it could't be corrected. For that if we store raw data, we can as well go
> that approach. But the granularity has to be as seconds level as more than
> one user can click the same link. So the data will be huge with more writes
> and more rows to count for reads right?
>
> Thanks
> Ajay
>
>
> On Mon, Dec 29, 2014 at 7:10 PM, Alain RODRIGUEZ 
> wrote:
>
>> Hi Ajay,
>>
>> Here is a good explanation you might want to read.
>>
>>
>> http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters
>>
>> Though we use counters for 3 years now, we used them from start C* 0.8
>> and we are happy with them. Limits I can see in both ways are:
>>
>> Counters:
>>
>> - accuracy indeed (Tend to be small in our use case < 5% - when the
>> business allow 10%, so fair enough for us) + we recount them through a
>> batch processing tool (spark / hadoop - Kind of lambda architecture). So
>> our real-time stats are inaccurate and after a few minutes or hours we have
>> the real value.
>> - Read-Before-Write model, which is an anti-pattern. Makes you use more
>> machine due to the pressure involved, affordable for us too.
>>
>> Raw data (counted)
>>
>> - Space used (can become quite impressive very fast, depending on your
>> business) !
>> - Time to answer a request (we expose the data to customer, they don't
>> want to wait 10 sec for Cassandra to read 1 000 000 + columns)
>> - Performances in o(n) (linear) instead of o(1) (constant). Customer
>> won't always understand that for you it is harder to read 1 than 1 000 000,
>> since it should be reading 1 number in both case, and your interface will
>> have very unstable read time.
>>
>> Pick the best solution (or combination) for your use case. Those
>> disadvantages lists are not exhaustive, just things that came to my mind
>> right now.
>>
>> C*heers
>>
>> Alain
>>
>> 2014-12-29 13:33 GMT+01:00 Ajay :
>>
>>> Hi,
>>>
>>> So you mean to say counters are not accurate? (It is highly likely that
>>> multiple parallel threads trying to increment the counter as users click
>>> the links).
>>>
>>> Thanks
>>> Ajay
>>>
>>>
>>> On Mon, Dec 29, 2014 at 4:49 PM, Janne Jalkanen <
>>> janne.jalka...@ecyrd.com> wrote:
>>>

 Hi!

 It’s really a tradeoff between accurate and fast and your read access
 patterns; if you need it to be fairly fast, use counters by all means, but
 accept the fact that they will (especially in older versions of cassandra
 or adverse network conditions) drift off from the true click count.  If you
 need accurate, use a timeuuid and count the rows (this is fairly safe for
 replays too).  However, if using timeuuids your storage will need lots of
 space; and your reads will be slow if the click counts are huge (because
 Cassandra will need to read every item).  Using counters makes it easy to
 just grab a slice of the time series data and shove it to a client for
 visualization.

 You could of course do a hybrid system; use timeuuids and then
 periodically count and add the result to a regular column, and then remove
 the columns.  Note that you might want to optimize this so that you don’t
 end up with a lot of tombstones, e.g. by bucketing the writes so that you
 can delete everything with just a single partition delete.

 At Thinglink some of the more important counters that we use are backed
 up by the actual data. So for speed purposes we use always counters for
 reads, but there’s a repair process that fixes the counter value if we
 suspect it starts drifting off the real data too much.  (You might be able
 to tell that we’ve been using counters for quite some time :-P)

 /Janne

 On 29 Dec 2014, at 13:00, Ajay  wrote:

 > Hi,
 >
 > Is it better to use Counter to User click count than maintaining
 creating new row as user id : timestamp and count it.
 >
 > Basically we want to track the user clicks and use the same for
 hourly/daily/monthly report.
 >
 > Thanks
 > Ajay


>>>
>>
>


Re: User click count

2014-12-29 Thread Ajay
Thanks for the clarification.

In my case, Cassandra is the only storage. If the counters get incorrect,
it could't be corrected. For that if we store raw data, we can as well go
that approach. But the granularity has to be as seconds level as more than
one user can click the same link. So the data will be huge with more writes
and more rows to count for reads right?

Thanks
Ajay


On Mon, Dec 29, 2014 at 7:10 PM, Alain RODRIGUEZ  wrote:

> Hi Ajay,
>
> Here is a good explanation you might want to read.
>
>
> http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters
>
> Though we use counters for 3 years now, we used them from start C* 0.8 and
> we are happy with them. Limits I can see in both ways are:
>
> Counters:
>
> - accuracy indeed (Tend to be small in our use case < 5% - when the
> business allow 10%, so fair enough for us) + we recount them through a
> batch processing tool (spark / hadoop - Kind of lambda architecture). So
> our real-time stats are inaccurate and after a few minutes or hours we have
> the real value.
> - Read-Before-Write model, which is an anti-pattern. Makes you use more
> machine due to the pressure involved, affordable for us too.
>
> Raw data (counted)
>
> - Space used (can become quite impressive very fast, depending on your
> business) !
> - Time to answer a request (we expose the data to customer, they don't
> want to wait 10 sec for Cassandra to read 1 000 000 + columns)
> - Performances in o(n) (linear) instead of o(1) (constant). Customer won't
> always understand that for you it is harder to read 1 than 1 000 000, since
> it should be reading 1 number in both case, and your interface will have
> very unstable read time.
>
> Pick the best solution (or combination) for your use case. Those
> disadvantages lists are not exhaustive, just things that came to my mind
> right now.
>
> C*heers
>
> Alain
>
> 2014-12-29 13:33 GMT+01:00 Ajay :
>
>> Hi,
>>
>> So you mean to say counters are not accurate? (It is highly likely that
>> multiple parallel threads trying to increment the counter as users click
>> the links).
>>
>> Thanks
>> Ajay
>>
>>
>> On Mon, Dec 29, 2014 at 4:49 PM, Janne Jalkanen > > wrote:
>>
>>>
>>> Hi!
>>>
>>> It’s really a tradeoff between accurate and fast and your read access
>>> patterns; if you need it to be fairly fast, use counters by all means, but
>>> accept the fact that they will (especially in older versions of cassandra
>>> or adverse network conditions) drift off from the true click count.  If you
>>> need accurate, use a timeuuid and count the rows (this is fairly safe for
>>> replays too).  However, if using timeuuids your storage will need lots of
>>> space; and your reads will be slow if the click counts are huge (because
>>> Cassandra will need to read every item).  Using counters makes it easy to
>>> just grab a slice of the time series data and shove it to a client for
>>> visualization.
>>>
>>> You could of course do a hybrid system; use timeuuids and then
>>> periodically count and add the result to a regular column, and then remove
>>> the columns.  Note that you might want to optimize this so that you don’t
>>> end up with a lot of tombstones, e.g. by bucketing the writes so that you
>>> can delete everything with just a single partition delete.
>>>
>>> At Thinglink some of the more important counters that we use are backed
>>> up by the actual data. So for speed purposes we use always counters for
>>> reads, but there’s a repair process that fixes the counter value if we
>>> suspect it starts drifting off the real data too much.  (You might be able
>>> to tell that we’ve been using counters for quite some time :-P)
>>>
>>> /Janne
>>>
>>> On 29 Dec 2014, at 13:00, Ajay  wrote:
>>>
>>> > Hi,
>>> >
>>> > Is it better to use Counter to User click count than maintaining
>>> creating new row as user id : timestamp and count it.
>>> >
>>> > Basically we want to track the user clicks and use the same for
>>> hourly/daily/monthly report.
>>> >
>>> > Thanks
>>> > Ajay
>>>
>>>
>>
>


Re: diff cassandra.yaml 1.2 --> 2.1

2014-12-29 Thread Jason Wee
https://issues.apache.org/jira/browse/CASSANDRA-3534

On Mon, Dec 29, 2014 at 6:58 PM, Alain RODRIGUEZ  wrote:

> Hi guys,
>
> I am looking at added and dropped option in Cassandra between 1.2.18 and
> 2.0.11 and this makes me wonder:
>
> Why has the index_interval option been removed from cassandra.yaml ? I
> know we can also define it on a per table basis, yet, this global option
> was quite useful to tune memory usage. I also know that this index is now
> kept off-heap, but I can not see when and why this option has been removed,
> any pointer ? Also it seems this option still usable even if not present by
> default on cassandra.yaml, but it is marked as deprecated (
> https://github.com/apache/cassandra/blob/cassandra-2.0.11/src/java/org/apache/cassandra/config/Config.java#L165).
> Is this option deprecated on the table schema definition too ?
>
> Same kind of questions around the heap "emergency pressure valve" -->
> "flush_largest_memtables_at", "reduce_cache_sizes_at" and
> "reduce_cache_capacity_to", except that those params seems to have been
> dropped directly. Why, is there no more need of it, has some other
> mechanism replaced it, improving things ?
>
> Hope this wasn't already discussed,I was unable to find information about
> it anyway.
>
> C*heers !
>


Re: Best practice for sorting on frequent updated column?

2014-12-29 Thread Eric Stevens
This is a bit difficult.  Depending on your access patterns and data
volume, I'd be inclined to keep a separate table with a (count,
foreign_key) clustering key.  Then do a client-side join to read the data
back in the order you're looking for.  That will at least make the heavily
updated table have a much smaller cost to update, but at the cost of
impacting read time.  At least related values that haven't changed don't
need to be deleted and inserted again each time this one value changes.

But like you said, this is a read-then-write operation, over time you'll
accumulate a lot of tombstones, and your data may suffer accuracy.  I would
also recommend rotating your partition keys and have a background process
that trues up your object-by-count table into a new partition key on some
schedule you determine.  Live updates write to partition key *n*, and *n*+1,
and your truing up process trues up *n*+1, before your read process changes
changes to reading from *n*+1.  When all readers are done with *n*, you can
delete the whole row, and because nobody is reading from that row any
longer, it doesn't matter how many tombstones it accumulated.  I suggest
using a timestamp for the partition key so it's easy to reason about, and
you can rotate it on a schedule that makes sense for you.

If there's heavy write contention, your data will end up being always off
by a little bit (due to race conditions between the truing up process and
the live process), but will correct itself over time.

On Sat, Dec 27, 2014 at 10:15 AM, ziju feng  wrote:

> I need to sort data on a frequent updated column, such as like count of an
> item. The common way of getting data sorted in Cassandra is to have the
> column to be sorted on as clustering key. However, whenever such column is
> updated, we need to delete the row of old value and insert the new one,
> which not only can generate a lot of tombstones, but also require a
> read-before-write if we don't know the original value (such as using
> counter table to maintain the count and propagate it to the table that
> needs to sort on the count).
>
> I was wondering what is best practice for such use case? I'm currently
> using DSE search to handle it but I would like to see a Cassandra only
> solution.
>
> Thanks.
>


Re: User click count

2014-12-29 Thread Alain RODRIGUEZ
Hi Ajay,

Here is a good explanation you might want to read.

http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters

Though we use counters for 3 years now, we used them from start C* 0.8 and
we are happy with them. Limits I can see in both ways are:

Counters:

- accuracy indeed (Tend to be small in our use case < 5% - when the
business allow 10%, so fair enough for us) + we recount them through a
batch processing tool (spark / hadoop - Kind of lambda architecture). So
our real-time stats are inaccurate and after a few minutes or hours we have
the real value.
- Read-Before-Write model, which is an anti-pattern. Makes you use more
machine due to the pressure involved, affordable for us too.

Raw data (counted)

- Space used (can become quite impressive very fast, depending on your
business) !
- Time to answer a request (we expose the data to customer, they don't want
to wait 10 sec for Cassandra to read 1 000 000 + columns)
- Performances in o(n) (linear) instead of o(1) (constant). Customer won't
always understand that for you it is harder to read 1 than 1 000 000, since
it should be reading 1 number in both case, and your interface will have
very unstable read time.

Pick the best solution (or combination) for your use case. Those
disadvantages lists are not exhaustive, just things that came to my mind
right now.

C*heers

Alain

2014-12-29 13:33 GMT+01:00 Ajay :

> Hi,
>
> So you mean to say counters are not accurate? (It is highly likely that
> multiple parallel threads trying to increment the counter as users click
> the links).
>
> Thanks
> Ajay
>
>
> On Mon, Dec 29, 2014 at 4:49 PM, Janne Jalkanen 
> wrote:
>
>>
>> Hi!
>>
>> It's really a tradeoff between accurate and fast and your read access
>> patterns; if you need it to be fairly fast, use counters by all means, but
>> accept the fact that they will (especially in older versions of cassandra
>> or adverse network conditions) drift off from the true click count.  If you
>> need accurate, use a timeuuid and count the rows (this is fairly safe for
>> replays too).  However, if using timeuuids your storage will need lots of
>> space; and your reads will be slow if the click counts are huge (because
>> Cassandra will need to read every item).  Using counters makes it easy to
>> just grab a slice of the time series data and shove it to a client for
>> visualization.
>>
>> You could of course do a hybrid system; use timeuuids and then
>> periodically count and add the result to a regular column, and then remove
>> the columns.  Note that you might want to optimize this so that you don't
>> end up with a lot of tombstones, e.g. by bucketing the writes so that you
>> can delete everything with just a single partition delete.
>>
>> At Thinglink some of the more important counters that we use are backed
>> up by the actual data. So for speed purposes we use always counters for
>> reads, but there's a repair process that fixes the counter value if we
>> suspect it starts drifting off the real data too much.  (You might be able
>> to tell that we've been using counters for quite some time :-P)
>>
>> /Janne
>>
>> On 29 Dec 2014, at 13:00, Ajay  wrote:
>>
>> > Hi,
>> >
>> > Is it better to use Counter to User click count than maintaining
>> creating new row as user id : timestamp and count it.
>> >
>> > Basically we want to track the user clicks and use the same for
>> hourly/daily/monthly report.
>> >
>> > Thanks
>> > Ajay
>>
>>
>


Re: User click count

2014-12-29 Thread Ajay
Hi,

So you mean to say counters are not accurate? (It is highly likely that
multiple parallel threads trying to increment the counter as users click
the links).

Thanks
Ajay


On Mon, Dec 29, 2014 at 4:49 PM, Janne Jalkanen 
wrote:

>
> Hi!
>
> It’s really a tradeoff between accurate and fast and your read access
> patterns; if you need it to be fairly fast, use counters by all means, but
> accept the fact that they will (especially in older versions of cassandra
> or adverse network conditions) drift off from the true click count.  If you
> need accurate, use a timeuuid and count the rows (this is fairly safe for
> replays too).  However, if using timeuuids your storage will need lots of
> space; and your reads will be slow if the click counts are huge (because
> Cassandra will need to read every item).  Using counters makes it easy to
> just grab a slice of the time series data and shove it to a client for
> visualization.
>
> You could of course do a hybrid system; use timeuuids and then
> periodically count and add the result to a regular column, and then remove
> the columns.  Note that you might want to optimize this so that you don’t
> end up with a lot of tombstones, e.g. by bucketing the writes so that you
> can delete everything with just a single partition delete.
>
> At Thinglink some of the more important counters that we use are backed up
> by the actual data. So for speed purposes we use always counters for reads,
> but there’s a repair process that fixes the counter value if we suspect it
> starts drifting off the real data too much.  (You might be able to tell
> that we’ve been using counters for quite some time :-P)
>
> /Janne
>
> On 29 Dec 2014, at 13:00, Ajay  wrote:
>
> > Hi,
> >
> > Is it better to use Counter to User click count than maintaining
> creating new row as user id : timestamp and count it.
> >
> > Basically we want to track the user clicks and use the same for
> hourly/daily/monthly report.
> >
> > Thanks
> > Ajay
>
>


Re: User click count

2014-12-29 Thread Janne Jalkanen

Hi!

It’s really a tradeoff between accurate and fast and your read access patterns; 
if you need it to be fairly fast, use counters by all means, but accept the 
fact that they will (especially in older versions of cassandra or adverse 
network conditions) drift off from the true click count.  If you need accurate, 
use a timeuuid and count the rows (this is fairly safe for replays too).  
However, if using timeuuids your storage will need lots of space; and your 
reads will be slow if the click counts are huge (because Cassandra will need to 
read every item).  Using counters makes it easy to just grab a slice of the 
time series data and shove it to a client for visualization.

You could of course do a hybrid system; use timeuuids and then periodically 
count and add the result to a regular column, and then remove the columns.  
Note that you might want to optimize this so that you don’t end up with a lot 
of tombstones, e.g. by bucketing the writes so that you can delete everything 
with just a single partition delete.

At Thinglink some of the more important counters that we use are backed up by 
the actual data. So for speed purposes we use always counters for reads, but 
there’s a repair process that fixes the counter value if we suspect it starts 
drifting off the real data too much.  (You might be able to tell that we’ve 
been using counters for quite some time :-P)

/Janne

On 29 Dec 2014, at 13:00, Ajay  wrote:

> Hi,
> 
> Is it better to use Counter to User click count than maintaining creating new 
> row as user id : timestamp and count it.
> 
> Basically we want to track the user clicks and use the same for 
> hourly/daily/monthly report.
> 
> Thanks
> Ajay



User click count

2014-12-29 Thread Ajay
Hi,

Is it better to use Counter to User click count than maintaining creating
new row as user id : timestamp and count it.

Basically we want to track the user clicks and use the same for
hourly/daily/monthly report.

Thanks
Ajay


diff cassandra.yaml 1.2 --> 2.1

2014-12-29 Thread Alain RODRIGUEZ
Hi guys,

I am looking at added and dropped option in Cassandra between 1.2.18 and
2.0.11 and this makes me wonder:

Why has the index_interval option been removed from cassandra.yaml ? I know
we can also define it on a per table basis, yet, this global option was
quite useful to tune memory usage. I also know that this index is now kept
off-heap, but I can not see when and why this option has been removed, any
pointer ? Also it seems this option still usable even if not present by
default on cassandra.yaml, but it is marked as deprecated (
https://github.com/apache/cassandra/blob/cassandra-2.0.11/src/java/org/apache/cassandra/config/Config.java#L165).
Is this option deprecated on the table schema definition too ?

Same kind of questions around the heap "emergency pressure valve" -->
"flush_largest_memtables_at", "reduce_cache_sizes_at" and
"reduce_cache_capacity_to", except that those params seems to have been
dropped directly. Why, is there no more need of it, has some other
mechanism replaced it, improving things ?

Hope this wasn't already discussed,I was unable to find information about
it anyway.

C*heers !


Re: 2.0.10 to 2.0.11 upgrade and immediate ParNew and CMS GC storm

2014-12-29 Thread Alain RODRIGUEZ
Hi,

Sorry about the gravedigging, but what would be a good start value to tune "
rpc_max_threads" ?

I mean, default is unlimited, the value commented is 2048. Native protocol
seems to only allow 128 simultaneous threads. Should I stick to 2048 or try
with something closer to 128 or even something else ?

About HSHA, I have tried this mode from time to time since C* 0.8 and
always faced the "ERROR 12:02:18,971 Read an invalid frame size of 0. Are
you using TFramedTransport on the client side?" error)". I haven't try for
a while (1 year maybe), has this been fixed, or is this due to my
configuration somehow ?

C*heers

Alain

2014-10-29 16:07 GMT+01:00 Peter Haggerty :

> That definitely appears to be the issue. Thanks for pointing that out!
>
> https://issues.apache.org/jira/browse/CASSANDRA-8116
> It looks like 2.0.12 will check for the default and throw an exception
> (thanks Mike Adamson) and also includes a bit more text in the config
> file but I'm thinking that 2.0.12 should be pushed out sooner rather
> than later as anyone using hsha and the default settings will simply
> have their cluster stop working a few minutes after the upgrade and
> without any indication of the actual problem.
>
>
> Peter
>
>
> On Wed, Oct 29, 2014 at 5:23 AM, Duncan Sands 
> wrote:
> > Hi Peter, are you using the hsha RPC server type on this node?  If you
> are,
> > then it looks like rpc_max_threads threads will be allocated on startup
> in
> > 2.0.11 while this wasn't the case before.  This can exhaust your heap if
> the
> > value of rpc_max_threads is too large (eg if you use the default).
> >
> > Ciao, Duncan.
> >
> >
> > On 29/10/14 01:08, Peter Haggerty wrote:
> >>
> >> On a 3 node test cluster we recently upgraded one node from 2.0.10 to
> >> 2.0.11. This is a cluster that had been happily running 2.0.10 for
> >> weeks and that has very little load and very capable hardware. The
> >> upgrade was just your typical package upgrade:
> >>
> >> $ dpkg -s cassandra | egrep '^Ver|^Main'
> >> Maintainer: Eric Evans 
> >> Version: 2.0.11
> >>
> >> Immediately after started it ran a couple of ParNews and then started
> >> executing CMS runs. In 10 minutes the node had become unreachable and
> >> was marked as down by the two other nodes in the ring, which are still
> >> 2.0.10.
> >>
> >> We have jstack output and the server logs but nothing seems to be
> >> jumping out. Has anyone else run into this? What should we be looking
> >> for?
> >>
> >>
> >> Peter
> >>
> >
>