Re: CQL3 Frame Length

2013-01-19 Thread Theo Hultberg
Hi,

Another reason for keeping the frame length in the header is that newer
versions can add fields to frames without older clients breaking. For
example a minor release can add some more content to an existing frame
without older clients breaking. If clients didn't know the full frame
length (and were required by the specification to consume all the bytes)
there would be trailing garbage which would most likely crash the client.

T#

> Hey Sylvain,>

> Thanks for explaining the rationale. When you look at from the perspective
> of the use cases you mention, it makes sense to be able to supply the
> reader with the frame size up front.>

> I've opted to go for serializing the frame into a buffer. Although this
> could materialize an arbitrarily large amount of memory, ultimately the
> driving application has control of the degree to which this can occur, so
> in the grander scheme of things, you can still maintain streaming
semantics.>

> Thanks for the heads up.>

> Cheers,>

> Ben>
>

> On Tue, Jan 8, 2013 at 4:08 PM, Sylvain Lebresne wrote:>

>> Mostly this is because having the frame length is convenient to have in
>> practice.
>>
>> Without pretending that there is only one way to write a server, it is
>> common
>> to separate the phase "read a frame from the network" from the phase
>> "decode
>> the frame" which is often simpler if you can read the frame upfront.
Also,
>> if
>> you don't have the frame size, it means you need to decode the whole
frame
>> before being able to decode the next one, and so you can't parallelize
the
>> decoding.
>>
>> It is true however that it means for the write side that you need to
>> either be
>> able to either pre-compute the frame body size or to serialize it in
memory
>> first. That's a trade of for making it easier on the read side. But if
you
>> want
>> my opinion, on the write side too it's probably worth parallelizing the
>> message
>> encoding (which require you encode it in memory first) since it's an
>> asynchronous protocol and so there will likely be multiple writer
>> simultaneously.
>>
>> --
>> Sylvain
>>
>>
>>
>> On Tue, Jan 8, 2013 at 12:48 PM, Ben Hood <0x6e6...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I've read the CQL wire specification and naively, I can't see how the
>>> frame length length header is used.
>>>
>>> To me, it looks like on the read side, you know which type of structures
>>> to expect based on the opcode and each structure is TLV encoded.
>>>
>>> On the write side, you need to encode TLV structures as well, but you
>>> don't know the overall frame length until you've encoded it. So it would
>>> seem that you either need to pre-calculate the cumulative TLV size
before
>>> you serialize the frame body, or you serialize the frame body to a
buffer
>>> which you can then get the size of and then write to the socket, after
>>> having first written the count out.
>>>
>>> Is there potentially an implicit assumption that the reader will want to
>>> pre-buffer the entire frame before decoding it?
>>>
>>> Cheers,
>>>
>>> Ben
>>>
>>
>>


Re: cql: show tables in a keystone

2013-01-28 Thread Theo Hultberg
the DESCRIBE family of commands in cqlsh are wrappers around queries to the
system keyspace, so if you want to inspect what keyspaces and tables exist
from your application you can do something like:

SELECT columnfamily_name, comment
FROM system.schema_columnfamilies
WHERE keyspace_name = 'test';

or

SELECT * FROM system.schema_keyspaces;

T#


On Mon, Jan 28, 2013 at 8:35 PM, Brian O'Neill wrote:

>
> cqlsh> use keyspace;
> cqlsh:cirrus> describe tables;
>
> For more info:
> cqlsh> help describe
>
> -brian
>
>
> ---
> Brian O'Neill
> Lead Architect, Software Development
> Health Market Science
> The Science of Better Results
> 2700 Horizon Drive € King of Prussia, PA € 19406
> M: 215.588.6024 € @boneill42   €
> healthmarketscience.com
>
> This information transmitted in this email message is for the intended
> recipient only and may contain confidential and/or privileged material. If
> you received this email in error and are not the intended recipient, or
> the person responsible to deliver it to the intended recipient, please
> contact the sender at the email above and delete this email and any
> attachments and destroy any copies thereof. Any review, retransmission,
> dissemination, copying or other use of, or taking any action in reliance
> upon, this information by persons or entities other than the intended
> recipient is strictly prohibited.
>
>
>
>
>
>
>
> On 1/28/13 2:27 PM, "Paul van Hoven" 
> wrote:
>
> >Is there some way in cql to get a list of all tables or column
> >families that belong to a keystore like "show tables" in sql?
>
>
>


New CQL3 driver for Ruby

2013-02-24 Thread Theo Hultberg
Hi,

For the last few weeks I've been working on a CQL3 driver for Ruby. If
you're using Ruby and Cassandra I would very much like your help getting it
production ready.

You can find the code and documentation here:

https://github.com/iconara/cql-rb

The driver supports the full CQL3 protocol except for authentication. It's
implemented purely in Ruby and has no dependencies.

If you try it out and find a bug (which I'm sure you will), please email me
directy (t...@iconara.net) or open an issue in the GitHub project.

yours,
Theo


Re: Limit on the size of a list

2013-05-12 Thread Theo Hultberg
In the CQL3 protocol the sizes of collections are unsigned shorts, so the
maximum number of elements in a LIST<...> is 65,536. There's no check,
afaik, that stops you from creating lists that are bigger than that, but
the protocol doesn't handle returning them (you get the first N - 65536 %
65536 items).

On the other hand the JDBC driver doesn't talk over the binary protocol but
Thrift, doesn't it? In that case there may be other limits.

T#


On Mon, May 13, 2013 at 3:26 AM, Edward Capriolo wrote:

> 2 billion is the maximum theoretically limit of columns under a row. It is
> NOT the maximum limit of a CQL collection. The design of CQL collections
> currently require retrieving the entire collection on read.
>
>
> On Sun, May 12, 2013 at 11:13 AM, Robert Wille wrote:
>
>> I designed a data model for my data that uses a list of UUID's in a
>> column. When I designed my data model, my expectation was that most of the
>> lists would have fewer than a hundred elements, with a few having several
>> thousand. I discovered in my data a list that has nearly 400,000 items in
>> it. When I try to retrieve it, I get the following exception:
>>
>> java.lang.IllegalArgumentException: Illegal Capacity: -14594
>> at java.util.ArrayList.(ArrayList.java:110)
>> at
>> org.apache.cassandra.cql.jdbc.ListMaker.compose(ListMaker.java:54)
>> at
>> org.apache.cassandra.cql.jdbc.TypedColumn.(TypedColumn.java:68)
>> at
>>
>> org.apache.cassandra.cql.jdbc.CassandraResultSet.createColumn(CassandraResu
>> ltSet.java:1086)
>> at
>>
>> org.apache.cassandra.cql.jdbc.CassandraResultSet.populateColumns(CassandraR
>> esultSet.java:161)
>> at
>>
>> org.apache.cassandra.cql.jdbc.CassandraResultSet.(CassandraResultSet.
>> java:134)
>> at
>>
>> org.apache.cassandra.cql.jdbc.CassandraStatement.doExecute(CassandraStateme
>> nt.java:166)
>> at
>>
>> org.apache.cassandra.cql.jdbc.CassandraStatement.executeQuery(CassandraStat
>> ement.java:226)
>>
>>
>> I get this with Cassandra 1.2.4 and the latest snapshot of the JDBC
>> driver. Admittedly, several hundred thousand is quite a lot of items, but
>> odd that I'm getting some kind of wraparound, since 400,000 is a long ways
>> from 2 billion.
>>
>> What are the physical and practical limits on the size of a list? Is it
>> possible to retrieve a range of items from a list?
>>
>> Thanks in advance
>>
>> Robert
>>
>>
>>
>


Getting error "Too many in flight hints"

2013-05-30 Thread Theo Hultberg
Hi,

I'm using Cassandra 1.2.4 on EC2 (3 x m1.large, this is a test cluster),
and my application is talking to it over the binary protocol (I'm using
JRuby and the cql-rb driver). I get this error quite frequently: "Too many
in flight hints: 2411" (the exact number varies)

Has anyone any idea of what's causing it? I'm pushing the cluster quite
hard with writes (but no reads at all).

T#


Re: Getting error "Too many in flight hints"

2013-05-30 Thread Theo Hultberg
thanks a lot for the explanation. if I understand it correctly it basically
back pressure from C*, it's telling me that it's overloaded and that I need
to back off.

I better start a few more nodes, I guess.

T#


On Thu, May 30, 2013 at 10:47 PM, Robert Coli  wrote:

> On Thu, May 30, 2013 at 8:24 AM, Theo Hultberg  wrote:
> > I'm using Cassandra 1.2.4 on EC2 (3 x m1.large, this is a test cluster),
> and
> > my application is talking to it over the binary protocol (I'm using JRuby
> > and the cql-rb driver). I get this error quite frequently: "Too many in
> > flight hints: 2411" (the exact number varies)
> >
> > Has anyone any idea of what's causing it? I'm pushing the cluster quite
> hard
> > with writes (but no reads at all).
>
> The code that produces this message (below) sets the bound based on
> the number of available processors. It is a bound of   number of in
> progress hints. An in progress hint (for some reason redundantly
> referred to as "in flight") is a hint which has been submitted to the
> executor which will ultimately write it to local disk. If you get
> OverloadedException, this means that you were trying to write hints to
> this executor so fast that you risked OOM, so Cassandra refused to
> submit your hint to the hint executor and therefore (partially) failed
> your write.
>
> "
> private static volatile int maxHintsInProgress = 1024 *
> FBUtilities.getAvailableProcessors();
> [... snip ...]
> for (InetAddress destination : targets)
> {
> // avoid OOMing due to excess hints.  we need to do this
> check even for "live" nodes, since we can
> // still generate hints for those if it's overloaded or
> simply dead but not yet known-to-be-dead.
> // The idea is that if we have over maxHintsInProgress
> hints in flight, this is probably due to
> // a small number of nodes causing problems, so we should
> avoid shutting down writes completely to
> // healthy nodes.  Any node with no hintsInProgress is
> considered healthy.
> if (totalHintsInProgress.get() > maxHintsInProgress
> && (hintsInProgress.get(destination).get() > 0 &&
> shouldHint(destination)))
> {
> throw new OverloadedException("Too many in flight
> hints: " + totalHintsInProgress.get());
> }
> "
>
> If Cassandra didn't return this exception, it might OOM while
> enqueueing your hints to be stored. Giving up on trying to enqueue a
> hint for the failed write is chosen instead. The solution is to reduce
> your write rate, ideally by enough that you don't even queue hints in
> the first place.
>
> =Rob
>


Re: [Cassandra] Conflict resolution in Cassandra

2013-06-07 Thread Theo Hultberg
Like Edward says Cassandra's conflict resolution strategy is LWW (last
write wins). This may seem simplistic, but Cassandra's Big Query-esque data
model makes it less of an issue than in a pure key/value-store like Riak,
for example. When all you have is an opaque value for a key you want to be
able to do things like keeping conflicting writes so that you can resolve
them later. Since Cassandra's rows aren't opaque, but more like a sorted
map LWW is almost always enough. With Cassandra you can add new
columns/cells to a row from multiple clients without having to worry about
conflicts. It's only when multiple clients write to the same column/cell
that there is an issue, but in that case you usually can (and you probably
should) model your way around that.

T#


On Fri, Jun 7, 2013 at 4:51 PM, Edward Capriolo wrote:

> Conflicts are managed at the column level.
> 1) If two columns have the same name the column with the highest timestamp
> wins.
> 2) If two columns have the same column name and the same timestamp the
> value of the column is compared and the highest* wins.
>
> Someone correct me if I am wrong about the *. I know the algorithm is
> deterministic, I do not remember if it is highest or lowest.
>
>
> On Thu, Jun 6, 2013 at 6:25 PM, Emalayan Vairavanathan <
> svemala...@yahoo.com> wrote:
>
>> I tried google and found conflicting answers. Thats why wanted to double
>> check with user forum.
>>
>> Thanks
>>
>>   --
>>  *From:* Bryan Talbot 
>> *To:* user@cassandra.apache.org; Emalayan Vairavanathan <
>> svemala...@yahoo.com>
>> *Sent:* Thursday, 6 June 2013 3:19 PM
>> *Subject:* Re: [Cassandra] Conflict resolution in Cassandra
>>
>> For generic questions like this, google is your friend:
>> http://lmgtfy.com/?q=cassandra+conflict+resolution
>>
>> -Bryan
>>
>>
>> On Thu, Jun 6, 2013 at 11:23 AM, Emalayan Vairavanathan <
>> svemala...@yahoo.com> wrote:
>>
>> Hi All,
>>
>> Can someone tell me about the conflict resolution mechanisms provided by
>> Cassandra?
>>
>> More specifically does Cassandra provides a way to define application
>> specific conflict resolution mechanisms (per row basis  / column basis)?
>>or
>> Does it automatically manage the conflicts based on some synchronization
>> algorithms ?
>>
>>
>> Thank you
>> Emalayan
>>
>>
>>
>>
>>
>>
>


Why so many vnodes?

2013-06-10 Thread Theo Hultberg
Hi,

The default number of vnodes is 256, is there any significance in this
number? Since Cassandra's vnodes don't work like for example Riak's, where
there is a fixed number of vnodes distributed evenly over the nodes, why so
many? Even with a moderately sized cluster you get thousands of slices.
Does this matter? If your cluster grows to over thirty machines and you
start looking at ten thousand slices, would that be a problem? I guess trat
traversing a list of a thousand or ten thousand slices to find where a
token lives isn't a huge problem, but are there any other up or downsides
to having a small or large number of vnodes per node?

I understand the benefits for splitting up the ring into pieces, for
example to be able to stream data from more nodes when bootstrapping a new
one, but that works even if each node only has say 32 vnodes (unless your
cluster is truly huge).

yours,
Theo


Re: Why so many vnodes?

2013-06-10 Thread Theo Hultberg
I'm not sure I follow what you mean, or if I've misunderstood what
Cassandra is telling me. Each node has 256 vnodes (or tokens, as the
prefered name seems to be). When I run `nodetool status` each node is
reported as having 256 vnodes, regardless of how many nodes are in the
cluster. A single node cluster has 256 vnodes on the single node, a six
node cluster has 256 nodes on each machine, making 1590 vnodes in total.
When I run `SELECT tokens FROM system.peers` or `nodetool ring` each node
lists 256 tokens.

This is different from how it works in Riak and Voldemort, if I'm not
mistaken, and that is the source of my confusion.

T#


On Mon, Jun 10, 2013 at 4:54 PM, Milind Parikh wrote:

> There are n vnodes regardless of the size of the physical cluster.
> Regards
> Milind
> On Jun 10, 2013 7:48 AM, "Theo Hultberg"  wrote:
>
>> Hi,
>>
>> The default number of vnodes is 256, is there any significance in this
>> number? Since Cassandra's vnodes don't work like for example Riak's, where
>> there is a fixed number of vnodes distributed evenly over the nodes, why so
>> many? Even with a moderately sized cluster you get thousands of slices.
>> Does this matter? If your cluster grows to over thirty machines and you
>> start looking at ten thousand slices, would that be a problem? I guess trat
>> traversing a list of a thousand or ten thousand slices to find where a
>> token lives isn't a huge problem, but are there any other up or downsides
>> to having a small or large number of vnodes per node?
>>
>> I understand the benefits for splitting up the ring into pieces, for
>> example to be able to stream data from more nodes when bootstrapping a new
>> one, but that works even if each node only has say 32 vnodes (unless your
>> cluster is truly huge).
>>
>> yours,
>> Theo
>>
>


Re: Why so many vnodes?

2013-06-10 Thread Theo Hultberg
thanks, that makes sense, but I assume in your last sentence you mean
decrease it for large clusters, not increase it?

T#


On Mon, Jun 10, 2013 at 11:02 PM, Richard Low  wrote:

> Hi Theo,
>
> The number (let's call it T and the number of nodes N) 256 was chosen to
> give good load balancing for random token assignments for most cluster
> sizes.  For small T, a random choice of initial tokens will in most cases
> give a poor distribution of data.  The larger T is, the closer to uniform
> the distribution will be, with increasing probability.
>
> Also, for small T, when a new node is added, it won't have many ranges to
> split so won't be able to take an even slice of the data.
>
> For this reason T should be large.  But if it is too large, there are too
> many slices to keep track of as you say.  The function to find which keys
> live where becomes more expensive and operations that deal with individual
> vnodes e.g. repair become slow.  (An extreme example is SELECT * LIMIT 1,
> which when there is no data has to scan each vnode in turn in search of a
> single row.  This is O(NT) and for even quite small T takes seconds to
> complete.)
>
> So 256 was chosen to be a reasonable balance.  I don't think most users
> will find it too slow; users with extremely large clusters may need to
> increase it.
>
> Richard.
>
>
> On 10 June 2013 18:55, Theo Hultberg  wrote:
>
>> I'm not sure I follow what you mean, or if I've misunderstood what
>> Cassandra is telling me. Each node has 256 vnodes (or tokens, as the
>> prefered name seems to be). When I run `nodetool status` each node is
>> reported as having 256 vnodes, regardless of how many nodes are in the
>> cluster. A single node cluster has 256 vnodes on the single node, a six
>> node cluster has 256 nodes on each machine, making 1590 vnodes in total.
>> When I run `SELECT tokens FROM system.peers` or `nodetool ring` each node
>> lists 256 tokens.
>>
>> This is different from how it works in Riak and Voldemort, if I'm not
>> mistaken, and that is the source of my confusion.
>>
>> T#
>>
>>
>> On Mon, Jun 10, 2013 at 4:54 PM, Milind Parikh wrote:
>>
>>> There are n vnodes regardless of the size of the physical cluster.
>>> Regards
>>> Milind
>>> On Jun 10, 2013 7:48 AM, "Theo Hultberg"  wrote:
>>>
>>>> Hi,
>>>>
>>>> The default number of vnodes is 256, is there any significance in this
>>>> number? Since Cassandra's vnodes don't work like for example Riak's, where
>>>> there is a fixed number of vnodes distributed evenly over the nodes, why so
>>>> many? Even with a moderately sized cluster you get thousands of slices.
>>>> Does this matter? If your cluster grows to over thirty machines and you
>>>> start looking at ten thousand slices, would that be a problem? I guess trat
>>>> traversing a list of a thousand or ten thousand slices to find where a
>>>> token lives isn't a huge problem, but are there any other up or downsides
>>>> to having a small or large number of vnodes per node?
>>>>
>>>> I understand the benefits for splitting up the ring into pieces, for
>>>> example to be able to stream data from more nodes when bootstrapping a new
>>>> one, but that works even if each node only has say 32 vnodes (unless your
>>>> cluster is truly huge).
>>>>
>>>> yours,
>>>> Theo
>>>>
>>>
>>
>


Re: Why so many vnodes?

2013-06-11 Thread Theo Hultberg
But in the paragraph just before Richard said that finding the node that
owns a token becomes slower on large clusters with lots of token ranges, so
increasing it further seems contradictory.

Is this a correct interpretation: finding the node that owns a particular
token becomes slower as the number of nodes (and therefore total token
ranges) increases, but for large clusters you also need to take the time
for bootstraps into account, which will become slower if each node has
fewer token ranges. The speed referred to in the two cases are the speeds
of different operations, and there will be a trade off, and 256 initial
tokens is a trade off that works for most cases.

T#


On Tue, Jun 11, 2013 at 8:37 AM, Alain RODRIGUEZ  wrote:

> I think he actually meant *increase*, for this reason "For small T, a
> random choice of initial tokens will in most cases give a poor distribution
> of data.  The larger T is, the closer to uniform the distribution will be,
> with increasing probability."
>
> Alain
>
>
> 2013/6/11 Theo Hultberg 
>
>> thanks, that makes sense, but I assume in your last sentence you mean
>> decrease it for large clusters, not increase it?
>>
>> T#
>>
>>
>> On Mon, Jun 10, 2013 at 11:02 PM, Richard Low wrote:
>>
>>> Hi Theo,
>>>
>>> The number (let's call it T and the number of nodes N) 256 was chosen to
>>> give good load balancing for random token assignments for most cluster
>>> sizes.  For small T, a random choice of initial tokens will in most cases
>>> give a poor distribution of data.  The larger T is, the closer to uniform
>>> the distribution will be, with increasing probability.
>>>
>>> Also, for small T, when a new node is added, it won't have many ranges
>>> to split so won't be able to take an even slice of the data.
>>>
>>> For this reason T should be large.  But if it is too large, there are
>>> too many slices to keep track of as you say.  The function to find which
>>> keys live where becomes more expensive and operations that deal with
>>> individual vnodes e.g. repair become slow.  (An extreme example is SELECT *
>>> LIMIT 1, which when there is no data has to scan each vnode in turn in
>>> search of a single row.  This is O(NT) and for even quite small T takes
>>> seconds to complete.)
>>>
>>> So 256 was chosen to be a reasonable balance.  I don't think most users
>>> will find it too slow; users with extremely large clusters may need to
>>> increase it.
>>>
>>> Richard.
>>>
>>>
>>> On 10 June 2013 18:55, Theo Hultberg  wrote:
>>>
>>>> I'm not sure I follow what you mean, or if I've misunderstood what
>>>> Cassandra is telling me. Each node has 256 vnodes (or tokens, as the
>>>> prefered name seems to be). When I run `nodetool status` each node is
>>>> reported as having 256 vnodes, regardless of how many nodes are in the
>>>> cluster. A single node cluster has 256 vnodes on the single node, a six
>>>> node cluster has 256 nodes on each machine, making 1590 vnodes in total.
>>>> When I run `SELECT tokens FROM system.peers` or `nodetool ring` each node
>>>> lists 256 tokens.
>>>>
>>>> This is different from how it works in Riak and Voldemort, if I'm not
>>>> mistaken, and that is the source of my confusion.
>>>>
>>>> T#
>>>>
>>>>
>>>> On Mon, Jun 10, 2013 at 4:54 PM, Milind Parikh 
>>>> wrote:
>>>>
>>>>> There are n vnodes regardless of the size of the physical cluster.
>>>>> Regards
>>>>> Milind
>>>>> On Jun 10, 2013 7:48 AM, "Theo Hultberg"  wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> The default number of vnodes is 256, is there any significance in
>>>>>> this number? Since Cassandra's vnodes don't work like for example Riak's,
>>>>>> where there is a fixed number of vnodes distributed evenly over the 
>>>>>> nodes,
>>>>>> why so many? Even with a moderately sized cluster you get thousands of
>>>>>> slices. Does this matter? If your cluster grows to over thirty machines 
>>>>>> and
>>>>>> you start looking at ten thousand slices, would that be a problem? I 
>>>>>> guess
>>>>>> trat traversing a list of a thousand or ten thousand slices to find 
>>>>>> where a
>>>>>> token lives isn't a huge problem, but are there any other up or downsides
>>>>>> to having a small or large number of vnodes per node?
>>>>>>
>>>>>> I understand the benefits for splitting up the ring into pieces, for
>>>>>> example to be able to stream data from more nodes when bootstrapping a 
>>>>>> new
>>>>>> one, but that works even if each node only has say 32 vnodes (unless your
>>>>>> cluster is truly huge).
>>>>>>
>>>>>> yours,
>>>>>> Theo
>>>>>>
>>>>>
>>>>
>>>
>>
>


cql-rb, the CQL3 driver for Ruby has reached v1.0

2013-06-13 Thread Theo Hultberg
After a few months of development and many preview releases cql-rb, the
pure Ruby CQL3 driver has finally reached v1.0.

You can find the code and examples on GitHub:
https://github.com/iconara/cql-rb

T#


Performance issues with CQL3 collections?

2013-06-26 Thread Theo Hultberg
Hi,

I've seen a couple of people on Stack Overflow having problems with
performance when they have maps that they continuously update, and in
hindsight I think I might have run into the same problem myself (but I
didn't suspect it as the reason and designed differently and by accident
didn't use maps anymore).

Is there any reason that maps (or lists or sets) in particular would become
a performance issue when they're heavily modified? As I've understood them
they're not special, and shouldn't be any different performance wise than
overwriting regular columns. Is there something different going on that I'm
missing?

Here are the Stack Overflow questions:

http://stackoverflow.com/questions/17282837/cassandra-insert-perfomance-issue-into-a-table-with-a-map-type/17290981

http://stackoverflow.com/questions/17082963/bad-performance-when-writing-log-data-to-cassandra-with-timeuuid-as-a-column-nam/17123236

yours,
Theo


Re: Performance issues with CQL3 collections?

2013-06-26 Thread Theo Hultberg
do I understand it correctly if I think that collection modifications are
done by reading the collection, writing a range tombstone that would cover
the collection and then re-writing the whole collection again? or is it
just the modified parts of the collection that are covered by the range
tombstones, but you still get massive amounts of them and its just their
number that is the problem.

would this explain the slowdown of writes too? I guess it would if
cassandra needed to read the collection before it wrote the new values,
otherwise I don't understand how this affects writes, but that only says
how much I know about how this works.

T#


On Wed, Jun 26, 2013 at 10:48 AM, Fabien Rousseau  wrote:

> Hi,
>
> I'm pretty sure that it's related to this ticket :
> https://issues.apache.org/jira/browse/CASSANDRA-5677
>
> I'd be happy if someone tests this patch.
> It should apply easily on 1.2.5 & 1.2.6
>
> After applying the patch, by default, the current implementation is still
> used, but modify your cassandra.yaml to add the following one :
> interval_tree_provider: IntervalTreeAvlProvider
>
> (Note that implementations should be interchangeable, because they share
> the same serializers and deserializers)
>
> Also, please note that this patch has not been reviewed nor intensively
> tested... So, it may not be "production ready"
>
> Fabien
>
>
>
>
>
>
>
> 2013/6/26 Theo Hultberg 
>
>> Hi,
>>
>> I've seen a couple of people on Stack Overflow having problems with
>> performance when they have maps that they continuously update, and in
>> hindsight I think I might have run into the same problem myself (but I
>> didn't suspect it as the reason and designed differently and by accident
>> didn't use maps anymore).
>>
>> Is there any reason that maps (or lists or sets) in particular would
>> become a performance issue when they're heavily modified? As I've
>> understood them they're not special, and shouldn't be any different
>> performance wise than overwriting regular columns. Is there something
>> different going on that I'm missing?
>>
>> Here are the Stack Overflow questions:
>>
>>
>> http://stackoverflow.com/questions/17282837/cassandra-insert-perfomance-issue-into-a-table-with-a-map-type/17290981
>>
>>
>> http://stackoverflow.com/questions/17082963/bad-performance-when-writing-log-data-to-cassandra-with-timeuuid-as-a-column-nam/17123236
>>
>> yours,
>> Theo
>>
>
>
>
> --
> Fabien Rousseau
> *
> *
>  www.yakaz.com
>


Re: Performance issues with CQL3 collections?

2013-06-27 Thread Theo Hultberg
the thing I was doing was definitely triggering the range tombstone issue,
this is what I was doing:

UPDATE clocks SET clock = ? WHERE shard = ?

in this table:

CREATE TABLE clocks (shard INT PRIMARY KEY, clock MAP)

however, from the stack overflow posts it sounds like they aren't
necessarily overwriting their collections. I've tried to replicate their
problem with these two statements

INSERT INTO clocks (shard, clock) VALUES (?, ?)
UPDATE clocks SET clock = clock + ? WHERE shard = ?

the first one should create range tombstones because it overwrites the the
map on every insert, and the second should not because it adds to the map.
neither of those seems to have any performance issues, at least not on
inserts.

and it's the slowdown on inserts that confuses me, both the stack overflow
questioners say that they saw a drop in insert performance. I never saw
that in my application, I just got slow reads (and Fabien's explanation
makes complete sense for that). I don't understand how insert performance
could be affected at all, and I know that for non-counter columns cassandra
doesn't read before it writes, but is it the same for collections too? they
are a bit special, but how special are they?

T#


On Fri, Jun 28, 2013 at 7:04 AM, aaron morton wrote:

> Can you provide details of the mutation statements you are running ? The
> Stack Overflow posts don't seem to include them.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 27/06/2013, at 5:58 AM, Theo Hultberg  wrote:
>
> do I understand it correctly if I think that collection modifications are
> done by reading the collection, writing a range tombstone that would cover
> the collection and then re-writing the whole collection again? or is it
> just the modified parts of the collection that are covered by the range
> tombstones, but you still get massive amounts of them and its just their
> number that is the problem.
>
> would this explain the slowdown of writes too? I guess it would if
> cassandra needed to read the collection before it wrote the new values,
> otherwise I don't understand how this affects writes, but that only says
> how much I know about how this works.
>
> T#
>
>
> On Wed, Jun 26, 2013 at 10:48 AM, Fabien Rousseau wrote:
>
>> Hi,
>>
>> I'm pretty sure that it's related to this ticket :
>> https://issues.apache.org/jira/browse/CASSANDRA-5677
>>
>> I'd be happy if someone tests this patch.
>> It should apply easily on 1.2.5 & 1.2.6
>>
>> After applying the patch, by default, the current implementation is still
>> used, but modify your cassandra.yaml to add the following one :
>> interval_tree_provider: IntervalTreeAvlProvider
>>
>> (Note that implementations should be interchangeable, because they share
>> the same serializers and deserializers)
>>
>> Also, please note that this patch has not been reviewed nor intensively
>> tested... So, it may not be "production ready"
>>
>> Fabien
>>
>>
>>
>>
>>
>>
>>
>> 2013/6/26 Theo Hultberg 
>>
>>> Hi,
>>>
>>> I've seen a couple of people on Stack Overflow having problems with
>>> performance when they have maps that they continuously update, and in
>>> hindsight I think I might have run into the same problem myself (but I
>>> didn't suspect it as the reason and designed differently and by accident
>>> didn't use maps anymore).
>>>
>>> Is there any reason that maps (or lists or sets) in particular would
>>> become a performance issue when they're heavily modified? As I've
>>> understood them they're not special, and shouldn't be any different
>>> performance wise than overwriting regular columns. Is there something
>>> different going on that I'm missing?
>>>
>>> Here are the Stack Overflow questions:
>>>
>>>
>>> http://stackoverflow.com/questions/17282837/cassandra-insert-perfomance-issue-into-a-table-with-a-map-type/17290981
>>>
>>>
>>> http://stackoverflow.com/questions/17082963/bad-performance-when-writing-log-data-to-cassandra-with-timeuuid-as-a-column-nam/17123236
>>>
>>> yours,
>>> Theo
>>>
>>
>>
>>
>> --
>> Fabien Rousseau
>> *
>> *
>>  www.yakaz.com
>>
>
>
>


Re: What is best Cassandra client?

2013-07-04 Thread Theo Hultberg
Datastax Java driver: https://github.com/datastax/java-driver

T#


On Thu, Jul 4, 2013 at 10:25 AM, Tony Anecito  wrote:

> Hi All,
> What is the best client to use? I want to use CQL 3.0.3 and have support
> for preparedStatmements. I tried JDBC and the thrift client so far.
>
> Thanks!
>


Re: does anyone store large values in cassandra e.g. 100kb?

2013-07-09 Thread Theo Hultberg
We store objects that are a couple of tens of K, sometimes 100K, and we
store quite a few of these per row, sometimes hundreds of thousands.

One problem we encountered early was that these rows would become so big
that C* couldn't compact the rows in-memory and had to revert to slow
two-pass compactions where it spills partially compacted rows to disk. we
solved that in two ways, first by
increasing in_memory_compaction_limit_in_mb from 64 to 128, and although it
helped a little bit we quickly realized didn't have much effect because
most of the time was taken up by really huge rows many times larger than
that.

We ended up implementing a simple sharding scheme where each row is
actually 36 rows that each contain 1/36 of the range (we take the first
letter in the column key and stick that on the row key on writes, and on
reads we read all 36 rows -- 36 because there are 36 letters and numbers in
the ascii alphabet and our column keys happen to distribute over that quite
nicely).

Cassandra works well with semi-large objects, and it works well with wide
rows, but you have to be careful about the combination where rows get
larger than 64 Mb.

T#


On Mon, Jul 8, 2013 at 8:13 PM, S Ahmed  wrote:

> Hi Peter,
>
> Can you describe your environment, # of documents and what kind of usage
> pattern you have?
>
>
>
>
> On Mon, Jul 8, 2013 at 2:06 PM, Peter Lin  wrote:
>
>> I regularly store word and pdf docs in cassandra without any issues.
>>
>>
>>
>>
>> On Mon, Jul 8, 2013 at 1:46 PM, S Ahmed  wrote:
>>
>>> I'm guessing that most people use cassandra to store relatively smaller
>>> payloads like 1-5kb in size.
>>>
>>> Is there anyone using it to store say 100kb (1/10 of a megabyte) and if
>>> so, was there any tweaking or gotchas that you ran into?
>>>
>>
>>
>


Re: does anyone store large values in cassandra e.g. 100kb?

2013-07-09 Thread Theo Hultberg
yes, by splitting the rows into 36 parts it's very rare that any part gets
big enough to impact the clusters performance. there are still rows that
are bigger than the in memory compaction limit, but when it's only some it
doesn't matter as much.

T#


On Tue, Jul 9, 2013 at 5:43 PM, S Ahmed  wrote:

> So was the point of breaking into 36 parts to bring each row to the 64 or
> 128mb threshold?
>
>
> On Tue, Jul 9, 2013 at 3:18 AM, Theo Hultberg  wrote:
>
>> We store objects that are a couple of tens of K, sometimes 100K, and we
>> store quite a few of these per row, sometimes hundreds of thousands.
>>
>> One problem we encountered early was that these rows would become so big
>> that C* couldn't compact the rows in-memory and had to revert to slow
>> two-pass compactions where it spills partially compacted rows to disk. we
>> solved that in two ways, first by
>> increasing in_memory_compaction_limit_in_mb from 64 to 128, and although it
>> helped a little bit we quickly realized didn't have much effect because
>> most of the time was taken up by really huge rows many times larger than
>> that.
>>
>> We ended up implementing a simple sharding scheme where each row is
>> actually 36 rows that each contain 1/36 of the range (we take the first
>> letter in the column key and stick that on the row key on writes, and on
>> reads we read all 36 rows -- 36 because there are 36 letters and numbers in
>> the ascii alphabet and our column keys happen to distribute over that quite
>> nicely).
>>
>> Cassandra works well with semi-large objects, and it works well with wide
>> rows, but you have to be careful about the combination where rows get
>> larger than 64 Mb.
>>
>> T#
>>
>>
>> On Mon, Jul 8, 2013 at 8:13 PM, S Ahmed  wrote:
>>
>>> Hi Peter,
>>>
>>> Can you describe your environment, # of documents and what kind of usage
>>> pattern you have?
>>>
>>>
>>>
>>>
>>> On Mon, Jul 8, 2013 at 2:06 PM, Peter Lin  wrote:
>>>
>>>> I regularly store word and pdf docs in cassandra without any issues.
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Jul 8, 2013 at 1:46 PM, S Ahmed  wrote:
>>>>
>>>>> I'm guessing that most people use cassandra to store relatively
>>>>> smaller payloads like 1-5kb in size.
>>>>>
>>>>> Is there anyone using it to store say 100kb (1/10 of a megabyte) and
>>>>> if so, was there any tweaking or gotchas that you ran into?
>>>>>
>>>>
>>>>
>>>
>>
>


manually removing sstable

2013-07-10 Thread Theo Hultberg
Hi,

I think I remember reading that if you have sstables that you know contain
only data that whose ttl has expired, it's safe to remove them manually by
stopping c*, removing the *-Data.db files and then starting up c* again. is
this correct?

we have a cluster where everything is written with a ttl, and sometimes c*
needs to compact over a 100 gb of sstables where we know ever has expired,
and we'd rather just manually get rid of those.

T#


Re: manually removing sstable

2013-07-10 Thread Theo Hultberg
thanks a lot. I can confirm that it solved our problem too.

looks like the C* 2.0 feature is perfect for us.

T#


On Wed, Jul 10, 2013 at 7:28 PM, Marcus Eriksson  wrote:

> yep that works, you need to remove all components of the sstable though,
> not just -Data.db
>
> and, in 2.0 there is this:
> https://issues.apache.org/jira/browse/CASSANDRA-5228
>
> /Marcus
>
>
> On Wed, Jul 10, 2013 at 2:09 PM, Theo Hultberg  wrote:
>
>> Hi,
>>
>> I think I remember reading that if you have sstables that you know
>> contain only data that whose ttl has expired, it's safe to remove them
>> manually by stopping c*, removing the *-Data.db files and then starting up
>> c* again. is this correct?
>>
>> we have a cluster where everything is written with a ttl, and sometimes
>> c* needs to compact over a 100 gb of sstables where we know ever has
>> expired, and we'd rather just manually get rid of those.
>>
>> T#
>>
>
>


Re: manually removing sstable

2013-07-11 Thread Theo Hultberg
a colleague of mine came up with an alternative solution that also seems to
work, and I'd just like your opinion on if it's sound.

we run find to list all old sstables, and then use cmdline-jmxclient to run
the forceUserDefinedCompaction function on each of them, this is roughly
what we do (but with find and xargs to orchestrate it)

  java -jar cmdline-jmxclient-0.10.3.jar - localhost:7199
org.apache.cassandra.db:type=CompactionManager
forceUserDefinedCompaction=the_keyspace,db_file_name

the downside is that c* needs to read the file and do disk io, but the
upside is that it doesn't require a restart. c* does a little more work,
but we can schedule that during off-peak hours. another upside is that it
feels like we're pretty safe from screwups, we won't accidentally remove an
sstable with live data, the worst case is that we ask c* to compact an
sstable with live data and end up with an identical sstable.

if anyone else wants to do the same thing, this is the full cron command:

0 4 * * * find /path/to/cassandra/data/the_keyspace_name -maxdepth 1 -type
f -name '*-Data.db' -mtime +8 -printf
"forceUserDefinedCompaction=the_keyspace_name,\%P\n" | xargs -t
--no-run-if-empty java -jar
/usr/local/share/java/cmdline-jmxclient-0.10.3.jar - localhost:7199
org.apache.cassandra.db:type=CompactionManager

just change the keyspace name and the path to the data directory.

T#


On Thu, Jul 11, 2013 at 7:09 AM, Theo Hultberg  wrote:

> thanks a lot. I can confirm that it solved our problem too.
>
> looks like the C* 2.0 feature is perfect for us.
>
> T#
>
>
> On Wed, Jul 10, 2013 at 7:28 PM, Marcus Eriksson wrote:
>
>> yep that works, you need to remove all components of the sstable though,
>> not just -Data.db
>>
>> and, in 2.0 there is this:
>> https://issues.apache.org/jira/browse/CASSANDRA-5228
>>
>> /Marcus
>>
>>
>> On Wed, Jul 10, 2013 at 2:09 PM, Theo Hultberg  wrote:
>>
>>> Hi,
>>>
>>> I think I remember reading that if you have sstables that you know
>>> contain only data that whose ttl has expired, it's safe to remove them
>>> manually by stopping c*, removing the *-Data.db files and then starting up
>>> c* again. is this correct?
>>>
>>> we have a cluster where everything is written with a ttl, and sometimes
>>> c* needs to compact over a 100 gb of sstables where we know ever has
>>> expired, and we'd rather just manually get rid of those.
>>>
>>> T#
>>>
>>
>>
>


Re: manually removing sstable

2013-07-12 Thread Theo Hultberg
thanks aaron, the second point I had not considered, and it could explain
why the sstables don't always disapear completely, sometimes a small file
(but megabytes instead of gigabytes) is left behind.

T#


On Fri, Jul 12, 2013 at 10:25 AM, aaron morton wrote:

> That sounds sane to me. Couple of caveats:
>
> * Remember that Expiring Columns turn into Tombstones and can only be
> purged after TTL and gc_grace.
> * Tombstones will only be purged if all fragments of a row are in the
> SStable(s) being compacted.
>
> Cheers
>
> -
> Aaron Morton
> Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 11/07/2013, at 10:17 PM, Theo Hultberg  wrote:
>
> a colleague of mine came up with an alternative solution that also seems
> to work, and I'd just like your opinion on if it's sound.
>
> we run find to list all old sstables, and then use cmdline-jmxclient to
> run the forceUserDefinedCompaction function on each of them, this is
> roughly what we do (but with find and xargs to orchestrate it)
>
>   java -jar cmdline-jmxclient-0.10.3.jar - localhost:7199
> org.apache.cassandra.db:type=CompactionManager 
> forceUserDefinedCompaction=the_keyspace,db_file_name
>
> the downside is that c* needs to read the file and do disk io, but the
> upside is that it doesn't require a restart. c* does a little more work,
> but we can schedule that during off-peak hours. another upside is that it
> feels like we're pretty safe from screwups, we won't accidentally remove an
> sstable with live data, the worst case is that we ask c* to compact an
> sstable with live data and end up with an identical sstable.
>
> if anyone else wants to do the same thing, this is the full cron command:
>
> 0 4 * * * find /path/to/cassandra/data/the_keyspace_name -maxdepth 1 -type
> f -name '*-Data.db' -mtime +8 -printf
> "forceUserDefinedCompaction=the_keyspace_name,\%P\n" | xargs -t
> --no-run-if-empty java -jar
> /usr/local/share/java/cmdline-jmxclient-0.10.3.jar - localhost:7199
> org.apache.cassandra.db:type=CompactionManager
>
> just change the keyspace name and the path to the data directory.
>
> T#
>
>
> On Thu, Jul 11, 2013 at 7:09 AM, Theo Hultberg  wrote:
>
>> thanks a lot. I can confirm that it solved our problem too.
>>
>> looks like the C* 2.0 feature is perfect for us.
>>
>> T#
>>
>>
>> On Wed, Jul 10, 2013 at 7:28 PM, Marcus Eriksson wrote:
>>
>>> yep that works, you need to remove all components of the sstable though,
>>> not just -Data.db
>>>
>>> and, in 2.0 there is this:
>>> https://issues.apache.org/jira/browse/CASSANDRA-5228
>>>
>>> /Marcus
>>>
>>>
>>> On Wed, Jul 10, 2013 at 2:09 PM, Theo Hultberg  wrote:
>>>
>>>> Hi,
>>>>
>>>> I think I remember reading that if you have sstables that you know
>>>> contain only data that whose ttl has expired, it's safe to remove them
>>>> manually by stopping c*, removing the *-Data.db files and then starting up
>>>> c* again. is this correct?
>>>>
>>>> we have a cluster where everything is written with a ttl, and sometimes
>>>> c* needs to compact over a 100 gb of sstables where we know ever has
>>>> expired, and we'd rather just manually get rid of those.
>>>>
>>>> T#
>>>>
>>>
>>>
>>
>
>


Re: Extract meta-data using cql 3

2013-07-12 Thread Theo Hultberg
there's a keyspace called system which has a few tables that contain the
metadata. for example schema_keyspaces that contain keyspace metadata, and
schema_columnfamilies that contain table metadata. there are more, just
fire up cqlsh and do a describe keyspace in the system keyspace to find
them.

T#


On Fri, Jul 12, 2013 at 10:52 AM, Murali  wrote:

> Hi experts,
>
> How to extract meta-data of a table or a keyspace using CQL 3.0?
>
> --
> Thanks,
> Murali
>
>


Re: CQL decimal encoding

2014-02-24 Thread Theo Hultberg
I don't know if it's by design or if it's by oversight that the data types
aren't part of the binary protocol specification. I had to reverse engineer
how to encode and decode all of them for the Ruby driver. There were
definitely a few bugs in the first few versions that could have been
avoided if there was a specification available.

T#


On Mon, Feb 24, 2014 at 8:43 PM, Paul "LeoNerd" Evans <
leon...@leonerd.org.uk> wrote:

> On Mon, 24 Feb 2014 19:14:48 +
> Ben Hood <0x6e6...@gmail.com> wrote:
>
> > So I have a question about the encoding of 0: \x00\x00\x00\x00\x00.
>
> The first four octets are the decimal shift (0), and the remaining ones
> (one in this case) encode a varint - 0 in this case. So it's
>
>   0 * 10**0
>
> literally zero.
>
> Technically the decimal shift matters not for zero - any four bytes
> could be given as the shift, ending in \x00, but 0 is the simplest.
>
> --
> Paul "LeoNerd" Evans
>
> leon...@leonerd.org.uk
> ICQ# 4135350   |  Registered Linux# 179460
> http://www.leonerd.org.uk/
>


How should clients handle the user defined types in 2.1?

2014-02-24 Thread Theo Hultberg
(I posted this on the client-dev list the other day, but that list seems
dead so I'm cross posting, sorry if it's the wrong thing to do)

Hi,

Is there any documentation on how CQL clients should handle the new user
defined types coming in 2.1? There's nothing in the protocol specification
on how to handle custom types as far as I can see.

For example, I tried creating the "address" type from the description of
CASSANDRA-5590, and this is how its metadata looks (the metadata for a
query contains a column with a custom type and this is the description of
it):

org.apache.cassandra.db.marshal.UserType(user_defined_types,61646472657373,737472656574:org.apache.cassandra.db.marshal.UTF8Type,63697479:org.apache.cassandra.db.marshal.UTF8Type,7a69705f636f6465:org.apache.cassandra.db.marshal.Int32Type,70686f6e6573:org.apache.cassandra.db.marshal.SetType(org.apache.cassandra.db.marshal.UTF8Type))

Is the client supposed to parse that description, and in that case how? I
could probably figure it out but it would be great if someone could point
me to the right docs.

yours,
Theo (author of cql-rb, the Ruby driver)


Re: How should clients handle the user defined types in 2.1?

2014-02-24 Thread Theo Hultberg
There hasn't been any activity (apart from my question) since december, and
only sporadic activity before that, so I think it's essentially dead.

http://www.mail-archive.com/client-dev@cassandra.apache.org/

T#


On Mon, Feb 24, 2014 at 10:34 PM, Ben Hood <0x6e6...@gmail.com> wrote:

> On Mon, Feb 24, 2014 at 7:52 PM, Theo Hultberg  wrote:
> > (I posted this on the client-dev list the other day, but that list seems
> > dead so I'm cross posting, sorry if it's the wrong thing to do)
>
> I didn't even realize there was a list for driver implementors - is
> this used at all? Is it worth being on this list?
>


Re: How should clients handle the user defined types in 2.1?

2014-02-25 Thread Theo Hultberg
thanks for the high level description of the format, I'll see if I can make
a stab at implementing support for custom types now.

and maybe I should take all of the reverse engineering I've done of the
type encoding and decoding and send a pull request for the protocol spec,
or write an appendix.

T#


On Tue, Feb 25, 2014 at 12:10 PM, Sylvain Lebresne wrote:

>
>> Is there any documentation on how CQL clients should handle the new user
>> defined types coming in 2.1? There's nothing in the protocol specification
>> on how to handle custom types as far as I can see.
>>
>
> Can't say there is much documentation so far for that. As for the spec, it
> was written in a time where user defined types didn't existed and so as far
> as the protocol is concerned so far, user defined types are handled by the
> protocol as a "custom type", i.e the full internal class is returned. And
> so ...
>
>
>>
>> For example, I tried creating the "address" type from the description of
>> CASSANDRA-5590, and this is how its metadata looks (the metadata for a
>> query contains a column with a custom type and this is the description of
>> it):
>>
>>
>> org.apache.cassandra.db.marshal.UserType(user_defined_types,61646472657373,737472656574:org.apache.cassandra.db.marshal.UTF8Type,63697479:org.apache.cassandra.db.marshal.UTF8Type,7a69705f636f6465:org.apache.cassandra.db.marshal.Int32Type,70686f6e6573:org.apache.cassandra.db.marshal.SetType(org.apache.cassandra.db.marshal.UTF8Type))
>>
>> Is the client supposed to parse that description, and in that case how?
>>
>
> ... yes, for now you're supposed to parse that description. Which is not
> really much documented outside of looking up the Cassandra code, but I can
> tell you that the first parameter of the UserType is the keyspace name the
> type has been defined in, the second is the type name hex encoded, and the
> rest is list of fields and their type. Each field name is hex encoded and
> separated from it's type by ':'. And that's about it.
>
> We will introduce much shorted definitions in the next iteration of the
> native protocol, but it's yet unclear when that will happen.
>
> --
> Sylvain
>
>
>


Re: How to paginate through all columns in a row?

2014-02-27 Thread Theo Hultberg
You can page yourself using the withColumnRange method (see the slice query
example on the page you linked to). What you do is that you save the last
column you got from the previous query, and you set that as the start of
the range you pass to withColumnRange. You don't need to set an end of a
range, but you want to set a max size.

This code is just a quick rewrite from the page you linked to and I haven't
checked that it worked, but it should give you an idea of where to start

ColumnList result;
int pageSize = 100;
String offset = Character.toString('\0');
while (true) {
  result = keyspace.prepareQuery(CF_STANDARD1)
   .getKey(rowKey)
   .withColumnRange(new
RangeBuilder().setStart(offset).setMaxSize(pageSize).build())
   .execute().getResult();
  while (result.hasNext()) {
Column col = result.next();
// do something with your column here, then save
// the last column to use as the offset when loading the next page
offset = col.getStringValue();
} while (result.size() == pageSize);

I'm using a string with a null byte as the first offset because that should
sort before all strings, but there might be a better way of doing. If you
have non-string columns or composite columns the exact way to do this is a
bit different but I hope this shows you the general idea.

T#



On Thu, Feb 27, 2014 at 11:36 AM, Lu, Boying  wrote:

> Hi, All,
>
>
>
> I'm using Netflix/Astyanax as a java cassandra client to access Cassandra
> DB.
>
>
>
> I need to paginate through all columns in a row and I found the document
> at https://github.com/Netflix/astyanax/wiki/Reading-Data
>
> about how to do that.
>
>
>
> But my requirement is a little different.  I don't want to do paginate in
> 'one querying session',
>
> i.e. I don't want to hold the returned 'RowQuery' object to get next page.
>
>
>
> Is there any way that I can keep a 'marker' for next page, so by using the
> marker,
>
> I can tell the Cassandra DB that where to start query.
>
> e.g.  the query result has three 'pages',
>
> Can I build the query by giving a marker pointed to the 'page 2' and
> Cassandra will return the second page of the query?
>
>
>
> Thanks a lot.
>
>
>
> Boying
>
>
>


Re: Proposal: freeze Thrift starting with 2.1.0

2014-03-12 Thread Theo Hultberg
Speaking as a CQL driver maintainer (Ruby) I'm +1 for end-of-lining Thrift.

I agree with Edward that it's unfortunate that there are no official
drivers being maintained by the Cassandra maintainers -- even though the
current state with the Datastax drivers is in practice very close (it is
not the same thing though).

However, I don't agree that not having drivers in the same repo/project is
a problem. Whether or not there's a Java driver in the Cassandra source or
not doesn't matter at all to us non-Java developers, and I don't see any
difference between the situation where there's no driver in the source or
just a Java driver. I might have misunderstood Edwards point about this,
though.

The CQL protocol is the key, as others have mentioned. As long as that is
maintained, and respected I think it's absolutely fine not having any
drivers shipped as part of Cassandra. However, I feel as this has not been
the case lately. I'm thinking particularly about the UDT feature of 2.1,
which is not a part of the CQL spec. There is no documentation on how
drivers should handle them and what a user should be able to expect from a
driver, they're completely implemented as custom types.

I hope this will be fixed before 2.1 is released (and there's been good
discussions on the mailing lists about how a driver should handle UDTs),
but it shows a problem with the the-spec-is-the-thruth argument. I think
we'll be fine as long as the spec is the truth, but that requires the spec
to be the truth and new features to not be bolted on outside of the spec.

T#


On Wed, Mar 12, 2014 at 3:23 PM, Peter Lin  wrote:

> I'm enjoying the discussion also.
>
> @Brian
> I've been looking at spark/shark along with other recent developments the
> last few years. Berkeley has been doing some interesting stuff. One reason
> I like Thrift is for type safety and the benefits for query validation and
> query optimization. One could do similar things with CQL, but it's just
> more work, especially with dynamic columns. I know others are mixing static
> with dynamic columns, so I'm not alone. I have no clue how long it will
> take to get there, but having tools like query explanation is a big time
> saver. Writing business reports is hard enough, so every bit of help the
> tool can provide makes it less painful.
>
>
> On Wed, Mar 12, 2014 at 10:12 AM, Brian O'Neill wrote:
>
>>
>> just when you thought the thread died...
>>
>>
>> First, let me say we are *WAY* off topic.  But that is a good thing.
>> I love this community because there are a ton of passionate, smart
>> people. (often with differing perspectives ;)
>>
>> RE: Reporting against C* (@Peter Lin)
>> We've had the same experience.  Pig + Hadoop is painful.  We are
>> experimenting with Spark/Shark, operating directly against the data.
>> http://brianoneill.blogspot.com/2014/03/spark-on-cassandra-w-calliope.html
>>
>> The Shark layer gives you SQL and caching capabilities that make it easy
>> to use and fast (for smaller data sets).  In front of this, we are going to
>> add dimensional aggregations so we can operate at larger scales.  (then the
>> Hive reports will run against the aggregations)
>>
>> RE: REST Server (@Russel Bradbury)
>> We had moderate success with Virgil, which was a REST server built
>> directly on Thrift.  We built it directly on top of Thrift, so one day it
>> could be easily embedded in the C* server itself.   It could be deployed
>> separately, or run an embedded C*.  More often than not, we ended up
>> running it separately to separate the layers.  (just like Titan and
>> Rexster)  I've started on a rewrite of Virgil called Memnon that rides on
>> top of CQL. (I'd love some help)
>> https://github.com/boneill42/memnon
>>
>> RE: CQL vs. Thrift
>> We've hitched our wagons to CQL.  CQL != Relational.
>> We've had success translating our "native" schemas into CQL, including
>> all the NoSQL goodness of wide-rows, etc.  You just need a good
>> understanding of how things translate into storage and underlying CFs.  If
>> anything, I think we could add some DESCRIBE information, which would help
>> users with this, along the lines of:
>> (https://issues.apache.org/jira/browse/CASSANDRA-6676)
>>
>> CQL does open up the *opportunity* for users to articulate more complex
>> queries using more familiar syntax.  (including future things such as
>> joins, grouping, etc.)   To me, that is exciting, and again -- one of the
>> reasons we are leaning on it.
>>
>> my two cents,
>> brian
>>
>> ---
>>
>> Brian O'Neill
>>
>> Chief Technology Officer
>>
>>
>> *Health Market Science*
>>
>> *The Science of Better Results*
>>
>> 2700 Horizon Drive * King of Prussia, PA * 19406
>>
>> M: 215.588.6024 * @boneill42   *
>>
>> healthmarketscience.com
>>
>>
>> This information transmitted in this email message is for the intended
>> recipient only and may contain confidential and/or privileged material. If
>> you received this email in error and are not the int

Re: Production Quality Ruby Driver?

2014-03-19 Thread Theo Hultberg
I'm the author of cql-rb, the first one on your list. It runs in production
in systems doing tens of thousands of operations per second. cequel is an
ORM and its latest version runs on top of cql-rb.

If you decide on using cql-rb I'm happy to help you out with any problems
you might have, just open an issue on the GitHub project page.

yours
Theo


On Mon, Mar 17, 2014 at 6:55 PM, NORD SC  wrote:

> Hi,
>
> I am looking for a Ruby driver that is production ready and truly supports
> CQL 3. Can anyone strongly recommend one in particular?
>
>
> I found
>
> - https://github.com/iconara/cql-rb
> - https://github.com/kreynolds/cassandra-cql
> - https://github.com/cequel/cequel
>
>
> Jan
>
>


Re: Production Quality Ruby Driver?

2014-03-19 Thread Theo Hultberg
And cql-rb is full featured when it comes to CQL3. It supports all features
of Cassandra 1.2. For some of the Cassandra 2.0 features you have to wait
for a final version of 2.0, but the current prerelease is stable and well
tested.

yours
Theo


On Wed, Mar 19, 2014 at 5:21 PM, Theo Hultberg  wrote:

> I'm the author of cql-rb, the first one on your list. It runs in
> production in systems doing tens of thousands of operations per second.
> cequel is an ORM and its latest version runs on top of cql-rb.
>
> If you decide on using cql-rb I'm happy to help you out with any problems
> you might have, just open an issue on the GitHub project page.
>
> yours
> Theo
>
>
> On Mon, Mar 17, 2014 at 6:55 PM, NORD SC wrote:
>
>> Hi,
>>
>> I am looking for a Ruby driver that is production ready and truly
>> supports CQL 3. Can anyone strongly recommend one in particular?
>>
>>
>> I found
>>
>> - https://github.com/iconara/cql-rb
>> - https://github.com/kreynolds/cassandra-cql
>> - https://github.com/cequel/cequel
>>
>>
>> Jan
>>
>>
>


Re: Meaning of "token" column in system.peers and system.local

2014-03-31 Thread Theo Hultberg
your assumption about 256 tokens per node is correct.

as for you second question, it seems to me like most of your assumptions
are correct, but I'm not sure I understand them correctly. hopefully
someone else can answer this better. tokens are a property of the cluster
and not the keyspace. the first replica of any token will be the same for
all keyspaces, but with different replication factors the other replicas
will differ.

when you query the system.local and system.peers tables you must make sure
that you don't connect to other nodes. I think the inconsistency you think
you found is because the first and second queries went to different nodes.
the java driver will connect to all nodes and load balance requests by
default.

T#


On Mon, Mar 31, 2014 at 4:06 AM, Clint Kelly  wrote:

> BTW one other thing that I have not been able to debug today that maybe
> someone can help me with:
>
> I am using a three-node Cassandra cluster with Vagrant.  The nodes in my
> cluster are 192.168.200.11, 192.168.200.12, and 192.168.200.13.
>
> If I use cqlsh to connect to 192.168.200.11, I see unique sets of tokens
> when I run the following three commands:
>
> select tokens from system.local
> select tokens from system.peers where peer=192.168.200.12
> select tokens from system.peers where peer=192.168.200.13
>
> This is what I expect.  However, when I tried making an application with
> the Java driver that does the following:
>
>
>- Create a Session by connecting to 192.168.200.11
>- From that session, "select tokens from system.local"
>- From that session, "select tokens, peer from system.peers"
>
> Now I get the exact-same set of tokens from system.local and from the row
> in system.peers in which peer=192.168.200.13.
>
> Anyone have any idea why this would happen?  I'm not sure how to debug
> this.  I see the following log from the Java driver:
>
> 14/03/30 19:05:24 DEBUG com.datastax.driver.core.Cluster: Starting new
> cluster with contact points [/192.168.200.11]
> 14/03/30 19:05:24 INFO com.datastax.driver.core.Cluster: New Cassandra
> host /192.168.200.13 added
> 14/03/30 19:05:24 INFO com.datastax.driver.core.Cluster: New Cassandra
> host /192.168.200.12 added
>
> I'm running Cassandra 2.0.6 in the virtual machine and I built my
> application with version 2.0.1 of the driver.
>
> Best regards,
> Clint
>
>
>
>
>
>
>
> On Sun, Mar 30, 2014 at 4:51 PM, Clint Kelly wrote:
>
>> Hi all,
>>
>>
>> I am working on a Hadoop InputFormat implementation that uses only the
>> native protocol Java driver and not the Thrift API.  I am currently trying
>> to replicate some of the behavior of
>> *Cassandra.client.describe_ring(myKeyspace)* from the Thrift API.  I
>> would like to do the following:
>>
>>- Get a list of all of the token ranges for a cluster
>>- For every token range, determine the replica nodes on which the
>>data in the token range resides
>>- Estimate the number of rows for every range of tokens
>>- Groups ranges of tokens on common replica nodes such that we can
>>create a set of input splits for Hadoop with total estimated line counts
>>that are reasonably close to the requested split size
>>
>> Last week I received some much-appreciated help on this list that pointed
>> me to using the system.peers table to get the list of token ranges for the
>> cluster and the corresponding hosts.  Today I created a three-node C*
>> cluster in Vagrant (https://github.com/dholbrook/vagrant-cassandra) and
>> tried inspecting some of the system tables.  I have a couple of questions
>> now:
>>
>> 1. *How many total unique tokens should I expect to see in my cluster?*
>> If I have three nodes, and each node has a cassandra.yaml with num_tokens =
>> 256, then should I expect a total of 256*3 = 768 distinct vnodes?
>>
>> 2. *How does the creation of vnodes and their assignment to nodes relate
>> to the replication factor for a given keyspace?*  I never thought about
>> this until today, and I tried to reread the documentation on virtual nodes,
>> replication in Cassandra, etc., and now I am sadly still confused.  Here is
>> what I think I understand.  :)
>>
>>- Given a row with a partition key, any client request for an
>>operation on that row will go to a coordinator node in the cluster.
>>- The coordinator node will compute the token value for the row and
>>from that determine a set of replica nodes for that token.
>>   - One of the replica nodes I assume is the node that "owns" the
>>   vnode with the token range that encompasses the token
>>   - The identity of the "owner" of this virtual node is a
>>   cross-keyspace property
>>   - And the other replicas were originally chosen based on the
>>   replica-placement strategy
>>   - And therefore the other replicas will be different for each
>>   keyspace (because replication factors and replica-placement strategy 
>> are
>>   properties of a keyspace)
>>
>> 3. What do the values in the "

Re: Timeuuid inserted with now(), how to get the value back in Java client?

2014-04-01 Thread Theo Hultberg
no, there's no way. you should generate the TIMEUUID on the client side so
that you have it.

T#


On Sat, Mar 29, 2014 at 1:01 AM, Andy Atj2  wrote:

> I'm writing a Java client to a Cassandra db.
>
> One of the main primary keys is a timeuuid.
>
> I plan to do INSERTs using now() and have Cassandra generate the value of
> the timeuuid.
>
> After the INSERT, I need the Cassandra-generated timeuuid value. Is there
> an easy wsay to get it, without having to re-query for the record I just
> inserted, hoping to get only one record back? Remember, I don't have the PK.
>
> Eg, in every other db there's a way to get the generated PK back. In sql
> it's @@identity, in oracle its...etc etc.
>
> I know Cassandra is not an RDBMS. All I want is the value Cassandra just
> generated.
>
> Thanks,
> Andy
>
>