Re: Sorting & pagination in apache cassandra 2.1

2016-01-12 Thread Jonathan Haddad
The clustering keys determine the sorting of rows within a partition.  The
partitions within a file are sorted by their token (usually computed by
applying the murmur 3 hash to the partition key).

If you are using a version of Cassandra < 3.0, you'll need to maintain your
own materialized view tables.

On Tue, Jan 12, 2016 at 10:07 PM anuja jain  wrote:

> I understand the meaning of SSTable but whats the reason behind sorting
> the table on the basis of int columns first..
> Is there any data type preference in cassandra?
> Also What is the alternative to creating materialised views if my
> cassandra version is prior to 3.0 (specifically 2.1) and which is already
> in production.?
>
>
> On Wed, Jan 13, 2016 at 12:17 AM, Robert Coli 
> wrote:
>
>> On Mon, Jan 11, 2016 at 11:30 PM, anuja jain 
>> wrote:
>>
>>> 1 more question, what does it mean by "cassandra inherently sorts data"?
>>>
>>
>> SSTable = Sorted Strings Table.
>>
>> It doesn't contain "Strings" anymore, really, but that's a hint.. :)
>>
>> =Rob
>>
>
>


Re: Sorting & pagination in apache cassandra 2.1

2016-01-12 Thread anuja jain
I understand the meaning of SSTable but whats the reason behind sorting the
table on the basis of int columns first..
Is there any data type preference in cassandra?
Also What is the alternative to creating materialised views if my cassandra
version is prior to 3.0 (specifically 2.1) and which is already in
production.?


On Wed, Jan 13, 2016 at 12:17 AM, Robert Coli  wrote:

> On Mon, Jan 11, 2016 at 11:30 PM, anuja jain  wrote:
>
>> 1 more question, what does it mean by "cassandra inherently sorts data"?
>>
>
> SSTable = Sorted Strings Table.
>
> It doesn't contain "Strings" anymore, really, but that's a hint.. :)
>
> =Rob
>


Re: Cassandra is consuming a lot of disk space

2016-01-12 Thread Kevin O'Connor
Have you tried restarting? It's possible there's open file handles to
sstables that have been compacted away. You can verify by doing lsof and
grepping for DEL or deleted.

If it's not that, you can run nodetool cleanup on each node to scan all of
the sstables on disk and remove anything that it's not responsible for.
Generally this would only work if you added nodes recently.

On Tuesday, January 12, 2016, Rahul Ramesh  wrote:

> We have a 2 node Cassandra cluster with a replication factor of 2.
>
> The load factor on the nodes is around 350Gb
>
> Datacenter: Cassandra
> ==
> Address  RackStatus State   LoadOwns
>  Token
>
>   -5072018636360415943
> 172.31.7.91  rack1   Up Normal  328.5 GB100.00%
>   -7068746880841807701
> 172.31.7.92  rack1   Up Normal  351.7 GB100.00%
>   -5072018636360415943
>
> However,if I use df -h,
>
> /dev/xvdf   252G  223G   17G  94% /HDD1
> /dev/xvdg   493G  456G   12G  98% /HDD2
> /dev/xvdh   197G  167G   21G  90% /HDD3
>
>
> HDD1,2,3 contains only cassandra data. It amounts to close to 1Tb in one
> of the machine and in another machine it is close to 650Gb.
>
> I started repair 2 days ago, after running repair, the amount of disk
> space consumption has actually increased.
> I also checked if this is because of snapshots. nodetool listsnapshot
> intermittently lists a snapshot but it goes away after sometime.
>
> Can somebody please help me understand,
> 1. why so much disk space is consumed?
> 2. Why did it increase after repair?
> 3. Is there any way to recover from this state.
>
>
> Thanks,
> Rahul
>
>


Cassandra is consuming a lot of disk space

2016-01-12 Thread Rahul Ramesh
We have a 2 node Cassandra cluster with a replication factor of 2.

The load factor on the nodes is around 350Gb

Datacenter: Cassandra
==
Address  RackStatus State   LoadOwns
 Token

-5072018636360415943
172.31.7.91  rack1   Up Normal  328.5 GB100.00%
-7068746880841807701
172.31.7.92  rack1   Up Normal  351.7 GB100.00%
-5072018636360415943

However,if I use df -h,

/dev/xvdf   252G  223G   17G  94% /HDD1
/dev/xvdg   493G  456G   12G  98% /HDD2
/dev/xvdh   197G  167G   21G  90% /HDD3


HDD1,2,3 contains only cassandra data. It amounts to close to 1Tb in one of
the machine and in another machine it is close to 650Gb.

I started repair 2 days ago, after running repair, the amount of disk space
consumption has actually increased.
I also checked if this is because of snapshots. nodetool listsnapshot
intermittently lists a snapshot but it goes away after sometime.

Can somebody please help me understand,
1. why so much disk space is consumed?
2. Why did it increase after repair?
3. Is there any way to recover from this state.


Thanks,
Rahul


Re: Repair with "-pr" and vnodes

2016-01-12 Thread Robert Coli
On Tue, Jan 12, 2016 at 3:46 PM, Roman Tkachenko 
wrote:

> The documentation for the "-pr" repair option says it repairs only the
> first range returned by the partitioner. However, with vnodes a node owns a
> lot of small ranges.
>
> Does that mean that if I run rolling "nodetool repair -pr" on the cluster,
> a whole bunch of ranges remain un-repaired? Am I missing/misunderstanding
> something?
>

The first replica returned by the partitioner, per node (or vnode) is
almost certainly what that means.

tl;dr - "I'm repairing all my nodes" is the case that "repair -pr" is for.

=Rob


Repair with "-pr" and vnodes

2016-01-12 Thread Roman Tkachenko
Hey guys,

The documentation for the "-pr" repair option says it repairs only the
first range returned by the partitioner. However, with vnodes a node owns a
lot of small ranges.

Does that mean that if I run rolling "nodetool repair -pr" on the cluster,
a whole bunch of ranges remain un-repaired? Am I missing/misunderstanding
something?

Thanks!

Roman


Re: Cassandra 1.2.19 and Java 8

2016-01-12 Thread Michael Shuler
On 01/12/2016 04:41 PM, Robert Coli wrote:
> On Tue, Jan 12, 2016 at 2:31 PM, Tim Heckman  > wrote:
> 
> We still have an installation of Cassandra on the 1.2.19 release,
> running on Java 7. We do plan on upgrading to a newer version, but in
> the mean time there has been some questions internally about running
> 1.2 on Java 8 until the upgrade can be fully completed.

cassandra-1.2 fails to *build* properly for me on java8 in
gen-cli-grammar with an antlr NPE, so we build on java6 or 7.

> I upgraded to 1.8 to avoid potential leap year problems, so in June of 2015.
> 
> java version "1.8.0_60"
> Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
> Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode) 
> ii  cassandra   1.2.16  
>distributed storage system for structured data
> 
> Works fine since, no discernible difference.

A quick build of cassandra-1.2 HEAD (1.2.19 tag) on java7, starting it
up with java8 runtime, and the basics all seem to work just fine for me.

-- 
Kind regards,
Michael


Re: Cassandra 1.2.19 and Java 8

2016-01-12 Thread Robert Coli
On Tue, Jan 12, 2016 at 2:31 PM, Tim Heckman  wrote:

> We still have an installation of Cassandra on the 1.2.19 release,
> running on Java 7. We do plan on upgrading to a newer version, but in
> the mean time there has been some questions internally about running
> 1.2 on Java 8 until the upgrade can be fully completed.
>

I upgraded to 1.8 to avoid potential leap year problems, so in June of 2015.

java version "1.8.0_60"
Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)
ii  cassandra   1.2.16
 distributed storage system for structured data

Works fine since, no discernible difference.

=Rob


Cassandra 1.2.19 and Java 8

2016-01-12 Thread Tim Heckman
Hello,

We still have an installation of Cassandra on the 1.2.19 release,
running on Java 7. We do plan on upgrading to a newer version, but in
the mean time there has been some questions internally about running
1.2 on Java 8 until the upgrade can be fully completed.

I seem to remember speaking to someone awhile back that advised
against running the 1.2 + Java 8 combination. Unfortunately, I can't
remember what the exact reasoning was behind the recommendation. It
could have just been that no one was really doing it, therefore it
wasn't fully tested.

Does anyone here have experience with Cassandra 1.2 and Java 8 in
production. Any known issues or gotchas?

Cheers!
-Tim

--
Tim Heckman
Operations Engineer
PagerDuty, Inc.


Re: [Typo correction] Is it good for performance to put rows that are of different types but are always queried together in the same table partition?

2016-01-12 Thread Carlos Alonso
Why can't you have something like this?

CREATE TABLE t (
  p INT,
  q1 INT,
  q2 UUID,
  c1 INT,
  c2 TEXT,
  PRIMARY KEY (p, q1, q2)
)

Sounds the simplest solution.

Carlos Alonso | Software Engineer | @calonso 

On 12 January 2016 at 18:27, Bamoqi  wrote:

> I over-simplified the original example. In the real model I cannot just
> merge the row types. Suppose
> create table t1(
> p int,
> q1 int,
> c1 int,
> primary key( p, q1 )
> )
> create table t2(
> p int,
> q2 uuid,
> c2 text,
> primary key( p, q2 )
> )
>
> Merging the tables will be slightly ugly and waste some storage in the
> clustering columns:
> create table t(
> p int,
> rowtype tinyint, // t1 or t2
> q1 int, q2 uuid, // depending on rowtype, either q1 or q2 is unused
> c1 int, c2 text, // depending on rowtype, either c1 or c2 is null
> primary key( p, rowtype, q1, q2)
> )
>
> Nevertheless, putting them into one table seems faster as we only need one
> query to get both types, and have better cache locality. Am I correct?
>
>
> On Saturday, January 09, 2016 06:47 AM, Jack Krupansky wrote:
>
> A simple denormalization is probably all that is called for - just merge
> the two tables into one (their union.) No need for this row type.
>
>
> -- Jack Krupansky
>
> On Fri, Jan 8, 2016 at 9:30 AM, Jeff Jirsa 
> wrote:
>
>> You’ll see better performance using a slice (which is effectively what
>> will happen if you put them into the same table and use query-1table-b), as
>> each node will only need to merge cells/results once. It may not be twice
>> as fast, but it’ll be fast enough to make it worthwhile.
>>
>>
>>
>> On 1/8/16, 12:13 AM, "Bamoqi" < bam...@gmail.com>
>> wrote:
>>
>> >[Correction of the original message which contains typos in code.]
>> >
>> >Is it good for performance to put rows that are of different types but
>> >are always queried together in the same table partition?
>> >
>> >My consideration is that whether doing so will result in better
>> >memory/disk cache locality.
>> >
>> >Suppose I need to query for 2 different types of rows for a frequent
>> >user request, I can use 2 tables or 1 table:
>> >
>> >2 tables:
>> >
>> >   create table t1(
>> > partitionkey int primary key,
>> > col1 int, col2 int, ...
>> >   )
>> >   create table t2(
>> > partitionkey int primary key,
>> > col3 int, col4 int, ...
>> >   )
>> >
>> >query-2table:
>> >   select col1,col2 from t1 where partitionkey = ?
>> >   select col3,col4 from t2 where partitionkey = ?
>> >
>> >1 table:
>> >
>> >   create table t(
>> > partitionkey int,
>> > rowtype tinyint,
>> > col1 int, col2 int, ...
>> > col3 int, col4 int, ...
>> > primary key( partitionkey, rowtype )
>> >   )
>> >
>> >query-1table-a:
>> >   select col1,col2 from t where partitionkey = ? and rowtype = 1
>> >   select col3,col4 from t where partitionkey = ? and rowtype = 2
>> >
>> >or alternatively, query-1table-b:
>> >   select rowtype,col1,col2,col3,col4 from t where partitionkey = ?
>> >   // Used columns are `null`. Switch on `rowtype` in the app code
>> >
>> >Is there significant performance difference in query-2table,
>> >query-1table-a, query-1table-b?
>> >Is the cassandra client/coordinator smart enough to direct subsequent
>> >queries of the same (table, partitionkey) to the same node so they can
>> >reuse a cached page?
>> >
>> >Regards & Thanks
>>
>
>
>


Re: Sorting & pagination in apache cassandra 2.1

2016-01-12 Thread Robert Coli
On Mon, Jan 11, 2016 at 11:30 PM, anuja jain  wrote:

> 1 more question, what does it mean by "cassandra inherently sorts data"?
>

SSTable = Sorted Strings Table.

It doesn't contain "Strings" anymore, really, but that's a hint.. :)

=Rob


Re: Too many compactions, maybe keyspace system?

2016-01-12 Thread Robert Coli
On Mon, Jan 11, 2016 at 9:12 PM, Shuo Chen  wrote:

> I have a assumption that, lots of pending compaction tasks jam the memory
> and raise full gc. The full chokes the process and slows down compaction.
> And this causes more pending compaction tasks and more pressure on memory.
>

The question is why there are so many pending compactions, because your log
doesn't show that much compaction is happening. What keyspaces /
columnfamilies do you expect to be compacting, and how many SSTables do
they contain?


> Is there a method to list the concrete details of pending compaction tasks?
>

Nope.

For the record, this type of extended operational debugging is often best
carried out interactively on #cassandra on freenode IRC.. :)

=Rob


Re: [Typo correction] Is it good for performance to put rows that are of different types but are always queried together in the same table partition?

2016-01-12 Thread Bamoqi
I over-simplified the original example. In the real model I cannot just 
merge the row types. Suppose

create table t1(
p int,
q1 int,
c1 int,
primary key( p, q1 )
)
create table t2(
p int,
q2 uuid,
c2 text,
primary key( p, q2 )
)

Merging the tables will be slightly ugly and waste some storage in the 
clustering columns:

create table t(
p int,
rowtype tinyint, // t1 or t2
q1 int, q2 uuid, // depending on rowtype, either q1 or q2 is unused
c1 int, c2 text, // depending on rowtype, either c1 or c2 is null
primary key( p, rowtype, q1, q2)
)

Nevertheless, putting them into one table seems faster as we only need 
one query to get both types, and have better cache locality. Am I correct?



On Saturday, January 09, 2016 06:47 AM, Jack Krupansky wrote:
A simple denormalization is probably all that is called for - just 
merge the two tables into one (their union.) No need for this row type.



-- Jack Krupansky

On Fri, Jan 8, 2016 at 9:30 AM, Jeff Jirsa > wrote:


You’ll see better performance using a slice (which is effectively
what will happen if you put them into the same table and use
query-1table-b), as each node will only need to merge
cells/results once. It may not be twice as fast, but it’ll be fast
enough to make it worthwhile.



On 1/8/16, 12:13 AM, "Bamoqi" mailto:bam...@gmail.com>> wrote:

>[Correction of the original message which contains typos in code.]
>
>Is it good for performance to put rows that are of different
types but
>are always queried together in the same table partition?
>
>My consideration is that whether doing so will result in better
>memory/disk cache locality.
>
>Suppose I need to query for 2 different types of rows for a frequent
>user request, I can use 2 tables or 1 table:
>
>2 tables:
>
>   create table t1(
> partitionkey int primary key,
> col1 int, col2 int, ...
>   )
>   create table t2(
> partitionkey int primary key,
> col3 int, col4 int, ...
>   )
>
>query-2table:
>   select col1,col2 from t1 where partitionkey = ?
>   select col3,col4 from t2 where partitionkey = ?
>
>1 table:
>
>   create table t(
> partitionkey int,
> rowtype tinyint,
> col1 int, col2 int, ...
> col3 int, col4 int, ...
> primary key( partitionkey, rowtype )
>   )
>
>query-1table-a:
>   select col1,col2 from t where partitionkey = ? and rowtype = 1
>   select col3,col4 from t where partitionkey = ? and rowtype = 2
>
>or alternatively, query-1table-b:
>   select rowtype,col1,col2,col3,col4 from t where partitionkey = ?
>   // Used columns are `null`. Switch on `rowtype` in the app code
>
>Is there significant performance difference in query-2table,
>query-1table-a, query-1table-b?
>Is the cassandra client/coordinator smart enough to direct subsequent
>queries of the same (table, partitionkey) to the same node so
they can
>reuse a cached page?
>
>Regards & Thanks






Re: Slow performance after upgrading from 2.0.9 to 2.1.11

2016-01-12 Thread Peddi, Praveen
Thanks Jeff for your reply. Sorry for delayed response. We were running some 
more tests and wanted to wait for the results.

So basically we saw higher CPU with 2.1.11 was higher compared to 2.0.9 (see 
below) for the same exact load test. Memory spikes were also aggressive on 
2.1.11.

So we wanted to rule out any of our custom setting so we ended up doing some 
testing with Cassandra stress test and default Cassandra installation. Here are 
the results we saw between 2.0.9 and 2.1.11. Both are default installations and 
both use Cassandra stress test with same params. This is the closest 
apple-apple comparison we can get. As you can see both read and write latencies 
are 30 to 50% worse in 2.1.11 than 2.0.9. Since we are using default 
installation.

Highlights of the test:
Load: 2x reads and 1x writes
CPU:  2.0.9 (goes upto 25%)  compared to 2.1.11 (goes upto 60%)
Local read latency: 0.039 ms for 2.0.9 and 0.066 ms for 2.1.11
Local write Latency: 0.033 ms for 2.0.9 Vs 0.030 ms for 2.1.11
One observation is, As the number of threads are increased, 2.1.11 read 
latencies are getting worse compared to 2.0.9 (see below table for 24 threads 
vs 54 threads)
Not sure if anyone has done this kind of comparison before and what their 
thoughts are. I am thinking for this same reason

2.0.9 Plain  type total ops op/spk/s   
row/smean med0.950.990.999max   
time
 16 threadCount  READ   66854   7205720572051.6 1.3 2.8 
3.5 9.6 85.39.3
 16 threadCount  WRITE  33146   3572357235721.3 1   2.6 
3.3 7   206.5   9.3
 16 threadCount  total  10  10777   10777   10777   1.5 1.3 2.7 
3.4 7.9 206.5   9.3
2.1.11 Plain
 16 threadCount  READ   67096   6818681868181.6 1.5 2.6 
3.5 7.9 61.79.8
 16 threadCount  WRITE  32904   3344334433441.4 1.3 2.3 
3   6.5 56.79.8
 16 threadCount  total  10  10162   10162   10162   1.6 1.4 2.5 
3.2 6   61.79.8
2.0.9 Plain
 24 threadCount  READ   66414   8167816781672   1.6 3.7 
7.5 16.7208 8.1
 24 threadCount  WRITE  33586   4130413041301.7 1.3 3.4 
5.4 25.645.48.1
 24 threadCount  total  10  12297   12297   12297   1.9 1.5 3.5 
6.2 15.2208 8.1
2.1.11 Plain
 24 threadCount  READ   66628   7433743374332.2 2.1 3.4 
4.3 8.4 38.39
 24 threadCount  WRITE  33372   3723372337232   1.9 3.1 
3.8 21.937.29
 24 threadCount  total  10  11155   11155   11155   2.1 2   3.3 
4.1 8.8 38.39
2.0.9 Plain
 54 threadCount  READ   67115   13419   13419   13419   2.8 2.6 4.2 
6.4 36.982.45
 54 threadCount  WRITE  32885   6575657565752.5 2.3 3.9 
5.6 15.981.55
 54 threadCount  total  10  19993   19993   19993   2.7 2.5 4.1 
5.7 13.982.45
2.1.11 Plain
 54 threadCount  READ   66780   8951895189514.3 3.9 6.8 
9.7 49.469.97.5
 54 threadCount  WRITE  33220   4453445344533.5 3.2 5.7 
8.2 36.868  7.5
 54 threadCount  total  10  13404   13404   13404   4   3.7 6.6 
9.2 48  69.97.5


From: Jeff Jirsa mailto:jeff.ji...@crowdstrike.com>>
Date: Thursday, January 7, 2016 at 1:01 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>, Peddi Praveen 
mailto:pe...@amazon.com>>
Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11

Anecdotal evidence typically agrees that 2.1 is faster than 2.0 (our experience 
was anywhere from 20-60%, depending on workload).

However, it’s not necessarily true that everything behaves exactly the same – 
in particular, memtables are different, commitlog segment handling is 
different, and GC params may need to be tuned differently for 2.1 than 2.0.

When the system is busy, what’s it actually DOING? Cassandra exposes a TON of 
metrics – have you plugged any into a reporting system to see what’s going on? 
Is your latency due to pegged cpu, iowait/disk queues or gc pauses?

My colleagues spent a lot of time validating different AWS EBS configs (video 
from reinvent at https://www.youtube.com/watch?v=1R-mgOcOSd4), 2.1 was faster 
in almost every case, but you’re using an instance size I don’t believe we 
tried (too little RAM to be viable in production).  c3.2xl only gives you 15G 
of ram – most “performance” based systems want 2-4x that (people running G1 
heaps usually start at 16G heaps and leave another 16-30G for page cache), 
you’re running fairly small hardware – it’s possible that 2.1 isn’t “as good” 
on smaller hardware.

(I do see your domain, presumably you know all of th

Re: electricity outage problem

2016-01-12 Thread Jack Krupansky
Sometimes you may have to clear out the saved Gossip state:
https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_gossip_purge.html

Note the instruction about bringing up the seed nodes first. Normally seed
nodes are only relevant when initially joining a node to a cluster (and
then the Gossip state will be persisted locally), but if you clear te
persisted Gossip state the seed nodes will again be needed to find the rest
of the cluster.

I'm not sure whether a power outage is the same as stopping and restarting
an instance (AWS) in terms of whether the restarted instance retains its
current public IP address.



-- Jack Krupansky

On Tue, Jan 12, 2016 at 10:02 AM, daemeon reiydelle 
wrote:

> This happens when there is insufficient time for nodes coming up to join a
> network. It takes a few seconds for a node to come up, e.g. your seed node.
> If you tell a node to join a cluster you can get this scenario because of
> high network utilization as well. I wait 90 seconds after the first (i.e.
> my first seed) node comes up to start the next one. Any nodes that are
> seeds need some 60 seconds, so the additional 30 seconds is a buffer.
> Additional nodes each wait 60 seconds before joining (although this is a
> parallel tree for large clusters).
>
>
>
>
>
> *...*
>
>
>
>
>
>
> *“Life should not be a journey to the grave with the intention of arriving
> safely in apretty and well preserved body, but rather to skid in broadside
> in a cloud of smoke,thoroughly used up, totally worn out, and loudly
> proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
> (+1) 415.501.0198 <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
> <%28%2B44%29%20%280%29%2020%208144%209872>*
>
> On Tue, Jan 12, 2016 at 6:56 AM, Adil  wrote:
>
>> Hi,
>>
>> we have two DC with 5 nodes in each cluster, yesterday there was an
>> electricity outage causing all nodes down, we restart the clusters but when
>> we run nodetool status on DC1 it results that some nodes are DN, the
>> strange thing is that running the command from diffrent node in DC1 doesn't
>> give the same node in DC as own, we have noticed this message in the log
>> "received an invalid gossip generation for peer", does anyone know how to
>> resolve this problem? should we purge the gossip?
>>
>> thanks
>>
>> Adil
>>
>
>


Seed Private / Public Broadcast IP

2016-01-12 Thread Asher Newcomer
HI All,

I am currently running a multi-region setup in AWS. I have a single cluster
across two datacenters in different regions.

In order to communicate cross-region in AWS, I have my broadcast_address
set to public IPs and my listen_address set to the instance's private IP. I
believe that this is the recommended setup and everything works great.

Now I want to expand my cluster to include my company's office as a third
datacenter. I have VPN tunnels established to both AWS datacenters, and I
need to exclusively use private IP addresses to communicate from our office
to AWS. If I connect via the AWS instance's public IP, then my traffic gets
NAT through my office firewall - which then cannot connect and I cannot
provide local instances with public IPs.

On my new nodes, I've tried setting the seeds entry in cassandra.yaml to
the private IP of the seeds in AWS. Cassandra can initially connect to the
seed nodes via the private IP, but then the seeds provide my local instance
with their brodcast_address - the public ip - and this causes problems.

Is there any way to change that behavior, such that my new, local nodes
ignore the broadcast_address provided to them?

How else might I accomplish the above?

Outside of configuring the two AWS regions to connect via private IP, which
is no small task, I don't see any workaround. Any help is most appreciated.

Thanks,

Asher


Re: Upgrade from 2.0.x to 2.2.x documentation missing

2016-01-12 Thread Michael Shuler
On 01/12/2016 01:07 AM, Amit Singh F wrote:
> We are currently at *Cassandra 2.0.14* in production and since it going
> to be EOL soon so we are planning to upgrade it to *Cassandra 2.2.4*
> (http://cassandra.apache.org/download/) which is the currently
> production ready version. While doing some analysis we found that there
> is no such entry of 2.2 branch in datastax documentation
> (http://docs.datastax.com/en/upgrade/doc/upgrade/cassandra/upgradeC_c.html)
> which guides on how to reach 2.2.x from 2.0.x .

(cc'ing docs@)

> Can somebody guide us on the Upgrade path which needs to be followed
> while upgrading from 2.0.x to 2.2.x  .

The canonical source for Apache Cassandra upgrade documentation is
NEWS.txt. Here's the cassandra-2.2 branch NEWS.txt file - read all
entries since your current release:

https://github.com/apache/cassandra/blob/cassandra-2.2/NEWS.txt

The typical Cassandra upgrade path is through the latest of each major
release, so from 2.0.latest -> 2.1.latest -> 2.2.latest. However, you
may be able to go from your 2.0.14 to 2.1.latest - read NEWS.txt to see
if that is the case, and test your upgrades in your staging env!

-- 
Kind regards,
Michael


Re: electricity outage problem

2016-01-12 Thread daemeon reiydelle
This happens when there is insufficient time for nodes coming up to join a
network. It takes a few seconds for a node to come up, e.g. your seed node.
If you tell a node to join a cluster you can get this scenario because of
high network utilization as well. I wait 90 seconds after the first (i.e.
my first seed) node comes up to start the next one. Any nodes that are
seeds need some 60 seconds, so the additional 30 seconds is a buffer.
Additional nodes each wait 60 seconds before joining (although this is a
parallel tree for large clusters).





*...*






*“Life should not be a journey to the grave with the intention of arriving
safely in apretty and well preserved body, but rather to skid in broadside
in a cloud of smoke,thoroughly used up, totally worn out, and loudly
proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
(+1) 415.501.0198London (+44) (0) 20 8144 9872*

On Tue, Jan 12, 2016 at 6:56 AM, Adil  wrote:

> Hi,
>
> we have two DC with 5 nodes in each cluster, yesterday there was an
> electricity outage causing all nodes down, we restart the clusters but when
> we run nodetool status on DC1 it results that some nodes are DN, the
> strange thing is that running the command from diffrent node in DC1 doesn't
> give the same node in DC as own, we have noticed this message in the log
> "received an invalid gossip generation for peer", does anyone know how to
> resolve this problem? should we purge the gossip?
>
> thanks
>
> Adil
>


electricity outage problem

2016-01-12 Thread Adil
Hi,

we have two DC with 5 nodes in each cluster, yesterday there was an
electricity outage causing all nodes down, we restart the clusters but when
we run nodetool status on DC1 it results that some nodes are DN, the
strange thing is that running the command from diffrent node in DC1 doesn't
give the same node in DC as own, we have noticed this message in the log
"received an invalid gossip generation for peer", does anyone know how to
resolve this problem? should we purge the gossip?

thanks

Adil


Re: [RELEASE] Apache Cassandra 3.2 released

2016-01-12 Thread Jake Luciani
Note: I made a mistake saying this is a bug fix release, it's a feature
release that includes bugfixes.

On Tue, Jan 12, 2016 at 8:46 AM, Jake Luciani  wrote:

>
> The Cassandra team is pleased to announce the release of Apache Cassandra
> version 3.2.
>
> Apache Cassandra is a fully distributed database. It is the right choice
> when you need scalability and high availability without compromising
> performance.
>
>  http://cassandra.apache.org/
>
> Downloads of source and binary distributions are listed in our download
> section:
>
>  http://cassandra.apache.org/download/
>
> This version is a bug fix release[1] on the 3.2 series. As always, please
> pay
> attention to the release notes[2] and Let us know[3] if you were to
> encounter
> any problem.
>
> Enjoy!
>
> [1]: http://goo.gl/vBb0Ad (CHANGES.txt)
> [2]: http://goo.gl/JjUIGF (NEWS.txt)
> [3]: https://issues.apache.org/jira/browse/CASSANDRA
>
>


[RELEASE] Apache Cassandra 3.2 released

2016-01-12 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 3.2.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 3.2 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/vBb0Ad (CHANGES.txt)
[2]: http://goo.gl/JjUIGF (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: ClosedChannelExcption while nodetool repair

2016-01-12 Thread Paulo Motta
You may be running into
https://issues.apache.org/jira/browse/CASSANDRA-10961, which will be fixed
in 2.2.5. In the meantime, you may replace your cassandra jar with a
snapshot version available in that issue.

2016-01-12 10:38 GMT-03:00 Jan Kesten :

> Hi,
>
> I have some problems recently on my cassandra cluster. I am running 12
> nodes with 2.2.4 and while repairing with a plain "nodetool repair". In
> system.log I can find
>
> ERROR [STREAM-IN-/172.17.2.233] 2016-01-08 08:32:38,327
> StreamSession.java:524 - [Stream #5f96e8b0-b5e2-11e5-b4da-4321ac9959ef]
> Streaming error occurred
> java.nio.channels.ClosedChannelException: null
>
> on one node and at the same time in the the node mentioned in the Log:
>
> INFO  [STREAM-IN-/172.17.2.223] 2016-01-08 08:32:38,073
> StreamResultFuture.java:168 - [Stream
> #5f96e8b0-b5e2-11e5-b4da-4321ac9959ef ID#0] Prepare completed. Receiving
> 2 files(46708049 bytes), sending 2 files(1856721742 bytes)
> ERROR [STREAM-OUT-/172.17.2.223] 2016-01-08 08:32:38,325
> StreamSession.java:524 - [Stream #5f96e8b0-b5e2-11e5-b4da-4321ac9959ef]
> Streaming error occurred
> org.apache.cassandra.io.FSReadError: java.io.IOException: Datenübergabe
> unterbrochen (broken pipe)
> at
> org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:144)
> ~[apache-cassandra-2.2.4.jar:2.2.4]
>
>
> Full relevant NFO  [STREAM-IN-/172.17.2.223] 2016-01-08 08:32:38,073
> StreamResultFuture.java:168 - [Stream
> #5f96e8b0-b5e2-11e5-b4da-4321ac9959ef ID#0] Prepare completed. Receiving
> 2 files(46708049 bytes), sending 2 files(1856721742 bytes)
> ERROR [STREAM-OUT-/172.17.2.223] 2016-01-08 08:32:38,325
> StreamSession.java:524 - [Stream #5f96e8b0-b5e2-11e5-b4da-4321ac9959ef]
> Streaming error occurred
> org.apache.cassandra.io.FSReadError: java.io.IOException: Datenübergabe
> unterbrochen (broken pipe)
> at
> org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:144)
> ~[apache-cassandra-2.2.4.jar:2.2.4]
>
> More complete log can be found here:
>
> http://pastebin.com/n6DjCCed
> http://pastebin.com/6rD5XNwU
>
> I already did a nodetool scrub.
>
> Any suggestions what is causing this?
>
> Thanks in advance,
> Jan
>


ClosedChannelExcption while nodetool repair

2016-01-12 Thread Jan Kesten
Hi,

I have some problems recently on my cassandra cluster. I am running 12
nodes with 2.2.4 and while repairing with a plain "nodetool repair". In
system.log I can find

ERROR [STREAM-IN-/172.17.2.233] 2016-01-08 08:32:38,327
StreamSession.java:524 - [Stream #5f96e8b0-b5e2-11e5-b4da-4321ac9959ef]
Streaming error occurred
java.nio.channels.ClosedChannelException: null

on one node and at the same time in the the node mentioned in the Log:

INFO  [STREAM-IN-/172.17.2.223] 2016-01-08 08:32:38,073
StreamResultFuture.java:168 - [Stream
#5f96e8b0-b5e2-11e5-b4da-4321ac9959ef ID#0] Prepare completed. Receiving
2 files(46708049 bytes), sending 2 files(1856721742 bytes)
ERROR [STREAM-OUT-/172.17.2.223] 2016-01-08 08:32:38,325
StreamSession.java:524 - [Stream #5f96e8b0-b5e2-11e5-b4da-4321ac9959ef]
Streaming error occurred
org.apache.cassandra.io.FSReadError: java.io.IOException: Datenübergabe
unterbrochen (broken pipe)
at
org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:144)
~[apache-cassandra-2.2.4.jar:2.2.4]


Full relevant NFO  [STREAM-IN-/172.17.2.223] 2016-01-08 08:32:38,073
StreamResultFuture.java:168 - [Stream
#5f96e8b0-b5e2-11e5-b4da-4321ac9959ef ID#0] Prepare completed. Receiving
2 files(46708049 bytes), sending 2 files(1856721742 bytes)
ERROR [STREAM-OUT-/172.17.2.223] 2016-01-08 08:32:38,325
StreamSession.java:524 - [Stream #5f96e8b0-b5e2-11e5-b4da-4321ac9959ef]
Streaming error occurred
org.apache.cassandra.io.FSReadError: java.io.IOException: Datenübergabe
unterbrochen (broken pipe)
at
org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:144)
~[apache-cassandra-2.2.4.jar:2.2.4]

More complete log can be found here:

http://pastebin.com/n6DjCCed
http://pastebin.com/6rD5XNwU

I already did a nodetool scrub.

Any suggestions what is causing this?

Thanks in advance,
Jan


Re: Modeling contact list, plain table or List

2016-01-12 Thread DuyHai Doan
--> Why don't you do: DELETE FROM user_contact WHERE userid=xxx AND
contactname= ?
Answer : Because a contact name can be duplicated.   Or should I force
unique contact names?

In this case, add contactid as extra clustering column to guarantee unicity
for your contact. The delete query becomes:

DELETE FROM user_contact WHERE userid=xxx AND contactname= AND
contactid=zzz

Normally, from the front-end (web apps or smartphone client) if you have
the contactname, you SURELY also have the contactid information.
Consequently, you can issue the above DELETE statement without having to
read-before-delete, am I wrong ?



On Tue, Jan 12, 2016 at 2:02 PM, I PVP  wrote:

> --> Why don't you do: DELETE FROM user_contact WHERE userid=xxx AND
> contactname= ?
> Answer : Because a contact name can be duplicated.   Or should I force
> unique contact names?
>
> Overall , the challenge seems to be addressed , with some trade of on the
> "ordering by contact name”.
>
> If, at the base table, the clustering column is the objectid(timeuuid)
> instead of the contactname , the DELETE will be based on userid = ? and
> objectid = ?.
> This works fine. Generic SELECTs will also work fine on the base table.
>
> The MV will serve SELECTs  targeting/filtering the favorite contacts.
>
> Like this:
>
> CREATE TABLE communication.user_contact (
> userid int,
> objectid timeuuid,
> contactid int,
> contactname text,
> createdat timeuuid,
> favoriteat timestamp,
> isfavorite boolean,
> PRIMARY KEY (userid, objectid)
> );
>
>
> CREATE MATERIALIZED VIEW communication.user_contact_by_favorite AS
> SELECT userid, isfavorite, objectid, contactid, contactname, createdat,
> favoriteat
> FROM user_contact
> WHERE userid IS NOT NULL AND isfavorite IS NOT NULL AND objectid IS NOT
> NULL
> PRIMARY KEY ( ( userid, isfavorite ), objectid )
> WITH CLUSTERING ORDER BY ( objectid DESC ) ;
>
>
> Unfortunately  this approach forces the model to cluster by
> objected(timeuuid) just to satisfy the need to DELETE a specific contact
> row,  and by doing that it wastes a opportunity on the MV, because all the
> PKs from the base table need to be at the MV and  it is not possible to set
> the MV with with more than 1 non-PK column from the base table as the MV
> PK.  But still working fine.
>
>
> That is my first Cassandra use case and the guidance provided by  you guys
> pretty important.
>
> Thanks very much for the answers, questions and suggestions.
>
>
> --
> IPVP
>
>
> From: DuyHai Doan  
> Reply: user@cassandra.apache.org >
> 
> Date: January 12, 2016 at 10:27:45 AM
> To: user@cassandra.apache.org >
> 
> Cc: Jack Krupansky > 
>
> Subject:  Re: Modeling contact list, plain table or List
>
> 1)SELECT all rows from user_contact excluding the one  that the user wants
> to get rid of.
> 2) DELETE all the user_contact rows  for that particular user .
> 3) INSERT  the result of 1).
>
> --> Why don't you do: DELETE FROM user_contact WHERE userid=xxx AND
> contactname= ?
>
> The Materialized View will be automagically updated by Cassandra with a
> query similar to DELETE FROM user_contact_by_favorite WHERE userid=xxx AND
> is_favorite=zzz AND contactname=yyy
>
> On Mon, Jan 11, 2016 at 10:40 PM, Jonathan Haddad 
> wrote:
>
>> In general I advise people avoid lists and use Maps or Sets instead.
>>
>> Using this data model, for instance, it's easy to remove a specific
>> Address from a user:
>>
>> CREATE TYPE address (
>>   street text,
>>   city text,
>>   zip_code int,
>> );
>>
>> CREATE TABLE user (
>> user_id int primary key,
>> addresses map>
>> );
>>
>> When I want to remove one of the addresses from a user, I can do this:
>>
>> cqlsh:test> delete addresses['home'] from user where user_id =  1;
>>
>>
>> Hope that helps,
>> Jon
>>
>>
>> On Mon, Jan 11, 2016 at 1:20 PM I PVP  wrote:
>>
>>> Well…the way it is now  it is not possible to delete a specific contact
>>> row from the base table at all. Because a DELETE statement only works with
>>>  PK in the WHERE  clausule. Non PK columns cannot be in the DELETE WHERE
>>> clausule.
>>> https://docs.datastax.com/en/cql/3.3/cql/cql_reference/delete_r.html
>>>
>>> The way it is now  It is only possible to delete the entire contact list
>>>  for that specific user.
>>> Looks like will need to:
>>> 1)SELECT all rows from user_contact excluding the one  that the user
>>> wants to get rid of.
>>> 2) DELETE all the user_contact rows  for that particular user .
>>> 3) INSERT  the result of 1).
>>>
>>> Is that the proper way to achieve it or Am I missing some point in the
>>> modeling that would allow to delete a specific contact row  and still able
>>> to comply with the select requirements?
>>>
>>> Thanks
>>> --
>>> IPVP
>>>
>>>
>>> From: Jack Krupansky 
>>> 
>>> Reply: user@cassandra.apache.org >
>>> 
>>> Date: January 11, 2016 at 7:00:04 PM
>>>
>>> To: user@cassandra.apache.org >
>>> 
>>> Subject:  Re: Modeling contact list, plain table or List
>>>
>>> That's the beauty of MV - Cassandra automatica

Re: Modeling contact list, plain table or List

2016-01-12 Thread I PVP
--> Why don't you do: DELETE FROM user_contact WHERE userid=xxx AND 
contactname= ?
Answer : Because a contact name can be duplicated.   Or should I force unique 
contact names?

Overall , the challenge seems to be addressed , with some trade of on the 
"ordering by contact name”.

If, at the base table, the clustering column is the objectid(timeuuid) instead 
of the contactname , the DELETE will be based on userid = ? and objectid = ?.
This works fine. Generic SELECTs will also work fine on the base table.

The MV will serve SELECTs  targeting/filtering the favorite contacts.

Like this:

CREATE TABLE communication.user_contact (
userid int,
objectid timeuuid,
contactid int,
contactname text,
createdat timeuuid,
favoriteat timestamp,
isfavorite boolean,
PRIMARY KEY (userid, objectid)
);


CREATE MATERIALIZED VIEW communication.user_contact_by_favorite AS
SELECT userid, isfavorite, objectid, contactid, contactname, createdat, 
favoriteat
FROM user_contact
WHERE userid IS NOT NULL AND isfavorite IS NOT NULL AND objectid IS NOT NULL
PRIMARY KEY ( ( userid, isfavorite ), objectid )
WITH CLUSTERING ORDER BY ( objectid DESC ) ;


Unfortunately  this approach forces the model to cluster by objected(timeuuid) 
just to satisfy the need to DELETE a specific contact row,  and by doing that 
it wastes a opportunity on the MV, because all the PKs from the base table need 
to be at the MV and  it is not possible to set the MV with with more than 1 
non-PK column from the base table as the MV PK.  But still working fine.


That is my first Cassandra use case and the guidance provided by  you guys 
pretty important.

Thanks very much for the answers, questions and suggestions.


--
IPVP


From: DuyHai Doan 
Reply: user@cassandra.apache.org 
>
Date: January 12, 2016 at 10:27:45 AM
To: user@cassandra.apache.org 
>
Cc: Jack Krupansky >
Subject:  Re: Modeling contact list, plain table or List

1)SELECT all rows from user_contact excluding the one  that the user wants to 
get rid of.
2) DELETE all the user_contact rows  for that particular user .
3) INSERT  the result of 1).

--> Why don't you do: DELETE FROM user_contact WHERE userid=xxx AND 
contactname= ?

The Materialized View will be automagically updated by Cassandra with a query 
similar to DELETE FROM user_contact_by_favorite WHERE userid=xxx AND 
is_favorite=zzz AND contactname=yyy

On Mon, Jan 11, 2016 at 10:40 PM, Jonathan Haddad 
mailto:j...@jonhaddad.com>> wrote:
In general I advise people avoid lists and use Maps or Sets instead.

Using this data model, for instance, it's easy to remove a specific Address 
from a user:

CREATE TYPE address (
  street text,
  city text,
  zip_code int,
);

CREATE TABLE user (
user_id int primary key,
addresses map>
);

When I want to remove one of the addresses from a user, I can do this:

cqlsh:test> delete addresses['home'] from user where user_id =  1;


Hope that helps,
Jon


On Mon, Jan 11, 2016 at 1:20 PM I PVP 
mailto:i...@hotmail.com>> wrote:
Well…the way it is now  it is not possible to delete a specific contact row 
from the base table at all. Because a DELETE statement only works with  PK in 
the WHERE  clausule. Non PK columns cannot be in the DELETE WHERE clausule.
https://docs.datastax.com/en/cql/3.3/cql/cql_reference/delete_r.html

The way it is now  It is only possible to delete the entire contact list  for 
that specific user.
Looks like will need to:
1)SELECT all rows from user_contact excluding the one  that the user wants to 
get rid of.
2) DELETE all the user_contact rows  for that particular user .
3) INSERT  the result of 1).

Is that the proper way to achieve it or Am I missing some point in the modeling 
that would allow to delete a specific contact row  and still able to comply 
with the select requirements?

Thanks
--
IPVP


From: Jack Krupansky 
Reply: user@cassandra.apache.org 
>
Date: January 11, 2016 at 7:00:04 PM

To: user@cassandra.apache.org 
>
Subject:  Re: Modeling contact list, plain table or List

That's the beauty of MV - Cassandra automatically updates the MVs when the base 
table changes, including deletions, which is why all of the PK columns from the 
base table needed to be in the MV PK.

-- Jack Krupansky

On Mon, Jan 11, 2016 at 3:41 PM, I PVP 
mailto:i...@hotmail.com>> wrote:
The below table and materialized view will solve the SELECT requirements of my 
current application .
The challenge now is when the user decides to DELETE one specific contact from 
his contact list. I could add the objectid to a composite partition key 
together with the userid. But that would make the SELECT inviable.

 Any ideas/suggestions?


CREATE TABLE communication.user_contact (
userid int,
con

Re: what consistency level should I set when using IF NOT EXIST or UPDATE IF statements ?

2016-01-12 Thread DuyHai Doan
There are 2 levels of consistency levels you can define on your query when
using Lightweight Transaction:

- one for the Paxos round: SERIAL or LOCAL_SERIAL (which indeed corresponds
to QUORUM/LOCAL_QUORUM but named differently so people do not get confused)

- one for the consistency of the mutation itself. In this case you can use
any CL except SERIAL/LOCAL_SERIAL

Setting the consistency level for Paxos is useful in the context of multi
data centers only. SERIAL => require a majority wrt RF in all DCs.
LOCAL_SERIAL => majority wrt RF in local DC only

Hope that helps



On Thu, Jan 7, 2016 at 10:44 AM, Hiroyuki Yamada  wrote:

> Hi,
>
> I've been doing some POCs of lightweight transactions and
> I come up with some questions, so please let me ask them to you here.
>
> So the question is:
> what consistency level should I set when using IF NOT EXIST or UPDATE IF
> statements ?
>
> I used the statements with ONE and QUORUM first, then it seems fine.
> But, when I set SERIAL, it gave me the following error.
>
> === error message ===
> Caused by: com.datastax.driver.core.exceptions.InvalidQueryException:
> SERIAL is not supported as conditional update commit consistency. Use ANY
> if you mean "make sure it is accepted but I don't care how many replicas
> commit it for non-SERIAL reads"
> === error message ===
>
>
> So, I'm wondering what's SERIAL for when writing (and reading) and
> what the differences are in setting ONE, QUORUM and ANY when using IF NOT
> EXIST or UPDATE IF statements.
>
> Could you give me some advises ?
>
> Thanks,
> Hiro
>
>
>
>
>


Re: Recommendations for an embedded Cassandra and Unit Tests

2016-01-12 Thread DuyHai Doan
"What I'm noticing with these projects is that they don't handle CQL files
properly"

--> your concern is very legit. But handling CQL files properly is very
complex, let me explain the reasons.

A naive solution if you want to handle CQL syntax is to re-use the ANTLR
grammar file here:
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/Cql.g

 I've gone down this path in the past and it's nearly impossible, simply
because the Cql.g grammar file is using a lot of "internal" Cassandra
classes. Just look at the import block at the beginning of the file.

At a higher level, we should clearly define the "scope" of a CQL script
executor. Is it responsible for 1) parsing CQL statements or 2) validating
CQL statements ?

As far as I'm concerned, point 2) should be done by Cassandra. If we limit
the scope of a script executor to point 1) it's sufficient.

Indeed the remaining challenge is : how to split a block of input text that
contains multiples CQL statements into a list of CQL statements that can be
executed sequentially (or in //) by the Java driver ?

The Zeppelin Cassandra interpreter is using Scala combinator parser to
define a minimum grammar to split differents CQL statements apart:
https://github.com/doanduyhai/incubator-zeppelin/blob/CassandraInterpreter-V2/cassandra/src/main/scala/org/apache/zeppelin/cassandra/ParagraphParser.scala#L179-L198

Until Cassandra 2.1, it's pretty easy, the semi-colon (;) can be used as
statement separator. Since Cassandra 2.2 and the introduction of UDF, it's
much more complex. Semi-colon can appears in Java source code block of the
definition of a function so using it as separator no longer works.

A complex regular expression like this:
https://github.com/doanduyhai/incubator-zeppelin/blob/CassandraInterpreter-V2/cassandra/src/main/scala/org/apache/zeppelin/cassandra/ParagraphParser.scala#L55-L69
is necessary to parse UDF creation statements correctly.

In a nutshell, parsing (and even not validating) CQL is harder than most
people think.



On Mon, Jan 11, 2016 at 10:52 PM, Richard L. Burton III 
wrote:

> What I'm noticing with these projects is that they don't handle CQL files
> properly. e.g., cassandra-unit dies when you have a string that contains ;
> inside of it. The parsing logic they use is very primitive in the sense
> they simple look for ; to denote the end of a statement.
>
> Is there any class in Cassandra I could use that given a *.cql file, it'll
> return a list of statements inside of it?
>
> Looking at CQLParser, it's only good for parsing a single statement vs. a
> file that contains multiple statements.
>
>
> On Mon, Jan 11, 2016 at 3:06 PM, DuyHai Doan  wrote:
>
>> Achilles 4.x does offer an embedded Cassandra server support with some
>> utility classes like ScriptExecutor. It supports C* 2.2 currently :
>>
>> https://github.com/doanduyhai/Achilles/wiki/CQL-embedded-cassandra-server
>> Le 11 janv. 2016 20:47, "Richard L. Burton III"  a
>> écrit :
>>
>>> I'm looking to see what's recommended for an embedded version of
>>> Cassandra, just for unit testing.
>>>
>>> I'm looking at https://github.com/jsevellec/cassandra-unit/wiki but I
>>> wanted to see if there's was a better recommendation?
>>>
>>> --
>>> -Richard L. Burton III
>>> @rburton
>>>
>>
>
>
> --
> -Richard L. Burton III
> @rburton
>


Re: Modeling contact list, plain table or List

2016-01-12 Thread DuyHai Doan
1)SELECT all rows from user_contact excluding the one  that the user wants
to get rid of.
2) DELETE all the user_contact rows  for that particular user .
3) INSERT  the result of 1).

--> Why don't you do: DELETE FROM user_contact WHERE userid=xxx AND
contactname= ?

The Materialized View will be automagically updated by Cassandra with a
query similar to DELETE FROM user_contact_by_favorite WHERE userid=xxx AND
is_favorite=zzz AND contactname=yyy

On Mon, Jan 11, 2016 at 10:40 PM, Jonathan Haddad  wrote:

> In general I advise people avoid lists and use Maps or Sets instead.
>
> Using this data model, for instance, it's easy to remove a specific
> Address from a user:
>
> CREATE TYPE address (
>   street text,
>   city text,
>   zip_code int,
> );
>
> CREATE TABLE user (
> user_id int primary key,
> addresses map>
> );
>
> When I want to remove one of the addresses from a user, I can do this:
>
> cqlsh:test> delete addresses['home'] from user where user_id =  1;
>
>
> Hope that helps,
> Jon
>
>
> On Mon, Jan 11, 2016 at 1:20 PM I PVP  wrote:
>
>> Well…the way it is now  it is not possible to delete a specific contact
>> row from the base table at all. Because a DELETE statement only works with
>>  PK in the WHERE  clausule. Non PK columns cannot be in the DELETE WHERE
>> clausule.
>> https://docs.datastax.com/en/cql/3.3/cql/cql_reference/delete_r.html
>>
>> The way it is now  It is only possible to delete the entire contact list
>>  for that specific user.
>> Looks like will need to:
>> 1)SELECT all rows from user_contact excluding the one  that the user
>> wants to get rid of.
>> 2) DELETE all the user_contact rows  for that particular user .
>> 3) INSERT  the result of 1).
>>
>> Is that the proper way to achieve it or Am I missing some point in the
>> modeling that would allow to delete a specific contact row  and still able
>> to comply with the select requirements?
>>
>> Thanks
>> --
>> IPVP
>>
>>
>> From: Jack Krupansky 
>> 
>> Reply: user@cassandra.apache.org >
>> 
>> Date: January 11, 2016 at 7:00:04 PM
>>
>> To: user@cassandra.apache.org >
>> 
>> Subject:  Re: Modeling contact list, plain table or List
>>
>> That's the beauty of MV - Cassandra automatically updates the MVs when
>> the base table changes, including deletions, which is why all of the PK
>> columns from the base table needed to be in the MV PK.
>>
>> -- Jack Krupansky
>>
>> On Mon, Jan 11, 2016 at 3:41 PM, I PVP  wrote:
>>
>>> The below table and materialized view will solve the SELECT requirements
>>> of my current application .
>>> The challenge now is when the user decides to DELETE one specific
>>> contact from his contact list. I could add the objectid to a composite
>>> partition key together with the userid. But that would make the SELECT
>>> inviable.
>>>
>>>  Any ideas/suggestions?
>>>
>>>
>>> CREATE TABLE communication.user_contact (
>>> userid int,
>>> contactname text,
>>> contactid int,
>>> createdat timeuuid,
>>> favoriteat timestamp,
>>> isfavorite boolean,
>>> objectid timeuuid,
>>> PRIMARY KEY (userid, contactname)
>>> ) WITH CLUSTERING ORDER BY ( contactname DESC )
>>>
>>>
>>> CREATE MATERIALIZED VIEW communication.user_contact_by_favorite AS
>>> SELECT userid, isfavorite, contactname, contactid, createdat,
>>> favoriteat, objectid
>>> FROM user_contact
>>> WHERE userid IS NOT NULL AND isfavorite IS NOT NULL AND contactname IS
>>> NOT NULL
>>> PRIMARY KEY ( ( userid, isfavorite ), contactname )
>>> WITH CLUSTERING ORDER BY ( contactname DESC )
>>>
>>> Thanks
>>>
>>> --
>>> IPVP
>>>
>>>
>>> From: DuyHai Doan  
>>> Reply: user@cassandra.apache.org >
>>> 
>>> Date: January 11, 2016 at 11:14:10 AM
>>>
>>> To: user@cassandra.apache.org >
>>> 
>>> Subject:  Re: Modeling contact list, plain table or List
>>>
>>> In the current iteration of materialized view, it is still not possible
>>> to have WHERE clause other than IS NOT NULL so is_favourite IS TRUE
>>> won't work.
>>>
>>> Still there is a JIRA created to support this feature :
>>> https://issues.apache.org/jira/browse/CASSANDRA-10368
>>>
>>> About cardinality of favorite vs non-favorites, it doesn't matter in this
>>> case because the OP said "Less then one hundred contacts by user is the
>>> normal."
>>>
>>> So even if all contacts are stuck in one unique favorite state, the
>>> materialized view partition for one user is at most 100. Even for extreme
>>> edge case with users having 10 000 contacts, it's still a manageable
>>> partition size for C*.
>>>
>>> But I agree it is important to know before-hand the
>>> favorite/non-favorite update frequency since it will impact the write
>>> throughput on the MV.
>>>
>>> For more details on materialized view impl and performance:
>>> http://www.doanduyhai.com/blog/?p=1930
>>>
>>> On Mon, Jan 11, 2016 at 1:36 PM, Jack Krupansky <
>>> jack.krupan...@gmail.com> wrote:
>>>
 The new Materialized View feature is just an automated way of creating
 and maintaining what people used to call a "query table", which i

Re: In UJ status for over a week trying to rejoin cluster in Cassandra 3.0.1

2016-01-12 Thread DuyHai Doan
Oh, sorry, did not notice the version in the title. Did you check the
system.log to verify if there isn't any Exception related to data streaming
? What is the output of "nodetool tpstats" ?

On Tue, Jan 12, 2016 at 1:00 PM, DuyHai Doan  wrote:

> What is your Cassandra version ? In earlier versions there was some issues
> with streaming that can make the joining process stuck.
>
> On Mon, Jan 11, 2016 at 6:57 AM, Carlos A  wrote:
>
>> Hello all,
>>
>> I have a small dev environment with 4 machines. One of them, I had it
>> removed (.33) from the cluster because I wanted to upgrade its HD to a SSD.
>> I then reinstalled it and tried to join. It is on UJ status for a week now
>> and no changes.
>>
>> I had tried node-repair etc but nothing.
>>
>> nodetool status output
>>
>> Datacenter: DC1
>> ===
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address   Load   Tokens   OwnsHost ID
>>   Rack
>> UN  192.168.1.30  16.13 MB   256  ?
>> 0e524b1c-b254-45d0-98ee-63b8f34a8531  RAC1
>> UN  192.168.1.31  20.12 MB   256  ?
>> 1f8000f5-026c-42c7-8189-cf19fbede566  RAC1
>> UN  192.168.1.32  17.73 MB   256  ?
>> 7b06f9e9-7c41-4364-ab18-f6976fd359e4  RAC1
>> UJ  192.168.1.33  877.6 KB   256  ?
>> 7a1507b5-198e-4a3a-a9fd-7af9e588fde2  RAC1
>>
>> Note: Non-system keyspaces don't have the same replication settings,
>> effective ownership information is meaningless
>>
>> Any tips on fixing this?
>>
>> Thanks,
>>
>> C.
>>
>
>


Re: In UJ status for over a week trying to rejoin cluster in Cassandra 3.0.1

2016-01-12 Thread DuyHai Doan
What is your Cassandra version ? In earlier versions there was some issues
with streaming that can make the joining process stuck.

On Mon, Jan 11, 2016 at 6:57 AM, Carlos A  wrote:

> Hello all,
>
> I have a small dev environment with 4 machines. One of them, I had it
> removed (.33) from the cluster because I wanted to upgrade its HD to a SSD.
> I then reinstalled it and tried to join. It is on UJ status for a week now
> and no changes.
>
> I had tried node-repair etc but nothing.
>
> nodetool status output
>
> Datacenter: DC1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address   Load   Tokens   OwnsHost ID
>   Rack
> UN  192.168.1.30  16.13 MB   256  ?
> 0e524b1c-b254-45d0-98ee-63b8f34a8531  RAC1
> UN  192.168.1.31  20.12 MB   256  ?
> 1f8000f5-026c-42c7-8189-cf19fbede566  RAC1
> UN  192.168.1.32  17.73 MB   256  ?
> 7b06f9e9-7c41-4364-ab18-f6976fd359e4  RAC1
> UJ  192.168.1.33  877.6 KB   256  ?
> 7a1507b5-198e-4a3a-a9fd-7af9e588fde2  RAC1
>
> Note: Non-system keyspaces don't have the same replication settings,
> effective ownership information is meaningless
>
> Any tips on fixing this?
>
> Thanks,
>
> C.
>


Re: Sorting & pagination in apache cassandra 2.1

2016-01-12 Thread Carlos Alonso
Hi Anuja.

Cassandra saves records on disk sorted by the clustering column. In this
case you haven't selected any but it looks like is picking birth_year as
the clustering column. I don't know which is the clustering column
selection algorithm though (maybe alphabetically by name?).

Regards

Carlos Alonso | Software Engineer | @calonso 

On 12 January 2016 at 07:30, anuja jain  wrote:

> 1 more question, what does it mean by "cassandra inherently sorts data"?
> For eg:
> I have a table with schema
>
> CREATE TABLE users (
>
> ...   user_name varchar PRIMARY KEY,
>
> ...   password varchar,
>
> ...   gender varchar,
>
> ...   session_token varchar,
>
> ...   state varchar,
>
> ...   birth_year bigint
>
> ... );
>
> I inserted data in random order but I on firing select statement I get
> data sorted by birth_year..  So why does this happen?
>
>  cqlsh:learning> select * from users;
>
>
>
> user_name | birth_year | gender | password | session_token | state
>
> ---+++--+---+-
>
>   John |   1979 |  M | qwer |   abc |  JK
>
>Dharini |   1980 |  F |  Xyz |   abc | Gujarat
>
>  Keval |   1990 |  M |  DDD |   abc |  WB
>
> On Tue, Jan 12, 2016 at 12:52 PM, anuja jain  wrote:
>
>> What is the alternative if my cassandra version is prior to 3.0
>> (specifically) 2.1) and which is already in production.?
>>
>> Also as per the docs given at
>>
>>
>> https://docs.datastax.com/en/datastax_enterprise/4.6/datastax_enterprise/srch/srchCapazty.html
>>  what does it mean by we need to do capacity planning if we need to
>> search using SOLR. What is other alternative when we do not know the size
>> of the data ?
>>
>>  Thanks,
>>
>> Anuja
>>
>>
>>
>> On Fri, Jan 8, 2016 at 12:15 AM, Tyler Hobbs  wrote:
>>
>>>
>>> On Thu, Jan 7, 2016 at 6:45 AM, anuja jain  wrote:
>>>
 My question is, what is the alternative if we need to order by col3 or
 col4 in my above example without including col2 in order by clause.

>>>
>>> The server-side alternative is to create a second table (or a
>>> materialized view, if you're using 3.0+) that uses a different clustering
>>> order.  Cassandra purposefully only supports simple and efficient queries
>>> that can be handled quickly (with a few exceptions), and arbitrary ordering
>>> is not part of that, especially if you consider complications like paging.
>>>
>>>
>>> --
>>> Tyler Hobbs
>>> DataStax 
>>>
>>
>>
>