Re: Multinode Cassandra and sstableloader

2015-04-01 Thread Alain RODRIGUEZ
From Michael Laing - posted on the wrong thread :

We use Alain's solution as well to make major operational revisions.

We have a red team and a blue team in each AWS region, so we just add
and drop datacenters to get where we want to be.

Pretty simple.

2015-03-31 15:50 GMT+02:00 Alain RODRIGUEZ arodr...@gmail.com:

 IMHO, the most straight forward solution is to add cluster2 as a new DC
 for mykeyspace and then drop the old DC.

 That's how we migrated to VPC (AWS) and we love this approach since you
 don't have to mess with your existing cluster, plus sync is made
 automatically and you can then drop your old DC safely, when you are sure.

 I put steps on this ML long time ago:
 https://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201406.mbox/%3cca+vsrlopop7th8nx20aoz3as75g2jrjm3ryx119deklynhq...@mail.gmail.com%3E
 Also Datastax docs:
 https://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html

 get data from cluster1,
 put it to cluster2
 wipe cluster1

 I would definitely use this method to do this (I actually did already,
 multiple times).

 Up to you, I heard once that there is almost as much way of doing
 operational on Cassandra as the number of operators :). You should go with
 method you can be confident with. I can assure the one I propose is quite
 secure.

 C*heers,

 Alain

 2015-03-31 15:32 GMT+02:00 Serega Sheypak serega.shey...@gmail.com:

 I have to ask you if you considered doing an Alter keyspace, change RF
 The idea is dead simple:
 get data from cluster1,
 put it to cluster2
 vipe cluster1

 I understand drawbacks of streaming sstableloader approach, I need right
 now something easy. Later we consider switch to Priam since it does
 backup/restore in a right way.

 2015-03-31 14:45 GMT+02:00 Alain RODRIGUEZ arodr...@gmail.com:

 Hi,

 Despite of I understand that it's not the best solution, I need it for
 testing purposes, I have to ask you if you considered doing an Alter
 keyspace, change RF  1 for mykeyspace on cluster2 and nodetool rebuild
 to add a new DC (your cluster2) ?

 In the case you go your way (sstableloader) also advice you to make a
 snapshot (instead of just flushing) to avoid fails due to compactions on
 your active cluster1.

 To answer your question, sstableloader is supposed to distribute
 correctly data on the new cluster depending on your RF and topology.
 Basically if you run sstable loader just on sstable c1.node1 my guess is
 that you will have all the data present on c1.node1 stored on the new c2
 (each data to corresponding node). So if you have an RF=3 on c1, you should
 have all the data on c2 just by running sstableloader from c1.node1, if you
 are using RF=1 on c1, then you need to load data from c1.each_node. I
 suppose that cluster2.nodeXXX doesn't matter and act as a coordinator.

 I never used the tool, but that's what would be logical imho. Wait for
 a confirmation as I wouldn't to lead you to a failure of any kind. Also, I
 don't know if data is also replicated directly with sstableloader or if you
 need to repair c2 after loading data.

 C*heers,

 Alain

 2015-03-31 13:21 GMT+02:00 Serega Sheypak serega.shey...@gmail.com:

  Hi, I have a simple question and can't find related info in docs.

 I have cluster1 with 3 nodes and cluster2 with 5 nodes. I want to
 transfer whole keyspace named 'mykeyspace' data from cluster1 to cluster2
 using sstableloader. I understand that it's not the best solution, I need
 it for testing purposes.

 What I'm going to do:

1. Recreate keyspace schema on cluster2 using schema from cluster1
2. nodetool flush for mykeyspace.source_table being exported from
cluster1 to cluster2
3.

Run sstableloader for each table on cluster1.node01

sstableloader -d cluster2.nodeXXX.com

 /var/lib/cassandra/data/mykeyspace/source_table-83f369e0d6e511e4b3a6010e8d2b68af/

 What should I get as a result on cluster2?

 *ALL* data from source_table?

 or

 Just data stored in *partition of source_table*

 I'm confused. Doc says I just run this command to export table from
 cluster1 to cluster2, but I specify path to a part of source_table data,
 since other parts of table should be on other nodes.







Testing sstableloader between Cassandra 2.1 DSE and community edition 2.1

2015-04-01 Thread Serega Sheypak
Hi,  I have 2 cassandra clusters.
cluster1 is datastax community 2.1
cluster2 is datastax DSE

I can run sstableloader from cluster1(Community) and stream data to
cluster2 (DSE)
But I get exception while streaming from cluster2 (DSE) to cluster1
(Community)


The expection is:

Could not retrieve endpoint ranges:

java.lang.NullPointerException

java.lang.RuntimeException: Could not retrieve endpoint ranges:

at
org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:282)

at
org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:149)

at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:94)

Caused by: java.lang.NullPointerException

at
org.apache.cassandra.serializers.BooleanSerializer.deserialize(BooleanSerializer.java:33)

at
org.apache.cassandra.serializers.BooleanSerializer.deserialize(BooleanSerializer.java:24)

at
org.apache.cassandra.db.marshal.AbstractType.compose(AbstractType.java:142)

at
org.apache.cassandra.cql3.UntypedResultSet$Row.getBoolean(UntypedResultSet.java:102)

at
org.apache.cassandra.config.CFMetaData.fromSchemaNoColumnsNoTriggers(CFMetaData.java:1701)

at
org.apache.cassandra.config.CFMetaData.fromThriftCqlRow(CFMetaData.java:1059)

at
org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:274)


Re: SSTable structure

2015-04-01 Thread Serega Sheypak
Hi bharat,
you are talking about Cassandra 1.2.5 Does it fit Cassandra 2.1?
Were there any significant changes to SSTable format and layout?
Thank you, article is interesting.

Hi jacob jacob.rho...@me.com,
HBase does it for example. http://hbase.apache.org/book.html#_hfile_format_2
It would be great to give general ideas. It could help to understand schema
design problems. You start to understand better how Cassandra scans data
how you can utilize its power.

2015-04-01 5:39 GMT+02:00 Bharatendra Boddu bharatend...@gmail.com:

 Some time back I created a blog article about the SSTable storage format
 with some code references.

 Cassandra: SSTable Storage Format
 http://distributeddatastore.blogspot.com/2013/08/cassandra-sstable-storage-format.html

 - bharat

 On Mon, Mar 30, 2015 at 5:24 PM, Jacob Rhoden jacob.rho...@me.com wrote:

 Yes updating code and documentation can sometimes be annoying, you would
 only ever maintain both if it were important. It comes down or is having
 the format of the data files documented for everyone to understand an
 important thing?

 __
 Sent from iPhone

 On 31 Mar 2015, at 11:07 am, daemeon reiydelle daeme...@gmail.com
 wrote:

 why? Then there are 2 places 2 maintain or get jira'ed for a discrepancy.
 On Mar 30, 2015 4:46 PM, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Mar 30, 2015 at 1:38 AM, Pierre pierredev...@gmail.com wrote:

 Does anyone know if there is a more complete and up to date
 documentation about the sstable files structure (data, index, stats etc.)
 than this one : http://wiki.apache.org/cassandra/ArchitectureSSTable


 No, there isn't. Unfortunately you will have to read the source.


 I'm looking for a full specification, with schema of the structure if
 possible.


 It would be nice if such fundamental things were documented, wouldn't it?

 =Rob






Re: Testing sstableloader between Cassandra 2.1 DSE and community edition 2.1

2015-04-01 Thread Serega Sheypak
Sorry
cluster1 community version is: ii  cassandra   2.1.3
distributed storage system for structured data
cluster2 DSE version is: ii  dse-libcassandra4.6.2-1
  The DataStax Enterprise package includes a production-certifie

2015-04-01 14:53 GMT+02:00 Serega Sheypak serega.shey...@gmail.com:

 Hi,  I have 2 cassandra clusters.
 cluster1 is datastax community 2.1
 cluster2 is datastax DSE

 I can run sstableloader from cluster1(Community) and stream data to
 cluster2 (DSE)
 But I get exception while streaming from cluster2 (DSE) to cluster1
 (Community)


 The expection is:

 Could not retrieve endpoint ranges:

 java.lang.NullPointerException

 java.lang.RuntimeException: Could not retrieve endpoint ranges:

 at
 org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:282)

 at
 org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:149)

 at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:94)

 Caused by: java.lang.NullPointerException

 at
 org.apache.cassandra.serializers.BooleanSerializer.deserialize(BooleanSerializer.java:33)

 at
 org.apache.cassandra.serializers.BooleanSerializer.deserialize(BooleanSerializer.java:24)

 at
 org.apache.cassandra.db.marshal.AbstractType.compose(AbstractType.java:142)

 at
 org.apache.cassandra.cql3.UntypedResultSet$Row.getBoolean(UntypedResultSet.java:102)

 at
 org.apache.cassandra.config.CFMetaData.fromSchemaNoColumnsNoTriggers(CFMetaData.java:1701)

 at
 org.apache.cassandra.config.CFMetaData.fromThriftCqlRow(CFMetaData.java:1059)

 at
 org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:274)




[SECURITY ANNOUNCEMENT] CVE-2015-0225

2015-04-01 Thread Jake Luciani
CVE-2015-0225: Apache Cassandra remote execution of arbitrary code

Severity: Important

Vendor:
The Apache Software Foundation

Versions Affected:
Cassandra 1.2.0 to 1.2.19
Cassandra 2.0.0 to 2.0.13
Cassandra 2.1.0 to 2.1.3

Description:
Under its default configuration, Cassandra binds an unauthenticated
JMX/RMI interface to all network interfaces.  As RMI is an API for the
transport and remote execution of serialized Java, anyone with access
to this interface can execute arbitrary code as the running user.

Mitigation:
1.2.x has reached EOL, so users of = 1.2.x are recommended to upgrade
to a supported version of Cassandra, or manually configure encryption
and authentication of JMX,
(seehttps://wiki.apache.org/cassandra/JmxSecurity).
2.0.x users should upgrade to 2.0.14
2.1.x users should upgrade to 2.1.4
Alternately, users of any version not wishing to upgrade can
reconfigure JMX/RMI to enable encryption and authentication according
to https://wiki.apache.org/cassandra/JmxSecurityor
http://docs.oracle.com/javase/7/docs/technotes/guides/management/agent.html

Credit:
This issue was discovered by Georgi Geshev of MWR InfoSecurity


Frequent timeout issues

2015-04-01 Thread Amlan Roy
Hi,

I am new to Cassandra. I have setup a cluster with Cassandra 2.0.13. I am 
writing the same data in HBase and Cassandra and find that the writes are 
extremely slow in Cassandra and frequently seeing exception “Cassandra timeout 
during write query at consistency ONE. The cluster size for both HBase and 
Cassandra are same. 

Looks like something is wrong with my cluster setup. What can be the possible 
issue? Data and commit logs are written into two separate disks. 

Regards,
Amlan

Re: Frequent timeout issues

2015-04-01 Thread Eric R Medley
Amlan,

Can you provide information on how much data is being written? Are any of the 
columns really large? Are any writes succeeding or are all timing out?

Regards,

Eric R Medley

 On Apr 1, 2015, at 9:03 AM, Amlan Roy amlan@cleartrip.com wrote:
 
 Hi,
 
 I am new to Cassandra. I have setup a cluster with Cassandra 2.0.13. I am 
 writing the same data in HBase and Cassandra and find that the writes are 
 extremely slow in Cassandra and frequently seeing exception “Cassandra 
 timeout during write query at consistency ONE. The cluster size for both 
 HBase and Cassandra are same. 
 
 Looks like something is wrong with my cluster setup. What can be the possible 
 issue? Data and commit logs are written into two separate disks. 
 
 Regards,
 Amlan



Datastax driver object mapper and union field

2015-04-01 Thread Craig Ching
Hi!

We need to implement a union field in our cassandra data model and we're
using the datastax Mapper.  Anyone have any recommendations for doing
this?  I'm thinking something like:

public class Value {
  int dataType;
  String valueAsString;
  double valueAsDouble;
}

If the Value is a String, do we need to store a double as well (and vice
versa)?  Or should we convert the double to a java.lang.Double and null
it?  If we did the latter, do we have to worry about tombstones?

Thanks and appreciate any advice!

Cheers,
Craig


Re: Frequent timeout issues

2015-04-01 Thread Eric R Medley
Also, can you provide the table details and the consistency level you are using?

Regards,

Eric R Medley

 On Apr 1, 2015, at 9:13 AM, Eric R Medley emed...@xylocore.com wrote:
 
 Amlan,
 
 Can you provide information on how much data is being written? Are any of the 
 columns really large? Are any writes succeeding or are all timing out?
 
 Regards,
 
 Eric R Medley
 
 On Apr 1, 2015, at 9:03 AM, Amlan Roy amlan@cleartrip.com 
 mailto:amlan@cleartrip.com wrote:
 
 Hi,
 
 I am new to Cassandra. I have setup a cluster with Cassandra 2.0.13. I am 
 writing the same data in HBase and Cassandra and find that the writes are 
 extremely slow in Cassandra and frequently seeing exception “Cassandra 
 timeout during write query at consistency ONE. The cluster size for both 
 HBase and Cassandra are same. 
 
 Looks like something is wrong with my cluster setup. What can be the 
 possible issue? Data and commit logs are written into two separate disks. 
 
 Regards,
 Amlan
 



Re: Frequent timeout issues

2015-04-01 Thread Amlan Roy
Hi Eric,

Thanks for the reply. Some columns are big but I see the issue even when I stop 
storing the big columns. Some of the writes are timing out, not all. Where can 
I find the number of writes to Cassandra?

Regards,
Amlan

On 01-Apr-2015, at 7:43 pm, Eric R Medley emed...@xylocore.com wrote:

 Amlan,
 
 Can you provide information on how much data is being written? Are any of the 
 columns really large? Are any writes succeeding or are all timing out?
 
 Regards,
 
 Eric R Medley
 
 On Apr 1, 2015, at 9:03 AM, Amlan Roy amlan@cleartrip.com wrote:
 
 Hi,
 
 I am new to Cassandra. I have setup a cluster with Cassandra 2.0.13. I am 
 writing the same data in HBase and Cassandra and find that the writes are 
 extremely slow in Cassandra and frequently seeing exception “Cassandra 
 timeout during write query at consistency ONE. The cluster size for both 
 HBase and Cassandra are same. 
 
 Looks like something is wrong with my cluster setup. What can be the 
 possible issue? Data and commit logs are written into two separate disks. 
 
 Regards,
 Amlan
 



Table design for historical data

2015-04-01 Thread Firdousi Farozan
Hi,

My requirement is to design a table for historical state information (not
exactly time-series). For ex: I have devices connecting and disconnecting
to the management platform. I want to know the details such as (name, mac,
os, image, etc.) for all devices connected to the management platform in a
given interval (start and end time).

Any help on table design for this use-case?

Regards,
Firdousi


Re: Frequent timeout issues

2015-04-01 Thread Amlan Roy
Did not see any exception in cassandra.log and system.log. Monitored using 
JConsole. Did not see anything wrong. Do I need to see any specific info? Doing 
almost 1000 writes/sec.

HBase and Cassandra are running on different clusters. For cassandra I have 6 
nodes with 64GB RAM(Heap is at default setting) and 32 cores.

On 01-Apr-2015, at 8:43 pm, Eric R Medley emed...@xylocore.com wrote:

 Are you seeing any exceptions in the cassandra logs? What are the loads on 
 your servers? Have you monitored the performance of those servers? How many 
 writes are you performing at a time? How many writes per seconds?
 
 Regards,
 
 Eric R Medley
 
 On Apr 1, 2015, at 9:40 AM, Amlan Roy amlan@cleartrip.com wrote:
 
 Write consistency level is ONE.
 
 This is the describe output for one of the tables.
 
 CREATE TABLE event_data (
   event text,
   week text,
   bucket int,
   date timestamp,
   unique text,
   adt int,
   age listint,
   arrival listtimestamp,
   bank text,
   bf double,
   cabin text,
   card text,
   carrier listtext,
   cb double,
   channel text,
   chd int,
   company text,
   cookie text,
   coupon listtext,
   depart listtimestamp,
   dest listtext,
   device text,
   dis double,
   domain text,
   duration bigint,
   emi int,
   expressway boolean,
   flight listtext,
   freq_flyer listtext,
   host text,
   host_ip text,
   inf int,
   instance text,
   insurance text,
   intl boolean,
   itinerary text,
   journey text,
   meal_pref listtext,
   mkp double,
   name listtext,
   origin listtext,
   pax_type listtext,
   payment text,
   pref_carrier listtext,
   referrer text,
   result_cnt int,
   search text,
   src text,
   src_ip text,
   stops int,
   supplier listtext,
   tags listtext,
   total double,
   trip text,
   user text,
   user_agent text,
   PRIMARY KEY ((event, week, bucket), date, unique)
 ) WITH CLUSTERING ORDER BY (date DESC, unique ASC) AND
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.10 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.00 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor’};
 
 
 On 01-Apr-2015, at 8:00 pm, Eric R Medley emed...@xylocore.com wrote:
 
 Also, can you provide the table details and the consistency level you are 
 using?
 
 Regards,
 
 Eric R Medley
 
 On Apr 1, 2015, at 9:13 AM, Eric R Medley emed...@xylocore.com wrote:
 
 Amlan,
 
 Can you provide information on how much data is being written? Are any of 
 the columns really large? Are any writes succeeding or are all timing out?
 
 Regards,
 
 Eric R Medley
 
 On Apr 1, 2015, at 9:03 AM, Amlan Roy amlan@cleartrip.com wrote:
 
 Hi,
 
 I am new to Cassandra. I have setup a cluster with Cassandra 2.0.13. I am 
 writing the same data in HBase and Cassandra and find that the writes are 
 extremely slow in Cassandra and frequently seeing exception “Cassandra 
 timeout during write query at consistency ONE. The cluster size for both 
 HBase and Cassandra are same. 
 
 Looks like something is wrong with my cluster setup. What can be the 
 possible issue? Data and commit logs are written into two separate disks. 
 
 Regards,
 Amlan
 
 
 
 



Re: Frequent timeout issues

2015-04-01 Thread Amlan Roy
Using the datastax driver without batch.
http://www.datastax.com/documentation/developer/java-driver/2.1/java-driver/whatsNew2.html


On 01-Apr-2015, at 9:15 pm, Brian O'Neill b...@alumni.brown.edu wrote:

 
 Are you using the storm-cassandra-cql driver? 
 (https://github.com/hmsonline/storm-cassandra-cql)
 
 If so, what version?
 Batching or no batching?
 
 -brian
 
 ---
 Brian O'Neill 
 Chief Technology Officer
 Health Market Science, a LexisNexis Company
 215.588.6024 Mobile • @boneill42 
 
 This information transmitted in this email message is for the intended 
 recipient only and may contain confidential and/or privileged material. If 
 you received this email in error and are not the intended recipient, or the 
 person responsible to deliver it to the intended recipient, please contact 
 the sender at the email above and delete this email and any attachments and 
 destroy any copies thereof. Any review, retransmission, dissemination, 
 copying or other use of, or taking any action in reliance upon, this 
 information by persons or entities other than the intended recipient is 
 strictly prohibited.
  
 
 
 From: Amlan Roy amlan@cleartrip.com
 Reply-To: user@cassandra.apache.org
 Date: Wednesday, April 1, 2015 at 11:37 AM
 To: user@cassandra.apache.org
 Subject: Re: Frequent timeout issues
 
 Replication factor is 2.
 CREATE KEYSPACE ct_keyspace WITH replication = {
   'class': 'NetworkTopologyStrategy',
   'DC1': '2'
 };
 
 Inserts are happening from Storm using java driver. Using prepared statement 
 without batch.
 
 
 On 01-Apr-2015, at 8:42 pm, Brice Dutheil brice.duth...@gmail.com wrote:
 
 And the keyspace? What is the replication factor.
 
 Also how are the inserts done?
 
 On Wednesday, April 1, 2015, Amlan Roy amlan@cleartrip.com wrote:
 Write consistency level is ONE.
 
 This is the describe output for one of the tables.
 
 CREATE TABLE event_data (
   event text,
   week text,
   bucket int,
   date timestamp,
   unique text,
   adt int,
   age listint,
   arrival listtimestamp,
   bank text,
   bf double,
   cabin text,
   card text,
   carrier listtext,
   cb double,
   channel text,
   chd int,
   company text,
   cookie text,
   coupon listtext,
   depart listtimestamp,
   dest listtext,
   device text,
   dis double,
   domain text,
   duration bigint,
   emi int,
   expressway boolean,
   flight listtext,
   freq_flyer listtext,
   host text,
   host_ip text,
   inf int,
   instance text,
   insurance text,
   intl boolean,
   itinerary text,
   journey text,
   meal_pref listtext,
   mkp double,
   name listtext,
   origin listtext,
   pax_type listtext,
   payment text,
   pref_carrier listtext,
   referrer text,
   result_cnt int,
   search text,
   src text,
   src_ip text,
   stops int,
   supplier listtext,
   tags listtext,
   total double,
   trip text,
   user text,
   user_agent text,
   PRIMARY KEY ((event, week, bucket), date, unique)
 ) WITH CLUSTERING ORDER BY (date DESC, unique ASC) AND
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.10 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.00 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor’};
 
 
 On 01-Apr-2015, at 8:00 pm, Eric R Medley emed...@xylocore.com wrote:
 
 Also, can you provide the table details and the consistency level you are 
 using?
 
 Regards,
 
 Eric R Medley
 
 On Apr 1, 2015, at 9:13 AM, Eric R Medley emed...@xylocore.com wrote:
 
 Amlan,
 
 Can you provide information on how much data is being written? Are any of 
 the columns really large? Are any writes succeeding or are all timing out?
 
 Regards,
 
 Eric R Medley
 
 On Apr 1, 2015, at 9:03 AM, Amlan Roy amlan@cleartrip.com wrote:
 
 Hi,
 
 I am new to Cassandra. I have setup a cluster with Cassandra 2.0.13. I 
 am writing the same data in HBase and Cassandra and find that the writes 
 are extremely slow in Cassandra and frequently seeing exception 
 “Cassandra timeout during write query at consistency ONE. The cluster 
 size for both HBase and Cassandra are same. 
 
 Looks like something is wrong with my cluster setup. What can be the 
 possible issue? Data and commit logs are written into two separate 
 disks. 
 
 Regards,
 Amlan
 
 
 
 
 
 -- 
 Brice
 



Re: Frequent timeout issues

2015-04-01 Thread Eric R Medley
Are HBase and Cassandra running on the same servers? Are the writes to each of 
these databases happening at the same time?

Regards,

Eric R Medley

 On Apr 1, 2015, at 10:12 AM, Brice Dutheil brice.duth...@gmail.com wrote:
 
 And the keyspace? What is the replication factor.
 
 Also how are the inserts done?
 
 On Wednesday, April 1, 2015, Amlan Roy amlan@cleartrip.com 
 mailto:amlan@cleartrip.com wrote:
 Write consistency level is ONE.
 
 This is the describe output for one of the tables.
 
 CREATE TABLE event_data (
   event text,
   week text,
   bucket int,
   date timestamp,
   unique text,
   adt int,
   age listint,
   arrival listtimestamp,
   bank text,
   bf double,
   cabin text,
   card text,
   carrier listtext,
   cb double,
   channel text,
   chd int,
   company text,
   cookie text,
   coupon listtext,
   depart listtimestamp,
   dest listtext,
   device text,
   dis double,
   domain text,
   duration bigint,
   emi int,
   expressway boolean,
   flight listtext,
   freq_flyer listtext,
   host text,
   host_ip text,
   inf int,
   instance text,
   insurance text,
   intl boolean,
   itinerary text,
   journey text,
   meal_pref listtext,
   mkp double,
   name listtext,
   origin listtext,
   pax_type listtext,
   payment text,
   pref_carrier listtext,
   referrer text,
   result_cnt int,
   search text,
   src text,
   src_ip text,
   stops int,
   supplier listtext,
   tags listtext,
   total double,
   trip text,
   user text,
   user_agent text,
   PRIMARY KEY ((event, week, bucket), date, unique)
 ) WITH CLUSTERING ORDER BY (date DESC, unique ASC) AND
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.10 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.00 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor’};
 
 
 On 01-Apr-2015, at 8:00 pm, Eric R Medley emed...@xylocore.com 
 javascript:_e(%7B%7D,'cvml','emed...@xylocore.com'); wrote:
 
 Also, can you provide the table details and the consistency level you are 
 using?
 
 Regards,
 
 Eric R Medley
 
 On Apr 1, 2015, at 9:13 AM, Eric R Medley emed...@xylocore.com 
 javascript:_e(%7B%7D,'cvml','emed...@xylocore.com'); wrote:
 
 Amlan,
 
 Can you provide information on how much data is being written? Are any of 
 the columns really large? Are any writes succeeding or are all timing out?
 
 Regards,
 
 Eric R Medley
 
 On Apr 1, 2015, at 9:03 AM, Amlan Roy amlan@cleartrip.com 
 javascript:_e(%7B%7D,'cvml','amlan@cleartrip.com'); wrote:
 
 Hi,
 
 I am new to Cassandra. I have setup a cluster with Cassandra 2.0.13. I am 
 writing the same data in HBase and Cassandra and find that the writes are 
 extremely slow in Cassandra and frequently seeing exception “Cassandra 
 timeout during write query at consistency ONE. The cluster size for both 
 HBase and Cassandra are same. 
 
 Looks like something is wrong with my cluster setup. What can be the 
 possible issue? Data and commit logs are written into two separate disks. 
 
 Regards,
 Amlan
 
 
 
 
 
 -- 
 Brice



Re: Frequent timeout issues

2015-04-01 Thread Amlan Roy
Replication factor is 2.
CREATE KEYSPACE ct_keyspace WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'DC1': '2'
};

Inserts are happening from Storm using java driver. Using prepared statement 
without batch.


On 01-Apr-2015, at 8:42 pm, Brice Dutheil brice.duth...@gmail.com wrote:

 And the keyspace? What is the replication factor.
 
 Also how are the inserts done?
 
 On Wednesday, April 1, 2015, Amlan Roy amlan@cleartrip.com wrote:
 Write consistency level is ONE.
 
 This is the describe output for one of the tables.
 
 CREATE TABLE event_data (
   event text,
   week text,
   bucket int,
   date timestamp,
   unique text,
   adt int,
   age listint,
   arrival listtimestamp,
   bank text,
   bf double,
   cabin text,
   card text,
   carrier listtext,
   cb double,
   channel text,
   chd int,
   company text,
   cookie text,
   coupon listtext,
   depart listtimestamp,
   dest listtext,
   device text,
   dis double,
   domain text,
   duration bigint,
   emi int,
   expressway boolean,
   flight listtext,
   freq_flyer listtext,
   host text,
   host_ip text,
   inf int,
   instance text,
   insurance text,
   intl boolean,
   itinerary text,
   journey text,
   meal_pref listtext,
   mkp double,
   name listtext,
   origin listtext,
   pax_type listtext,
   payment text,
   pref_carrier listtext,
   referrer text,
   result_cnt int,
   search text,
   src text,
   src_ip text,
   stops int,
   supplier listtext,
   tags listtext,
   total double,
   trip text,
   user text,
   user_agent text,
   PRIMARY KEY ((event, week, bucket), date, unique)
 ) WITH CLUSTERING ORDER BY (date DESC, unique ASC) AND
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.10 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.00 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor’};
 
 
 On 01-Apr-2015, at 8:00 pm, Eric R Medley emed...@xylocore.com wrote:
 
 Also, can you provide the table details and the consistency level you are 
 using?
 
 Regards,
 
 Eric R Medley
 
 On Apr 1, 2015, at 9:13 AM, Eric R Medley emed...@xylocore.com wrote:
 
 Amlan,
 
 Can you provide information on how much data is being written? Are any of 
 the columns really large? Are any writes succeeding or are all timing out?
 
 Regards,
 
 Eric R Medley
 
 On Apr 1, 2015, at 9:03 AM, Amlan Roy amlan@cleartrip.com wrote:
 
 Hi,
 
 I am new to Cassandra. I have setup a cluster with Cassandra 2.0.13. I am 
 writing the same data in HBase and Cassandra and find that the writes are 
 extremely slow in Cassandra and frequently seeing exception “Cassandra 
 timeout during write query at consistency ONE. The cluster size for both 
 HBase and Cassandra are same. 
 
 Looks like something is wrong with my cluster setup. What can be the 
 possible issue? Data and commit logs are written into two separate disks. 
 
 Regards,
 Amlan
 
 
 
 
 
 -- 
 Brice



Re: Frequent timeout issues

2015-04-01 Thread Brice Dutheil
And the keyspace? What is the replication factor.

Also how are the inserts done?

On Wednesday, April 1, 2015, Amlan Roy amlan@cleartrip.com wrote:

 Write consistency level is ONE.

 This is the describe output for one of the tables.

 CREATE TABLE event_data (
   event text,
   week text,
   bucket int,
   date timestamp,
   unique text,
   adt int,
   age listint,
   arrival listtimestamp,
   bank text,
   bf double,
   cabin text,
   card text,
   carrier listtext,
   cb double,
   channel text,
   chd int,
   company text,
   cookie text,
   coupon listtext,
   depart listtimestamp,
   dest listtext,
   device text,
   dis double,
   domain text,
   duration bigint,
   emi int,
   expressway boolean,
   flight listtext,
   freq_flyer listtext,
   host text,
   host_ip text,
   inf int,
   instance text,
   insurance text,
   intl boolean,
   itinerary text,
   journey text,
   meal_pref listtext,
   mkp double,
   name listtext,
   origin listtext,
   pax_type listtext,
   payment text,
   pref_carrier listtext,
   referrer text,
   result_cnt int,
   search text,
   src text,
   src_ip text,
   stops int,
   supplier listtext,
   tags listtext,
   total double,
   trip text,
   user text,
   user_agent text,
   PRIMARY KEY ((event, week, bucket), date, unique)
 ) WITH CLUSTERING ORDER BY (date DESC, unique ASC) AND
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.10 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.00 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor’};


 On 01-Apr-2015, at 8:00 pm, Eric R Medley emed...@xylocore.com
 javascript:_e(%7B%7D,'cvml','emed...@xylocore.com'); wrote:

 Also, can you provide the table details and the consistency level you are
 using?

 Regards,

 Eric R Medley

 On Apr 1, 2015, at 9:13 AM, Eric R Medley emed...@xylocore.com
 javascript:_e(%7B%7D,'cvml','emed...@xylocore.com'); wrote:

 Amlan,

 Can you provide information on how much data is being written? Are any of
 the columns really large? Are any writes succeeding or are all timing out?

 Regards,

 Eric R Medley

 On Apr 1, 2015, at 9:03 AM, Amlan Roy amlan@cleartrip.com
 javascript:_e(%7B%7D,'cvml','amlan@cleartrip.com'); wrote:

 Hi,

 I am new to Cassandra. I have setup a cluster with Cassandra 2.0.13. I am
 writing the same data in HBase and Cassandra and find that the writes are
 extremely slow in Cassandra and frequently seeing exception “Cassandra
 timeout during write query at consistency ONE. The cluster size for both
 HBase and Cassandra are same.

 Looks like something is wrong with my cluster setup. What can be the
 possible issue? Data and commit logs are written into two separate disks.

 Regards,
 Amlan






-- 
Brice


Re: Frequent timeout issues

2015-04-01 Thread Amlan Roy
Write consistency level is ONE.

This is the describe output for one of the tables.

CREATE TABLE event_data (
  event text,
  week text,
  bucket int,
  date timestamp,
  unique text,
  adt int,
  age listint,
  arrival listtimestamp,
  bank text,
  bf double,
  cabin text,
  card text,
  carrier listtext,
  cb double,
  channel text,
  chd int,
  company text,
  cookie text,
  coupon listtext,
  depart listtimestamp,
  dest listtext,
  device text,
  dis double,
  domain text,
  duration bigint,
  emi int,
  expressway boolean,
  flight listtext,
  freq_flyer listtext,
  host text,
  host_ip text,
  inf int,
  instance text,
  insurance text,
  intl boolean,
  itinerary text,
  journey text,
  meal_pref listtext,
  mkp double,
  name listtext,
  origin listtext,
  pax_type listtext,
  payment text,
  pref_carrier listtext,
  referrer text,
  result_cnt int,
  search text,
  src text,
  src_ip text,
  stops int,
  supplier listtext,
  tags listtext,
  total double,
  trip text,
  user text,
  user_agent text,
  PRIMARY KEY ((event, week, bucket), date, unique)
) WITH CLUSTERING ORDER BY (date DESC, unique ASC) AND
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.10 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.00 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor’};


On 01-Apr-2015, at 8:00 pm, Eric R Medley emed...@xylocore.com wrote:

 Also, can you provide the table details and the consistency level you are 
 using?
 
 Regards,
 
 Eric R Medley
 
 On Apr 1, 2015, at 9:13 AM, Eric R Medley emed...@xylocore.com wrote:
 
 Amlan,
 
 Can you provide information on how much data is being written? Are any of 
 the columns really large? Are any writes succeeding or are all timing out?
 
 Regards,
 
 Eric R Medley
 
 On Apr 1, 2015, at 9:03 AM, Amlan Roy amlan@cleartrip.com wrote:
 
 Hi,
 
 I am new to Cassandra. I have setup a cluster with Cassandra 2.0.13. I am 
 writing the same data in HBase and Cassandra and find that the writes are 
 extremely slow in Cassandra and frequently seeing exception “Cassandra 
 timeout during write query at consistency ONE. The cluster size for both 
 HBase and Cassandra are same. 
 
 Looks like something is wrong with my cluster setup. What can be the 
 possible issue? Data and commit logs are written into two separate disks. 
 
 Regards,
 Amlan
 
 



Re: Frequent timeout issues

2015-04-01 Thread Brian O'Neill

Are you using the storm-cassandra-cql driver?
(https://github.com/hmsonline/storm-cassandra-cql)

If so, what version?
Batching or no batching?

-brian

---
Brian O'Neill 
Chief Technology Officer
Health Market Science, a LexisNexis Company
215.588.6024 Mobile € @boneill42 http://www.twitter.com/boneill42


This information transmitted in this email message is for the intended
recipient only and may contain confidential and/or privileged material. If
you received this email in error and are not the intended recipient, or the
person responsible to deliver it to the intended recipient, please contact
the sender at the email above and delete this email and any attachments and
destroy any copies thereof. Any review, retransmission, dissemination,
copying or other use of, or taking any action in reliance upon, this
information by persons or entities other than the intended recipient is
strictly prohibited.
 


From:  Amlan Roy amlan@cleartrip.com
Reply-To:  user@cassandra.apache.org
Date:  Wednesday, April 1, 2015 at 11:37 AM
To:  user@cassandra.apache.org
Subject:  Re: Frequent timeout issues

Replication factor is 2.
CREATE KEYSPACE ct_keyspace WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'DC1': '2'
};

Inserts are happening from Storm using java driver. Using prepared statement
without batch.


On 01-Apr-2015, at 8:42 pm, Brice Dutheil brice.duth...@gmail.com wrote:

 And the keyspace? What is the replication factor.
 
 Also how are the inserts done?
 
 On Wednesday, April 1, 2015, Amlan Roy amlan@cleartrip.com wrote:
 Write consistency level is ONE.
 
 This is the describe output for one of the tables.
 
 CREATE TABLE event_data (
   event text,
   week text,
   bucket int,
   date timestamp,
   unique text,
   adt int,
   age listint,
   arrival listtimestamp,
   bank text,
   bf double,
   cabin text,
   card text,
   carrier listtext,
   cb double,
   channel text,
   chd int,
   company text,
   cookie text,
   coupon listtext,
   depart listtimestamp,
   dest listtext,
   device text,
   dis double,
   domain text,
   duration bigint,
   emi int,
   expressway boolean,
   flight listtext,
   freq_flyer listtext,
   host text,
   host_ip text,
   inf int,
   instance text,
   insurance text,
   intl boolean,
   itinerary text,
   journey text,
   meal_pref listtext,
   mkp double,
   name listtext,
   origin listtext,
   pax_type listtext,
   payment text,
   pref_carrier listtext,
   referrer text,
   result_cnt int,
   search text,
   src text,
   src_ip text,
   stops int,
   supplier listtext,
   tags listtext,
   total double,
   trip text,
   user text,
   user_agent text,
   PRIMARY KEY ((event, week, bucket), date, unique)
 ) WITH CLUSTERING ORDER BY (date DESC, unique ASC) AND
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.10 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.00 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor¹};
 
 
 On 01-Apr-2015, at 8:00 pm, Eric R Medley emed...@xylocore.com
 javascript:_e(%7B%7D,'cvml','emed...@xylocore.com');  wrote:
 
 Also, can you provide the table details and the consistency level you are
 using?
 
 Regards,
 
 Eric R Medley
 
 On Apr 1, 2015, at 9:13 AM, Eric R Medley emed...@xylocore.com
 javascript:_e(%7B%7D,'cvml','emed...@xylocore.com');  wrote:
 
 Amlan,
 
 Can you provide information on how much data is being written? Are any of
 the columns really large? Are any writes succeeding or are all timing out?
 
 Regards,
 
 Eric R Medley
 
 On Apr 1, 2015, at 9:03 AM, Amlan Roy amlan@cleartrip.com
 javascript:_e(%7B%7D,'cvml','amlan@cleartrip.com');  wrote:
 
 Hi,
 
 I am new to Cassandra. I have setup a cluster with Cassandra 2.0.13. I am
 writing the same data in HBase and Cassandra and find that the writes are
 extremely slow in Cassandra and frequently seeing exception ³Cassandra
 timeout during write query at consistency ONE. The cluster size for both
 HBase and Cassandra are same.
 
 Looks like something is wrong with my cluster setup. What can be the
 possible issue? Data and commit logs are written into two separate disks.
 
 Regards,
 Amlan
 
 
 
 
 
 -- 
 Brice





replace_address vs add+removenode

2015-04-01 Thread Ulrich Geilmann
Hi.

The documentation suggests to use the replace_address startup
parameter for replacing a dead node. However, it doesn't motivate why
this is superior over adding a new node and retiring the dead one
using nodetool removenode.
I assume it can be more efficient since the new node can take over the
exact tokens of the dead node. Are there any other differences?
Can it be reasonable to not use replace_address in the interest of
more uniform operations?

br, Ulrich


Re: Frequent timeout issues

2015-04-01 Thread Eric R Medley
Are you seeing any exceptions in the cassandra logs? What are the loads on your 
servers? Have you monitored the performance of those servers? How many writes 
are you performing at a time? How many writes per seconds?

Regards,

Eric R Medley

 On Apr 1, 2015, at 9:40 AM, Amlan Roy amlan@cleartrip.com wrote:
 
 Write consistency level is ONE.
 
 This is the describe output for one of the tables.
 
 CREATE TABLE event_data (
   event text,
   week text,
   bucket int,
   date timestamp,
   unique text,
   adt int,
   age listint,
   arrival listtimestamp,
   bank text,
   bf double,
   cabin text,
   card text,
   carrier listtext,
   cb double,
   channel text,
   chd int,
   company text,
   cookie text,
   coupon listtext,
   depart listtimestamp,
   dest listtext,
   device text,
   dis double,
   domain text,
   duration bigint,
   emi int,
   expressway boolean,
   flight listtext,
   freq_flyer listtext,
   host text,
   host_ip text,
   inf int,
   instance text,
   insurance text,
   intl boolean,
   itinerary text,
   journey text,
   meal_pref listtext,
   mkp double,
   name listtext,
   origin listtext,
   pax_type listtext,
   payment text,
   pref_carrier listtext,
   referrer text,
   result_cnt int,
   search text,
   src text,
   src_ip text,
   stops int,
   supplier listtext,
   tags listtext,
   total double,
   trip text,
   user text,
   user_agent text,
   PRIMARY KEY ((event, week, bucket), date, unique)
 ) WITH CLUSTERING ORDER BY (date DESC, unique ASC) AND
   bloom_filter_fp_chance=0.01 AND
   caching='KEYS_ONLY' AND
   comment='' AND
   dclocal_read_repair_chance=0.10 AND
   gc_grace_seconds=864000 AND
   index_interval=128 AND
   read_repair_chance=0.00 AND
   replicate_on_write='true' AND
   populate_io_cache_on_flush='false' AND
   default_time_to_live=0 AND
   speculative_retry='99.0PERCENTILE' AND
   memtable_flush_period_in_ms=0 AND
   compaction={'class': 'SizeTieredCompactionStrategy'} AND
   compression={'sstable_compression': 'LZ4Compressor’};
 
 
 On 01-Apr-2015, at 8:00 pm, Eric R Medley emed...@xylocore.com 
 mailto:emed...@xylocore.com wrote:
 
 Also, can you provide the table details and the consistency level you are 
 using?
 
 Regards,
 
 Eric R Medley
 
 On Apr 1, 2015, at 9:13 AM, Eric R Medley emed...@xylocore.com 
 mailto:emed...@xylocore.com wrote:
 
 Amlan,
 
 Can you provide information on how much data is being written? Are any of 
 the columns really large? Are any writes succeeding or are all timing out?
 
 Regards,
 
 Eric R Medley
 
 On Apr 1, 2015, at 9:03 AM, Amlan Roy amlan@cleartrip.com 
 mailto:amlan@cleartrip.com wrote:
 
 Hi,
 
 I am new to Cassandra. I have setup a cluster with Cassandra 2.0.13. I am 
 writing the same data in HBase and Cassandra and find that the writes are 
 extremely slow in Cassandra and frequently seeing exception “Cassandra 
 timeout during write query at consistency ONE. The cluster size for both 
 HBase and Cassandra are same. 
 
 Looks like something is wrong with my cluster setup. What can be the 
 possible issue? Data and commit logs are written into two separate disks. 
 
 Regards,
 Amlan
 
 
 



Re: Table design for historical data

2015-04-01 Thread Firdousi Farozan
I will be writing an event when device connects. Probably a device never
disconnects till current time, and I want to return that device for that
time range.
Device disconnect is used to mark the end time; Any query beyond that time
should not return that device.

Queries can have adhoc start and end times. But fixed size buckets can be
an option, if it provides better design, even if with a little client
overhead.

This is not a regular time-series event; in the sense, that there wont be
too many device connect/disconnect. We cannot say for sure there will be x
events per second. Device disconnects, only if it is down and administrator
removes it/replaces it with a different device.

That is the reason I am calling it historical state information. I want
to know details of device connected to the platform for any given interval.

Regards,
Firdousi

On Wed, Apr 1, 2015 at 11:42 PM, Eric R Medley emed...@xylocore.com wrote:

 Firdousi,

 What kind of events would be stored in the table? Will you be writing an
 event when a device connects and another when it disconnects or will you
 write a single event after the device finally disconnects? Also, for your
 queries, do you want ad-hoc start and end times or do you have a known
 fixed sized interval that could be used to put these events into buckets?
 How many events would you expect during a fixed interval or how many events
 per second? Do you know all of the ways that you want to query this data?
 You may want to write the events out to separate tables in order to satisfy
 those queries.

 Regards,

 Eric R Medley

  On Apr 1, 2015, at 10:54 AM, Firdousi Farozan ffaro...@gmail.com
 wrote:
 
  Hi,
 
  My requirement is to design a table for historical state information
 (not exactly time-series). For ex: I have devices connecting and
 disconnecting to the management platform. I want to know the details such
 as (name, mac, os, image, etc.) for all devices connected to the management
 platform in a given interval (start and end time).
 
  Any help on table design for this use-case?
 
  Regards,
  Firdousi
 
 




Re: Why select returns tombstoned results?

2015-04-01 Thread Benyi Wang
Unfortunately I'm using 2.1.2. Is it possible that I downgrade to 2.0.13
without wiping out the data? I'm worrying about if there is a bug in 2.1.2.


On Tue, Mar 31, 2015 at 4:37 AM, Paulo Ricardo Motta Gomes 
paulo.mo...@chaordicsystems.com wrote:

  What version of Cassandra are you running? Are you by any chance running
 repairs on your data?

 On Mon, Mar 30, 2015 at 5:39 PM, Benyi Wang bewang.t...@gmail.com wrote:

 Thanks for replying.

 In cqlsh, if I change to Quorum (Consistency quorum), sometime the select
 return the deleted row, sometime not.

 I have two virtual data centers: service (3 nodes) and analytics(4 nodes
 collocate with Hadoop data nodes).The table has 3 replicas in service and 2
 in analytics. When I wrote, I wrote into analytics using local_one. So I
 guest the data may not replicated to all nodes yet.

 I will try to use strong consistency for write.



 On Mon, Mar 30, 2015 at 11:59 AM, Prem Yadav ipremya...@gmail.com
 wrote:

 Increase the read CL to quorum and you should get correct results.
 How many nodes do you have in the cluster and what is the replication
 factor for the keyspace?

 On Mon, Mar 30, 2015 at 7:41 PM, Benyi Wang bewang.t...@gmail.com
 wrote:

 Create table tomb_test (
guid text,
content text,
range text,
rank int,
id text,
cnt int
primary key (guid, content, range, rank)
 )

 Sometime I delete the rows using cassandra java driver using this query

 DELETE FROM tomb_test WHERE guid=? and content=? and range=?

 in Batch statement with UNLOGGED. CONSISTENCE_LEVEL is local_one.

 But if I run

 SELECT * FROM tomb_test WHERE guid='guid-1' and content='content-1' and
 range='week'
 or
 SELECT * FROM tomb_test WHERE guid='guid-1' and content='content-1' and
 range='week' and rank = 1

 The result shows the deleted rows.

 If I run this select, the deleted rows are not shown

 SELECT * FROM tomb_test WHERE guid='guid-1' and content='content-1'

 If I run delete statement in cqlsh, the deleted rows won't show up.

 How can I fix this?






 --
 *Paulo Motta*

 Chaordic | *Platform*
 *www.chaordic.com.br http://www.chaordic.com.br/*
 +55 48 3232.3200



Re: replace_address vs add+removenode

2015-04-01 Thread Anuj Wadehra
In both cases node needs to bootstrap and get data frm other nodes. Removenode 
has an additional cost as it will lead to additional redistribution of tokens 
such that all data resides on remaining nodes as per the replication strategy. 
On removenode, remaining nodes will stream data amongst themselves so that 
ranges for which dead node was responsible is taken care.


Anuj Wadehra


From:Ulrich Geilmann ulrich.geilm...@freiheit.com
Date:Wed, 1 Apr, 2015 at 9:58 pm
Subject:replace_address vs add+removenode

Hi.

The documentation suggests to use the replace_address startup
parameter for replacing a dead node. However, it doesn't motivate why
this is superior over adding a new node and retiring the dead one
using nodetool removenode.
I assume it can be more efficient since the new node can take over the
exact tokens of the dead node. Are there any other differences?
Can it be reasonable to not use replace_address in the interest of
more uniform operations?

br, Ulrich



Re: replace_address vs add+removenode

2015-04-01 Thread Robert Coli
On Wed, Apr 1, 2015 at 9:26 AM, Ulrich Geilmann 
ulrich.geilm...@freiheit.com wrote:

 I assume it can be more efficient since the new node can take over the
 exact tokens of the dead node. Are there any other differences?


That's the reason. You get one streaming operation (bootstrap a new node
with the same tokens as the old one) instead of two (first, remove this
node *and give its data to other nodes*. then, bootstrap a new node with
new random tokens.).

=Rob


Re: Frequent timeout issues

2015-04-01 Thread Robert Coli
On Wed, Apr 1, 2015 at 8:37 AM, Amlan Roy amlan@cleartrip.com wrote:

 Replication factor is 2.


It is relatively unusual for people to use a replication factor of 2, for
what it's worth.

=Rob


Re: Table design for historical data

2015-04-01 Thread Eric R Medley
Firdousi,

What kind of events would be stored in the table? Will you be writing an event 
when a device connects and another when it disconnects or will you write a 
single event after the device finally disconnects? Also, for your queries, do 
you want ad-hoc start and end times or do you have a known fixed sized interval 
that could be used to put these events into buckets? How many events would you 
expect during a fixed interval or how many events per second? Do you know all 
of the ways that you want to query this data? You may want to write the events 
out to separate tables in order to satisfy those queries.

Regards,

Eric R Medley

 On Apr 1, 2015, at 10:54 AM, Firdousi Farozan ffaro...@gmail.com wrote:
 
 Hi,
 
 My requirement is to design a table for historical state information (not 
 exactly time-series). For ex: I have devices connecting and disconnecting to 
 the management platform. I want to know the details such as (name, mac, os, 
 image, etc.) for all devices connected to the management platform in a given 
 interval (start and end time).
 
 Any help on table design for this use-case?
 
 Regards,
 Firdousi
 
 



Re: Frequent timeout issues

2015-04-01 Thread Anuj Wadehra
Are you writing multiple cf at same time?

Please run nodetool tpstats to make sure that FlushWriter etc doesnt have high 
All time blocked counts. A Blocked memtable FlushWriter may block/drop writes. 
If thats the case you may need to increase memtable flush writers..if u have 
many secondary indexes in cf ..make sure that memtable flush que size is set at 
least equal to no of indexes..


monitoring iostat and gc logs may help..


Thanks

Anuj Wadehra 

From:Amlan Roy amlan@cleartrip.com
Date:Wed, 1 Apr, 2015 at 9:27 pm
Subject:Re: Frequent timeout issues

Did not see any exception in cassandra.log and system.log. Monitored using 
JConsole. Did not see anything wrong. Do I need to see any specific info? Doing 
almost 1000 writes/sec.


HBase and Cassandra are running on different clusters. For cassandra I have 6 
nodes with 64GB RAM(Heap is at default setting) and 32 cores.


On 01-Apr-2015, at 8:43 pm, Eric R Medley emed...@xylocore.com wrote:




Re: Testing sstableloader between Cassandra 2.1 DSE and community edition 2.1

2015-04-01 Thread Michael Shuler

On 04/01/2015 08:10 AM, Serega Sheypak wrote:

Sorry
cluster1 community version is: ii  cassandra   2.1.3
   distributed storage system for structured data
cluster2 DSE version is: ii  dse-libcassandra4.6.2-1
 The DataStax Enterprise package includes a production-certifie


Mixing versions of Cassandra may not work out well and isn't 
recommended. DSC 4.6 is Cassandra 2.0.X.


--
Michael


Re: Testing sstableloader between Cassandra 2.1 DSE and community edition 2.1

2015-04-01 Thread Serega Sheypak
Got it.

2015-04-01 20:39 GMT+02:00 Michael Shuler mich...@pbandjelly.org:

 On 04/01/2015 08:10 AM, Serega Sheypak wrote:

 Sorry
 cluster1 community version is: ii  cassandra   2.1.3
distributed storage system for structured data
 cluster2 DSE version is: ii  dse-libcassandra4.6.2-1
  The DataStax Enterprise package includes a production-certifie


 Mixing versions of Cassandra may not work out well and isn't recommended.
 DSC 4.6 is Cassandra 2.0.X.

 --
 Michael



Re: Cross-datacenter requests taking a very long time.

2015-04-01 Thread Bharatendra Boddu
What type of snitch are you using for cassandra.yaml: endpoint_snitch ?
PropertyFileSnitch can improve performance.

- bharat

On Tue, Mar 31, 2015 at 1:59 PM, daemeon reiydelle daeme...@gmail.com
wrote:

 What is your replication factor?

 Any idea how much data has to be processed under the query?

 With that few nodes (3) in each DC, even with replication=1, you are
 probably not getting much inter-node data transfer in a local quorum, until
 of course you do cross data centers and at least one full copy of the data
 has to come across the wire.

 While running the query against both DC's, you can take a look at netstats
 to get a really quick-and-dirty idea of network traffic.



 *...*






 *“Life should not be a journey to the grave with the intention of arriving
 safely in apretty and well preserved body, but rather to skid in broadside
 in a cloud of smoke,thoroughly used up, totally worn out, and loudly
 proclaiming “Wow! What a Ride!” - Hunter ThompsonDaemeon C.M. ReiydelleUSA
 (+1) 415.501.0198 %28%2B1%29%20415.501.0198London (+44) (0) 20 8144 9872
 %28%2B44%29%20%280%29%2020%208144%209872*

 On Tue, Mar 31, 2015 at 1:54 PM, Andrew Vant andrew.v...@rackspace.com
 wrote:

 I have a Cassandra 2.0.13 cluster with three datacenters, three nodes per
 datacenter. If I open cqlsh and do a select with any consistency level that
 crosses datacenters (e.g. QUORUM or ALL), it works, but takes 2+ minutes to
 return. The same statement with consistency ONE or LOCAL_QUORUM is as fast
 as it should be. It does not appear to be latency between centers; I can
 point cqlsh at a server in a different DC and it's not noticeably slow.

 I tried turning tracing on to get a better idea of what was happening;
 but it complains `Session long hex string wasn't found`.

 I'm not entirely sure what direction to look in to find the problem.

 --

 Andrew





Re: SSTable structure

2015-04-01 Thread Bharatendra Boddu
Hi Serega,

Most of the content in the blog article is still relevant. After 1.2.5
(ic), there are only three new versions (ja, jb, ka) for SSTable format.
Following are the changes in these versions.

// ja (2.0.0): super columns are serialized as composites
(note that there is no real format change,
//   this is mostly a marker to know if we should
expect super columns or not. We do need
//   a major version bump however, because we
should not allow streaming of super columns
//   into this new format)
// tracks max local deletiontime in sstable metadata
// records bloom_filter_fp_chance in metadata component
// remove data size and column count from data
file (CASSANDRA-4180)
// tracks max/min column values (according to comparator)
// jb (2.0.1): switch from crc32 to adler32 for compression checksums
// checksum the compressed data
// ka (2.1.0): new Statistics.db file format
// index summaries can be downsampled and the
sampling level is persisted
// switch uncompressed checksums to adler32
// tracks presense of legacy (local and remote)
counter shards

- bharat

On Wed, Apr 1, 2015 at 12:02 AM, Serega Sheypak serega.shey...@gmail.com
wrote:

 Hi bharat,
 you are talking about Cassandra 1.2.5 Does it fit Cassandra 2.1?
 Were there any significant changes to SSTable format and layout?
 Thank you, article is interesting.

 Hi jacob jacob.rho...@me.com,
 HBase does it for example.
 http://hbase.apache.org/book.html#_hfile_format_2
 It would be great to give general ideas. It could help to understand
 schema design problems. You start to understand better how Cassandra scans
 data how you can utilize its power.

 2015-04-01 5:39 GMT+02:00 Bharatendra Boddu bharatend...@gmail.com:

 Some time back I created a blog article about the SSTable storage format
 with some code references.

 Cassandra: SSTable Storage Format
 http://distributeddatastore.blogspot.com/2013/08/cassandra-sstable-storage-format.html

 - bharat

 On Mon, Mar 30, 2015 at 5:24 PM, Jacob Rhoden jacob.rho...@me.com
 wrote:

 Yes updating code and documentation can sometimes be annoying, you would
 only ever maintain both if it were important. It comes down or is having
 the format of the data files documented for everyone to understand an
 important thing?

 __
 Sent from iPhone

 On 31 Mar 2015, at 11:07 am, daemeon reiydelle daeme...@gmail.com
 wrote:

 why? Then there are 2 places 2 maintain or get jira'ed for a discrepancy.
 On Mar 30, 2015 4:46 PM, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Mar 30, 2015 at 1:38 AM, Pierre pierredev...@gmail.com wrote:

 Does anyone know if there is a more complete and up to date
 documentation about the sstable files structure (data, index, stats etc.)
 than this one : http://wiki.apache.org/cassandra/ArchitectureSSTable


 No, there isn't. Unfortunately you will have to read the source.


 I'm looking for a full specification, with schema of the structure if
 possible.


 It would be nice if such fundamental things were documented, wouldn't
 it?

 =Rob







Re: Why select returns tombstoned results?

2015-04-01 Thread Benyi Wang
All servers are running ntpd. I guess the time should be synced across all
servers.

My dataset is too large to use sstable2json. It would take long time.

I will try to repair to see if the issue is gone.

On Tue, Mar 31, 2015 at 7:49 AM, Ken Hancock ken.hanc...@schange.com
wrote:

 Have you checked time sync across all servers?  The fact that you've
 changed consistency levels and you're getting different results may
 indicate something inherently wrong with the cluster such as writes being
 dropped or time differences between the nodes.

 A brute-force approach to better understand what's going on (especially if
 you have an example of the wrong data being returned) is to do a
 sstable2json on all your tables and simply grep for an example key.

 On Mon, Mar 30, 2015 at 4:39 PM, Benyi Wang bewang.t...@gmail.com wrote:

 Thanks for replying.

 In cqlsh, if I change to Quorum (Consistency quorum), sometime the select
 return the deleted row, sometime not.

 I have two virtual data centers: service (3 nodes) and analytics(4 nodes
 collocate with Hadoop data nodes).The table has 3 replicas in service and 2
 in analytics. When I wrote, I wrote into analytics using local_one. So I
 guest the data may not replicated to all nodes yet.

 I will try to use strong consistency for write.



 On Mon, Mar 30, 2015 at 11:59 AM, Prem Yadav ipremya...@gmail.com
 wrote:

 Increase the read CL to quorum and you should get correct results.
 How many nodes do you have in the cluster and what is the replication
 factor for the keyspace?

 On Mon, Mar 30, 2015 at 7:41 PM, Benyi Wang bewang.t...@gmail.com
 wrote:

 Create table tomb_test (
guid text,
content text,
range text,
rank int,
id text,
cnt int
primary key (guid, content, range, rank)
 )

 Sometime I delete the rows using cassandra java driver using this query

 DELETE FROM tomb_test WHERE guid=? and content=? and range=?

 in Batch statement with UNLOGGED. CONSISTENCE_LEVEL is local_one.

 But if I run

 SELECT * FROM tomb_test WHERE guid='guid-1' and content='content-1' and
 range='week'
 or
 SELECT * FROM tomb_test WHERE guid='guid-1' and content='content-1' and
 range='week' and rank = 1

 The result shows the deleted rows.

 If I run this select, the deleted rows are not shown

 SELECT * FROM tomb_test WHERE guid='guid-1' and content='content-1'

 If I run delete statement in cqlsh, the deleted rows won't show up.

 How can I fix this?






 --
 *Ken Hancock *| System Architect, Advanced Advertising
 SeaChange International
 50 Nagog Park
 Acton, Massachusetts 01720
 ken.hanc...@schange.com | www.schange.com | NASDAQ:SEAC
 http://www.schange.com/en-US/Company/InvestorRelations.aspx
 Office: +1 (978) 889-3329 | [image: Google Talk:] ken.hanc...@schange.com
  | [image: Skype:]hancockks | [image: Yahoo IM:]hancockks[image: LinkedIn]
 http://www.linkedin.com/in/kenhancock

 [image: SeaChange International]
 http://www.schange.com/This e-mail and any attachments may contain
 information which is SeaChange International confidential. The information
 enclosed is intended only for the addressees herein and may not be copied
 or forwarded without permission from SeaChange International.



Re: How to store unique visitors in cassandra

2015-04-01 Thread Jim Ancona
Very interesting. I had saved your email from three years ago in hopes of
an elegant answer. Thanks for sharing!

Jim

On Tue, Mar 31, 2015 at 8:16 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:

 People keep asking me if we finally found a solution (even if this is 3+
 years old) so I will just update this thread with our findings.

 We finally achieved doing this thanks to our bigdata and reporting stacks
 by storing blobs corresponding to HLL (HyperLogLog) structures. HLL is an
 algorithm used by Google, twitter and many more to solve count-distinct
 problems. Structures built through this algorithm can be summed and give
 a good approximation of the UV number.

 Precision you will reach depends on the size of structure you chose
 (predictable precision). You can reach fairly acceptable approximation with
 small data structures.

 So we basically store a HLL per hour and just sum HLL for all the hours
 between 2 ranges (you can do it at day level or any other level depending
 on your needs).

 Hope this will help some of you, we finally had this (good) idea after
 more than 3 years. Actually we use HLL for a long time but the idea of
 storing HLL structures instead of counts allow us to request on custom
 ranges (at the price of more intelligence on the reporting stack that must
 read and smartly sum HLLs stored as blobs). We are happy with it since.

 C*heers,

 Alain

 2012-01-19 22:21 GMT+01:00 Milind Parikh milindpar...@gmail.com:

 You might want to look at the code in countandra.org; regardless of
 whether you use it. It use a model of dynamic composite keys (although
 static composite keys would have worked as well). For the actual query,only
 one row is hit. This of course only works bc the data model is attuned for
 the query.

 Regards
 Milind

 /***
 sent from my android...please pardon occasional typos as I respond @ the
 speed of thought
 /

 On Jan 19, 2012 1:31 AM, Alain RODRIGUEZ arodr...@gmail.com wrote:

 Hi thanks for your answer but I don't want to add more layer on top of
 Cassandra. I also have done all of my application without Countandra and I
 would like to continue this way.

 Furthermore there is a Cassandra modeling problem that I would like to
 solve, and not just hide.

 Alain



 2012/1/18 Lucas de Souza Santos lucas...@gmail.com
 
  Why not http://www.countandra.org/
 
 
  ...