Re: Empty snapshot created

2016-06-09 Thread Mradul Maheshwari
Hi Ben,
Many thanks.
I had tried this command earlier too but I guess my syntax was wrong. This
time I just issued
nodetool rebuild dc1
and took the snapshot after this. I could see the *.db files created.

Earlier I was trying
nodetool rebuild -- dc1
which I guess was wrong.

Thanks for the lightning response.

Regards,
Mradul


On Fri, Jun 10, 2016 at 9:52 AM, Ben Slater 
wrote:

> After adding a DC you need to run nodetool rebuild. See the procedure
> here:
> https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html
>
> Cheers
> Ben
>
> On Fri, 10 Jun 2016 at 14:17 Mradul Maheshwari 
> wrote:
>
>> Hi,
>> I am facing an issue when taking snapshots.
>>
>> The details of the setup are as follows
>>
>>1. Cassandra Version 3.5
>>2. I have a keyspace named *other_map* with '*NetworkTopologyStrategy*'
>>and replication factor 1 for 'dc1'
>>3. Added another datacenter 'dc2' in the existing cluster
>>4. Modified  other_map keyspace using the *ALTER* command.
>>5. After this Logged on on the node on dc2 datacenter and issued the 
>> *nodetool
>>snapshot* command for the other_map keyspace.
>>6. As a result a directory is created in the other_keyspace/>name>/snapshot/. This contains only a manifest,json file which
>>has no information about any files.
>>
>> 
>> *cat
>> data/data/other_map/country-f34a28d02b1511e689afc7a4a4b2ee40/snapshots/1465457086678/manifest.json*
>>
>> *{"files":[]}*
>>
>> 
>>
>> Am I missing any thing here? Are the above mentioned steps complete?
>>
>> After altering the keyspace I have tried a *nodetool repair* command
>> which had also not changed anything.
>>
>> Regards,
>> Mradul
>>
>>
>> Information about schema follows
>>
>> CREATE KEYSPACE other_map WITH replication = {'class':
>> 'NetworkTopologyStrategy', 'dc': '1', 'dc2': '2'}  AND durable_writes =
>> true;
>>
>> CREATE TABLE other_map.country (
>> id int PRIMARY KEY,
>> name text,
>> states int
>> ) WITH bloom_filter_fp_chance = 0.01
>> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>> AND comment = ''
>> AND compaction = {'class':
>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>> 'max_threshold': '32', 'min_threshold': '4'}
>> AND compression = {'chunk_length_in_kb': '64', 'class':
>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>> AND crc_check_chance = 1.0
>> AND dclocal_read_repair_chance = 0.1
>> AND default_time_to_live = 0
>> AND gc_grace_seconds = 864000
>> AND max_index_interval = 2048
>> AND memtable_flush_period_in_ms = 0
>> AND min_index_interval = 128
>> AND read_repair_chance = 0.0
>> AND speculative_retry = '99PERCENTILE';
>>
>> --
> 
> Ben Slater
> Chief Product Officer, Instaclustr
> +61 437 929 798
>


Re: Empty snapshot created

2016-06-09 Thread Ben Slater
After adding a DC you need to run nodetool rebuild. See the procedure here:
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html

Cheers
Ben

On Fri, 10 Jun 2016 at 14:17 Mradul Maheshwari  wrote:

> Hi,
> I am facing an issue when taking snapshots.
>
> The details of the setup are as follows
>
>1. Cassandra Version 3.5
>2. I have a keyspace named *other_map* with '*NetworkTopologyStrategy*'
>and replication factor 1 for 'dc1'
>3. Added another datacenter 'dc2' in the existing cluster
>4. Modified  other_map keyspace using the *ALTER* command.
>5. After this Logged on on the node on dc2 datacenter and issued the 
> *nodetool
>snapshot* command for the other_map keyspace.
>6. As a result a directory is created in the other_keyspace/name>/snapshot/. This contains only a manifest,json file which
>has no information about any files.
>
> 
> *cat
> data/data/other_map/country-f34a28d02b1511e689afc7a4a4b2ee40/snapshots/1465457086678/manifest.json*
>
> *{"files":[]}*
>
> 
>
> Am I missing any thing here? Are the above mentioned steps complete?
>
> After altering the keyspace I have tried a *nodetool repair* command
> which had also not changed anything.
>
> Regards,
> Mradul
>
>
> Information about schema follows
>
> CREATE KEYSPACE other_map WITH replication = {'class':
> 'NetworkTopologyStrategy', 'dc': '1', 'dc2': '2'}  AND durable_writes =
> true;
>
> CREATE TABLE other_map.country (
> id int PRIMARY KEY,
> name text,
> states int
> ) WITH bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
>
> --

Ben Slater
Chief Product Officer, Instaclustr
+61 437 929 798


Empty snapshot created

2016-06-09 Thread Mradul Maheshwari
Hi,
I am facing an issue when taking snapshots.

The details of the setup are as follows

   1. Cassandra Version 3.5
   2. I have a keyspace named *other_map* with '*NetworkTopologyStrategy*'
   and replication factor 1 for 'dc1'
   3. Added another datacenter 'dc2' in the existing cluster
   4. Modified  other_map keyspace using the *ALTER* command.
   5. After this Logged on on the node on dc2 datacenter and issued
the *nodetool
   snapshot* command for the other_map keyspace.
   6. As a result a directory is created in the other_keyspace//snapshot/. This contains only a manifest,json file which
   has no information about any files.


*cat
data/data/other_map/country-f34a28d02b1511e689afc7a4a4b2ee40/snapshots/1465457086678/manifest.json*

*{"files":[]}*



Am I missing any thing here? Are the above mentioned steps complete?

After altering the keyspace I have tried a *nodetool repair* command which
had also not changed anything.

Regards,
Mradul


Information about schema follows

CREATE KEYSPACE other_map WITH replication = {'class':
'NetworkTopologyStrategy', 'dc': '1', 'dc2': '2'}  AND durable_writes =
true;

CREATE TABLE other_map.country (
id int PRIMARY KEY,
name text,
states int
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';


Re: Interesting use case

2016-06-09 Thread John Thomas
The example I gave was for when N=1, if we need to save more values I
planned to just add more columns.

On Thu, Jun 9, 2016 at 12:51 AM, kurt Greaves  wrote:

> I would say it's probably due to a significantly larger number of
> partitions when using the overwrite method - but really you should be
> seeing similar performance unless one of the schemas ends up generating a
> lot more disk IO.
> If you're planning to read the last N values for an event at the same time
> the widerow schema would be better, otherwise reading N events using the
> overwrite schema will result in you hitting N partitions. You really need
> to take into account how you're going to read the data when you design a
> schema, not only how many writes you can push through.
>
> On 8 June 2016 at 19:02, John Thomas  wrote:
>
>> We have a use case where we are storing event data for a given system and
>> only want to retain the last N values.  Storing extra values for some time,
>> as long as it isn’t too long, is fine but never less than N.  We can't use
>> TTLs to delete the data because we can't be sure how frequently events will
>> arrive and could end up losing everything.  Is there any built in mechanism
>> to accomplish this or a known pattern that we can follow?  The events will
>> be read and written at a pretty high frequency so the solution would have
>> to be performant and not fragile under stress.
>>
>>
>>
>> We’ve played with a schema that just has N distinct columns with one
>> value in each but have found overwrites seem to perform much poorer than
>> wide rows.  The use case we tested only required we store the most recent
>> value:
>>
>>
>>
>> CREATE TABLE eventyvalue_overwrite(
>>
>> system_name text,
>>
>> event_name text,
>>
>> event_time timestamp,
>>
>> event_value blob,
>>
>> PRIMARY KEY (system_name,event_name))
>>
>>
>>
>> CREATE TABLE eventvalue_widerow (
>>
>> system_name text,
>>
>> event_name text,
>>
>> event_time timestamp,
>>
>> event_value blob,
>>
>> PRIMARY KEY ((system_name, event_name), event_time))
>>
>> WITH CLUSTERING ORDER BY (event_time DESC)
>>
>>
>>
>> We tested it against the DataStax AMI on EC2 with 6 nodes, replication 3,
>> write consistency 2, and default settings with a write only workload and
>> got 190K/s for wide row and 150K/s for overwrite.  Thinking through the
>> write path it seems the performance should be pretty similar, with probably
>> smaller sstables for the overwrite schema, can anyone explain the big
>> difference?
>>
>>
>>
>> The wide row solution is more complex in that it requires a separate
>> clean up thread that will handle deleting the extra values.  If that’s the
>> path we have to follow we’re thinking we’d add a bucket of some sort so
>> that we can delete an entire partition at a time after copying some values
>> forward, on the assumption that deleting the whole partition is much better
>> than deleting some slice of the partition.  Is that true?  Also, is there
>> any difference between setting a really short ttl and doing a delete?
>>
>>
>>
>> I know there are a lot of questions in there but we’ve been going back
>> and forth on this for a while and I’d really appreciate any help you could
>> give.
>>
>>
>>
>> Thanks,
>>
>> John
>>
>
>
>
> --
> Kurt Greaves
> k...@instaclustr.com
> www.instaclustr.com
>


Re: Consistency level ONE and using withLocalDC

2016-06-09 Thread George Sigletos
Hi Alain,

Thank you for your answer.

I recently queried multiple times my cluster with consistency ONE and
setting "myLocalDC" (withUsedHostsPerRemoteDc=1)

However sometimes (not always) I got response from the node in the remote
DC. All my nodes in "myLocalDC" were up and running.

I was facing an data inconsistency issue. When connecting to the remote
node I got empty result, while when connecting to "myLocalDC" I got the
expected result back.

I was expecting that since all nodes in "myLocalDC" were up and running, no
attempt would have been made to the remote node.

I had to solve the problem by setting consistency "LOCAL_ONE" till I repair
the remote node. Or I could alternatively have set
withUsedHostsPerRemoteDc=0.

Kind regards,
George

On Wed, Jun 8, 2016 at 7:10 PM, Alain RODRIGUEZ  wrote:

> Hi George,
>
> Would that be correct?
>
>
> I think it is actually quite the opposite :-).
>
> It is very well explained here:
> https://docs.datastax.com/en/drivers/java/2.0/com/datastax/driver/core/policies/DCAwareRoundRobinPolicy.Builder.html#withUsedHostsPerRemoteDc-int-
>
> Connection is opened to the X nodes in the remote DC. But it will only be
> used to indeed do a local operation as a fallback if the operation is not
> using a LOCAL_* consistency level.
>
> Sorry I have been so long answering you.
>
> ---
> Alain Rodriguez - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2016-05-20 17:54 GMT+02:00 George Sigletos :
>
>> Hello,
>>
>> Using withLocalDC="myLocalDC" and withUsedHostsPerRemoteDc>0 will
>> guarantee that you will connect to one of the nodes in "myLocalDC",
>>
>> but DOES NOT guarantee that your read/write request will be acknowledged
>> by a "myLocalDC" node. It may well be acknowledged by a remote DC node as
>> well, even if "myLocalDC" is up and running.
>>
>> Would that be correct? Thank you
>>
>> Kind regards,
>> George
>>
>
>