Rows with same key

2016-02-11 Thread Yulian Oifa
Hello to all
I have multiple rows with same id on one of cfs, one row is completely
empty ,another one has vaues.
Values are wrotten into new row , however they are retreived from old row...
I guess one row is created due to removed values, and stucked somehow.
I am trying to remove it with no luck ( compact , flush , repair , etc ).
I have set gc grace to this CF , however i beleive the old row has old
value.
How can i get rid of this row?
Best regards
Yulian Oifa


Re: Replication Factor Change

2015-11-05 Thread Yulian Oifa
Hello
OK i got it , so i should set CL to ALL for reads, otherwise data may be
retrieved from nodes that does not have yet current record.
Thanks for help.
Yulian Oifa

On Thu, Nov 5, 2015 at 5:33 PM, Eric Stevens  wrote:

> If you switch reads to CL=LOCAL_ALL, you should be able to increase RF,
> then run repair, and after repair is complete, go back to your old
> consistency level.  However, while you're operating at ALL consistency, you
> have no tolerance for a node failure (but at RF=1 you already have no
> tolerance for a node failure, so that doesn't really change your
> availability model).
>
> On Thu, Nov 5, 2015 at 8:01 AM Yulian Oifa  wrote:
>
>> Hello to all.
>> I am planning to change replication factor from 1 to 3.
>> Will it cause data read errors in time of nodes repair?
>>
>> Best regards
>> Yulian Oifa
>>
>


Replication Factor Change

2015-11-05 Thread Yulian Oifa
Hello to all.
I am planning to change replication factor from 1 to 3.
Will it cause data read errors in time of nodes repair?

Best regards
Yulian Oifa


Re: Composite Keys in cassandra 1.2

2015-03-03 Thread Yulian Oifa
Hello
Initially problem is that customer wants to have an option for ANY query ,
which does not fits good with NOSQL.However the size of data is too big for
Relational DB.
There are no typical queries on the data, there are 10 fields , based on
which ( any mix of them also ) queries should be made.
Till now i allowed only single field query ( in specific cases with 2
fields ) so i had indexes CFs for each field and that solved a
problem.Since now i need compounds , sometimes 5-6 fields i either need to
iterate over index CF based on some column ( or may be read several indexes
and find common ids ) or create some index that will allow me to read data
based on any part. Creating index for each group of fields of course is not
an option since number of indexes will be huge , and disk usage will be too
big.

Best regards
Yulian Oifa

On Mon, Mar 2, 2015 at 5:33 PM, Kai Wang  wrote:

> AFIK it's not possible. The fact you need to query the data by partial row
> key indicates your data model isn't proper. What are your typical queries
> on the data?
>
> On Sun, Mar 1, 2015 at 7:24 AM, Yulian Oifa  wrote:
>
>> Hello to all.
>> Lets assume a scenario where key is compound type with 3 types in it (
>> Long , UTF8, UTF8 ).
>> Each row stores timeuuids as column names and empty values.
>> Is it possible to retreive data by single key part ( for example by long
>> only ) by using java thrift?
>>
>> Best regards
>> Yulian Oifa
>>
>>
>>
>


Re: best practices for time-series data with massive amounts of records

2015-03-03 Thread Yulian Oifa
Hello
You can use timeuuid as raw key and create sepate CF to be used for indexing
Indexing CF may be either with user_id as key , or a better approach is to
partition row by timestamp.
In case of partition you can create compound key , in which you will store
user_id and timestamp base ( for example if you would like to keep 8 of 13
digits in timestamp , then new row will be created each 10 seconds -
approximately each day , a bit more and maximum number of rows per user
would be 100K , of course you can play with number of rows/ time for each
row depending on number of records you are receiving. i am creating new row
each 11 days , so its 35 rows per year , per user ) )
In each column you can store timeuuid as name and empty value.

This way you keep you data ordered by time. The only disadvantage of this
approach is that you have to "glue" your data when you finished reading one
index row and started another one ( both asc and desc ).

When reading data you should first get slice depending on your needs from
index , and then get multi_range from original CF based on slice received.
Hope it helps
Best regards
Yulian Oifa



On Mon, Mar 2, 2015 at 9:47 PM, Clint Kelly  wrote:

> Hi all,
>
> I am designing an application that will capture time series data where we
> expect the number of records per user to potentially be extremely high.  I
> am not sure if we will eclipse the max row size of 2B elements, but I
> assume that we would not want our application to approach that size anyway.
>
> If we wanted to put all of the interactions in a single row, then I would
> make a data model that looks like:
>
> CREATE TABLE events (
>   id text,
>   event_time timestamp,
>   event blob,
>   PRIMARY KEY (id, event_time))
> WITH CLUSTERING ORDER BY (event_time DESC);
>
> The best practice for breaking up large rows of time series data is, as I
> understand it, to put part of the time into the partitioning key (
> http://planetcassandra.org/getting-started-with-time-series-data-modeling/
> ):
>
> CREATE TABLE events (
>   id text,
>   date text, // Could also use year+month here or year+week or something
> else
>   event_time timestamp,
>   event blob,
>   PRIMARY KEY ((id, date), event_time))
> WITH CLUSTERING ORDER BY (event_time DESC);
>
> The downside of this approach is that we can no longer do a simple
> continuous scan to get all of the events for a given user.  Some users may
> log lots and lots of interactions every day, while others may interact with
> our application infrequently, so I'd like a quick way to get the most
> recent interaction for a given user.
>
> Has anyone used different approaches for this problem?
>
> The only thing I can think of is to use the second table schema described
> above, but switch to an order-preserving hashing function, and then
> manually hash the "id" field.  This is essentially what we would do in
> HBase.
>
> Curious if anyone else has any thoughts.
>
> Best regards,
> Clint
>
>
>


Composite Keys in cassandra 1.2

2015-03-01 Thread Yulian Oifa
Hello to all.
Lets assume a scenario where key is compound type with 3 types in it ( Long
, UTF8, UTF8 ).
Each row stores timeuuids as column names and empty values.
Is it possible to retreive data by single key part ( for example by long
only ) by using java thrift?

Best regards
Yulian Oifa


Re: Cassandra Read Timeout

2015-02-24 Thread Yulian Oifa
Hello
I am running 1.2.19
Best regards
Yulian Oifa

On Tue, Feb 24, 2015 at 6:57 PM, Marcelo Valle (BLOOMBERG/ LONDON) <
mvallemil...@bloomberg.net> wrote:

> Super column? Out of curiosity, which Cassandra version are you running?
>
>
> From: user@cassandra.apache.org
> Subject: Re: Cassandra Read Timeout
>
> Hello
> The structure is the same , the CFs are super column CFs , where key is
> long  ( timestamp to partition the index , so each 11 days new row is
> created ) , super Column is int32 and columns / values are timeuuids.I am
> running same queries , getting reversed slice by raw key and super column.
> The number of reads is relatively high on second CF since i am testing it
> several hours already , most of time there are no requests for read on both
> of them , only writes.There is at most 1 read request in 20-30 seconds so
> it should not create a load.There is also no reads ( 0 before and 1 after )
> pending on tpstats.
> Please also note that queries on different row same super column , and
> same row , different super column are working , and if i am not mistaken
> cassandra is loading complete raw including all super columns to memory (
> either any request to this row should fail if this would be a memory
> problem , or none...).
>
> Best regards
> Yulian Oifa
>
>
>


Re: Cassandra Read Timeout

2015-02-24 Thread Yulian Oifa
Hello
The structure is the same , the CFs are super column CFs , where key is
long  ( timestamp to partition the index , so each 11 days new row is
created ) , super Column is int32 and columns / values are timeuuids.I am
running same queries , getting reversed slice by raw key and super column.
The number of reads is relatively high on second CF since i am testing it
several hours already , most of time there are no requests for read on both
of them , only writes.There is at most 1 read request in 20-30 seconds so
it should not create a load.There is also no reads ( 0 before and 1 after )
pending on tpstats.
Please also note that queries on different row same super column , and same
row , different super column are working , and if i am not mistaken
cassandra is loading complete raw including all super columns to memory (
either any request to this row should fail if this would be a memory
problem , or none...).

Best regards
Yulian Oifa


Re: Cassandra Read Timeout

2015-02-24 Thread Yulian Oifa
Hello
TP STATS Before Request:

Pool NameActive   Pending  Completed   Blocked  All
time blocked
ReadStage 0 0*7592835*
0 0
RequestResponseStage  0 0  0
0 0
MutationStage 0 0  *215980736*
0 0
ReadRepairStage   0 0  0
0 0
ReplicateOnWriteStage 0 0  0
0 0
GossipStage   0 0  0
0 0
AntiEntropyStage  0 0  0
0 0
MigrationStage0 0 28
0 0
MemoryMeter   0 0474
0 0
MemtablePostFlusher   0 0  32845
0 0
FlushWriter   0 0   4013
0  2239
MiscStage 0 0  0
0 0
PendingRangeCalculator0 0  1
0 0
commitlog_archiver0 0  0
0 0
InternalResponseStage 0 0  0
0 0
HintedHandoff 0 0  0
0 0

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
BINARY   0
READ 0
MUTATION 0
_TRACE   0
REQUEST_RESPONSE 0
COUNTER_MUTATION 0

TP STATS After Request:

Pool NameActive   Pending  Completed   Blocked  All
time blocked
ReadStage *1 17592942*
0 0
RequestResponseStage  0 0  0
0 0
MutationStage 0 0  *215983339*
0 0
ReadRepairStage   0 0  0
0 0
ReplicateOnWriteStage 0 0  0
0 0
GossipStage   0 0  0
0 0
AntiEntropyStage  0 0  0
0 0
MigrationStage0 0 28
0 0
MemoryMeter   0 0474
0 0
MemtablePostFlusher   0 0  32845
0 0
FlushWriter   0 0   4013
0  2239
MiscStage 0 0  0
0 0
PendingRangeCalculator0 0  1
0 0
commitlog_archiver0 0  0
0 0
InternalResponseStage 0 0  0
0 0
HintedHandoff 0 0  0
0 0

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
BINARY   0
READ 0
MUTATION 0
_TRACE   0
REQUEST_RESPONSE 0
COUNTER_MUTATION 0

The only items changed are : ReadStage increased by 107 + 1 Active/Pending
and MutationStage which changed by 2603.
Please note that system is writing all the time in batches ( each second 2
servers write one batch each ) so i dont see anything special in this
numbers.

Best regards
Yulian Oifa


Cassandra Read Timeout

2015-02-24 Thread Yulian Oifa
Hello to all
I have single node cassandra on amazon ec2.
Currently i am having a read timeout problem on single CF , single raw.

Raw size is aroung 190MB.There are bigger raws with similar structure ( its
index raws , which actually stores keys ) and everything is working fine on
them, everything is working also fine on this cf but on other raw.

Tables data from CFStats ( First table has bigger raws but works fine ,
where second has timeout ) :
Column Family: pendindexes
SSTable count: 5
Space used (live): 462298352
Space used (total): 462306752
SSTable Compression Ratio: 0.3511107495795905
Number of Keys (estimate): 640
Memtable Columns Count: 63339
Memtable Data Size: 12328802
Memtable Switch Count: 78
Read Count: 10
Read Latency: NaN ms.
Write Count: 1530113
Write Latency: 0.022 ms.
Pending Tasks: 0
Bloom Filter False Positives: 0
Bloom Filter False Ratio: 0.0
Bloom Filter Space Used: 3920
Compacted row minimum size: 73
Compacted row maximum size: 223875792
Compacted row mean size: 42694982
Average live cells per slice (last five minutes): 21.0
Average tombstones per slice (last five minutes): 0.0

Column Family: statuspindexes
SSTable count: 1
Space used (live): 99602136
Space used (total): 99609360
SSTable Compression Ratio: 0.34278775390997873
Number of Keys (estimate): 128
Memtable Columns Count: 6250
Memtable Data Size: 6061097
Memtable Switch Count: 65
Read Count: 1000
Read Latency: NaN ms.
Write Count: 1193142
Write Latency: 3.616 ms.
Pending Tasks: 0
Bloom Filter False Positives: 0
Bloom Filter False Ratio: 0.0
Bloom Filter Space Used: 656
Compacted row minimum size: 180
Compacted row maximum size: 186563160
Compacted row mean size: 63225562
Average live cells per slice (last five minutes): 0.0
Average tombstones per slice (last five minutes): 0.0

I have tried to debug it with cql , thats what i get:

 activity
| timestamp| source   | source_elapsed
-+--+--+

execute_cql3_query | 15:39:53,120 | 172.31.6.173 |  0
   Parsing Select * from
statuspindexes LIMIT 1; | 15:39:53,120 | 172.31.6.173 |875

Preparing statement | 15:39:53,121 | 172.31.6.173 |   1643

Determining replicas to query | 15:39:53,121 | 172.31.6.173 |   1740
 Executing seq scan across 1 sstables for [min(-9223372036854775808),
min(-9223372036854775808)] | 15:39:53,122 | 172.31.6.173 |   2581
 Seeking to partition
beginning in data file | 15:39:53,123 | 172.31.6.173 |   3118
   Timed out; received 0 of 1
responses for range 2 of 2 | 15:40:03,121 | 172.31.6.173 |   10001370

Request complete | 15:40:03,121 | 172.31.6.173 |   10001513

I have executed compaction on that cf.
What could lead to that problem?
Best regards
Yulian Oifa


Cassandra Strange behaviour

2014-04-14 Thread Yulian Oifa
Hello to all
I have cassandra cluster with 3 nodes and RF=3 writing with Quorum.
Application wrote today several millions of records to specific CF.
After that one of servers went wild , he eats up the disk.
As i see from logs hinted handoff from 2 other servers are occuring to disk
server.
On this server i see that data is flushed to disk each several seconds :
 WARN [CompactionExecutor:249] 2014-04-14 19:17:38,633
CompactionManager.java (line 509) insufficient space to compact all
requested files
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-548-Data.db'),
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-1060-Data.db'),
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-890-Data.db'),
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-978-Data.db')
 WARN [CompactionExecutor:249] 2014-04-14 19:17:58,647
CompactionManager.java (line 509) insufficient space to compact all
requested files
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-1060-Data.db'),
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-890-Data.db'),
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-978-Data.db')
 INFO [COMMIT-LOG-WRITER] 2014-04-14 19:18:06,232 CommitLogSegment.java
(line 59) Creating new commitlog segment
/opt/cassandra/commitlog/CommitLog-1397492286232.log
 WARN [CompactionExecutor:249] 2014-04-14 19:18:18,648
CompactionManager.java (line 509) insufficient space to compact all
requested files
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-1060-Data.db'),
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-890-Data.db')
ERROR [CompactionExecutor:249] 2014-04-14 19:18:18,649
CompactionManager.java (line 513) insufficient space to compact even the
two smallest files, aborting
 INFO [NonPeriodicTasks:1] 2014-04-14 19:18:25,228 MeteredFlusher.java
(line 62) flushing high-traffic column family CFS(Keyspace='USER_DATA',
ColumnFamily='freeNumbers')
 INFO [NonPeriodicTasks:1] 2014-04-14 19:18:25,228 ColumnFamilyStore.java
(line 1128) Enqueuing flush of
Memtable-freeNumbers@1950635535(37693440/309874109
serialized/live bytes, 837632 ops)
 INFO [FlushWriter:22] 2014-04-14 19:18:25,229 Memtable.java (line 237)
Writing Memtable-freeNumbers@1950635535(37693440/309874109 serialized/live
bytes, 837632 ops)
 INFO [FlushWriter:22] 2014-04-14 19:18:26,871 Memtable.java (line 254)
Completed flushing /opt/cassandra/data/USER_DATA/freeNumbers-g-1066-Data.db
(38103944 bytes)
 INFO [CompactionExecutor:251] 2014-04-14 19:18:26,872
CompactionManager.java (line 542) Compacting Minor:
[SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-1065-Data.db'),
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-1063-Data.db'),
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-1064-Data.db'),
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-1066-Data.db')]
 INFO [CompactionExecutor:251] 2014-04-14 19:18:26,878
CompactionController.java (line 146) Compacting large row
USER_DATA/freeNumbers:8bdf9678-6d70-11e3-85ab-80e385abf85d (151810145
bytes) incrementally


However total data of this CF is around 4.5 GB , while disk usage for this
CF on this server overcomes 20GB.
I have tried restart of this server , cyclic restart of all servers and no
luck it continues to write data .i can not run compact also
How can i stop that?
Best regards
Yulian Oifa


Re: Cassandra Strange behaviour

2014-04-14 Thread Yulian Oifa
Adding some more log:
INFO [FlushWriter:22] 2014-04-14 19:23:13,443 Memtable.java (line 254)
Completed flushing /opt/cassandra/data/USER_DATA/freeNumbers-g-1074-Data.db
(37824462 bytes)
 WARN [CompactionExecutor:258] 2014-04-14 19:23:31,915
CompactionManager.java (line 509) insufficient space to compact all
requested files
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-1060-Data.db'),
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-890-Data.db'),
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-978-Data.db')
 INFO [COMMIT-LOG-WRITER] 2014-04-14 19:23:45,037 CommitLogSegment.java
(line 59) Creating new commitlog segment
/opt/cassandra/commitlog/CommitLog-1397492625037.log
 WARN [CompactionExecutor:258] 2014-04-14 19:23:51,916
CompactionManager.java (line 509) insufficient space to compact all
requested files
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-1060-Data.db'),
SSTableReader(path='/opt/cassandra/data/USER_DATA/freeNumbers-g-890-Data.db')
ERROR [CompactionExecutor:258] 2014-04-14 19:23:51,916
CompactionManager.java (line 513) insufficient space to compact even the
two smallest files, aborting
 INFO [NonPeriodicTasks:1] 2014-04-14 19:24:01,073 MeteredFlusher.java
(line 62) flushing high-traffic column family CFS(Keyspace='USER_DATA',
ColumnFamily='freeNumbers')
 INFO [NonPeriodicTasks:1] 2014-04-14 19:24:01,074 ColumnFamilyStore.java
(line 1128) Enqueuing flush of
Memtable-freeNumbers@1751772888(37509120/308358832
serialized/live bytes, 833536 ops)
 INFO [FlushWriter:22] 2014-04-14 19:24:01,074 Memtable.java (line 237)
Writing Memtable-freeNumbers@1751772888(37509120/308358832 serialized/live
bytes, 833536 ops)
 INFO [FlushWriter:22] 2014-04-14 19:24:02,575 Memtable.java (line 254)
Completed flushing /opt/cassandra/data/USER_DATA/freeNumbers-g-1075-Data.db
(37917606 bytes)

Best regards
Yulian Oifa


Re: Cassandra disk usage

2014-04-14 Thread Yulian Oifa
Hello
The load of data on 3 nodes is :

Address DC  RackStatus State   Load
OwnsToken

113427455640312821154458202477256070485
172.19.10.1 19  10  Up Normal  22.16 GB
33.33%  0
172.19.10.2 19  10  Up Normal  19.89 GB
33.33%  56713727820156410577229101238628035242
172.19.10.3 19  10  Up Normal  30.74 GB
33.33%  113427455640312821154458202477256070485

Best regards
Yulian Oifa



On Sun, Apr 13, 2014 at 9:17 PM, Mark Reddy  wrote:

> i I will change the data i am storing to decrease the usage , in value i
>> will find some small value to store.Previously i used same value since this
>> table is index only for search purposed and does not really has value.
>
>
> If you don't need a value, you don't have to store anything. You can store
> the column name and leave the value empty, this is a common practice.
>
> 1) What should be recommended read and write consistency and replication
>> factor for 3 nodes with option of future increase server numbers?
>
>
> Both consistency level and replication factor are tuneable depending on
> your application constraints. I'd say a CL or quorum and RF of 3 is the
> general practice.
>
> Still it has 1.5X of overall data how can this be resolved and what is
>> reason for that?
>
>
> As Michał pointed out there is a 15 byte column overhead to consider
> here, where:
>
> total_column_size = column_name_size + column_value_size + 15
>
>
> This link might shed some light on this:
> http://www.datastax.com/documentation/cassandra/1.2/cassandra/architecture/architecturePlanningUserData_t.html
>
> Also i see that data is in different size on all nodes , does that means
>> that servers are out of sync
>
>
> How much is it out by? Data size may differ due to deletes, as you
> mentioned you do deletes. What is the output of 'nodetool ring'?
>
>
> On Sun, Apr 13, 2014 at 6:42 PM, Michal Michalski <
> michal.michal...@boxever.com> wrote:
>
>> > Each columns have name of 15 chars ( digits ) and same 15 chars in
>> value ( also digits ).
>> > Each column should have 30 bytes.
>>
>> Remember about the standard Cassandra's column overhead which is, as far
>> as I remember, 15 bytes, so it's 45 bytes in total - 50% more than you
>> estimated, which kind of matches your 3 GB vs 4.5 GB case.
>>
>> There's also a per-row overhead, but I'm not sure about its size in
>> current C* versions - I remember it was about 25 bytes or so some time ago,
>> but it's not important in your case.
>>
>> Kind regards,
>> Michał Michalski,
>> michal.michal...@boxever.com
>>
>>
>> On 13 April 2014 17:48, Yulian Oifa  wrote:
>>
>>> Hello Mark and thanks for you reply.
>>> 1) i store is as UTF8String.All digits are from 0x30 to 0x39 and should
>>> take 1 byte each digit. Since all characters are digits it should have 15
>>> bytes.
>>> 2) I will change the data i am storing to decrease the usage , in value
>>> i will find some small value to store.Previously i used same value since
>>> this table is index only for search purposed and does not really has value.
>>> 3) You are right i read and write in quorum and it was my mistake ( i
>>> though that if i write in quorum then data will be written to 2 nodes only).
>>> If i check the keyspace
>>> create keyspace USER_DATA
>>>   with placement_strategy = 'NetworkTopologyStrategy'
>>>   and strategy_options = [{19 : 3}]
>>>   and durable_writes = true;
>>>
>>> it has replication factor of 3.
>>> Therefore i have several questions
>>> 1) What should be recommended read and write consistency and replication
>>> factor for 3 nodes with option of future increase server numbers?
>>> 2) Still it has 1.5X of overall data how can this be resolved and what
>>> is reason for that?
>>> 3) Also i see that data is in different size on all nodes , does that
>>> means that servers are out of sync???
>>>
>>> Thanks and best regards
>>> Yulian Oifa
>>>
>>>
>>> On Sun, Apr 13, 2014 at 7:03 PM, Mark Reddy wrote:
>>>
>>>> What are you storing these 15 chars as; string, int, double, etc.? 15
>>>> chars does not translate to 15 bytes.
>>>>
>>>> You may be mixing up replication factor and quorum when you say "Cassandra
>>>> cluster has 3 servers, and data is stored in quorum ( 2 servers )."
>>>> You read and write at quorum 

Re: Cassandra disk usage

2014-04-13 Thread Yulian Oifa
Hello Mark and thanks for you reply.
1) i store is as UTF8String.All digits are from 0x30 to 0x39 and should
take 1 byte each digit. Since all characters are digits it should have 15
bytes.
2) I will change the data i am storing to decrease the usage , in value i
will find some small value to store.Previously i used same value since this
table is index only for search purposed and does not really has value.
3) You are right i read and write in quorum and it was my mistake ( i
though that if i write in quorum then data will be written to 2 nodes only).
If i check the keyspace
create keyspace USER_DATA
  with placement_strategy = 'NetworkTopologyStrategy'
  and strategy_options = [{19 : 3}]
  and durable_writes = true;

it has replication factor of 3.
Therefore i have several questions
1) What should be recommended read and write consistency and replication
factor for 3 nodes with option of future increase server numbers?
2) Still it has 1.5X of overall data how can this be resolved and what is
reason for that?
3) Also i see that data is in different size on all nodes , does that means
that servers are out of sync???

Thanks and best regards
Yulian Oifa


On Sun, Apr 13, 2014 at 7:03 PM, Mark Reddy  wrote:

> What are you storing these 15 chars as; string, int, double, etc.? 15
> chars does not translate to 15 bytes.
>
> You may be mixing up replication factor and quorum when you say "Cassandra
> cluster has 3 servers, and data is stored in quorum ( 2 servers )." You
> read and write at quorum (N/2)+1 where N=total_number_of_nodes and your
> data is replicated to the number of nodes you specify in your replication
> factor. Could you clarify?
>
> Also if you are concerned about disk usage, why are you storing the same
> 15 char value in both the column name and value? You could just store it as
> the name and half your data usage :)
>
>
>
>
> On Sun, Apr 13, 2014 at 4:26 PM, Yulian Oifa wrote:
>
>> I have column family with 2 raws.
>> 2 raws have overall 100 million columns.
>> Each columns have name of 15 chars ( digits ) and same 15 chars in value
>> ( also digits ).
>> Each column should have 30 bytes.
>> Therefore all data should contain approximately 3GB.
>> Cassandra cluster has 3 servers , and data is stored in quorum ( 2
>> servers ).
>> Therefore each server should have 3GB*2/3=2GB of data for this column
>> family.
>> Table is almost never changed , data is only removed from this table ,
>> which possibly created tombstones , but it should not increase the usage.
>> However when i check the data i see that each server has more then 4GB of
>> data ( more then twice of what should be).
>>
>> server 1:
>> -rw-r--r-- 1 root root 3506446057 Dec 26 12:02 freeNumbers-g-264-Data.db
>> -rw-r--r-- 1 root root  814699666 Dec 26 12:24 freeNumbers-g-281-Data.db
>> -rw-r--r-- 1 root root  198432466 Dec 26 12:27 freeNumbers-g-284-Data.db
>> -rw-r--r-- 1 root root   35883918 Apr 12 20:07 freeNumbers-g-336-Data.db
>>
>> server 2:
>> -rw-r--r-- 1 root root 3448432307 Dec 26 11:57 freeNumbers-g-285-Data.db
>> -rw-r--r-- 1 root root  762399716 Dec 26 12:22 freeNumbers-g-301-Data.db
>> -rw-r--r-- 1 root root  220887062 Dec 26 12:23 freeNumbers-g-304-Data.db
>> -rw-r--r-- 1 root root   54914466 Dec 26 12:26 freeNumbers-g-306-Data.db
>> -rw-r--r-- 1 root root   53639516 Dec 26 12:26 freeNumbers-g-305-Data.db
>> -rw-r--r-- 1 root root   53007967 Jan  8 15:45 freeNumbers-g-314-Data.db
>> -rw-r--r-- 1 root root 413717 Apr 12 18:33 freeNumbers-g-359-Data.db
>>
>>
>> server 3:
>> -rw-r--r-- 1 root root 4490657264 Apr 11 18:20 freeNumbers-g-358-Data.db
>> -rw-r--r-- 1 root root 389171 Apr 12 20:58 freeNumbers-g-360-Data.db
>> -rw-r--r-- 1 root root   4276 Apr 11 18:20
>> freeNumbers-g-358-Statistics.db
>> -rw-r--r-- 1 root root   4276 Apr 11 18:24
>> freeNumbers-g-359-Statistics.db
>> -rw-r--r-- 1 root root   4276 Apr 12 20:58
>> freeNumbers-g-360-Statistics.db
>> -rw-r--r-- 1 root root976 Apr 11 18:20 freeNumbers-g-358-Filter.db
>> -rw-r--r-- 1 root root208 Apr 11 18:24 freeNumbers-g-359-Data.db
>> -rw-r--r-- 1 root root 78 Apr 11 18:20 freeNumbers-g-358-Index.db
>> -rw-r--r-- 1 root root 52 Apr 11 18:24 freeNumbers-g-359-Index.db
>> -rw-r--r-- 1 root root 52 Apr 12 20:58 freeNumbers-g-360-Index.db
>> -rw-r--r-- 1 root root 16 Apr 11 18:24 freeNumbers-g-359-Filter.db
>> -rw-r--r-- 1 root root 16 Apr 12 20:58 freeNumbers-g-360-Filter.db
>>
>> When i try to compact i get the following notification from first server :
>> INFO [CompactionExecutor:1604] 2014-04-13 18:23:07,260
>> CompactionController.java (line 146) Compacting large row
>> USER_DATA/freeNumbers:8bdf9678-6d70-11e3-85ab-80e385abf85d (4555076689
>> bytes) incrementally
>>
>> Which confirms that there is around 4.5GB of data on that server only.
>> Why does cassandra takes so much data???
>>
>> Best regards
>> Yulian Oifa
>>
>>
>


Cassandra disk usage

2014-04-13 Thread Yulian Oifa
I have column family with 2 raws.
2 raws have overall 100 million columns.
Each columns have name of 15 chars ( digits ) and same 15 chars in value (
also digits ).
Each column should have 30 bytes.
Therefore all data should contain approximately 3GB.
Cassandra cluster has 3 servers , and data is stored in quorum ( 2 servers
).
Therefore each server should have 3GB*2/3=2GB of data for this column
family.
Table is almost never changed , data is only removed from this table ,
which possibly created tombstones , but it should not increase the usage.
However when i check the data i see that each server has more then 4GB of
data ( more then twice of what should be).

server 1:
-rw-r--r-- 1 root root 3506446057 Dec 26 12:02 freeNumbers-g-264-Data.db
-rw-r--r-- 1 root root  814699666 Dec 26 12:24 freeNumbers-g-281-Data.db
-rw-r--r-- 1 root root  198432466 Dec 26 12:27 freeNumbers-g-284-Data.db
-rw-r--r-- 1 root root   35883918 Apr 12 20:07 freeNumbers-g-336-Data.db

server 2:
-rw-r--r-- 1 root root 3448432307 Dec 26 11:57 freeNumbers-g-285-Data.db
-rw-r--r-- 1 root root  762399716 Dec 26 12:22 freeNumbers-g-301-Data.db
-rw-r--r-- 1 root root  220887062 Dec 26 12:23 freeNumbers-g-304-Data.db
-rw-r--r-- 1 root root   54914466 Dec 26 12:26 freeNumbers-g-306-Data.db
-rw-r--r-- 1 root root   53639516 Dec 26 12:26 freeNumbers-g-305-Data.db
-rw-r--r-- 1 root root   53007967 Jan  8 15:45 freeNumbers-g-314-Data.db
-rw-r--r-- 1 root root 413717 Apr 12 18:33 freeNumbers-g-359-Data.db


server 3:
-rw-r--r-- 1 root root 4490657264 Apr 11 18:20 freeNumbers-g-358-Data.db
-rw-r--r-- 1 root root 389171 Apr 12 20:58 freeNumbers-g-360-Data.db
-rw-r--r-- 1 root root   4276 Apr 11 18:20
freeNumbers-g-358-Statistics.db
-rw-r--r-- 1 root root   4276 Apr 11 18:24
freeNumbers-g-359-Statistics.db
-rw-r--r-- 1 root root   4276 Apr 12 20:58
freeNumbers-g-360-Statistics.db
-rw-r--r-- 1 root root976 Apr 11 18:20 freeNumbers-g-358-Filter.db
-rw-r--r-- 1 root root208 Apr 11 18:24 freeNumbers-g-359-Data.db
-rw-r--r-- 1 root root 78 Apr 11 18:20 freeNumbers-g-358-Index.db
-rw-r--r-- 1 root root 52 Apr 11 18:24 freeNumbers-g-359-Index.db
-rw-r--r-- 1 root root 52 Apr 12 20:58 freeNumbers-g-360-Index.db
-rw-r--r-- 1 root root 16 Apr 11 18:24 freeNumbers-g-359-Filter.db
-rw-r--r-- 1 root root 16 Apr 12 20:58 freeNumbers-g-360-Filter.db

When i try to compact i get the following notification from first server :
INFO [CompactionExecutor:1604] 2014-04-13 18:23:07,260
CompactionController.java (line 146) Compacting large row
USER_DATA/freeNumbers:8bdf9678-6d70-11e3-85ab-80e385abf85d (4555076689
bytes) incrementally

Which confirms that there is around 4.5GB of data on that server only.
Why does cassandra takes so much data???

Best regards
Yulian Oifa


SSTable files question

2014-04-11 Thread Yulian Oifa
Hello to all
I have runed today nodetool compact no specific node.
It created single file ( g-1155) on 18:08
Currently all clients are down therefore no new data is wrotten
However in time of running compact on on other nodes i found that new
SSTables appeared
on this node :

-rw-r--r-- 1 root root 4590359957 Apr 11 18:08 globalIndexes-g-1155-Data.db
-rw-r--r-- 1 root root 2416533871 Apr 11 19:57 globalIndexes-g-1241-Data.db
-rw-r--r-- 1 root root  812435119 Apr 11 20:13 globalIndexes-g-1282-Data.db
-rw-r--r-- 1 root root  809054655 Apr 11 20:27 globalIndexes-g-1303-Data.db
-rw-r--r-- 1 root root  767685693 Apr 11 20:00 globalIndexes-g-1261-Data.db
-rw-r--r-- 1 root root  203513615 Apr 11 20:32 globalIndexes-g-1313-Data.db
-rw-r--r-- 1 root root  202942155 Apr 11 20:35 globalIndexes-g-1318-Data.db
-rw-r--r-- 1 root root  202656433 Apr 11 20:29 globalIndexes-g-1308-Data.db
-rw-r--r-- 1 root root   51223791 Apr 11 20:39 globalIndexes-g-1323-Data.db
-rw-r--r-- 1 root root   50890483 Apr 11 20:37 globalIndexes-g-1320-Data.db
-rw-r--r-- 1 root root   50366855 Apr 11 20:36 globalIndexes-g-1319-Data.db
-rw-r--r-- 1 root root   50366685 Apr 11 20:38 globalIndexes-g-1322-Data.db
-rw-r--r-- 1 root root   50271529 Apr 11 20:39 globalIndexes-g-1324-Data.db

Where does those files comes from?
Also it takes a plenty of space for some reason.

Thanks and best regards
Yulian Oifa


Re: Transaction Timeout on get_count

2014-04-07 Thread Yulian Oifa
Thank for you replies.
1) I can not create raw each X time , since it will not allow me to get a
complete list of currently active records ( this is the only reason i keep
this raw initially ).
2) As for compaction i thought that only raw ids are cached and not columns
itself. I have completed compaction and it indeed cleared most of data (
75% of complete table ,and there are some other raws ).
It looks like it did not cleared all the data from disk ( compaction should
set table into single file , while other files still left ) , but showed in
log that data cleared , also after compaction this raw became responsive.

Most of our data is wrotten once and kept in history , however we have also
counter columns which are not working good ( had troubles with that ) and
several places where we use create / delete approach.Now i understand why
our data grows so much , it just does not clears old data at all , but
marks it as tombstone...
What NOSQL database would you recommend for such usages ( write onces read
many mixed with counter columns mixed with read/write oftenly data)?

Thanks and best regards
Yulian Oifa




On Mon, Apr 7, 2014 at 5:57 PM, Lukas Steiblys  wrote:

>   Deleting a column simply produces a tombstone for that column, as far
> as I know. It's probably going through all the columns with tombstones and
> timing out. Compacting more often should help, but maybe Cassandra isn't
> the best choice overall for what you're trying to do.
>
> Lukas
>
>  *From:* Yulian Oifa 
> *Sent:* Sunday, April 6, 2014 11:54 AM
> *To:* user@cassandra.apache.org
> *Subject:* Transaction Timeout on get_count
>
>Hello
> I am having raw in which approximately 100 values is written per minute.
> Those columns are then deleted ( it contains active records list ).
> When i am trying to execute get_count on that raw i get transaction
> timeout , even while the raw is empty.
> I dont see anything in cassandra log on neither node , pending tasks are
> zero.
> What could be a reason for that , and how can it be resolved?
> Best regards
> Yulian Oifa
>


Transaction Timeout on get_count

2014-04-06 Thread Yulian Oifa
Hello
I am having raw in which approximately 100 values is written per minute.
Those columns are then deleted ( it contains active records list ).
When i am trying to execute get_count on that raw i get transaction timeout
, even while the raw is empty.
I dont see anything in cassandra log on neither node , pending tasks are
zero.
What could be a reason for that , and how can it be resolved?
Best regards
Yulian Oifa


Re: Problem with counter columns

2013-09-22 Thread Yulian Oifa
Hello
I have resolved the issue by recreating tables and recalculating them ( i
use counter columns for summary , so live data allows me to recalculate it
).

As for upgrading i walked over issues list and found that issues either
where resolved in 0.8.3 , 0.8.4 or opened till now ( my version is 0.8.10 )

Especially the issues :
https://issues.apache.org/jira/browse/CASSANDRA-4775

And previous one that was moved into 4775 :
https://issues.apache.org/jira/browse/CASSANDRA-2495

So my goal is to identify the problem first , prior to upgrading and
opening duplicate case , etc.
I am almost sure that what caused the error was Out of memory problems
which where caused by big super column tables.after that i had to restart a
servers one after another.
Counters got out of sync and looks like when update went to correct server
everything was working , otherwise update was discarded ( servers was not
able to resync on counter columns ).
Looks like also nodetool repair does nothing with counter columns , but
only regular columns.

Unfortunately i did not found any hint about what was wrong in counter
columns and did not saw anything in log files , which makes me only hope
that next time the issue will not be reproduced
For now i will have to stay with current version till i will see that
described issues are resolved in cassandra.

Best regards
Yulian Oifa



On Thu, Sep 19, 2013 at 7:06 PM, Robert Coli  wrote:

> On Wed, Sep 18, 2013 at 11:07 AM, Yulian Oifa wrote:
>
>> i am using counter columns in cassandra cluster with 3 nodes.
>>
>
>>
> Current cassandra version is 0.8.10.
>>
>> How can i debug , find the problem
>>
>
> The problem is using Counters in Cassandra 0.8.
>
> But seriously, I don't know whether the particular issue you describe is
> fixed upstream. But if it isn't, no one will fix it in 0.8, so you should
> probably...
>
> 1) upgrade to Cassandra 1.2.9 (note that you likely need to pass through
> 1.0/1.1)
> 2) attempt to reproduce
> 3) if you can, file a JIRA and update this thread with a link to it
>
> =Rob
>
>


Problem with counter columns

2013-09-18 Thread Yulian Oifa
Hello to all
i am using counter columns in cassandra cluster with 3 nodes.
all 3 nodes are up and synchronized with ntp timeserver , same with client.
I am using libthrift java client.

Current problem i am having is that part of writes to counter columns
simply disappears ( most of time different values are added so i am able to
follow what missed and what not ).Approximately 40-50% of add requests
disappears.

So for example if initial value is 1 , and i want to add 1 , 2 , 3 i would
expect final result on 7 , but it will return me the value 5 since add(2)
will be lost ( no exception is thrown ).

I did not disabled replica on write ( it is set to true ):

ColumnFamily: subscriptionsCounters
  Key Validation Class: org.apache.cassandra.db.marshal.TimeUUIDType
  Default column value validator:
org.apache.cassandra.db.marshal.CounterColumnType
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period in seconds: 0.0/0
  Row Cache Provider:
org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider
  Key cache size / save period in seconds: 20.0/14400
  Memtable thresholds: 1.1484375/1440/245 (millions of ops/minutes/MB)
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Replicate on write: true
  Built indexes: []


Also i dont see anything in log in any not , neither warning nor error.
I am writing data with Local QUORUM Consistency Level since it stopped
working with one.
same with read operations.


Current cassandra version is 0.8.10.

How can i debug , find the problem

Thanks and best regards
Yulian Oifa