Re: why dose it still have to seach in SSTable when getting data in memtable in the read flow?

2017-03-27 Thread 赵豫峰
@Bhuvan Rawal @Jasonstack Thanks a lot, it's very helpful!






--


赵豫峰



环信即时通讯云/研发




 
 
 
-- Original --
From:  "Bhuvan Rawal";
Date:  Mon, Mar 27, 2017 08:42 PM
To:  "user"; 

Subject:  Re: why dose it still have to seach in SSTable when getting data in 
memtable in the read flow?

 
Also Cassandra working unit is Cells so in a partition there may be possibility 
of some cells in a row being present in memtable and others may be located in 
memtable therefore the need of reconciling partition data.

@Jason's point is valid too - User defined timestamp  may put sstable cells 
ahead of memtable ones.


Thanks,
Bhuvan


On Mon, Mar 27, 2017 at 5:29 PM, jason zhao yang  
wrote:
Hi,

Cassandra uses last-writetime-win strategy.

In memory data doesn't mean it is the latest data due to custom write time, if 
data is also in Sstable, Cassandra has to read it and reconcile.

Jasonstack
On Mon, 27 Mar 2017 at 7:53 PM, 赵豫峰  wrote:

hello, I get the message that "If the memtable has the desired partition data, 
then the data is read and then merged with the data from the SSTables. The 
SSTable data is accessed as shown in the following steps." in "how is data 
read?" chapter  in 
http://docs.datastax.com/en/archived/cassandra/2.2/cassandra/dml/dmlAboutReads.html.
 


I do not understand that why have to read SSTable when it has got target data 
in memtable. If the data is in memtable, it means that data is lastest one, is 
there any other reason that it still has to seach in SSTable?


Thanks!


--


赵豫峰



环信即时通讯云/研发

Re: Effective partition key for time series data, which allows range queries?

2017-03-27 Thread Noorul Islam Kamal Malmiyoda
Have you looked at KairosDB schema ?

https://kairosdb.github.io/

Regards,
Noorul

On Tue, Mar 28, 2017 at 6:17 AM, Ali Akhtar  wrote:
> I have a use case where the data for individual users is being tracked, and
> every 15 minutes or so, the data for the past 15 minutes is inserted into
> the table.
>
> The table schema looks like:
> user id, timestamp, foo, bar, etc.
>
> Where foo, bar, etc are the items being tracked, and their values over the
> past 15 minutes.
>
> I initially planned to use the user id as the primary key of the table. But,
> I realized that this may cause really wide rows ( tracking for 24 hours
> means 96 records inserted (1 for each 15 min window), over 1 year this means
> 36k records per user, over 2 years, 72k, etc).
>
> I know the  limit of wide rows is billions of records, but I've heard that
> the practical limit is much lower.
>
> So I considered using a composite primary key: (user, timestamp)
>
> If I'm correct, the above should create a new row for each user & timestamp
> logged.
>
> However, will i still be able to do range queries on the timestamp, to e.g
> return the data for the last week?
>
> E.g select * from data where user_id = 'foo' and timestamp >= '<1 month
> ago>' and timestamp <= '' ?
>


Effective partition key for time series data, which allows range queries?

2017-03-27 Thread Ali Akhtar
I have a use case where the data for individual users is being tracked, and
every 15 minutes or so, the data for the past 15 minutes is inserted into
the table.

The table schema looks like:
user id, timestamp, foo, bar, etc.

Where foo, bar, etc are the items being tracked, and their values over the
past 15 minutes.

I initially planned to use the user id as the primary key of the table.
But, I realized that this may cause really wide rows ( tracking for 24
hours means 96 records inserted (1 for each 15 min window), over 1 year
this means 36k records per user, over 2 years, 72k, etc).

I know the  limit of wide rows is billions of records, but I've heard that
the practical limit is much lower.

So I considered using a composite primary key: (user, timestamp)

If I'm correct, the above should create a new row for each user & timestamp
logged.

However, will i still be able to do range queries on the timestamp, to e.g
return the data for the last week?

E.g select * from data where user_id = 'foo' and timestamp >= '<1 month
ago>' and timestamp <= '' ?


Issues while using TWCS compaction and Bulkloader

2017-03-27 Thread eugene miretsky
Hi,

We have a Cassandra 3.0.8 cluster, and we use the Bulkloader

to upload time series data nightly. The data has a 3day TTL, and the
compaction window unit is 1h.

Generally the data fits into memory, all reads are served from OS page
cache, and the cluster works fine. However, we had a few unexplained
incidents:

   1. High page fault ratio: The happened ones, for 3-4 days and was
   resolved after we restarted the cluster. Have not been able to reproduce it
   since.
   2. High Bloom number of bloom filter false positive: Same as above.

Several questions:

   1. What could have caused the page fault, and/or bloom filter false
   positives?
   2. What's the right strategy for running repairs?
  1. Are repairs even required? We don't generate any tombstones.
  2. The following article suggests that incremental repairs should not
  be used with Date Tiered compactions, does it also apply to TWCS?
  
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsRepairNodesManualRepair.html

Cheers,
Eugene


When is anti-entropy repair required?

2017-03-27 Thread eugene miretsky
Hi,

Trying to get some clarifications on this post: https://docs.datastax.
com/en/cassandra/3.0/cassandra/operations/opsRepairNodesWhen.html

As far as I understand it, repairs to account for the fact that nodes could
go down (for short of long period of time)

The 2 main reasons for repairing are:

   1. To make sure date is consistent
   2. To make sure tombstones don't creep back

If I have a time series data model, with TWCS compaction where I never
update rows and hence don't care about either of the above (the whole
SSTable just expires after a few days ), do I even need to run repairs?


Re: Weird error: InvalidQueryException: unconfigured table table2

2017-03-27 Thread Vladimir Yudovin
>Just wish that an error like:   "Table x not found in keyspace y"

You are welcome to open JIRA with type Improvement.





Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Sun, 26 Mar 2017 13:31:33 -0400 S G  
wrote 




Thanks, got it working now :)



Just wish that an error like:

   "Table x not found in keyspace y"

would have been much better than:

   "Table x not configured".







On Sat, Mar 25, 2017 at 6:13 AM, Arvydas Jonusonis 
 wrote:






Make sure to prefix the table with the keyspace. 

On Sat, Mar 25, 2017 at 13:28 Anuj Wadehra  wrote:

Ensure that all the nodes are on same schema version such that table2 schema is 
replicated properly on all the nodes.



Thanks

Anuj



Sent from Yahoo Mail on Android






On Sat, Mar 25, 2017 at 3:19 AM, S G

 wrote:



Hi,



I have a keyspace with two tables.



I run a different query for each table:



Table 1:

  Select * from table1 where id = ?



Table 2:

  Select * from table2 where id1 = ? and id = ?





My code using datastax fires above two queries one after the other.

While it never fails for table 1, it never succeeds for table 2

And gives an error:





com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured table 
table2

at com.datastax.driver.core.Responses$Error.asException(Responses.java:136) 

at 
com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:179)
 

at 
com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:177) 

at com.datastax.driver.core.RequestHandler.access$2500(RequestHandler.java:46) 

at 
com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:799)
 

at 
com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:633)
 

at 
com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1070)
 

at 
com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:993)
 

at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
 

at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)
 

at 
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
 

at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)
 

at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
 

at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)
 

at 
io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293)
 

at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:267)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
 

at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:321)
 

at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1280)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
 

at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:328)
 

at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:890)
 

at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
 

at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:564) 

at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:505)
 

at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:419) 

at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:391) 




Any idea what might be wrong?



I have confirmed that all table-names and columns names are lowercase.

Datastax java version tried : 3.1.2  and 3.1.4

Cassandra version: 3.10





Thanks

SG


















Re: Help with data modelling (from MySQL to Cassandra)

2017-03-27 Thread Zoltan Lorincz
Great suggestion! Thanks Avi!

On Mon, Mar 27, 2017 at 3:47 PM, Avi Kivity  wrote:

> You can use static columns to and just one table:
>
>
> CREATE TABLE documents (
>
> doc_id uuid,
>
> element_id uuid,
>
> description text static,
>
> doc_title text static,
>
> element_title text,
>
> PRIMARY KEY (doc_id, element_id)
>
> );
>
> The static columns are present once per unique doc_id.
>
>
>
> On 03/27/2017 01:08 PM, Zoltan Lorincz wrote:
>
> Hi Alexander,
>
> thank you for your help! I think we found the answer:
>
> CREATE TABLE documents (
> doc_id uuid,
> description text,
> title text,
> PRIMARY KEY (doc_id)
>  );
>
> CREATE TABLE nodes (
> doc_id uuid,
> element_id uuid,
> title text,
> PRIMARY KEY (doc_id, element_id)
> );
>
> We can retrieve all elements with the following query:
>  SELECT * FROM elements WHERE doc_id=131cfa55-181e-431e-7956-fe449139d613
>  UPDATE elements SET title='Hello' WHERE 
> doc_id=131cfa55-181e-431e-7956-fe449139d613
> AND element_id=a5e41c5d-fd69-45d1-959b-2fe7a1578949;
>
> Zoltan.
>
>
> On Mon, Mar 27, 2017 at 9:47 AM, Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
>> Hi Zoltan,
>>
>> you must try to avoid multi partition queries as much as possible.
>> Instead, use asynchronous queries to grab several partitions concurrently.
>> Try to send no more than  ~100 queries at the same time to avoid DDOS-ing
>> your cluster.
>> This would leave you roughly with 1000+ async queries groups to run.
>> Performance will really depend on your hardware, consistency level, load
>> balancing policy, partition fragmentation (how many updates you'll run on
>> each element over time) and the SLA you're expecting.
>>
>> If that approach doesn't meet your SLA requirements, you can try to use
>> wide partitions and group elements under buckets :
>>
>> CREATE TABLE elements (
>> doc_id long,
>> bucket long,
>> element_id long,
>> element_content text,
>> PRIMARY KEY((doc_id, bucket), element_id)
>> )
>>
>> The bucket here could be a modulus of the element_id (or of the hash of
>> element_id if it is not a numerical value). This way you can spread
>> elements over the cluster and access them directly if you have the doc_id
>> and the element_id to perform updates.
>> You'll get to run less queries concurrently but they'll take more time
>> than individual ones in the first scenario (1 partition per element). You
>> should benchmark both solutions to see which one gives best performance.
>> Bucket your elements so that your partitions don't grow over 100MB. Large
>> partitions are silent cluster killers (1GB+ partitions are a direct threat
>> to cluster stability)...
>>
>> To ensure best performance, use prepared statements along with the
>> TokenAwarePolicy
>> 
>>  to
>> avoid unnecessary coordination.
>>
>> Cheers,
>>
>>
>> On Mon, Mar 27, 2017 at 4:40 AM Zoltan Lorincz  wrote:
>>
>>> Querying by (doc_id and element_id ) OR just by (element_id) is fine,
>>> but the real question is, will it be efficient to query 100k+ primary keys
>>> in the elements table?
>>> e.g.
>>>
>>> SELECT * FROM elements WHERE element_id IN (element_id1, element_id2,
>>> element_id3,  element_id100K+)  ?
>>>
>>> The elements_id is a primary key.
>>>
>>> Thank you?
>>>
>>>
>>> On Sun, Mar 26, 2017 at 11:35 PM, Matija Gobec 
>>> wrote:
>>>
>>> Have one table hold document metadata (doc_id, title, description, ...)
>>> and have another table elements where partition key is doc_id and
>>> clustering key is element_id.
>>> Only problem here is if you need to query and/or update element just by
>>> element_id but I don't know your queries up front.
>>>
>>> On Sun, Mar 26, 2017 at 10:16 PM, Zoltan Lorincz 
>>> wrote:
>>>
>>> Dear cassandra users,
>>>
>>> We have the following structure in MySql:
>>>
>>> documents->[doc_id(primary key), title, description]
>>> elements->[element_id(primary key), doc_id(index), title, description]
>>>
>>> Notation: table name->[column1(key or index), column2, …]
>>>
>>> We want to transfer the data to Cassandra.
>>>
>>> Each document can contain a large number of elements (between 1 and
>>> 100k+)
>>>
>>> We have two requirements:
>>> a) Load all elements for a given doc_id quickly
>>> b) Update the value of one individual element quickly
>>>
>>>
>>> We were thinking on the following cassandra configurations:
>>>
>>> Option A
>>>
>>> documents->[doc_id(primary key), title, description, elements] (elements
>>> could be a SET or a TEXT, each time new elements are added (they are never
>>> removed) we would append it to this column)
>>> elements->[element_id(primary key), title, description]
>>>
>>> Loading a document:
>>>
>>>  a) Load document with given  and get all element ids
>>> SELECT * from documents where doc_id=‘id’
>>>
>>>  b) Load all elements with the given ids
>>> SELECT * FROM elements where element_id IN 

Understanding DynamicSnitch Scores.

2017-03-27 Thread Pranay akula
Hi,

I am seeing the DynamicSnitch scores on my node one with 3.44775023338358
and other with 0.884715810185486 so my current node will send full request
to node with Value 3.44 and digest request to node with value 0.88

So the higher the value >1 means the node responding the fastest ??

node with score 5.08 is responding faster than 3.44 right ?? Please correct
me if i am wrong



Thanks
Pranay.


Pagination and timeouts

2017-03-27 Thread Tom van den Berge
I have a table with some 1M rows, and I would like to get the partition key
of each row. Using the java driver (2.1.9), I'm executing the query

select distinct key from table;

The result set is paginated automatically. My C* cluster has two
datacenters, and when I run this query using consistency level LOCAL_ONE,
it starts returning results (page by page) as expected. But after some
time, it will give a ReadTimeoutException. This happens anywhere between 30
seconds and a few minutes.
The java driver's read timeout is set to 50 ms, and the cluster's
read_request_timeout_in_ms is 30 ms.

I'm wondering what is causing this timeout?

What is also not clear to me is whether the driver and server timeout apply
to a single page, or to the entire query?

Thanks,
Tom


Re: How can I scale my read rate?

2017-03-27 Thread Alexander Dejanovski
By default the TokenAwarePolicy does shuffle replicas, and it can be
disabled if you want to only hit the primary replica for the token range
you're querying :
http://docs.datastax.com/en/drivers/java/3.0/com/datastax/driver/core/policies/TokenAwarePolicy.html

On Mon, Mar 27, 2017 at 9:41 AM Avi Kivity  wrote:

> Is the driver doing the right thing by directing all reads for a given
> token to the same node?  If that node fails, then all of those reads will
> be directed at other nodes, all oh whom will be cache-cold for the the
> failed node's primary token range.  Seems like the driver should distribute
> reads among the all the replicas for a token, at least as an option, to
> keep the caches warm for latency-sensitive loads.
>
> On 03/26/2017 07:46 PM, Eric Stevens wrote:
>
> Yes, throughput for a given partition key cannot be improved with
> horizontal scaling.  You can increase RF to theoretically improve
> throughput on that key, but actually in this case smart clients might hold
> you back, because they're probably token aware, and will try to serve that
> read off the key's primary replica, so all reads would be directed at a
> single node for that key.
>
> If you're reading at CL=QUORUM, there's a chance that increasing RF will
> actually reduce performance rather than improve it, because you've
> increased the total amount of work to serve the read (as well as the
> write).  If you're reading at CL=ONE, increasing RF will increase the
> chances of falling afoul of eventual consistency.
>
> However that's not really a real-world scenario.  Or if it is, Cassandra
> is probably the wrong tool to satisfy that kind of workload.
>
> On Thu, Mar 23, 2017 at 11:43 PM Alain Rastoul 
> wrote:
>
> On 24/03/2017 01:00, Eric Stevens wrote:
> > Assuming an even distribution of data in your cluster, and an even
> > distribution across those keys by your readers, you would not need to
> > increase RF with cluster size to increase read performance.  If you have
> > 3 nodes with RF=3, and do 3 million reads, with good distribution, each
> > node has served 1 million read requests.  If you increase to 6 nodes and
> > keep RF=3, then each node now owns half as much data and serves only
> > 500,000 reads.  Or more meaningfully in the same time it takes to do 3
> > million reads under the 3 node cluster you ought to be able to do 6
> > million reads under the 6 node cluster since each node is just
> > responsible for 1 million total reads.
> >
> Hi Eric,
>
> I think I got your point.
> In case of really evenly distributed  reads it may (or should?) not make
> any difference,
>
> But when you do not distribute well the reads (and in that case only),
> my understanding about RF was that it could help spreading the load :
> In that case, with RF= 4 instead of 3,  with several clients accessing keys
> same key ranges, a coordinator could pick up one node to handle the request
> in 4 replicas instead of picking up one node in 3 , thus having
> more "workers" to handle a request ?
>
> Am I wrong here ?
>
> Thank you for the clarification
>
>
> --
> best,
> Alain
>
>
> --
-
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Help with data modelling (from MySQL to Cassandra)

2017-03-27 Thread Avi Kivity

You can use static columns to and just one table:


CREATE TABLE documents (

doc_id uuid,

element_id uuid,

description text static,

doc_title text static,

element_title text,

PRIMARY KEY (doc_id, element_id)

);


The static columns are present once per unique doc_id.


On 03/27/2017 01:08 PM, Zoltan Lorincz wrote:

Hi Alexander,

thank you for your help! I think we found the answer:

CREATE TABLE documents (
doc_id uuid,
description text,
title text,
PRIMARY KEY (doc_id)
 );

CREATE TABLE nodes (
doc_id uuid,
element_id uuid,
title text,
PRIMARY KEY (doc_id, element_id)
);

We can retrieve all elements with the following query:
 SELECT * FROM elements WHERE doc_id=131cfa55-181e-431e-7956-fe449139d613
 UPDATE elements SET title='Hello' WHERE 
doc_id=131cfa55-181e-431e-7956-fe449139d613 AND 
element_id=a5e41c5d-fd69-45d1-959b-2fe7a1578949;


Zoltan.


On Mon, Mar 27, 2017 at 9:47 AM, Alexander Dejanovski 
mailto:a...@thelastpickle.com>> wrote:


Hi Zoltan,

you must try to avoid multi partition queries as much as possible.
Instead, use asynchronous queries to grab several partitions
concurrently.
Try to send no more than  ~100 queries at the same time to avoid
DDOS-ing your cluster.
This would leave you roughly with 1000+ async queries groups to
run. Performance will really depend on your hardware, consistency
level, load balancing policy, partition fragmentation (how many
updates you'll run on each element over time) and the SLA you're
expecting.

If that approach doesn't meet your SLA requirements, you can try
to use wide partitions and group elements under buckets :

CREATE TABLE elements (
doc_id long,
bucket long,
element_id long,
element_content text,
PRIMARY KEY((doc_id, bucket), element_id)
)

The bucket here could be a modulus of the element_id (or of the
hash of element_id if it is not a numerical value). This way you
can spread elements over the cluster and access them directly if
you have the doc_id and the element_id to perform updates.
You'll get to run less queries concurrently but they'll take more
time than individual ones in the first scenario (1 partition per
element). You should benchmark both solutions to see which one
gives best performance.
Bucket your elements so that your partitions don't grow over
100MB. Large partitions are silent cluster killers (1GB+
partitions are a direct threat to cluster stability)...

To ensure best performance, use prepared statements along with the
TokenAwarePolicy


 to
avoid unnecessary coordination.

Cheers,


On Mon, Mar 27, 2017 at 4:40 AM Zoltan Lorincz mailto:zol...@gmail.com>> wrote:

Querying by (doc_id and element_id ) OR just by (element_id)
is fine, but the real question is, will it be efficient to
query 100k+ primary keys in the elements table?
e.g.

SELECT * FROM elements WHERE element_id IN (element_id1,
element_id2, element_id3,  element_id100K+)  ?

The elements_id is a primary key.

Thank you?


On Sun, Mar 26, 2017 at 11:35 PM, Matija Gobec
mailto:matija0...@gmail.com>> wrote:

Have one table hold document metadata (doc_id, title,
description, ...) and have another table elements where
partition key is doc_id and clustering key is element_id.
Only problem here is if you need to query and/or update
element just by element_id but I don't know your queries
up front.

On Sun, Mar 26, 2017 at 10:16 PM, Zoltan Lorincz
mailto:zol...@gmail.com>> wrote:

Dear cassandra users,

We have the following structure in MySql:

documents->[doc_id(primary key), title, description]
elements->[element_id(primary key), doc_id(index),
title, description]

Notation: table name->[column1(key or index), column2, …]

We want to transfer the data to Cassandra.

Each document can contain a large number of elements
(between 1 and 100k+)

We have two requirements:
a) Load all elements for a given doc_id quickly
b) Update the value of one individual element quickly


We were thinking on the following cassandra
configurations:

Option A

documents->[doc_id(primary key), title, description,
elements] (elements could be a SET or a TEXT, each
time new elements are added (they are never removed)
we would append it to this column)
elements->[element_id(primary key), title, des

Re: why dose it still have to seach in SSTable when getting data in memtable in the read flow?

2017-03-27 Thread Bhuvan Rawal
Also Cassandra working unit is Cells so in a partition there may be
possibility of some cells in a row being present in memtable and others may
be located in memtable therefore the need of reconciling partition data.

@Jason's point is valid too - User defined timestamp  may put sstable cells
ahead of memtable ones.

Thanks,
Bhuvan

On Mon, Mar 27, 2017 at 5:29 PM, jason zhao yang <
zhaoyangsingap...@gmail.com> wrote:

> Hi,
>
> Cassandra uses last-writetime-win strategy.
>
> In memory data doesn't mean it is the latest data due to custom write
> time, if data is also in Sstable, Cassandra has to read it and reconcile.
>
> Jasonstack
>
> On Mon, 27 Mar 2017 at 7:53 PM, 赵豫峰  wrote:
>
>> hello, I get the message that "If the memtable has the desired partition
>> data, then the data is read and then merged with the data from the
>> SSTables. The SSTable data is accessed as shown in the following steps."
>> in "how is data read?" chapter  in http://docs.datastax.com/en/
>> archived/cassandra/2.2/cassandra/dml/dmlAboutReads.html.
>>
>> I do not understand that why have to read SSTable when it has got target
>> data in memtable. If the data is in memtable, it means that data is lastest
>> one, is there any other reason that it still has to seach in SSTable?
>>
>> Thanks!
>>
>>
>> --
>> 赵豫峰
>>
>> 环信即时通讯云/研发
>>
>>
>


Re: why dose it still have to seach in SSTable when getting data in memtable in the read flow?

2017-03-27 Thread jason zhao yang
Hi,

Cassandra uses last-writetime-win strategy.

In memory data doesn't mean it is the latest data due to custom write time,
if data is also in Sstable, Cassandra has to read it and reconcile.

Jasonstack
On Mon, 27 Mar 2017 at 7:53 PM, 赵豫峰  wrote:

> hello, I get the message that "If the memtable has the desired partition
> data, then the data is read and then merged with the data from the
> SSTables. The SSTable data is accessed as shown in the following steps."
> in "how is data read?" chapter  in
> http://docs.datastax.com/en/archived/cassandra/2.2/cassandra/dml/dmlAboutReads.html
> .
>
> I do not understand that why have to read SSTable when it has got target
> data in memtable. If the data is in memtable, it means that data is lastest
> one, is there any other reason that it still has to seach in SSTable?
>
> Thanks!
>
>
> --
> 赵豫峰
>
> 环信即时通讯云/研发
>
>


why dose it still have to seach in SSTable when getting data in memtable in the read flow?

2017-03-27 Thread 赵豫峰
hello, I get the message that "If the memtable has the desired partition data, 
then the data is read and then merged with the data from the SSTables. The 
SSTable data is accessed as shown in the following steps." in "how is data 
read?" chapter  in 
http://docs.datastax.com/en/archived/cassandra/2.2/cassandra/dml/dmlAboutReads.html.
 


I do not understand that why have to read SSTable when it has got target data 
in memtable. If the data is in memtable, it means that data is lastest one, is 
there any other reason that it still has to seach in SSTable?


Thanks!


--


赵豫峰



环信即时通讯云/研发

Re: Help with data modelling (from MySQL to Cassandra)

2017-03-27 Thread Zoltan Lorincz
Thank you Matija, because i am newbie, it was not clear for me that i am
able to query by the partition key (not providing the clustering key),
sorry about that!
Zoltan.

On Mon, Mar 27, 2017 at 1:54 PM, Matija Gobec  wrote:

> Thats exactly what I described. IN queries can be used sometimes but I
> usually run parallel async as Alexander explained.
>
> On Mon, Mar 27, 2017 at 12:08 PM, Zoltan Lorincz  wrote:
>
>> Hi Alexander,
>>
>> thank you for your help! I think we found the answer:
>>
>> CREATE TABLE documents (
>> doc_id uuid,
>> description text,
>> title text,
>> PRIMARY KEY (doc_id)
>>  );
>>
>> CREATE TABLE nodes (
>> doc_id uuid,
>> element_id uuid,
>> title text,
>> PRIMARY KEY (doc_id, element_id)
>> );
>>
>> We can retrieve all elements with the following query:
>>  SELECT * FROM elements WHERE doc_id=131cfa55-181e-431e-7956-fe449139d613
>>  UPDATE elements SET title='Hello' WHERE 
>> doc_id=131cfa55-181e-431e-7956-fe449139d613
>> AND element_id=a5e41c5d-fd69-45d1-959b-2fe7a1578949;
>>
>> Zoltan.
>>
>>
>> On Mon, Mar 27, 2017 at 9:47 AM, Alexander Dejanovski <
>> a...@thelastpickle.com> wrote:
>>
>>> Hi Zoltan,
>>>
>>> you must try to avoid multi partition queries as much as possible.
>>> Instead, use asynchronous queries to grab several partitions concurrently.
>>> Try to send no more than  ~100 queries at the same time to avoid
>>> DDOS-ing your cluster.
>>> This would leave you roughly with 1000+ async queries groups to run.
>>> Performance will really depend on your hardware, consistency level, load
>>> balancing policy, partition fragmentation (how many updates you'll run on
>>> each element over time) and the SLA you're expecting.
>>>
>>> If that approach doesn't meet your SLA requirements, you can try to use
>>> wide partitions and group elements under buckets :
>>>
>>> CREATE TABLE elements (
>>> doc_id long,
>>> bucket long,
>>> element_id long,
>>> element_content text,
>>> PRIMARY KEY((doc_id, bucket), element_id)
>>> )
>>>
>>> The bucket here could be a modulus of the element_id (or of the hash of
>>> element_id if it is not a numerical value). This way you can spread
>>> elements over the cluster and access them directly if you have the doc_id
>>> and the element_id to perform updates.
>>> You'll get to run less queries concurrently but they'll take more time
>>> than individual ones in the first scenario (1 partition per element). You
>>> should benchmark both solutions to see which one gives best performance.
>>> Bucket your elements so that your partitions don't grow over 100MB.
>>> Large partitions are silent cluster killers (1GB+ partitions are a direct
>>> threat to cluster stability)...
>>>
>>> To ensure best performance, use prepared statements along with the
>>> TokenAwarePolicy
>>> 
>>>  to
>>> avoid unnecessary coordination.
>>>
>>> Cheers,
>>>
>>>
>>> On Mon, Mar 27, 2017 at 4:40 AM Zoltan Lorincz  wrote:
>>>
 Querying by (doc_id and element_id ) OR just by (element_id) is fine,
 but the real question is, will it be efficient to query 100k+ primary keys
 in the elements table?
 e.g.

 SELECT * FROM elements WHERE element_id IN (element_id1, element_id2,
 element_id3,  element_id100K+)  ?

 The elements_id is a primary key.

 Thank you?


 On Sun, Mar 26, 2017 at 11:35 PM, Matija Gobec 
 wrote:

 Have one table hold document metadata (doc_id, title, description, ...)
 and have another table elements where partition key is doc_id and
 clustering key is element_id.
 Only problem here is if you need to query and/or update element just by
 element_id but I don't know your queries up front.

 On Sun, Mar 26, 2017 at 10:16 PM, Zoltan Lorincz 
 wrote:

 Dear cassandra users,

 We have the following structure in MySql:

 documents->[doc_id(primary key), title, description]
 elements->[element_id(primary key), doc_id(index), title, description]

 Notation: table name->[column1(key or index), column2, …]

 We want to transfer the data to Cassandra.

 Each document can contain a large number of elements (between 1 and
 100k+)

 We have two requirements:
 a) Load all elements for a given doc_id quickly
 b) Update the value of one individual element quickly


 We were thinking on the following cassandra configurations:

 Option A

 documents->[doc_id(primary key), title, description, elements]
 (elements could be a SET or a TEXT, each time new elements are added (they
 are never removed) we would append it to this column)
 elements->[element_id(primary key), title, description]

 Loading a document:

  a) Load document with given  and get all element ids
 SELECT * from documents where doc_id=‘id’

  b) Load all

Re: Help with data modelling (from MySQL to Cassandra)

2017-03-27 Thread Matija Gobec
Thats exactly what I described. IN queries can be used sometimes but I
usually run parallel async as Alexander explained.

On Mon, Mar 27, 2017 at 12:08 PM, Zoltan Lorincz  wrote:

> Hi Alexander,
>
> thank you for your help! I think we found the answer:
>
> CREATE TABLE documents (
> doc_id uuid,
> description text,
> title text,
> PRIMARY KEY (doc_id)
>  );
>
> CREATE TABLE nodes (
> doc_id uuid,
> element_id uuid,
> title text,
> PRIMARY KEY (doc_id, element_id)
> );
>
> We can retrieve all elements with the following query:
>  SELECT * FROM elements WHERE doc_id=131cfa55-181e-431e-7956-fe449139d613
>  UPDATE elements SET title='Hello' WHERE 
> doc_id=131cfa55-181e-431e-7956-fe449139d613
> AND element_id=a5e41c5d-fd69-45d1-959b-2fe7a1578949;
>
> Zoltan.
>
>
> On Mon, Mar 27, 2017 at 9:47 AM, Alexander Dejanovski <
> a...@thelastpickle.com> wrote:
>
>> Hi Zoltan,
>>
>> you must try to avoid multi partition queries as much as possible.
>> Instead, use asynchronous queries to grab several partitions concurrently.
>> Try to send no more than  ~100 queries at the same time to avoid DDOS-ing
>> your cluster.
>> This would leave you roughly with 1000+ async queries groups to run.
>> Performance will really depend on your hardware, consistency level, load
>> balancing policy, partition fragmentation (how many updates you'll run on
>> each element over time) and the SLA you're expecting.
>>
>> If that approach doesn't meet your SLA requirements, you can try to use
>> wide partitions and group elements under buckets :
>>
>> CREATE TABLE elements (
>> doc_id long,
>> bucket long,
>> element_id long,
>> element_content text,
>> PRIMARY KEY((doc_id, bucket), element_id)
>> )
>>
>> The bucket here could be a modulus of the element_id (or of the hash of
>> element_id if it is not a numerical value). This way you can spread
>> elements over the cluster and access them directly if you have the doc_id
>> and the element_id to perform updates.
>> You'll get to run less queries concurrently but they'll take more time
>> than individual ones in the first scenario (1 partition per element). You
>> should benchmark both solutions to see which one gives best performance.
>> Bucket your elements so that your partitions don't grow over 100MB. Large
>> partitions are silent cluster killers (1GB+ partitions are a direct threat
>> to cluster stability)...
>>
>> To ensure best performance, use prepared statements along with the
>> TokenAwarePolicy
>> 
>>  to
>> avoid unnecessary coordination.
>>
>> Cheers,
>>
>>
>> On Mon, Mar 27, 2017 at 4:40 AM Zoltan Lorincz  wrote:
>>
>>> Querying by (doc_id and element_id ) OR just by (element_id) is fine,
>>> but the real question is, will it be efficient to query 100k+ primary keys
>>> in the elements table?
>>> e.g.
>>>
>>> SELECT * FROM elements WHERE element_id IN (element_id1, element_id2,
>>> element_id3,  element_id100K+)  ?
>>>
>>> The elements_id is a primary key.
>>>
>>> Thank you?
>>>
>>>
>>> On Sun, Mar 26, 2017 at 11:35 PM, Matija Gobec 
>>> wrote:
>>>
>>> Have one table hold document metadata (doc_id, title, description, ...)
>>> and have another table elements where partition key is doc_id and
>>> clustering key is element_id.
>>> Only problem here is if you need to query and/or update element just by
>>> element_id but I don't know your queries up front.
>>>
>>> On Sun, Mar 26, 2017 at 10:16 PM, Zoltan Lorincz 
>>> wrote:
>>>
>>> Dear cassandra users,
>>>
>>> We have the following structure in MySql:
>>>
>>> documents->[doc_id(primary key), title, description]
>>> elements->[element_id(primary key), doc_id(index), title, description]
>>>
>>> Notation: table name->[column1(key or index), column2, …]
>>>
>>> We want to transfer the data to Cassandra.
>>>
>>> Each document can contain a large number of elements (between 1 and
>>> 100k+)
>>>
>>> We have two requirements:
>>> a) Load all elements for a given doc_id quickly
>>> b) Update the value of one individual element quickly
>>>
>>>
>>> We were thinking on the following cassandra configurations:
>>>
>>> Option A
>>>
>>> documents->[doc_id(primary key), title, description, elements] (elements
>>> could be a SET or a TEXT, each time new elements are added (they are never
>>> removed) we would append it to this column)
>>> elements->[element_id(primary key), title, description]
>>>
>>> Loading a document:
>>>
>>>  a) Load document with given  and get all element ids
>>> SELECT * from documents where doc_id=‘id’
>>>
>>>  b) Load all elements with the given ids
>>> SELECT * FROM elements where element_id IN (ids loaded from query a)
>>>
>>>
>>> Option B
>>>
>>> documents->[doc_id(primary key), title, description]
>>> elements->[element_id(primary key), doc_id(secondary index), title,
>>> description]
>>>
>>> Loading a document:
>>>  a) SELECT * from elements where doc_id=‘id’
>>>
>>>
>>> N

Re: Help with data modelling (from MySQL to Cassandra)

2017-03-27 Thread Zoltan Lorincz
Hi Alexander,

thank you for your help! I think we found the answer:

CREATE TABLE documents (
doc_id uuid,
description text,
title text,
PRIMARY KEY (doc_id)
 );

CREATE TABLE nodes (
doc_id uuid,
element_id uuid,
title text,
PRIMARY KEY (doc_id, element_id)
);

We can retrieve all elements with the following query:
 SELECT * FROM elements WHERE doc_id=131cfa55-181e-431e-7956-fe449139d613
 UPDATE elements SET title='Hello' WHERE
doc_id=131cfa55-181e-431e-7956-fe449139d613 AND
element_id=a5e41c5d-fd69-45d1-959b-2fe7a1578949;

Zoltan.


On Mon, Mar 27, 2017 at 9:47 AM, Alexander Dejanovski <
a...@thelastpickle.com> wrote:

> Hi Zoltan,
>
> you must try to avoid multi partition queries as much as possible.
> Instead, use asynchronous queries to grab several partitions concurrently.
> Try to send no more than  ~100 queries at the same time to avoid DDOS-ing
> your cluster.
> This would leave you roughly with 1000+ async queries groups to run.
> Performance will really depend on your hardware, consistency level, load
> balancing policy, partition fragmentation (how many updates you'll run on
> each element over time) and the SLA you're expecting.
>
> If that approach doesn't meet your SLA requirements, you can try to use
> wide partitions and group elements under buckets :
>
> CREATE TABLE elements (
> doc_id long,
> bucket long,
> element_id long,
> element_content text,
> PRIMARY KEY((doc_id, bucket), element_id)
> )
>
> The bucket here could be a modulus of the element_id (or of the hash of
> element_id if it is not a numerical value). This way you can spread
> elements over the cluster and access them directly if you have the doc_id
> and the element_id to perform updates.
> You'll get to run less queries concurrently but they'll take more time
> than individual ones in the first scenario (1 partition per element). You
> should benchmark both solutions to see which one gives best performance.
> Bucket your elements so that your partitions don't grow over 100MB. Large
> partitions are silent cluster killers (1GB+ partitions are a direct threat
> to cluster stability)...
>
> To ensure best performance, use prepared statements along with the
> TokenAwarePolicy
> 
>  to
> avoid unnecessary coordination.
>
> Cheers,
>
>
> On Mon, Mar 27, 2017 at 4:40 AM Zoltan Lorincz  wrote:
>
>> Querying by (doc_id and element_id ) OR just by (element_id) is fine, but
>> the real question is, will it be efficient to query 100k+ primary keys in
>> the elements table?
>> e.g.
>>
>> SELECT * FROM elements WHERE element_id IN (element_id1, element_id2,
>> element_id3,  element_id100K+)  ?
>>
>> The elements_id is a primary key.
>>
>> Thank you?
>>
>>
>> On Sun, Mar 26, 2017 at 11:35 PM, Matija Gobec 
>> wrote:
>>
>> Have one table hold document metadata (doc_id, title, description, ...)
>> and have another table elements where partition key is doc_id and
>> clustering key is element_id.
>> Only problem here is if you need to query and/or update element just by
>> element_id but I don't know your queries up front.
>>
>> On Sun, Mar 26, 2017 at 10:16 PM, Zoltan Lorincz 
>> wrote:
>>
>> Dear cassandra users,
>>
>> We have the following structure in MySql:
>>
>> documents->[doc_id(primary key), title, description]
>> elements->[element_id(primary key), doc_id(index), title, description]
>>
>> Notation: table name->[column1(key or index), column2, …]
>>
>> We want to transfer the data to Cassandra.
>>
>> Each document can contain a large number of elements (between 1 and
>> 100k+)
>>
>> We have two requirements:
>> a) Load all elements for a given doc_id quickly
>> b) Update the value of one individual element quickly
>>
>>
>> We were thinking on the following cassandra configurations:
>>
>> Option A
>>
>> documents->[doc_id(primary key), title, description, elements] (elements
>> could be a SET or a TEXT, each time new elements are added (they are never
>> removed) we would append it to this column)
>> elements->[element_id(primary key), title, description]
>>
>> Loading a document:
>>
>>  a) Load document with given  and get all element ids
>> SELECT * from documents where doc_id=‘id’
>>
>>  b) Load all elements with the given ids
>> SELECT * FROM elements where element_id IN (ids loaded from query a)
>>
>>
>> Option B
>>
>> documents->[doc_id(primary key), title, description]
>> elements->[element_id(primary key), doc_id(secondary index), title,
>> description]
>>
>> Loading a document:
>>  a) SELECT * from elements where doc_id=‘id’
>>
>>
>> Neither solutions doesn’t seem to be good, in Option A, even if we are
>> querying by Primary keys, the second query will have 100k+ primary key id’s
>> in the WHERE clause, and the second solution looks like an anti pattern in
>> cassandra.
>>
>> Could anyone give any advice how would we create a model for our use case?
>>
>> Thank you in advance,
>

Re: How can I scale my read rate?

2017-03-27 Thread Avi Kivity
Is the driver doing the right thing by directing all reads for a given 
token to the same node?  If that node fails, then all of those reads 
will be directed at other nodes, all oh whom will be cache-cold for the 
the failed node's primary token range.  Seems like the driver should 
distribute reads among the all the replicas for a token, at least as an 
option, to keep the caches warm for latency-sensitive loads.



On 03/26/2017 07:46 PM, Eric Stevens wrote:
Yes, throughput for a given partition key cannot be improved with 
horizontal scaling.  You can increase RF to theoretically improve 
throughput on that key, but actually in this case smart clients might 
hold you back, because they're probably token aware, and will try to 
serve that read off the key's primary replica, so all reads would be 
directed at a single node for that key.


If you're reading at CL=QUORUM, there's a chance that increasing RF 
will actually reduce performance rather than improve it, because 
you've increased the total amount of work to serve the read (as well 
as the write).  If you're reading at CL=ONE, increasing RF will 
increase the chances of falling afoul of eventual consistency.


However that's not really a real-world scenario.  Or if it is, 
Cassandra is probably the wrong tool to satisfy that kind of workload.


On Thu, Mar 23, 2017 at 11:43 PM Alain Rastoul > wrote:


On 24/03/2017 01:00, Eric Stevens wrote:
> Assuming an even distribution of data in your cluster, and an even
> distribution across those keys by your readers, you would not
need to
> increase RF with cluster size to increase read performance.  If
you have
> 3 nodes with RF=3, and do 3 million reads, with good
distribution, each
> node has served 1 million read requests.  If you increase to 6
nodes and
> keep RF=3, then each node now owns half as much data and serves only
> 500,000 reads.  Or more meaningfully in the same time it takes
to do 3
> million reads under the 3 node cluster you ought to be able to do 6
> million reads under the 6 node cluster since each node is just
> responsible for 1 million total reads.
>
Hi Eric,

I think I got your point.
In case of really evenly distributed  reads it may (or should?)
not make
any difference,

But when you do not distribute well the reads (and in that case only),
my understanding about RF was that it could help spreading the load :
In that case, with RF= 4 instead of 3,  with several clients
accessing keys
same key ranges, a coordinator could pick up one node to handle
the request
in 4 replicas instead of picking up one node in 3 , thus having
more "workers" to handle a request ?

Am I wrong here ?

Thank you for the clarification


--
best,
Alain