[ https://issues.apache.org/jira/browse/CASSANDRA-11138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15530531#comment-15530531 ]
Yap Sok Ann edited comment on CASSANDRA-11138 at 9/28/16 7:02 PM: ------------------------------------------------------------------ Encounter similar problem with both current trunk (with or without the patch) and 2.1.15. With the following sample config: {code:none} table: table1 table_definition: | CREATE TABLE table1 ( pk1 text, pk2 int, col1 timestamp, col2 text, col3 blob, PRIMARY KEY ((pk1, pk2), col1, col2) ); columnspec: - name: col1 cluster: uniform(1..3) - name: col2 cluster: uniform(1..4) {code} after insert, there will be duplicate values for {{col2}} *and* {{col3}}: {code:sql} cqlsh:stress> select * from table1 where pk1 = 'VFk.mZLR' and pk2 = 1772149447; pk1 | pk2 | col1 | col2 | col3 ----------+------------+--------------------------+----------+-------------------- VFk.mZLR | 1772149447 | 1994-05-17 08:23:01+0000 | QyCJtb6` | 0x5728b1b79dd2372a VFk.mZLR | 1772149447 | 2010-11-24 11:19:30+0000 | QyCJtb6` | 0x5728b1b79dd2372a (2 rows) {code} If I just remove {{col2}} from clustering key, then there is no duplicate: {code:sql} cqlsh:stress> select * from table1 where pk1 = 'VFk.mZLR' and pk2 = 1772149447; pk1 | pk2 | col1 | col2 | col3 ----------+------------+--------------------------+----------+---------------- VFk.mZLR | 1772149447 | 1994-05-17 08:23:01+0000 | '{9j\(; | 0x1080f88c325e VFk.mZLR | 1772149447 | 2010-11-24 11:19:30+0000 | sA0wlY>' | 0x763588f2f5a8 (2 rows) {code} Is this how it's supposed to work? Of particular concern is how {{col3}} remains the same and thus becomes highly compressible. was (Author: sayap): Encounter similar problem with the current trunk, with or without the patch. With the following sample config: {code:none} table: table1 table_definition: | CREATE TABLE table1 ( pk1 text, pk2 int, col1 timestamp, col2 text, col3 blob, PRIMARY KEY ((pk1, pk2), col1, col2) ); columnspec: - name: col1 cluster: uniform(1..3) - name: col2 cluster: uniform(1..4) {code} after insert, there will be duplicate values for {{col2}} *and* {{col3}}: {code:sql} cqlsh:stress> select * from table1 where pk1 = 'VFk.mZLR' and pk2 = 1772149447; pk1 | pk2 | col1 | col2 | col3 ----------+------------+--------------------------+----------+-------------------- VFk.mZLR | 1772149447 | 1994-05-17 08:23:01+0000 | QyCJtb6` | 0x5728b1b79dd2372a VFk.mZLR | 1772149447 | 2010-11-24 11:19:30+0000 | QyCJtb6` | 0x5728b1b79dd2372a (2 rows) {code} If I just remove {{col2}} from clustering key, then there is no duplicate: {code:sql} cqlsh:stress> select * from table1 where pk1 = 'VFk.mZLR' and pk2 = 1772149447; pk1 | pk2 | col1 | col2 | col3 ----------+------------+--------------------------+----------+---------------- VFk.mZLR | 1772149447 | 1994-05-17 08:23:01+0000 | '{9j\(; | 0x1080f88c325e VFk.mZLR | 1772149447 | 2010-11-24 11:19:30+0000 | sA0wlY>' | 0x763588f2f5a8 (2 rows) {code} > cassandra-stress tool - clustering key values not distributed > ------------------------------------------------------------- > > Key: CASSANDRA-11138 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11138 > Project: Cassandra > Issue Type: Bug > Components: Tools > Environment: Cassandra 2.2.4, Centos 6.5, Java 8 > Reporter: Ralf Steppacher > Labels: stress > Attachments: 11138-trunk.patch > > > I am trying to get the stress tool to generate random values for three > clustering keys. I am trying to simulate collecting events per user id (text, > partition key). Events have a session type (text), event type (text), and > creation time (timestamp) (clustering keys, in that order). For testing > purposes I ended up with the following column spec: > {noformat} > columnspec: > - name: created_at > cluster: uniform(10..10) > - name: event_type > size: uniform(5..10) > population: uniform(1..30) > cluster: uniform(1..30) > - name: session_type > size: fixed(5) > population: uniform(1..4) > cluster: uniform(1..4) > - name: user_id > size: fixed(15) > population: uniform(1..1000000) > - name: message > size: uniform(10..100) > population: uniform(1..100B) > {noformat} > My expectation was that this would lead to anywhere between 10 and 1200 rows > to be created per partition key. But it seems that exactly 10 rows are being > created, with the {{created_at}} timestamp being the only variable that is > assigned variable values (per partition key). The {{session_type}} and > {{event_type}} variables are assigned fixed values. This is even the case if > I set the cluster distribution to uniform(30..30) and uniform(4..4) > respectively. With this setting I expected 1200 rows per partition key to be > created, as announced when running the stress tool, but it is still 10. > {noformat} > [rsteppac@centos bin]$ ./cassandra-stress user > profile=../batch_too_large.yaml ops\(insert=1\) -log level=verbose > file=~/centos_eventy_patient_session_event_timestamp_insert_only.log -node > 10.211.55.8 > … > Created schema. Sleeping 1s for propagation. > Generating batches with [1..1] partitions and [1..1] rows (of [1200..1200] > total rows in the partitions) > Improvement over 4 threadCount: 19% > ... > {noformat} > Sample of generated data: > {noformat} > cqlsh> select user_id, event_type, session_type, created_at from > stresscql.batch_too_large LIMIT 30 ; > user_id | event_type | session_type | created_at > -----------------------------+------------------+--------------+-------------------------- > %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| | P+|u\x0b | 2012-10-19 > 08:14:11+0000 > %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| | P+|u\x0b | 2004-11-08 > 04:04:56+0000 > %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| | P+|u\x0b | 2002-10-15 > 00:39:23+0000 > %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| | P+|u\x0b | 1999-08-31 > 19:56:30+0000 > %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| | P+|u\x0b | 1999-04-02 > 20:46:26+0000 > %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| | P+|u\x0b | 1990-10-08 > 03:27:17+0000 > %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| | P+|u\x0b | 1984-03-31 > 23:30:34+0000 > %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| | P+|u\x0b | 1975-11-16 > 02:41:28+0000 > %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| | P+|u\x0b | 1970-04-07 > 07:23:48+0000 > %\x7f\x03/.d29<i\$u\x114 | Y ?\x1eR|\x13\t| | P+|u\x0b | 1970-03-08 > 23:23:04+0000 > N!\x0eUA7^r7d\x06J<v< | \x1bm/c/Th\x07U | E}P^k | 2015-10-12 > 17:48:51+0000 > N!\x0eUA7^r7d\x06J<v< | \x1bm/c/Th\x07U | E}P^k | 2010-10-28 > 06:21:13+0000 > N!\x0eUA7^r7d\x06J<v< | \x1bm/c/Th\x07U | E}P^k | 2005-06-28 > 03:34:41+0000 > N!\x0eUA7^r7d\x06J<v< | \x1bm/c/Th\x07U | E}P^k | 2005-01-29 > 05:26:21+0000 > N!\x0eUA7^r7d\x06J<v< | \x1bm/c/Th\x07U | E}P^k | 2003-03-27 > 01:31:24+0000 > N!\x0eUA7^r7d\x06J<v< | \x1bm/c/Th\x07U | E}P^k | 2002-03-29 > 14:22:43+0000 > N!\x0eUA7^r7d\x06J<v< | \x1bm/c/Th\x07U | E}P^k | 2000-06-15 > 14:54:29+0000 > N!\x0eUA7^r7d\x06J<v< | \x1bm/c/Th\x07U | E}P^k | 1998-03-08 > 13:31:54+0000 > N!\x0eUA7^r7d\x06J<v< | \x1bm/c/Th\x07U | E}P^k | 1988-01-21 > 06:38:40+0000 > N!\x0eUA7^r7d\x06J<v< | \x1bm/c/Th\x07U | E}P^k | 1975-08-03 > 21:16:47+0000 > oy\x1c0077H"i\x07\x13_%\x06 | | \nz@Qj\x1cB | E}P^k | 2014-11-23 > 17:05:45+0000 > oy\x1c0077H"i\x07\x13_%\x06 | | \nz@Qj\x1cB | E}P^k | 2012-02-23 > 23:20:54+0000 > oy\x1c0077H"i\x07\x13_%\x06 | | \nz@Qj\x1cB | E}P^k | 2012-02-19 > 12:05:15+0000 > oy\x1c0077H"i\x07\x13_%\x06 | | \nz@Qj\x1cB | E}P^k | 2005-10-17 > 04:22:45+0000 > oy\x1c0077H"i\x07\x13_%\x06 | | \nz@Qj\x1cB | E}P^k | 2003-02-24 > 19:45:06+0000 > oy\x1c0077H"i\x07\x13_%\x06 | | \nz@Qj\x1cB | E}P^k | 1996-12-18 > 06:18:31+0000 > oy\x1c0077H"i\x07\x13_%\x06 | | \nz@Qj\x1cB | E}P^k | 1991-06-10 > 22:07:45+0000 > oy\x1c0077H"i\x07\x13_%\x06 | | \nz@Qj\x1cB | E}P^k | 1983-05-05 > 12:29:09+0000 > oy\x1c0077H"i\x07\x13_%\x06 | | \nz@Qj\x1cB | E}P^k | 1972-04-17 > 21:24:52+0000 > oy\x1c0077H"i\x07\x13_%\x06 | | \nz@Qj\x1cB | E}P^k | 1971-05-09 > 23:00:02+0000 > (30 rows) > cqlsh> > {noformat} > If I remove the {{created_at}} clustering key, then the other two clustering > keys are being assigned variable values per partition key. -- This message was sent by Atlassian JIRA (v6.3.4#6332)