Re: Cassandra Collections performance issue
Hi Daemeon, We tried changing the behavior "we overwrite every value" to update only 1 element in the map, and still we saw the same performance degradation. Thanks, Pratik From: daemeon reiydelle mailto:daeme...@gmail.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Date: Tuesday, February 9, 2016 at 11:39 AM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Cc: "Peddi, Praveen" mailto:pe...@amazon.com>> Subject: Re: Cassandra Collections performance issue I think the key to your problem might be around "we overwrite every value". You are creating a large number of tombstones, forcing many reads to pull current results. You would do well to rethink why you are having to to overwrite values all the time under the same key. You would be better to figure out haw to add values under a key then age off the old values. I would say that (at least at scale) you have a classic anti-pattern in play. ... Daemeon C.M. Reiydelle USA (+1) 415.501.0198 London (+44) (0) 20 8144 9872 On Mon, Feb 8, 2016 at 5:23 PM, Robert Coli mailto:rc...@eventbrite.com>> wrote: On Mon, Feb 8, 2016 at 2:10 PM, Agrawal, Pratik mailto:paagr...@amazon.com>> wrote: Recently we added one of the table fields from as Map in Cassandra 2.1.11. Currently we read every field from Map and overwrite map values. Map is of size 3. We saw that writes are 30-40% slower while reads are 70-80% slower. Please find below some metrics that can help. My question is, Are there any known issues in Cassandra map performance? As I understand it each of the CQL3 Map entry, maps to a column in cassandra, with that assumption we are just creating 3 columns right? Any insight on this issue would be helpful. I have previously heard reports along similar lines, but in the other direction. eg - "I moved from a collection to a TEXT column with JSON in it, and my reads and writes both became much faster!" I'm not sure if the issue has been raised as an Apache Cassandra Jira, iow if it is a known and expected limitation as opposed to just a performance issue. If I were you, I would consider filing a repro case as a Jira ticket, and responding to this thread with its URL. :D =Rob
Re: Cassandra Collections performance issue
Just to help other users reading along here, what is your access pattern with maps? I mean, do you typically have a large or small number of keys set, are you typically mostly adding keys or deleting keys a lot, adding one at a time or adding and deleting a lot in a single request, or... what? And are you indexing map columns, keys or values? -- Jack Krupansky On Thu, Feb 11, 2016 at 10:44 AM, Clint Martin < clintlmar...@coolfiretechnologies.com> wrote: > I have experienced excessive performance issues while using collections as > well. Mostly my issue was due to the excessive number of cells per > partition that having a modest map size requires. > > Since you are reading and writing the entire map, you can probably gain > some performance the same way I did. Convert you map to be a frozen map. > This essentially puts you in the same place as folks who migrate to a blob > of json, but it puts the onus on Cassandra to manage serializing and > deserializing the map. It does have limitations over a regular map.. You > cant append values, you can't selectively ttl, reading single keys requires > deserializing the whole collection. Basically anything besides reading and > writing the whole collection becomes a little harder. But it is > considerably faster due to the lower cell count and management overhead. > > Clint > On Feb 8, 2016 5:11 PM, "Agrawal, Pratik" wrote: > >> Hello all, >> >> Recently we added one of the table fields from as Map in >> *Cassandra >> 2.1.11*. Currently we read every field from Map and overwrite map >> values. Map is of size 3. We saw that writes are 30-40% slower while reads >> are 70-80% slower. Please find below some metrics that can help. >> >> My question is, Are there any known issues in Cassandra map performance? >> As I understand it each of the CQL3 Map entry, maps to a column in >> cassandra, with that assumption we are just creating 3 columns right? Any >> insight on this issue would be helpful. >> >> Datastax Java Driver 2.1.6. >> Machine: Amazon C3 2x large >> CPU – pretty much same as before (around 30%) >> Memory – max around 4.8 GB >> >> CFSTATS: >> >> Keyspace: Keyspace >> Read Count: 28359044 >> Read Latency: 2.847392469259542 ms. >> Write Count: 1152765 >> Write Latency: 0.14778018590085576 ms. >> Pending Flushes: 0 >> Table: table1 >> SSTable count: 1 >> SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0] >> Space used (live): 4119699 >> Space used (total): 4119699 >> Space used by snapshots (total): 90323640 >> Off heap memory used (total): 2278 >> SSTable Compression Ratio: 0.23172161124142604 >> Number of keys (estimate): 14 >> Memtable cell count: 6437 >> Memtable data size: 872912 >> Memtable off heap memory used: 0 >> Memtable switch count: 7626 >> Local read count: 27754634 >> Local read latency: 1.921 ms >> Local write count: 1113668 >> Local write latency: 0.142 ms >> Pending flushes: 0 >> Bloom filter false positives: 0 >> Bloom filter false ratio: 0.0 >> Bloom filter space used: 96 >> Bloom filter off heap memory used: 88 >> Index summary off heap memory used: 46 >> Compression metadata off heap memory used: 2144 >> Compacted partition minimum bytes: 315853 >> Compacted partition maximum bytes: 4055269 >> Compacted partition mean bytes: 2444011 >> Average live cells per slice (last five minutes): 17.536775249005437 >> Maximum live cells per slice (last five minutes): 1225.0 >> Average tombstones per slice (last five minutes): 34.99979575985972 >> Maximum tombstones per slice (last five minutes): 3430.0 >> >> Table: table2 >> SSTable count: 1 >> SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0] >> Space used (live): 869900 >> Space used (total): 869900 >> Space used by snapshots (total): 17279824 >> Off heap memory used (total): 387 >> SSTable Compression Ratio: 0.3999013540551859 >> Number of keys (estimate): 2 >> Memtable cell count: 1958 >> Memtable data size: 8 >> Memtable off heap memory used: 0 >> Memtable switch count: 7484 >> Local read count: 604412 >> Local read latency: 45.421 ms >> Local write count: 39097 >> Local write latency: 0.337 ms >> Pending flushes: 0 >> Bloom filter false positives: 0 >> Bloom filter false ratio: 0.0 >> Bloom filter space used: 96 >> Bloom filter off heap memory used: 88 >> Index summary off heap memory used: 35 >> Compression metadata off heap memory used: 264 >> Compacted partition minimum bytes: 1955667 >> Compacted partition maximum bytes: 2346799 >> Compacted partition mean bytes: 2346799 >> Average live cells per slice (last five minutes): 1963.0632242863855 >> Maximum live cells per slice (last five minutes): 5001.0 >> Average tombstones per slice (last five minutes): 0.0 >> Maximum tombstones per slice (last five minutes): 0.0 >> >> *NETSTATS:* >> Mode: NORMAL >> Not sending any streams. >> Read Repair Statistics: >> Attempted: 2853996 >> Mismatch (Blocking): 67386 >> Mismatch (Background): 9233 >> Pool NameActive Pending Completed >> Commands
Re: Cassandra Collections performance issue
I have experienced excessive performance issues while using collections as well. Mostly my issue was due to the excessive number of cells per partition that having a modest map size requires. Since you are reading and writing the entire map, you can probably gain some performance the same way I did. Convert you map to be a frozen map. This essentially puts you in the same place as folks who migrate to a blob of json, but it puts the onus on Cassandra to manage serializing and deserializing the map. It does have limitations over a regular map.. You cant append values, you can't selectively ttl, reading single keys requires deserializing the whole collection. Basically anything besides reading and writing the whole collection becomes a little harder. But it is considerably faster due to the lower cell count and management overhead. Clint On Feb 8, 2016 5:11 PM, "Agrawal, Pratik" wrote: > Hello all, > > Recently we added one of the table fields from as Map in > *Cassandra > 2.1.11*. Currently we read every field from Map and overwrite map values. > Map is of size 3. We saw that writes are 30-40% slower while reads are > 70-80% slower. Please find below some metrics that can help. > > My question is, Are there any known issues in Cassandra map performance? > As I understand it each of the CQL3 Map entry, maps to a column in > cassandra, with that assumption we are just creating 3 columns right? Any > insight on this issue would be helpful. > > Datastax Java Driver 2.1.6. > Machine: Amazon C3 2x large > CPU – pretty much same as before (around 30%) > Memory – max around 4.8 GB > > CFSTATS: > > Keyspace: Keyspace > Read Count: 28359044 > Read Latency: 2.847392469259542 ms. > Write Count: 1152765 > Write Latency: 0.14778018590085576 ms. > Pending Flushes: 0 > Table: table1 > SSTable count: 1 > SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0] > Space used (live): 4119699 > Space used (total): 4119699 > Space used by snapshots (total): 90323640 > Off heap memory used (total): 2278 > SSTable Compression Ratio: 0.23172161124142604 > Number of keys (estimate): 14 > Memtable cell count: 6437 > Memtable data size: 872912 > Memtable off heap memory used: 0 > Memtable switch count: 7626 > Local read count: 27754634 > Local read latency: 1.921 ms > Local write count: 1113668 > Local write latency: 0.142 ms > Pending flushes: 0 > Bloom filter false positives: 0 > Bloom filter false ratio: 0.0 > Bloom filter space used: 96 > Bloom filter off heap memory used: 88 > Index summary off heap memory used: 46 > Compression metadata off heap memory used: 2144 > Compacted partition minimum bytes: 315853 > Compacted partition maximum bytes: 4055269 > Compacted partition mean bytes: 2444011 > Average live cells per slice (last five minutes): 17.536775249005437 > Maximum live cells per slice (last five minutes): 1225.0 > Average tombstones per slice (last five minutes): 34.99979575985972 > Maximum tombstones per slice (last five minutes): 3430.0 > > Table: table2 > SSTable count: 1 > SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0] > Space used (live): 869900 > Space used (total): 869900 > Space used by snapshots (total): 17279824 > Off heap memory used (total): 387 > SSTable Compression Ratio: 0.3999013540551859 > Number of keys (estimate): 2 > Memtable cell count: 1958 > Memtable data size: 8 > Memtable off heap memory used: 0 > Memtable switch count: 7484 > Local read count: 604412 > Local read latency: 45.421 ms > Local write count: 39097 > Local write latency: 0.337 ms > Pending flushes: 0 > Bloom filter false positives: 0 > Bloom filter false ratio: 0.0 > Bloom filter space used: 96 > Bloom filter off heap memory used: 88 > Index summary off heap memory used: 35 > Compression metadata off heap memory used: 264 > Compacted partition minimum bytes: 1955667 > Compacted partition maximum bytes: 2346799 > Compacted partition mean bytes: 2346799 > Average live cells per slice (last five minutes): 1963.0632242863855 > Maximum live cells per slice (last five minutes): 5001.0 > Average tombstones per slice (last five minutes): 0.0 > Maximum tombstones per slice (last five minutes): 0.0 > > *NETSTATS:* > Mode: NORMAL > Not sending any streams. > Read Repair Statistics: > Attempted: 2853996 > Mismatch (Blocking): 67386 > Mismatch (Background): 9233 > Pool NameActive Pending Completed > Commandsn/a 0 33953165 > Responses n/a 0 370301 > > *IOSTAT* > avg-cpu: %user %nice %system %iowait %steal %idle > 15.200.830.560.100.04 83.27 > > Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn > xvda 2.79 0.4769.86 553719 82619304 > xvdb 14.49 3.39 775.564009600 917227536 > xvdc 15.13 2.98 819.933522250 969708944 > dm-0 49.67 6.36 1595.497525858 1886936320
Re: Cassandra Collections performance issue
If the overwrites are per map key there are no tombstones generated; only if the whole map is re-imaged are tombstones created, and prior to 3.0 this indeed can be major problem if done frequently. Prior to 3.0 collections also forbid certain optimisations to cell comparisons, and as a result can yield appreciable performance decline when they're added to a table. Unfortunately dropping the collection won't resolve the performance degradation, as its prior presence continues to haunt the table. To restore performance you will need to recreate your table without the collection column and reinsert your data. Or upgrade to 3.0. On 9 February 2016 at 16:39, daemeon reiydelle wrote: > I think the key to your problem might be around "we overwrite every > value". You are creating a large number of tombstones, forcing many reads > to pull current results. You would do well to rethink why you are having to > to overwrite values all the time under the same key. You would be better to > figure out haw to add values under a key then age off the old values. I > would say that (at least at scale) you have a classic anti-pattern in play. > > > *...* > > > > *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 > <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872 > <%28%2B44%29%20%280%29%2020%208144%209872>* > > On Mon, Feb 8, 2016 at 5:23 PM, Robert Coli wrote: > >> On Mon, Feb 8, 2016 at 2:10 PM, Agrawal, Pratik >> wrote: >> >>> Recently we added one of the table fields from as Map in >>> *Cassandra >>> 2.1.11*. Currently we read every field from Map and overwrite map >>> values. Map is of size 3. We saw that writes are 30-40% slower while reads >>> are 70-80% slower. Please find below some metrics that can help. >>> >>> My question is, Are there any known issues in Cassandra map >>> performance? As I understand it each of the CQL3 Map entry, maps to a >>> column in cassandra, with that assumption we are just creating 3 columns >>> right? Any insight on this issue would be helpful. >>> >> >> I have previously heard reports along similar lines, but in the other >> direction. >> >> eg - "I moved from a collection to a TEXT column with JSON in it, and my >> reads and writes both became much faster!" >> >> I'm not sure if the issue has been raised as an Apache Cassandra Jira, >> iow if it is a known and expected limitation as opposed to just a >> performance issue. >> >> If I were you, I would consider filing a repro case as a Jira ticket, and >> responding to this thread with its URL. :D >> >> =Rob >> >> > >
Re: Cassandra Collections performance issue
I think the key to your problem might be around "we overwrite every value". You are creating a large number of tombstones, forcing many reads to pull current results. You would do well to rethink why you are having to to overwrite values all the time under the same key. You would be better to figure out haw to add values under a key then age off the old values. I would say that (at least at scale) you have a classic anti-pattern in play. *...* *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198London (+44) (0) 20 8144 9872* On Mon, Feb 8, 2016 at 5:23 PM, Robert Coli wrote: > On Mon, Feb 8, 2016 at 2:10 PM, Agrawal, Pratik > wrote: > >> Recently we added one of the table fields from as Map in >> *Cassandra >> 2.1.11*. Currently we read every field from Map and overwrite map >> values. Map is of size 3. We saw that writes are 30-40% slower while reads >> are 70-80% slower. Please find below some metrics that can help. >> >> My question is, Are there any known issues in Cassandra map performance? >> As I understand it each of the CQL3 Map entry, maps to a column in >> cassandra, with that assumption we are just creating 3 columns right? Any >> insight on this issue would be helpful. >> > > I have previously heard reports along similar lines, but in the other > direction. > > eg - "I moved from a collection to a TEXT column with JSON in it, and my > reads and writes both became much faster!" > > I'm not sure if the issue has been raised as an Apache Cassandra Jira, iow > if it is a known and expected limitation as opposed to just a performance > issue. > > If I were you, I would consider filing a repro case as a Jira ticket, and > responding to this thread with its URL. :D > > =Rob > >
Re: Cassandra Collections performance issue
On Mon, Feb 8, 2016 at 2:10 PM, Agrawal, Pratik wrote: > Recently we added one of the table fields from as Map in > *Cassandra > 2.1.11*. Currently we read every field from Map and overwrite map values. > Map is of size 3. We saw that writes are 30-40% slower while reads are > 70-80% slower. Please find below some metrics that can help. > > My question is, Are there any known issues in Cassandra map performance? > As I understand it each of the CQL3 Map entry, maps to a column in > cassandra, with that assumption we are just creating 3 columns right? Any > insight on this issue would be helpful. > I have previously heard reports along similar lines, but in the other direction. eg - "I moved from a collection to a TEXT column with JSON in it, and my reads and writes both became much faster!" I'm not sure if the issue has been raised as an Apache Cassandra Jira, iow if it is a known and expected limitation as opposed to just a performance issue. If I were you, I would consider filing a repro case as a Jira ticket, and responding to this thread with its URL. :D =Rob
Re: Cassandra Collections performance issue
Hello all, Recently we added one of the table fields from as Map in Cassandra 2.1.11. Currently we read every field from Map and overwrite map values. Map is of size 3. We saw that writes are 30-40% slower while reads are 70-80% slower. Please find below some metrics that can help. My question is, Are there any known issues in Cassandra map performance? As I understand it each of the CQL3 Map entry, maps to a column in cassandra, with that assumption we are just creating 3 columns right? Any insight on this issue would be helpful. Datastax Java Driver 2.1.6. Machine: Amazon C3 2x large CPU – pretty much same as before (around 30%) Memory – max around 4.8 GB CFSTATS: Keyspace: Keyspace Read Count: 28359044 Read Latency: 2.847392469259542 ms. Write Count: 1152765 Write Latency: 0.14778018590085576 ms. Pending Flushes: 0 Table: table1 SSTable count: 1 SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0] Space used (live): 4119699 Space used (total): 4119699 Space used by snapshots (total): 90323640 Off heap memory used (total): 2278 SSTable Compression Ratio: 0.23172161124142604 Number of keys (estimate): 14 Memtable cell count: 6437 Memtable data size: 872912 Memtable off heap memory used: 0 Memtable switch count: 7626 Local read count: 27754634 Local read latency: 1.921 ms Local write count: 1113668 Local write latency: 0.142 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.0 Bloom filter space used: 96 Bloom filter off heap memory used: 88 Index summary off heap memory used: 46 Compression metadata off heap memory used: 2144 Compacted partition minimum bytes: 315853 Compacted partition maximum bytes: 4055269 Compacted partition mean bytes: 2444011 Average live cells per slice (last five minutes): 17.536775249005437 Maximum live cells per slice (last five minutes): 1225.0 Average tombstones per slice (last five minutes): 34.99979575985972 Maximum tombstones per slice (last five minutes): 3430.0 Table: table2 SSTable count: 1 SSTables in each level: [1, 0, 0, 0, 0, 0, 0, 0, 0] Space used (live): 869900 Space used (total): 869900 Space used by snapshots (total): 17279824 Off heap memory used (total): 387 SSTable Compression Ratio: 0.3999013540551859 Number of keys (estimate): 2 Memtable cell count: 1958 Memtable data size: 8 Memtable off heap memory used: 0 Memtable switch count: 7484 Local read count: 604412 Local read latency: 45.421 ms Local write count: 39097 Local write latency: 0.337 ms Pending flushes: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.0 Bloom filter space used: 96 Bloom filter off heap memory used: 88 Index summary off heap memory used: 35 Compression metadata off heap memory used: 264 Compacted partition minimum bytes: 1955667 Compacted partition maximum bytes: 2346799 Compacted partition mean bytes: 2346799 Average live cells per slice (last five minutes): 1963.0632242863855 Maximum live cells per slice (last five minutes): 5001.0 Average tombstones per slice (last five minutes): 0.0 Maximum tombstones per slice (last five minutes): 0.0 NETSTATS: Mode: NORMAL Not sending any streams. Read Repair Statistics: Attempted: 2853996 Mismatch (Blocking): 67386 Mismatch (Background): 9233 Pool NameActive Pending Completed Commandsn/a 0 33953165 Responses n/a 0 370301 IOSTAT avg-cpu: %user %nice %system %iowait %steal %idle 15.200.830.560.100.04 83.27 Device:tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn xvda 2.79 0.4769.86 553719 82619304 xvdb 14.49 3.39 775.564009600 917227536 xvdc 15.13 2.98 819.933522250 969708944 dm-0 49.67 6.36 1595.497525858 1886936320 TPSTAT: Pool NameActive Pending Completed Blocked All time blocked MutationStage 0 01199683 0 0 ReadStage 0 0 28449207 0 0 RequestResponseStage 0 0 33983356 0 0 ReadRepairStage 0 02865749 0 0 CounterMutationStage 0 0 0 0 0 MiscStage 0 0 0 0 0 HintedHandoff 0 0 2 0 0 GossipStage 0 0 270364 0 0 CacheCleanupExecutor 0 0 0 0 0 InternalResponseStage 0 0 0 0 0 CommitLogArchiver 0 0 0 0 0 CompactionExecutor