[jira] [Commented] (CASSANDRA-8371) DateTieredCompactionStrategy is always compacting

2014-12-06 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237065#comment-14237065
 ] 

Jonathan Shook commented on CASSANDRA-8371:
---

I tend to agree with Tupshin on the first point, which is to say that an 
occasional side-effect of a needed repair should be small compared to the 
over-arching benefit of having a (tunably) lower steady-state compaction load. 
My rationale, in detail is below. If I have made a mistake somewhere in this 
explanation, please correct me. 

It is true that the boundary increases geometrically, but not necessarily true 
that this means compaction load will be lower as the windows get larger. There 
is a distinction for the most recent intervals simply because that is where 
memtable flushing and DTCS meet up, with the expected variance in sstable 
sizes. I'll assume the implications of this are obvious and only focus for now 
on later compactions.

If we had ideal scheduling of later compactions, each sstable would be 
coalesced exactly once per interval size. This isn't what we will expect to see 
as a rule, but we have to start somewhere for a reasonable estimate on the 
bounds. This means that our average compaction load would tend towards a 
constant over time for each window, and that the average compaction load for 
all active interval sizes would stack linearly depending on how many windows 
were accounted for. This means that the compaction load is super-linear over 
time in the case of no max age.  Even though the stacking effect does slow down 
over time, it's merely a slowing of increased load, not the base load itself.

In contrast, given a max age and an average ingestion rate, the average 
steady-state compaction load increases as each larger interval becomes active, 
but levels out at a maximum. If the max age is low enough, then the effect can 
be significant. Considering that the load stacking effect occurs more quickly 
in recent time but less quickly as time progresses, the adjustment of max age 
closer to now() has the most visible effect. In other words, a max adjustment 
which deactivates compaction at the 4th smallest interval size will have a less 
obvious effect that one that deactivates the 3rd or 2nd.

Reducing the steady-state compaction load has significant advantages across the 
board in a well-balanced system. Testing can easily show the correlation 
between higher average compaction load and lower op rates and worsening latency 
spikes.

Requiring that the max be higher than the time it takes for a scheduled repair 
cycle would rule out these types of adjustments. As well, the boundary between 
those two settings is pretty fuzzy, considering that most automated repair 
schedules take a week or more.

There are also remedies, if you see that repairs are significantly affecting 
your larger intervals. If you want want to have it be perfectly compacted, 
(probably not that important, in all honestly) simply adjust the max age, let 
DTCS recompact the higher intervals, and then adjust it back, or not. If I were 
having a significant amount of data being repaired on a routine basis, I'd 
probably be scaling or tuning the system at that point, anyway. Repairs that 
have to stream enough data to really become a problem for larger intervals 
should be considered a bad thing-- a sign that there are other pressures in the 
system that need to be addressed. However, a limited amount of data being 
repaired, as in a healthy cluster be handled quite well enough by IntervalTree, 
BloomFilter and friends.

I'm not advocating specifically for a  low default max, but I did want to 
explain the rationale for not ruling it out as a valid choice in certain cases.








> DateTieredCompactionStrategy is always compacting 
> --
>
> Key: CASSANDRA-8371
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8371
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: mck
>Assignee: Björn Hegerfors
>  Labels: compaction, performance
> Attachments: java_gc_counts_rate-month.png, 
> read-latency-recommenders-adview.png, read-latency.png, 
> sstables-recommenders-adviews.png, sstables.png, vg2_iad-month.png
>
>
> Running 2.0.11 and having switched a table to 
> [DTCS|https://issues.apache.org/jira/browse/CASSANDRA-6602] we've seen that 
> disk IO and gc count increase, along with the number of reads happening in 
> the "compaction" hump of cfhistograms.
> Data, and generally performance, looks good, but compactions are always 
> happening, and pending compactions are building up.
> The schema for this is 
> {code}CREATE TABLE search (
>   loginid text,
>   searchid timeuuid,
>   description text,
>   searchkey text,
>   searchurl text,
>   PRIMARY KEY ((loginid), searchid)
> )

[jira] [Created] (CASSANDRA-8435) Cassandra LIST with UDT fails to create index in SOLR throws JDBC error

2014-12-06 Thread madheswaran (JIRA)
madheswaran created CASSANDRA-8435:
--

 Summary: Cassandra LIST with UDT fails to create index in SOLR 
throws JDBC error
 Key: CASSANDRA-8435
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8435
 Project: Cassandra
  Issue Type: Bug
  Components: Drivers (now out of tree)
Reporter: madheswaran
Priority: Critical


16767 [qtp297774990-12] INFO org.apache.solr.handler.dataimport.DataImporter – 
Loading DIH Configuration: dataconfigCassandra.xml
16779 [qtp297774990-12] INFO org.apache.solr.handler.dataimport.DataImporter – 
Data Configuration loaded successfully
16788 [Thread-15] INFO org.apache.solr.handler.dataimport.DataImporter – 
Starting Full Import
16789 [qtp297774990-12] INFO org.apache.solr.core.SolrCore – [Entity_dev] 
webapp=/solr path=/dataimport params=
{optimize=false&indent=true&clean=true&commit=true&verbose=false&command=full-import&debug=false&wt=json}
status=0 QTime=27
16810 [qtp297774990-12] INFO org.apache.solr.core.SolrCore – [Entity_dev] 
webapp=/solr path=/dataimport params=
{indent=true&command=status&_=1416042006354&wt=json}
status=0 QTime=0
16831 [Thread-15] INFO 
org.apache.solr.handler.dataimport.SimplePropertiesWriter – Read 
dataimport.properties
16917 [Thread-15] INFO org.apache.solr.search.SolrIndexSearcher – Opening 
Searcher@6214b0dc[Entity_dev] realtime
16945 [Thread-15] INFO org.apache.solr.handler.dataimport.JdbcDataSource – 
Creating a connection for entity Entity with URL: 
jdbc:cassandra://10.234.31.153:9160/galaxy_dev
17082 [Thread-15] INFO org.apache.solr.handler.dataimport.JdbcDataSource – Time 
taken for getConnection(): 136
17429 [Thread-15] ERROR org.apache.solr.handler.dataimport.DocBuilder – 
Exception while processing: Entity document : SolrInputDocument(fields: 
[]):org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to 
execute query: select * from entity Processing Document # 1
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:283)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:240)
at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:44)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461)
Caused by: java.lang.NullPointerException
at org.apache.cassandra.cql.jdbc.ListMaker.compose(ListMaker.java:61)
at org.apache.cassandra.cql.jdbc.TypedColumn.(TypedColumn.java:68)
at 
org.apache.cassandra.cql.jdbc.CassandraResultSet.createColumn(CassandraResultSet.java:1174)
at 
org.apache.cassandra.cql.jdbc.CassandraResultSet.populateColumns(CassandraResultSet.java:240)
at 
org.apache.cassandra.cql.jdbc.CassandraResultSet.(CassandraResultSet.java:200)
at 
org.apache.cassandra.cql.jdbc.CassandraStatement.doExecute(CassandraStatement.java:169)
at 
org.apache.cassandra.cql.jdbc.CassandraStatement.execute(CassandraStatement.java:205)
at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:276)
... 12 more


UDT:
=
 DESCRIBE TYPE fieldmap ;
CREATE TYPE galaxy_dev.fieldmap (
key text,
value text
);


Table:
=
CREATE TABLE galaxy_dev.entity (
entity_id uuid PRIMARY KEY,
begining int,
domain text,
domain_type text,
entity_template_name text,
field_values list>,
global_entity_type text,
revision_time timeuuid,
status_key int,
status_name text,
uuid timeuuid
);




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8421) Cassandra 2.1.1 UDT not returning value for LIST type as UDT

2014-12-06 Thread madheswaran (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237056#comment-14237056
 ] 

madheswaran commented on CASSANDRA-8421:


This issue is not reproduced in lesser records ( 4 or 5 records) when record 
count get increased then it behaves abnormally. 

> Cassandra 2.1.1 UDT not returning value for LIST type as UDT
> 
>
> Key: CASSANDRA-8421
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8421
> Project: Cassandra
>  Issue Type: Bug
>  Components: API
> Environment: single node cassandra 
>Reporter: madheswaran
>Assignee: Philip Thompson
> Fix For: 3.0, 2.1.3
>
>
> I using List and its data type is UDT.
> UDT:
> {code}
> CREATE TYPE
> fieldmap (
>  key text,
>  value text
> );
> {code}
> TABLE:
> {code}
> CREATE TABLE entity (
>   entity_id uuid PRIMARY KEY,
>   begining int,
>   domain text,
>   domain_type text,
>   entity_template_name text,
>   field_values list,
>   global_entity_type text,
>   revision_time timeuuid,
>   status_key int,
>   status_name text,
>   uuid timeuuid
>   ) {code}
> INDEX:
> {code}
> CREATE INDEX entity_domain_idx_1 ON galaxy_dev.entity (domain);
> CREATE INDEX entity_field_values_idx_1 ON galaxy_dev.entity (field_values);
> CREATE INDEX entity_global_entity_type_idx_1 ON galaxy_dev.entity (gen_type );
> {code}
> QUERY
> {code}
> SELECT * FROM entity WHERE status_key < 3 and field_values contains {key: 
> 'userName', value: 'Sprint5_22'} and gen_type = 'USER' and domain = 
> 'S4_1017.abc.com' allow filtering;
> {code}
> The above query return value for some row and not for many rows but those 
> rows and data's are exist.
> Observation:
> If I execute query with other than field_maps, then it returns value. I 
> suspect the problem with LIST with UDT.
> I have single node cassadra DB. Please let me know why this strange behavior 
> from cassandra.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8421) Cassandra 2.1.1 UDT not returning value for LIST type as UDT

2014-12-06 Thread madheswaran (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237054#comment-14237054
 ] 

madheswaran commented on CASSANDRA-8421:


It is typo error. I used frozen  keyword for UDT .

CREATE TYPE galaxy_dev.fieldmap (
key text,
value text
);

CREATE TABLE galaxy_dev.entity (
entity_id uuid PRIMARY KEY,
begining int,
domain text,
domain_type text,
entity_template_name text,
field_values list>,
global_entity_type text,
revision_time timeuuid,
status_key int,
status_name text,
uuid timeuuid
) 

Even I tried with latest release Cassandra 2.1.2 and the issue is reproduced. :(

Total Records inserted was 20
>>cqlsh:galaxy_dev> SELECT count(*) from entity;

 count
---
20


>>SELECT * from entity where  status_key = 3 allow filtering ; 
it returns 17 records
>>SELECT * from entity where field_values CONTAINS {key: 'state', value: 
>>'Dubai'} 
 it returns 19 records

>> SELECT * from entity where field_values CONTAINS {key: 'state', value: 
>> 'Dubai'} and status_key = 3 allow filtering;
returns only 1 record. ( but expected is 17 records ).

Sample Data which inserted:

>>cqlsh:master_db> SELECT * FROM  entity limit 1;
 entity_id| begining | domain | domain_type | 
entity_template_name | field_values 











































 

[jira] [Commented] (CASSANDRA-4987) Support more queries when ALLOW FILTERING is used.

2014-12-06 Thread Rajanarayanan Thottuvaikkatumana (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237000#comment-14237000
 ] 

Rajanarayanan Thottuvaikkatumana commented on CASSANDRA-4987:
-

[~slebresne], The changes are not as trivial as I explained above. Upon further 
testing of the changes, I have found that if the table has data, and if I have 
done only the above changes, it throws some exceptions. I need to look further 
into the places to make the appropriate changes. If you know it off of your 
head, please let me know. Otherwise I will find it out. Thanks

> Support more queries when ALLOW FILTERING is used.
> --
>
> Key: CASSANDRA-4987
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4987
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Rajanarayanan Thottuvaikkatumana
>  Labels: cql
> Fix For: 3.0
>
>
> Even after CASSANDRA-4915, there is still a bunch of queries that we don't 
> support even if {{ALLOW FILTERING}} is used. Typically, pretty much any 
> queries with restriction on a non-primary-key column unless we have one of 
> those restriction that is an EQ on an indexed column.
> If {{ALLOW FILTERING}} is used, we could allow those queries out of 
> convenience.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8429) Stress on trunk fails mixed workload on missing keys

2014-12-06 Thread Ariel Weisberg (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-8429:
--
Attachment: cluster.conf

> Stress on trunk fails mixed workload on missing keys
> 
>
> Key: CASSANDRA-8429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8429
> Project: Cassandra
>  Issue Type: Bug
> Environment: Ubuntu 14.04
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
> Attachments: cluster.conf
>
>
> Starts as part of merge commit 25be46497a8df46f05ffa102bc645bfd684ea48a
> Stress will say that a key wasn't validated because it isn't returned even 
> though it's loaded. The key will eventually appear and can be queried using 
> cqlsh.
> Reproduce with
> #!/bin/sh
> rm /tmp/fark
> ROWCOUNT=1000
> SCHEMA='-col n=fixed(1) -schema 
> compaction(strategy=LeveledCompactionStrategy) compression=LZ4Compressor'
> ./cassandra-stress write n=$ROWCOUNT -node xh61 -pop seq=1..$ROWCOUNT no-wrap 
> -rate threads=25 $SCHEMA
> ./cassandra-stress mixed "ratio(read=2)" n=1 -node xh61 -pop 
> "dist=extreme(1..$ROWCOUNT,0.6)" -rate threads=25 $SCHEMA



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8429) Stress on trunk fails mixed workload on missing keys

2014-12-06 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236997#comment-14236997
 ] 

Ariel Weisberg commented on CASSANDRA-8429:
---

Ok... bisecting off of trunk points to 9c316e7858f6dbf9df892aff78431044aa104ed9

I am ran the commit before e60a06cc866e5e85d3e58f25b98f8c048d07ad24 to make 
sure it doesn't fail. It usually fails in two tries. I have done four and it 
seems to work.

Two others have tried to reproduce the issue and not had any luck. I seem to be 
doing something different to cause it. I am using CCM to start the one node 
cluster with the attached cluster.conf. Server is on Ubuntu 14.04 on a 
dedicated quad-core desktop with SSD, client is on a quad-core laptop, network 
is gig-E.

> Stress on trunk fails mixed workload on missing keys
> 
>
> Key: CASSANDRA-8429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8429
> Project: Cassandra
>  Issue Type: Bug
> Environment: Ubuntu 14.04
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>
> Starts as part of merge commit 25be46497a8df46f05ffa102bc645bfd684ea48a
> Stress will say that a key wasn't validated because it isn't returned even 
> though it's loaded. The key will eventually appear and can be queried using 
> cqlsh.
> Reproduce with
> #!/bin/sh
> rm /tmp/fark
> ROWCOUNT=1000
> SCHEMA='-col n=fixed(1) -schema 
> compaction(strategy=LeveledCompactionStrategy) compression=LZ4Compressor'
> ./cassandra-stress write n=$ROWCOUNT -node xh61 -pop seq=1..$ROWCOUNT no-wrap 
> -rate threads=25 $SCHEMA
> ./cassandra-stress mixed "ratio(read=2)" n=1 -node xh61 -pop 
> "dist=extreme(1..$ROWCOUNT,0.6)" -rate threads=25 $SCHEMA



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-4986) Allow finer control of ALLOW FILTERING behavior

2014-12-06 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236972#comment-14236972
 ] 

Michał Michalski commented on CASSANDRA-4986:
-

>From what I understand:

LIMIT defines the maximum number of rows we want to return. If there are rows 
matching your query, they're guaranteed to be returned (up to the LIMIT), but 
it may take a long time to find them all depending on the dataset size. You 
will get correct result, but there's no guarantee on the execution time.

MAX defines the maximum number of rows we want to "iterate over" (even if none 
of them was matching your query). Even if there are rows matching your query, 
they might not be returned if it requires C* to iterate over too many (> MAX) 
rows to find them. This guarantees that the execution time of your query will 
not be worse than what it takes to iterate over MAX rows, but you might get 
inaccurate result (assuming more useful implementation, see point 1 in 
description).


> Allow finer control of ALLOW FILTERING behavior
> ---
>
> Key: CASSANDRA-4986
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4986
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Priority: Minor
> Fix For: 3.0
>
>
> CASSANDRA-4915 added {{ALLOW FILTERING}} to warn people when they do 
> potentially inefficient queries. However, as discussed in the former issue it 
> would be interesting to allow controlling that mode more precisely by 
> allowing something like:
> {noformat}
> ... ALLOW FILTERING MAX 500
> {noformat}
> whose behavior would be that the query would be short-circuited if it filters 
> (i.e. read but discard from the ResultSet) more than 500 CQL3 rows.
> There is however 2 details I'm not totally clear on:
> # what to do exactly when we reach the max filtering allowed. Do we return 
> what we have so far, but then we need to have a way to say in the result set 
> that the query was short-circuited. Or do we just throw an exception 
> TooManyFiltered (simpler but maybe a little bit less useful).
> # what about deleted records? Should we count them as 'filtered'? Imho the 
> logical thing is to not count them as filtered, since after all we "filter 
> them out" in the normal path (i.e. even when ALLOW FILTERING is not used).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-4986) Allow finer control of ALLOW FILTERING behavior

2014-12-06 Thread Rajanarayanan Thottuvaikkatumana (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236960#comment-14236960
 ] 

Rajanarayanan Thottuvaikkatumana commented on CASSANDRA-4986:
-

There is already a LIMIT keyword in the SELECT statement .Then why to introduce 
another one such as MAX? Thanks

> Allow finer control of ALLOW FILTERING behavior
> ---
>
> Key: CASSANDRA-4986
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4986
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Priority: Minor
> Fix For: 3.0
>
>
> CASSANDRA-4915 added {{ALLOW FILTERING}} to warn people when they do 
> potentially inefficient queries. However, as discussed in the former issue it 
> would be interesting to allow controlling that mode more precisely by 
> allowing something like:
> {noformat}
> ... ALLOW FILTERING MAX 500
> {noformat}
> whose behavior would be that the query would be short-circuited if it filters 
> (i.e. read but discard from the ResultSet) more than 500 CQL3 rows.
> There is however 2 details I'm not totally clear on:
> # what to do exactly when we reach the max filtering allowed. Do we return 
> what we have so far, but then we need to have a way to say in the result set 
> that the query was short-circuited. Or do we just throw an exception 
> TooManyFiltered (simpler but maybe a little bit less useful).
> # what about deleted records? Should we count them as 'filtered'? Imho the 
> logical thing is to not count them as filtered, since after all we "filter 
> them out" in the normal path (i.e. even when ALLOW FILTERING is not used).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-8414) Avoid loops over array backed iterators that call iter.remove()

2014-12-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Mårdell updated CASSANDRA-8414:
-
Attachment: cassandra-2.0-8414-1.txt

> Avoid loops over array backed iterators that call iter.remove()
> ---
>
> Key: CASSANDRA-8414
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8414
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Richard Low
>  Labels: performance
> Fix For: 2.1.3
>
> Attachments: cassandra-2.0-8414-1.txt
>
>
> I noticed from sampling that sometimes compaction spends almost all of its 
> time in iter.remove() in ColumnFamilyStore.removeDeletedStandard. It turns 
> out that the cf object is using ArrayBackedSortedColumns, so deletes are from 
> an ArrayList. If the majority of your columns are GCable tombstones then this 
> is O(n^2). The data structure should be changed or a copy made to avoid this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-8434) L0 should have a separate configurable bloom filter false positive ratio

2014-12-06 Thread Benedict (JIRA)
Benedict created CASSANDRA-8434:
---

 Summary: L0 should have a separate configurable bloom filter false 
positive ratio
 Key: CASSANDRA-8434
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8434
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Benedict
 Fix For: 2.0.12, 2.1.3


In follow up to CASSANDRA-5371. We now perform size-tiered file selection for 
compaction if L0 gets too far behind, however as far as I can tell we stick 
with the CF configured false positive ratio, likely inflating substantially the 
number of files we visit on average until L0 is cleaned up. Having a a 
different bf fp for L0 would solve this problem without introducing any 
significant burden when L0 is not overloaded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8414) Avoid loops over array backed iterators that call iter.remove()

2014-12-06 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236760#comment-14236760
 ] 

Aleksey Yeschenko commented on CASSANDRA-8414:
--

Actually Richard's issue is with 1.2 and 2.0.

I'm not sure how much of an issue in practice it really is for compaction in 
2.1, w/ only LazilyCompactedRow there, and PreCompactedRow gone.

> Avoid loops over array backed iterators that call iter.remove()
> ---
>
> Key: CASSANDRA-8414
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8414
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Richard Low
>  Labels: performance
> Fix For: 2.1.3
>
>
> I noticed from sampling that sometimes compaction spends almost all of its 
> time in iter.remove() in ColumnFamilyStore.removeDeletedStandard. It turns 
> out that the cf object is using ArrayBackedSortedColumns, so deletes are from 
> an ArrayList. If the majority of your columns are GCable tombstones then this 
> is O(n^2). The data structure should be changed or a copy made to avoid this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8400) nodetool cfstats is missing "Number of Keys (estimate)"

2014-12-06 Thread Lyuben Todorov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14236725#comment-14236725
 ] 

Lyuben Todorov commented on CASSANDRA-8400:
---

It was removed during the update of nodetool to use the then new metrics API. 
The commit is a4fc13c052fd816585bcb15793495ca74643aa2c for 
[CASSANDRA-5871|https://issues.apache.org/jira/browse/CASSANDRA-5871] Not sure 
why I didn't add it back in at the time so I'm going with an oversight. I think 
it was to do with adding the avg. key count for each percentile to cfhistograms 
that caused the confusion. 

> nodetool cfstats is missing "Number of Keys (estimate)"
> ---
>
> Key: CASSANDRA-8400
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8400
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
> Environment: C* 2.1.2
>Reporter: Sebastian Estevez
>Assignee: Lyuben Todorov
>Priority: Minor
>  Labels: tools
> Fix For: 2.1.3
>
>
> Expected result:
> :~$nodetool version
> ReleaseVersion: 2.0.11.83
> :~$ nodetool cfstats system.schema_keyspaces|grep keys
>   Table: schema_keyspaces
>   Number of keys (estimate): 384
> Result in C* 2.1:
> $ bin/nodetool version
> ReleaseVersion: 2.1.2
> $ bin/nodetool cfstats system|grep key
> Table: schema_keyspaces



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)