[jira] [Commented] (CASSANDRA-4718) More-efficient ExecutorService for improved throughput

2014-05-14 Thread Lior Golan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997497#comment-13997497
 ] 

Lior Golan commented on CASSANDRA-4718:
---

But there are use cases where the full working set is memory resident or close 
to that. Improving performance in these use cases would reduce the need for 
caching in front of Cassandra

 More-efficient ExecutorService for improved throughput
 --

 Key: CASSANDRA-4718
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4718
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jonathan Ellis
Assignee: Benedict
Priority: Minor
  Labels: performance
 Fix For: 2.1.0

 Attachments: 4718-v1.patch, PerThreadQueue.java, aws.svg, 
 aws_read.svg, backpressure-stress.out.txt, baq vs trunk.png, 
 belliotsmith_branches-stress.out.txt, jason_read.svg, jason_read_latency.svg, 
 jason_write.svg, op costs of various queues.ods, stress op rate with various 
 queues.ods, v1-stress.out


 Currently all our execution stages dequeue tasks one at a time.  This can 
 result in contention between producers and consumers (although we do our best 
 to minimize this by using LinkedBlockingQueue).
 One approach to mitigating this would be to make consumer threads do more 
 work in bulk instead of just one task per dequeue.  (Producer threads tend 
 to be single-task oriented by nature, so I don't see an equivalent 
 opportunity there.)
 BlockingQueue has a drainTo(collection, int) method that would be perfect for 
 this.  However, no ExecutorService in the jdk supports using drainTo, nor 
 could I google one.
 What I would like to do here is create just such a beast and wire it into (at 
 least) the write and read stages.  (Other possible candidates for such an 
 optimization, such as the CommitLog and OutboundTCPConnection, are not 
 ExecutorService-based and will need to be one-offs.)
 AbstractExecutorService may be useful.  The implementations of 
 ICommitLogExecutorService may also be useful. (Despite the name these are not 
 actual ExecutorServices, although they share the most important properties of 
 one.)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5862) Switch to adler checksum for sstables

2013-08-08 Thread Lior Golan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733784#comment-13733784
 ] 

Lior Golan commented on CASSANDRA-5862:
---

Are you computing the checksum on the data before or after compression? If you 
are computing the checksum of the uncompressed data, then it might be useful to 
switch to checksum the compressed data (less data to checksum)

 Switch to adler checksum for sstables
 -

 Key: CASSANDRA-5862
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5862
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: T Jake Luciani
 Fix For: 2.0.1


 Adler is significantly faster than CRC32: 
 http://java-performance.info/java-crc32-and-adler32/
 (Adler is weaker for short inputs, so we should leave the commitlog alone, as 
 it checksums each mutation individually.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5862) Switch to adler checksum for sstables

2013-08-08 Thread Lior Golan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13733814#comment-13733814
 ] 

Lior Golan commented on CASSANDRA-5862:
---

What's the typical size of a compressed chunk (after compression)? If it's 
above a few 100 bytes then Adler should be ok. In addition, compressed data has 
higher entropy which improves the quality of the adler32 checksum

 Switch to adler checksum for sstables
 -

 Key: CASSANDRA-5862
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5862
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: T Jake Luciani
 Fix For: 2.0.1


 Adler is significantly faster than CRC32: 
 http://java-performance.info/java-crc32-and-adler32/
 (Adler is weaker for short inputs, so we should leave the commitlog alone, as 
 it checksums each mutation individually.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5014) Convert timeout from milli second to micro second.

2013-06-26 Thread Lior Golan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694091#comment-13694091
 ] 

Lior Golan commented on CASSANDRA-5014:
---

Why not keep the timeouts in millisecond (in the configuration) but make them 
double, and then convert them to microseconds internally. This way you keep 
compatibility with old timeout values while allowing users to set microsecond 
level timeouts if they wish

 Convert timeout from milli second to micro second.
 --

 Key: CASSANDRA-5014
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5014
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Affects Versions: 2.0 beta 1
Reporter: Vijay
Assignee: Vijay
Priority: Trivial
 Fix For: 2.0 beta 1

 Attachments: 0001-CASSANDRA-5014.patch


 Convert all the timeouts to microseconds.
 Jonathan's comment from CASSANDRA-4705
 {quote}
 millis may be too coarse a grain here, especially for Custom settings. 
 Currently an in-memory read will typically be under 2ms and it's quite 
 possible we can get that down to 1 if we can purge some of the latency 
 between stages. Might as well use micros since Timer gives it to us for free
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-3647) Support set and map value types in CQL

2012-06-12 Thread Lior Golan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293579#comment-13293579
 ] 

Lior Golan commented on CASSANDRA-3647:
---

Talking about Hive - a question about how you envision Lists/Sets/Maps and Hive 
integration: Will it be possible to perform a hive query that joins against 
any/all values in a List/Set/Map?

For example let's say I have the following column families:

1. Users CF - with row key = user id and a groups column for the Set of 
groups the user belongs to
2. Groups CF - with row key = group id and a name column for group name

And let's say I want to have a query for the number of users per group (name). 
In a relational database this would be supported by factoring the relationship 
between users and groups to a 3rd table (users_groups), and performing an 
inner join between groups and users_groups, grouping by groups.name.

How will this be supported in Hive (over Cassandra) if the mapping between 
users and groups is stored as a single Set column in the users CF?

 Support set and map value types in CQL
 --

 Key: CASSANDRA-3647
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3647
 Project: Cassandra
  Issue Type: New Feature
  Components: API, Core
Reporter: Jonathan Ellis
Assignee: Sylvain Lebresne
  Labels: cql
 Fix For: 1.2


 Composite columns introduce the ability to have arbitrarily nested data in a 
 Cassandra row.  We should expose this through CQL.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

2011-08-05 Thread Lior Golan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079934#comment-13079934
 ] 

Lior Golan commented on CASSANDRA-1717:
---

Seems like in terms of overhead (which based on HADOOP-6148 is potentially very 
significant in both storage and CPU) - block level checksums is much better.

I understand you believe block level checksums are easy in the compressed case 
but to not easy in the non-compressed case. So can't you just implement a no-op 
compression option that will utilize what you're doing for compression in terms 
of block structure and block level checksums. That would be easy if you already 
designed for the compression algorithm to be plugable. And if the compression 
algorithm is not plugable yet - adding that would have an obvious side benefit 
besides an easier implementation of block level checksums.   

 Cassandra cannot detect corrupt-but-readable column data
 

 Key: CASSANDRA-1717
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
 Fix For: 1.0

 Attachments: checksums.txt


 Most corruptions of on-disk data due to bitrot render the column (or row) 
 unreadable, so the data can be replaced by read repair or anti-entropy.  But 
 if the corruption keeps column data readable we do not detect it, and if it 
 corrupts to a higher timestamp value can even resist being overwritten by 
 newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

2011-08-05 Thread Lior Golan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079934#comment-13079934
 ] 

Lior Golan edited comment on CASSANDRA-1717 at 8/5/11 12:31 PM:


Seems like in terms of overhead (which based on HADOOP-6148 is potentially very 
significant in both storage and CPU) - block level checksums is much better.

I understand you believe block level checksums are easy in the compressed case 
but not easy in the non-compressed case.

So can't you just implement a no-op compression option that will utilize what 
you're doing / planning to do for compression in terms of block structure and 
block level checksums?
That would be easy if you already designed the compression algorithm to be 
plugable. And if the compression algorithm is not plugable yet - adding that 
would have an obvious side benefit besides having easier implementation of 
block level checksums.   

  was (Author: liorgo2):
Seems like in terms of overhead (which based on HADOOP-6148 is potentially 
very significant in both storage and CPU) - block level checksums is much 
better.

I understand you believe block level checksums are easy in the compressed case 
but to not easy in the non-compressed case. So can't you just implement a no-op 
compression option that will utilize what you're doing for compression in terms 
of block structure and block level checksums. That would be easy if you already 
designed for the compression algorithm to be plugable. And if the compression 
algorithm is not plugable yet - adding that would have an obvious side benefit 
besides an easier implementation of block level checksums.   
  
 Cassandra cannot detect corrupt-but-readable column data
 

 Key: CASSANDRA-1717
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
 Fix For: 1.0

 Attachments: checksums.txt


 Most corruptions of on-disk data due to bitrot render the column (or row) 
 unreadable, so the data can be replaced by read repair or anti-entropy.  But 
 if the corruption keeps column data readable we do not detect it, and if it 
 corrupts to a higher timestamp value can even resist being overwritten by 
 newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-05 Thread Lior Golan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079936#comment-13079936
 ] 

Lior Golan commented on CASSANDRA-2034:
---

Writes are fast, but you also have network latency for communicating with all 
the nodes. What would happen in the worst-of-N case for multi-datacenter 
deployments?

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, CASSANDRA-2034-trunk-v2.patch, 
 CASSANDRA-2034-trunk-v3.patch, CASSANDRA-2034-trunk-v4.patch, 
 CASSANDRA-2034-trunk-v5.patch, CASSANDRA-2034-trunk-v6.patch, 
 CASSANDRA-2034-trunk-v7.patch, CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data

2011-08-05 Thread Lior Golan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080035#comment-13080035
 ] 

Lior Golan commented on CASSANDRA-1717:
---

If you're afraid of people getting confused with compression options that have 
nothing with compression, why not give it a more generic name like encoding 
options. e.g. encoding options = (snappy-with-checksum, checksum-only, none)

 Cassandra cannot detect corrupt-but-readable column data
 

 Key: CASSANDRA-1717
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
 Fix For: 1.0

 Attachments: checksums.txt


 Most corruptions of on-disk data due to bitrot render the column (or row) 
 unreadable, so the data can be replaced by read repair or anti-entropy.  But 
 if the corruption keeps column data readable we do not detect it, and if it 
 corrupts to a higher timestamp value can even resist being overwritten by 
 newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira