[jira] [Commented] (CASSANDRA-8757) IndexSummaryBuilder should construct itself offheap, and share memory between the result of each build() invocation

2015-02-12 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318575#comment-14318575
 ] 

Benedict commented on CASSANDRA-8757:
-

Patch available 
[here|github.com/belliottsmith/cassandra/tree/8757-offheapsummarybuilder]

The approach is pretty straight forward in principle: we split the offheap 
memory for the summary into two allocations, the summary offsets and the 
summary entries - the latter composed of the key and its offset in the index 
file. The offsets index from zero now, instead of from the end of the offsets 
themselves, and so to maintain compatibility we do not change the serialization 
format, on read/write we simply subtract/add the necessary offset. This split 
permits us to have a separate chunk of memory for each that we can append to in 
the writer, so that a prefix of both can be used to open a summary before we've 
finished writing. This permits us to share memory between all early instances 
of a table.

> IndexSummaryBuilder should construct itself offheap, and share memory between 
> the result of each build() invocation
> ---
>
> Key: CASSANDRA-8757
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8757
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 2.1.4
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8757) IndexSummaryBuilder should construct itself offheap, and share memory between the result of each build() invocation

2015-02-17 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324492#comment-14324492
 ] 

Benedict commented on CASSANDRA-8757:
-

Just to explain why I consider this a priority for 2.1, if you have users with 
very large STCS compactions, we can have some fairly pathological behaviour. 
Let's say our target file is 500Gb, and 20% of the data is the partition key. 
This means the summary will be approximately 800Mb, assuming defaults. If we 
re-open the result every 50Mb (default behaviour) we will allocate a total of 
4Tb of memory for summaries over the duration of the compaction. Not all of 
this will be used at once; ideally, in fact, we would only ever have maybe 
1.6Gb allocated. But there is no guarantee, and longer running operations like 
compactions could retain copies of multiple different instances indefinitely, 
so we could see several Gb of summary floating around in this pathological 
case. If there is a reticence to introduce this into 2.1, another option might 
be to either disable early reopening entirely for very large files, or to open 
far less frequently, say at even intervals of sqrt(N) where N is the expected 
end size, or at logarthmically further apart intervals. But the advantage of 
reopening vanishes if we do this, so we may as well just not do it for such 
files without this patch.

> IndexSummaryBuilder should construct itself offheap, and share memory between 
> the result of each build() invocation
> ---
>
> Key: CASSANDRA-8757
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8757
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 2.1.4
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8757) IndexSummaryBuilder should construct itself offheap, and share memory between the result of each build() invocation

2015-02-20 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329186#comment-14329186
 ] 

T Jake Luciani commented on CASSANDRA-8757:
---

Looking at this in conjunction with CASSANDRA-8689.  There is some issue with 
this patch, all tests are hanging on the SchemaLoader

{code}
"main" prio=10 tid=0x7f9ab000f000 nid=0x11d5 waiting on condition 
[0x7f9ab7447000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0xecb941f8> (a 
java.util.concurrent.FutureTask)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:425)
at java.util.concurrent.FutureTask.get(FutureTask.java:187)
at 
org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:398)
at 
org.apache.cassandra.service.MigrationManager.announce(MigrationManager.java:375)
at 
org.apache.cassandra.service.MigrationManager.announceNewKeyspace(MigrationManager.java:231)
at 
org.apache.cassandra.service.MigrationManager.announceNewKeyspace(MigrationManager.java:220)
at 
org.apache.cassandra.service.MigrationManager.announceNewKeyspace(MigrationManager.java:215)
at org.apache.cassandra.SchemaLoader.loadSchema(SchemaLoader.java:65)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at org.junit.runners.ParentRunner.run(ParentRunner.java:220)
at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:532)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1179)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1030)
{code}

> IndexSummaryBuilder should construct itself offheap, and share memory between 
> the result of each build() invocation
> ---
>
> Key: CASSANDRA-8757
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8757
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 2.1.4
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8757) IndexSummaryBuilder should construct itself offheap, and share memory between the result of each build() invocation

2015-02-21 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14330143#comment-14330143
 ] 

Benedict commented on CASSANDRA-8757:
-

Broken by the rebase, since 8792's behaviour was changed. The 
IndexSummaryBuilder calculates a maxExpectedEntries count of zero, which was 
perfectly safe when we accepted zero length allocations. The new version of 
8792 does not support this, so we have to ensure the maxExpectedEntries is at 
least 1, so we do not allocate a zero length region of memory. Pushed an update.

> IndexSummaryBuilder should construct itself offheap, and share memory between 
> the result of each build() invocation
> ---
>
> Key: CASSANDRA-8757
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8757
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 2.1.4
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8757) IndexSummaryBuilder should construct itself offheap, and share memory between the result of each build() invocation

2015-02-26 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339391#comment-14339391
 ] 

Ariel Weisberg commented on CASSANDRA-8757:
---

I didn't understand the offset business so the comment probably doesn't provide 
the right context. After agonizing for a while I figured out what it meant. It 
could be lack of sleep.

It might be clearer if it described the mismatch between in memory and on disk 
(in a way I grok). The in-memory representation is a set of offsets into a 
separate zero indexed array while the disk based representation is a set of 
offsets to entries appended after the offsets section so every offset needs to 
be recalculated.

I think "serialization point" didn't parse for me as being the point in the 
file after the offsets.
{quote}
because we serialize/deserialize in native
+// int/long format,
{quote}
And that doesn't seem to be the cause of this mess. It's not the native 
int/long formatness of it. It's that the offsets are into array and the two 
have to be flattened into one file.

SSTableReader line 747 random semi-colon, IndexSummaryBuilder line 216 extra 
semi-colon.

SafeMemoryWriter has no unit test.

Otherwise I am +1

> IndexSummaryBuilder should construct itself offheap, and share memory between 
> the result of each build() invocation
> ---
>
> Key: CASSANDRA-8757
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8757
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 2.1.4
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8757) IndexSummaryBuilder should construct itself offheap, and share memory between the result of each build() invocation

2015-03-03 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14345066#comment-14345066
 ] 

Benedict commented on CASSANDRA-8757:
-

OK, I've pushed a new version to the repository that improves the comments and 
integrates SafeMemoryWriter with DataOutputTest (also slightly changing the 
behaviour of SafeMemoryWriter to support this, but in a way that is probably 
generally sensible anyway)

> IndexSummaryBuilder should construct itself offheap, and share memory between 
> the result of each build() invocation
> ---
>
> Key: CASSANDRA-8757
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8757
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 2.1.4
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8757) IndexSummaryBuilder should construct itself offheap, and share memory between the result of each build() invocation

2015-03-04 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347070#comment-14347070
 ] 

Benedict commented on CASSANDRA-8757:
-

Got ahead of myself and thought I'd had the final +1 for this, so I've already 
committed. If you could still check the final changes to confirm you're ok with 
them and I don't need to rollback, that would be appreciated.

> IndexSummaryBuilder should construct itself offheap, and share memory between 
> the result of each build() invocation
> ---
>
> Key: CASSANDRA-8757
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8757
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 2.1.4
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8757) IndexSummaryBuilder should construct itself offheap, and share memory between the result of each build() invocation

2015-03-09 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353187#comment-14353187
 ] 

Ariel Weisberg commented on CASSANDRA-8757:
---

+1 The new comment makes sense to me although it might be because I already 
know what is going on.

> IndexSummaryBuilder should construct itself offheap, and share memory between 
> the result of each build() invocation
> ---
>
> Key: CASSANDRA-8757
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8757
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Benedict
> Fix For: 2.1.4
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)