[GitHub] spark pull request: [SPARK-13980][WIP] Incrementally serialize blo...

2016-03-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11791#issuecomment-198218345
  
**[Test build #53496 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53496/consoleFull)**
 for PR 11791 at commit 
[`5489748`](https://github.com/apache/spark/commit/5489748d7e8b0d4aa7a0e7331a1a4a02f65d977f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13980][WIP] Incrementally serialize blo...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11791#issuecomment-198086746
  
**[Test build #53456 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53456/consoleFull)**
 for PR 11791 at commit 
[`7dc3623`](https://github.com/apache/spark/commit/7dc362331f3f549670ecd9488db456b4136a3ad7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13980][WIP] Incrementally serialize blo...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11791#issuecomment-198218448
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53496/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13980][WIP] Incrementally serialize blo...

2016-03-19 Thread JoshRosen
GitHub user JoshRosen opened a pull request:

https://github.com/apache/spark/pull/11791

[SPARK-13980][WIP] Incrementally serialize blocks while unrolling them in 
MemoryStore

When a block is persisted in the MemoryStore at a serialized storage level, 
the current MemoryStore.putIterator() code will unroll the entire iterator as 
Java objects in memory, then will turn around and serialize an iterator 
obtained from the unrolled array. This is inefficient and doubles our peak 
memory requirements.

Instead, I think that we should incrementally serialize blocks while 
unrolling them.

A downside to incremental serialization is the fact that we will need to 
deserialize the partially-unrolled data in case there is not enough space to 
unroll the block and the block cannot be dropped to disk. However, I'm hoping 
that the memory efficiency improvements will outweigh any performance losses as 
a result of extra serialization in that hopefully-rare case.

Diff containing only this patch's changes: 
https://github.com/JoshRosen/spark/compare/chunked-block-serialization...JoshRosen:serialize-incrementally?expand=1

This patch is marked as WIP because it's rebased on top of / blocked by 
#11748.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JoshRosen/spark serialize-incrementally

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11791.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11791


commit 735eca68d8efcd150d47631644cf848b4d98603e
Author: Josh Rosen 
Date:   2016-03-15T04:57:16Z

Split MemoryEntry into two separate classes (serialized and deserialized)

commit 8f0828986b72ce722cfe0360ae863971547fc58b
Author: Josh Rosen 
Date:   2016-03-15T18:53:54Z

Add ChunkedByteBuffer and use it in storage layer.

commit 79b1a6a31236b81c444dda1e8ee1cfdf2f3c36ae
Author: Josh Rosen 
Date:   2016-03-15T20:53:27Z

Add test cases and fix bug in ChunkedByteBuffer.toInputStream()

commit 7dbcd5a9ef0c669f5db97990af944d8b63300e97
Author: Josh Rosen 
Date:   2016-03-15T22:05:23Z

WIP towards understanding destruction.

commit 3fbec212d9f714386121b4aed791d6c9fb1359a2
Author: Josh Rosen 
Date:   2016-03-15T22:39:27Z

Small fixes to dispose behavior.

commit e5e663f22094333dac6e184c78176ee658e3441e
Author: Josh Rosen 
Date:   2016-03-15T22:49:24Z

Modify BlockManager.dataSerialize to write ChunkedByteBuffers.

commit de62f0d0a5f128dd91173e73b214a3297dd203d4
Author: Josh Rosen 
Date:   2016-03-16T06:47:21Z

Merge remote-tracking branch 'origin/master' into 
chunked-block-serialization

commit 0a347fdd9ec0e94eab17eb0f33c93acd1afbdcfb
Author: Josh Rosen 
Date:   2016-03-16T06:56:02Z

Fix test compilation in streaming.

commit 6852c482a4935b992c199810f1156952f1e93a8c
Author: Josh Rosen 
Date:   2016-03-16T20:47:45Z

Merge remote-tracking branch 'origin/master' into 
chunked-block-serialization

commit 43f8fa6ae5ba093655cdbd55ca56959a7652de56
Author: Josh Rosen 
Date:   2016-03-16T20:54:55Z

Allow ChunkedByteBuffer to contain no chunks.

commit 25e68841541b45d7eedc0447cc8154d746ee8db2
Author: Josh Rosen 
Date:   2016-03-16T21:00:21Z

Document toByteBuffer() and toArray() size limitations.

commit 325c83d8909472428ae65620033fff4887c36e06
Author: Josh Rosen 
Date:   2016-03-16T21:07:42Z

Move dispose() from BlockManager to StorageUtils.

It was a static method before, but its location was confusing.

commit 4f5074ece49030a6e7134f7ece706ed441c02ee4
Author: Josh Rosen 
Date:   2016-03-16T21:11:14Z

Better documentation for dispose() methods.

commit b6ddf3ed40cc90ec94b7e4917808f8a726b597ee
Author: Josh Rosen 
Date:   2016-03-16T21:12:39Z

Rename limit to size.

commit 719ad3c4e9e942ce62cbcf288788aca785690a7e
Author: Josh Rosen 
Date:   2016-03-16T21:20:08Z

Implement missing InputStream methods.

commit 23006076dcb73095a9eaa7e2524a10c048bae646
Author: Josh Rosen 
Date:   2016-03-16T22:00:10Z

More comments.

commit 3fc0b66981aa2d45be129986f0dc5bd595e08b22
Author: Josh Rosen 
Date:   2016-03-16T22:02:42Z

Fix confusing getChunks().head

commit c747c8546ff248b8b08285e92afad2fe71875acd
Author: Josh Rosen 
Date:   2016-03-17T18:08:34Z

Merge remote-tracking branch 'origin/master' into 
chunked-block-serialization

commit 

[GitHub] spark pull request: [SPARK-13980][WIP] Incrementally serialize blo...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11791#issuecomment-198087062
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13980][WIP] Incrementally serialize blo...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11791#issuecomment-198218447
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13980][WIP] Incrementally serialize blo...

2016-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11791#issuecomment-198087068
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53456/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13980][WIP] Incrementally serialize blo...

2016-03-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11791#issuecomment-198037121
  
**[Test build #53456 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53456/consoleFull)**
 for PR 11791 at commit 
[`7dc3623`](https://github.com/apache/spark/commit/7dc362331f3f549670ecd9488db456b4136a3ad7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13980][WIP] Incrementally serialize blo...

2016-03-19 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11791#issuecomment-198648221
  
Still WIP?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13980][WIP] Incrementally serialize blo...

2016-03-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11791#issuecomment-198185758
  
**[Test build #53496 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53496/consoleFull)**
 for PR 11791 at commit 
[`5489748`](https://github.com/apache/spark/commit/5489748d7e8b0d4aa7a0e7331a1a4a02f65d977f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13980][WIP] Incrementally serialize blo...

2016-03-18 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/11791#issuecomment-198584688
  
/cc @rxin @andrewor14, this is the next most important patch to review 
towards off-heap caching. After these changes get in, we'll be able to use 
off-heap memory for the unroll memory in off-heap caching, greatly simplifying 
things. Without this change, the on-heap unroll array needs to be accounted 
properly even if the final cache destination is off-heap, making the caching 
more OOM-prone and complicating the accounting logic (since it then becomes 
different between the two modes).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org