[GitHub] spark pull request #18775: [SPARK-8577][SparkR] Eliminate needless synchroni...

2017-07-29 Thread SereneAnt
GitHub user SereneAnt opened a pull request:

https://github.com/apache/spark/pull/18775

[SPARK-8577][SparkR] Eliminate needless synchronization in java-R 
serialization

## What changes were proposed in this pull request?
Remove surplus synchronized blocks.

## How was this patch tested?
Unit tests run OK.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SereneAnt/spark 
eliminate_unnecessary_synchronization_in_java-R_serialization

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18775.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18775


commit eafad7d0cdf73c9fab9a03f2898039de7b127bc7
Author: iurii.ant <serene...@gmail.com>
Date:   2017-07-30T05:04:06Z

[SPARK-8577][SparkR] Eliminate needless synchronization in java-R 
serialization




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18693: [SPARK-21491][GraphX] Enhance GraphX performance:...

2017-07-20 Thread SereneAnt
Github user SereneAnt commented on a diff in the pull request:

https://github.com/apache/spark/pull/18693#discussion_r128637486
  
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala 
---
@@ -199,10 +199,10 @@ object PageRank extends Logging {
 require(sources.max <= Int.MaxValue.toLong,
   s"This implementation currently only works for source vertex ids at 
most ${Int.MaxValue}")
 val zero = Vectors.sparse(sources.size, List()).asBreeze
-val sourcesInitMap = sources.zipWithIndex.map { case (vid, i) =>
+val sourcesInitMap: Map[VertexId, BV[Double]] = 
sources.zipWithIndex.map { case (vid, i) =>
   val v = Vectors.sparse(sources.size, Array(i), Array(1.0)).asBreeze
   (vid, v)
-}.toMap
+}(collection.breakOut)
--- End diff --

Optimization nerds have already used it in the spark code:

https://github.com/apache/spark/blob/c4008480b781379ac0451b9220300d83c054c60d/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L517


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18693: [SPARK-21491][GraphX] Enhance GraphX performance:...

2017-07-20 Thread SereneAnt
Github user SereneAnt commented on a diff in the pull request:

https://github.com/apache/spark/pull/18693#discussion_r128635631
  
--- Diff: graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala 
---
@@ -199,10 +199,10 @@ object PageRank extends Logging {
 require(sources.max <= Int.MaxValue.toLong,
   s"This implementation currently only works for source vertex ids at 
most ${Int.MaxValue}")
 val zero = Vectors.sparse(sources.size, List()).asBreeze
-val sourcesInitMap = sources.zipWithIndex.map { case (vid, i) =>
+val sourcesInitMap: Map[VertexId, BV[Double]] = 
sources.zipWithIndex.map { case (vid, i) =>
   val v = Vectors.sparse(sources.size, Array(i), Array(1.0)).asBreeze
   (vid, v)
-}.toMap
+}(collection.breakOut)
--- End diff --

The principles are the same, 
`sources.zipWithIndex.map {...}' allocates a collection of tuples, `.toMap` 
then uterates and converts them into the map (hehe).
breakOut is the implementation of CanBuildFrom, the implicit parameter 
passed to `Traversable.map` method. Once used, it allows populating newborn map 
directly, without intermediate collection of tuples. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18693: [SPARK-21491][GraphX] Enhance GraphX performance:...

2017-07-20 Thread SereneAnt
GitHub user SereneAnt opened a pull request:

https://github.com/apache/spark/pull/18693

[SPARK-21491][GraphX] Enhance GraphX performance: breakOut instead of .toMap

## What changes were proposed in this pull request?

`Traversable.toMap` changed to 'collections.breakOut', that eliminates 
intermediate tuple collection creation, see [Stack Overflow 
article](https://stackoverflow.com/questions/1715681/scala-2-8-breakout).

## How was this patch tested?
Unit tests run.
No performance tests performed yet.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/SereneAnt/spark performance_toMap-breakOut

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18693.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18693


commit d7b73b544116314439a536ac944d66a8d7ee1113
Author: iurii.ant <serene...@gmail.com>
Date:   2017-07-20T21:03:43Z

[SPARK-21491][GraphX] Enhance GraphX performance: eliminate intermediate 
collections creation with breakOut




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org