I’ve implemented this in the suggested manner. When I build Spark and attach the new spark-core jar to my eclipse project, I am able to use the new method. In order to conduct the experiments I need to launch my app on a cluster. I am using EC2. When I setup my master and slaves using the EC2 setup scripts, it sets up spark, but I think my custom built spark-core jar is not being used. How do it up on EC2 so that my custom version of Spark-core is used?
Thanks, Raghav > On Jun 9, 2015, at 7:41 PM, DB Tsai <dbt...@dbtsai.com> wrote: > > Having the following code in RDD.scala works for me. PS, in the following > code, I merge the smaller queue into larger one. I wonder if this will help > performance. Let me know when you do the benchmark. > def treeTakeOrdered(num: Int)(implicit ord: Ordering[T]): Array[T] = > withScope { > if (num == 0) { > Array.empty > } else { > val mapRDDs = mapPartitions { items => > // Priority keeps the largest elements, so let's reverse the ordering. > val queue = new BoundedPriorityQueue[T](num)(ord.reverse) > queue ++= util.collection.Utils.takeOrdered(items, num)(ord) > Iterator.single(queue) > } > if (mapRDDs.partitions.length == 0) { > Array.empty > } else { > mapRDDs.treeReduce { (queue1, queue2) => > if (queue1.size > queue2.size) { > queue1 ++= queue2 > queue1 > } else { > queue2 ++= queue1 > queue2 > } > }.toArray.sorted(ord) > } > } > } > > def treeTop(num: Int)(implicit ord: Ordering[T]): Array[T] = withScope { > treeTakeOrdered(num)(ord.reverse) > } > > > Sincerely, > > DB Tsai > ---------------------------------------------------------- > Blog: https://www.dbtsai.com <https://www.dbtsai.com/> > PGP Key ID: 0xAF08DF8D > <https://pgp.mit.edu/pks/lookup?search=0x59DF55B8AF08DF8D> > > On Tue, Jun 9, 2015 at 10:09 AM, raggy <raghav0110...@gmail.com > <mailto:raghav0110...@gmail.com>> wrote: > I am trying to implement top-k in scala within apache spark. I am aware that > spark has a top action. But, top() uses reduce(). Instead, I would like to > use treeReduce(). I am trying to compare the performance of reduce() and > treeReduce(). > > The main issue I have is that I cannot use these 2 lines of code which are > used in the top() action within my Spark application. > > val queue = new BoundedPriorityQueue[T](num)(ord.reverse) > queue ++= util.collection.Utils.takeOrdered(items, num)(ord) > > How can I go about implementing top() using treeReduce()? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Implementing-top-using-treeReduce-tp23227.html > > <http://apache-spark-user-list.1001560.n3.nabble.com/Implementing-top-using-treeReduce-tp23227.html> > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > <mailto:user-unsubscr...@spark.apache.org> > For additional commands, e-mail: user-h...@spark.apache.org > <mailto:user-h...@spark.apache.org> > >