RE: Custom paritioning of DSTream
You can use "transform" which yields RDDs from the DStream as on each of the RDDs you can then apply partitionBy - transform also returns another DSTream while foreach doesn't Btw what do you mean re "foreach killing the performance by not distributing the workload" - every function (provided it is not Action) applied to an RDD within foreach is distributed across the cluster since it gets applied to an RDD From: davidkl [via Apache Spark User List] [mailto:ml-node+s1001560n22630...@n3.nabble.com] Sent: Thursday, April 23, 2015 10:13 AM To: Evo Eftimov Subject: Re: Custom paritioning of DSTream Hello Evo, Ranjitiyer, I am also looking for the same thing. Using foreach is not useful for me as processing the RDD as a whole won't be distributed across workers and that would kill performance in my application :-/ Let me know if you find a solution for this. Regards _ If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/Custom-paritioning-of-DS Tream-tp22574p22630.html To unsubscribe from Custom paritioning of DSTream, click here <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jt p?macro=unsubscribe_by_code&node=22574&code=ZXZvLmVmdGltb3ZAaXNlY2MuY29tfDIy NTc0fDY0MDQ0NDg5Ng==> . <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jt p?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml. namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.vi ew.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemai l.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aem ail.naml> NAML -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Custom-paritioning-of-DSTream-tp22574p22631.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: Custom paritioning of DSTream
Hello Evo, Ranjitiyer, I am also looking for the same thing. Using foreach is not useful for me as processing the RDD as a whole won't be distributed across workers and that would kill performance in my application :-/ Let me know if you find a solution for this. Regards -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Custom-paritioning-of-DSTream-tp22574p22630.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Custom paritioning of DSTream
I think DStream.transform is the one that you are looking for. Thanks Best Regards On Mon, Apr 20, 2015 at 9:42 PM, Evo Eftimov wrote: > Is the only way to implement a custom partitioning of DStream via the > foreach > approach so to gain access to the actual RDDs comprising the DSTReam and > hence their paritionBy method > > DSTReam has only a "repartition" method accepting only the number of > partitions, BUT not the method of partitioning > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Custom-paritioning-of-DSTream-tp22574.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Custom paritioning of DSTream
Is the only way to implement a custom partitioning of DStream via the foreach approach so to gain access to the actual RDDs comprising the DSTReam and hence their paritionBy method DSTReam has only a "repartition" method accepting only the number of partitions, BUT not the method of partitioning -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Custom-paritioning-of-DSTream-tp22574.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org