RE: Custom paritioning of DSTream

2015-04-23 Thread Evo Eftimov
You can use "transform" which yields RDDs from the DStream as on each of the
RDDs you can then apply partitionBy - transform also returns another DSTream
while foreach doesn't 

 

Btw what do you mean re "foreach killing the performance by not distributing
the workload"  - every function (provided it is not Action) applied to an
RDD within foreach is distributed across the cluster since it gets applied
to an RDD 

 

From: davidkl [via Apache Spark User List]
[mailto:ml-node+s1001560n22630...@n3.nabble.com] 
Sent: Thursday, April 23, 2015 10:13 AM
To: Evo Eftimov
Subject: Re: Custom paritioning of DSTream

 

Hello Evo, Ranjitiyer, 

I am also looking for the same thing. Using foreach is not useful for me as
processing the RDD as a whole won't be distributed across workers and that
would kill performance in my application :-/ 

Let me know if you find a solution for this. 

Regards 

  _  

If you reply to this email, your message will be added to the discussion
below:

http://apache-spark-user-list.1001560.n3.nabble.com/Custom-paritioning-of-DS
Tream-tp22574p22630.html 

To unsubscribe from Custom paritioning of DSTream, click here
<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jt
p?macro=unsubscribe_by_code&node=22574&code=ZXZvLmVmdGltb3ZAaXNlY2MuY29tfDIy
NTc0fDY0MDQ0NDg5Ng==> .
 
<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jt
p?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.
namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.vi
ew.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemai
l.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aem
ail.naml> NAML 





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Custom-paritioning-of-DSTream-tp22574p22631.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Custom paritioning of DSTream

2015-04-23 Thread davidkl
Hello Evo, Ranjitiyer,

I am also looking for the same thing. Using foreach is not useful for me as
processing the RDD as a whole won't be distributed across workers and that
would kill performance in my application :-/

Let me know if you find a solution for this. 

Regards



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Custom-paritioning-of-DSTream-tp22574p22630.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Custom paritioning of DSTream

2015-04-21 Thread Akhil Das
I think DStream.transform is the one that you are looking for.

Thanks
Best Regards

On Mon, Apr 20, 2015 at 9:42 PM, Evo Eftimov  wrote:

> Is the only way to implement a custom partitioning of DStream via the
> foreach
> approach so to gain access to the actual RDDs comprising the DSTReam and
> hence their paritionBy method
>
> DSTReam has only a "repartition" method accepting only the number of
> partitions, BUT not the method of partitioning
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Custom-paritioning-of-DSTream-tp22574.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Custom paritioning of DSTream

2015-04-20 Thread Evo Eftimov
Is the only way to implement a custom partitioning of DStream via the foreach
approach so to gain access to the actual RDDs comprising the DSTReam and
hence their paritionBy method 

DSTReam has only a "repartition" method accepting only the number of
partitions, BUT not the method of partitioning 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Custom-paritioning-of-DSTream-tp22574.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org