I don't know what your notation really means. I'm very much unclear on why you can't use the filter method for 1. If you're talking about splitting/bucketing rather filtering as such I think that is a specific lacuna in spark's Api.
I've generally found the join api to be entirely adequate for my needs, so I don't really have a comment on 2. On Thursday, April 28, 2016, <ioannis.deligian...@nomura.com> wrote: > One example pattern we have it doing joins or filters based on two > datasets. E.g. > > 1 Filter –multiple- RddB for a given set extracted from RddA > (keyword here is multiple times) > > a. RddA -> keyBy -> distinct -> collect() to Set A; > > b. RddB -> Filter using Set A; > > 2 “Join” using composition on executor (again multiple times) > > a. RddA -> filter by XYZ -> keyBy join attribute -> collectAsMap > ->Broadcast MapA; > > b. RddB -> map (Broadcast<Map<K,V>> MapA; > > > > The first use case might not be that common, but joining a large RDD with > a small (reference) RDD is quite common and much faster than using “join” > method. > > > > > > *From:* Marcin Tustin [mailto:mtus...@handybook.com > <javascript:_e(%7B%7D,'cvml','mtus...@handybook.com');>] > *Sent:* 28 April 2016 12:08 > *To:* Deligiannis, Ioannis (UK) > *Cc:* dev@spark.apache.org > <javascript:_e(%7B%7D,'cvml','dev@spark.apache.org');> > *Subject:* Re: RDD.broadcast > > > > Why would you ever need to do this? I'm genuinely curious. I view collects > as being solely for interactive work. > > On Thursday, April 28, 2016, <ioannis.deligian...@nomura.com > <javascript:_e(%7B%7D,'cvml','ioannis.deligian...@nomura.com');>> wrote: > > Hi, > > > > It is a common pattern to process an RDD, collect (typically a subset) to > the driver and then broadcast back. > > > > Adding an RDD method that can do that using the torrent broadcast > mechanics would be much more efficient. In addition, it would not require > the Driver to also utilize its Heap holding this broadcast. > > > > I guess this can become complicated if the resulting broadcast is required > to keep lineage information, but assuming a torrent distribution, once the > broadcast is synced then lineage would not be required. I’d also expect the > call to rdd.brodcast to be an action that eagerly distributes the broadcast > and returns when the operation has succeeded. > > > > Is this something that could be implemented or are there any reasons that > prohibits this? > > > > Thanks > > Ioannis > > > > This e-mail (including any attachments) is private and confidential, may > contain proprietary or privileged information and is intended for the named > recipient(s) only. Unintended recipients are strictly prohibited from > taking action on the basis of information in this e-mail and must contact > the sender immediately, delete this e-mail (and all attachments) and > destroy any hard copies. Nomura will not accept responsibility or liability > for the accuracy or completeness of, or the presence of any virus or > disabling code in, this e-mail. If verification is sought please request a > hard copy. Any reference to the terms of executed transactions should be > treated as preliminary only and subject to formal written confirmation by > Nomura. Nomura reserves the right to retain, monitor and intercept e-mail > communications through its networks (subject to and in accordance with > applicable laws). No confidentiality or privilege is waived or lost by > Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is > a reference to any entity in the Nomura Holdings, Inc. group. Please read > our Electronic Communications Legal Notice which forms part of this e-mail: > http://www.Nomura.com/email_disclaimer.htm > <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.Nomura.com_email-5Fdisclaimer.htm&d=CwMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=GAA5LZhuKEWXxozKzXPhWAYY4BSTpcXaf2lFg5JSPB0&s=SLnOgTBJ2zAlhtvjcFRXfqUArds-4HSAZCgFXLgMCVY&e=> > > > > Want to work at Handy? Check out our culture deck and open roles > <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.handy.com_careers&d=CwMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=GAA5LZhuKEWXxozKzXPhWAYY4BSTpcXaf2lFg5JSPB0&s=WgDnCrSGv_qt66f2cabjugmMGU46gc5rSkt_gm7lEkQ&e=> > > Latest news > <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.handy.com_press&d=CwMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=GAA5LZhuKEWXxozKzXPhWAYY4BSTpcXaf2lFg5JSPB0&s=rfQxr8cDwVFK7Mql1_HdnvqAmXeiOHZgnjNtKXGn_Kg&e=> > at > Handy > > Handy just raised $50m > <https://urldefense.proofpoint.com/v2/url?u=http-3A__venturebeat.com_2015_11_02_on-2Ddemand-2Dhome-2Dservice-2Dhandy-2Draises-2D50m-2Din-2Dround-2Dled-2Dby-2Dfidelity_&d=CwMFaQ&c=dCBwIlVXJsYZrY6gpNt0LA&r=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A&m=GAA5LZhuKEWXxozKzXPhWAYY4BSTpcXaf2lFg5JSPB0&s=RbQTDcalISb9w2WMxzXmRgR1mr7QiCaqpD2bLAkt-z4&e=> > led > by Fidelity > > > > [image: Image removed by sender.] > > This e-mail (including any attachments) is private and confidential, may > contain proprietary or privileged information and is intended for the named > recipient(s) only. Unintended recipients are strictly prohibited from > taking action on the basis of information in this e-mail and must contact > the sender immediately, delete this e-mail (and all attachments) and > destroy any hard copies. Nomura will not accept responsibility or liability > for the accuracy or completeness of, or the presence of any virus or > disabling code in, this e-mail. If verification is sought please request a > hard copy. Any reference to the terms of executed transactions should be > treated as preliminary only and subject to formal written confirmation by > Nomura. Nomura reserves the right to retain, monitor and intercept e-mail > communications through its networks (subject to and in accordance with > applicable laws). No confidentiality or privilege is waived or lost by > Nomura by any mistransmission of this e-mail. Any reference to "Nomura" is > a reference to any entity in the Nomura Holdings, Inc. group. Please read > our Electronic Communications Legal Notice which forms part of this e-mail: > http://www.Nomura.com/email_disclaimer.htm > -- Want to work at Handy? Check out our culture deck and open roles <http://www.handy.com/careers> Latest news <http://www.handy.com/press> at Handy Handy just raised $50m <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> led by Fidelity