Hi,
Conceptually I can understand below spark joins, when it comes to implementation I don’t find much information in Google. Please help me with code/pseudo code for below joins using java-spark or scala-spark. *Replication Join:* Given two datasets, where one is small enough to fit into the memory, perform a Replicated join using Spark. Note: Need a program to justify this fits for Replication Join. *Semi-Join:* Given a huge dataset, do a semi-join using spark. Note that, with semi-join, one dataset needs to do Filter and projection to fit into the cache. Note: Need a program to justify this fits for Semi-Join. *Composite Join:* Given a dataset whereby a dataset is still too big after filtering and cannot fit into the memory. Perform composite join on a pre-sorted and pre-partitioned data using spark. Note: Need a program to justify this fits for composite Join. *Repartition join:* Join two datasets by performing Repartition join in spark. Note: Need a program to justify this fits for repartition Join. Thanks, Aakash.