Re: Benchmarks for Many-to-Many Joins

2021-04-22 Thread Mohamadreza Rostami
What kind of benchmark do you need to take? I mean, you want to benchmark Spark many to many joins, or you want to benchmark another aspect of spark or cluster? (such as network or disk) If you want only to take a many-to-many join, you can use cross join or repartitioning the data with another

Benchmarks for Many-to-Many Joins

2021-04-21 Thread Dhruv Kumar
Hi I wanted to ask if anyone knows any datasets or benchmarks which I can use for evaluating many-to-many joins (as depicted in the attached snapshot). I looked at TPC-H and TPC-DS benchmarks but surprisingly, they mostly have one-to-many