If KM is kilometre then you must replace val distance = atan2(sqrt(a), sqrt (-a + 1)) * 2 * 6371 to val distance = atan2(sqrt(a), sqrt(-a + 1)) * 2 * 12742
Have a look at this gnist Spherical distance calcualtion based on latitude and longitude with Apache Spark <https://gist.github.com/pavlov99/bd265be244f8a84e291e96c5656ceb5c> tir. 7. jun. 2022 kl. 19:39 skrev Chetan Khatri <chetan.opensou...@gmail.com >: > Hi Dear Spark Users, > > It has been many years that I have worked on Spark, Please help me. Thanks > much > > I have different cities and their co-ordinates in DataFrame[Row], I want > to find distance in KMs and then show only those records /cities which are > 10 KMs far. > > I have a function created that can find the distance in KMs given two > co-coordinates. But I don't know how to apply it to rows, like one to many > and calculate the distance. > > Some code that I wrote, Sorry for the basic code. > > lass HouseMatching { > def main(args: Array[String]): Unit = { > > val search_property_id = args(0) > > // list of columns where the condition should be exact match > val groupOneCriteria = List( > "occupied_by_tenant", > "water_index", > "electricity_index", > "elevator_index", > "heating_index", > "nb_bathtubs", > "nb_showers", > "nb_wc", > "nb_rooms", > "nb_kitchens" > ) > // list of columns where the condition should be matching 80% > val groupTwoCriteria = List( > "area", > "home_condition", > "building_age" > ) > // list of columns where the condition should be found using Euclidean > distance > val groupThreeCriteria = List( > "postal_code" > ) > > val region_or_city = "region" > > def haversineDistance(destination_latitude: Column, > destination_longitude: Column, origin_latitude: Column, > origin_longitude: Column): Column = { > val a = pow(sin(radians(destination_latitude - origin_latitude) / 2), > 2) + > cos(radians(origin_latitude)) * cos(radians(destination_latitude)) * > pow(sin(radians(destination_longitude - origin_longitude) / 2), 2) > val distance = atan2(sqrt(a), sqrt(-a + 1)) * 2 * 6371 > distance > } > > val spark = SparkSession.builder().appName("real-estate-property-matcher") > .getOrCreate() > > val housingDataDF = > spark.read.csv("~/Downloads/real-estate-sample-data.csv") > > // searching for the property by `ref_id` > val searchPropertyDF = housingDataDF.filter(col("ref_id") === > search_property_id) > > // Similar house in the same city (same postal code) and group one > condition > val similarHouseAndSameCity = housingDataDF.join(searchPropertyDF, > groupThreeCriteria ++ groupOneCriteria, > "inner") > > // Similar house not in the same city but 10km range > > -- Bjørn Jørgensen Vestre Aspehaug 4, 6010 Ålesund Norge +47 480 94 297