2nd try From: Anil Dasari <adas...@guidewire.com> Date: Sunday, September 5, 2021 at 10:42 AM To: "user@spark.apache.org" <user@spark.apache.org> Subject: Spark Pair RDD write to Hive
Hello, I have a use case where users of group id are persisted to hive table. // pseudo code looks like below usersRDD = sc.parallelize(..) usersPairRDD = usersRDD.map(u => (u.groupId, u)) groupedUsers = usersPairRDD.groupByKey() Can I save groupedUsers RDD into hive tables where table name is key of groupedUsers entry ? I want to avoid below approach as it is not scalable solution where papalism is limited with driver cores – groupIds = usersRDD.map(u => u.groupId).distinct.collect.toList groupIds.par.map(id => { rdd = usersRDD.filter(u => u.groupId == id).cache // create dataframe // persist df to hive table using df.write.saveAsTable ) Could you suggest better approach ? thanks in advance. - Anil