Re: Get distinct column data from grouped data

2016-08-09 Thread Selvam Raman
my frined suggest this way val fil = sc.textFile("hdfs:///user/vijayc/data/test-spk.tx") val res =fil.map(l => l.split(",")).map(l =>( l(0),l(1))).groupByKey.map(rd =>(rd._1,rd._2.toList.distinct)) another useful function is *collect_set* in dataframe. Thanks, selvam R On Tue, Aug 9, 2016

Get distinct column data from grouped data

2016-08-09 Thread Selvam Raman
Example: sel1 test sel1 test sel1 ok sel2 ok sel2 test expected result: sel1, [test,ok] sel2,[test,ok] How to achieve the above result using spark dataframe. please suggest me. -- Selvam Raman "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"