Hi All, I have a 5GB CSV dataset having 69 columns..I need to find the count of distinct values in each column. What is the optimized way to find the same using spark scala?
Example CSV format : a,b,c,d a,c,b,a b,b,c,d b,b,c,a c,b,b,a Output expecting : (a,2),(b,2),(c,1) #- First column distinct count (b,4),(c,1) #- Second column distinct count (c,3),(b,2) #- Third column distinct count (d,2),(a,3) #- Fourth column distinct count Thanks in Advance