Thanks Yanbo,

Thanks for the help. But I'm not able to find countDistinct ot
approxCountDistinct. function. These functions are within dataframe or any
other package

On Tue, Jan 5, 2016 at 3:24 PM, Yanbo Liang <yblia...@gmail.com> wrote:

> Hi Arunkumar,
>
> You can use datasetDF.select(countDistinct(col1, col2, col3, ...)) or
> approxCountDistinct for a approximate result.
>
> 2016-01-05 17:11 GMT+08:00 Arunkumar Pillai <arunkumar1...@gmail.com>:
>
>> Hi
>>
>> Is there any   functions to find distinct count of all the variables in
>> dataframe.
>>
>> val sc = new SparkContext(conf) // spark context
>> val options = Map("header" -> "true", "delimiter" -> delimiter, 
>> "inferSchema" -> "true")
>> val sqlContext = new org.apache.spark.sql.SQLContext(sc) // sql context
>> val datasetDF = 
>> sqlContext.read.format("com.databricks.spark.csv").options(options).load(inputFile)
>>
>>
>> we are able to get the schema, variable data type. is there any method to 
>> get the distinct count ?
>>
>>
>>
>> --
>> Thanks and Regards
>>         Arun
>>
>
>


-- 
Thanks and Regards
        Arun

Reply via email to