Re: finding distinct count using dataframe

Arunkumar Pillai Tue, 05 Jan 2016 02:56:22 -0800

Thanks Yanbo,

Thanks for the help. But I'm not able to find countDistinct ot
approxCountDistinct. function. These functions are within dataframe or any
other package


On Tue, Jan 5, 2016 at 3:24 PM, Yanbo Liang <yblia...@gmail.com> wrote:

> Hi Arunkumar,
>
> You can use datasetDF.select(countDistinct(col1, col2, col3, ...)) or
> approxCountDistinct for a approximate result.
>
> 2016-01-05 17:11 GMT+08:00 Arunkumar Pillai <arunkumar1...@gmail.com>:
>
>> Hi
>>
>> Is there any   functions to find distinct count of all the variables in
>> dataframe.
>>
>> val sc = new SparkContext(conf) // spark context
>> val options = Map("header" -> "true", "delimiter" -> delimiter, 
>> "inferSchema" -> "true")
>> val sqlContext = new org.apache.spark.sql.SQLContext(sc) // sql context
>> val datasetDF = 
>> sqlContext.read.format("com.databricks.spark.csv").options(options).load(inputFile)
>>
>>
>> we are able to get the schema, variable data type. is there any method to 
>> get the distinct count ?
>>
>>
>>
>> --
>> Thanks and Regards
>>         Arun
>>
>
>


-- 
Thanks and Regards
        Arun

Re: finding distinct count using dataframe

Reply via email to