Can you attach the result of eventDF.filter($"entityType" ===
"user").select("entityId").distinct.explain(true)?

Thanks,

Yin

On Thu, Nov 5, 2015 at 1:12 AM, 千成徳 <s.c...@opt.ne.jp> wrote:

> Hi All,
>
> I have data frame like this.
>
> Equality expression is not working in 1.5.1 but, works as expected in 1.4.0
> What is the difference?
>
> scala> eventDF.printSchema()
> root
>  |-- id: string (nullable = true)
>  |-- event: string (nullable = true)
>  |-- entityType: string (nullable = true)
>  |-- entityId: string (nullable = true)
>  |-- targetEntityType: string (nullable = true)
>  |-- targetEntityId: string (nullable = true)
>  |-- properties: string (nullable = true)
>
> scala> eventDF.groupBy("entityType").agg(countDistinct("entityId")).show
> +----------+------------------------+
> |entityType|COUNT(DISTINCT entityId)|
> +----------+------------------------+
> |   ib_user|                    4751|
> |      user|                    2091|
> +----------+------------------------+
>
>
> ----- not works ( Bug ? )
> scala> eventDF.filter($"entityType" ===
> "user").select("entityId").distinct.count
> res151: Long = 1219
>
> scala> eventDF.filter(eventDF("entityType") ===
> "user").select("entityId").distinct.count
> res153: Long = 1219
>
> scala> eventDF.filter($"entityType" equalTo
> "user").select("entityId").distinct.count
> res149: Long = 1219
>
> ----- works as expected
> scala> eventDF.map{ e => (e.getAs[String]("entityId"),
> e.getAs[String]("entityType")) }.filter(x => x._2 ==
> "user").map(_._1).distinct.count
> res150: Long = 2091
>
> scala> eventDF.filter($"entityType" in
> "user").select("entityId").distinct.count
> warning: there were 1 deprecation warning(s); re-run with -deprecation for
> details
> res155: Long = 2091
>
> scala> eventDF.filter($"entityType" !==
> "ib_user").select("entityId").distinct.count
> res152: Long = 2091
>
>
> But, All of above code works in 1.4.0
>
> Thanks.
>
>

Reply via email to