n Owen [mailto:so...@cloudera.com]
> Sent: Wednesday, March 2, 2016 3:37 AM
> To: Dirceu Semighini Filho <dirceu.semigh...@gmail.com>
> Cc: user <user@spark.apache.org>
> Subject: Re: SparkR Count vs Take performance
>
> Yeah one surprising result is that you can't call i
fetch.
-Original Message-
From: Sean Owen [mailto:so...@cloudera.com]
Sent: Wednesday, March 2, 2016 3:37 AM
To: Dirceu Semighini Filho <dirceu.semigh...@gmail.com>
Cc: user <user@spark.apache.org>
Subject: Re: SparkR Count vs Take performance
Yeah one surprising result is th
Yeah one surprising result is that you can't call isEmpty on an RDD of
nonserializable objects. You can't do much with an RDD of
nonserializable objects anyway, but they can exist as an intermediate
stage.
We could fix that pretty easily with a little copy and paste of the
take() code; right now
Great, I didn't noticed this isEmpty method.
Well serialization is been a problem in this project, we have noticed a lot
of time been spent in serializing and deserializing things to send and get
from the cluster.
2016-03-01 15:47 GMT-03:00 Sean Owen :
> There is an "isEmpty"
There is an "isEmpty" method that basically does exactly what your
second version does.
I have seen it be unusually slow at times because it must copy 1
element to the driver, and it's possible that's slow. It still
shouldn't be slow in general, and I'd be surprised if it's slower than
a count in