Hello!

I am pretty sure that I am asking something which has been already asked lots 
of times. However, I cannot find the question in the mailing list archive.

The question is - I need to check whether dataframe is empty or not. I receive 
a dataframe from 3rd party library and this dataframe can be potentially empty, 
but also can be really huge - millions of rows. Thus, I want to avoid of doing 
some logic in case the dataframe is empty. How can I efficiently check it?

Right now I am doing it in the following way:

private def isEmpty(df: Option[DataFrame]): Boolean = {
  df.isEmpty || (df.isDefined && df.get.limit(1).rdd.isEmpty())
}

But the performance is really slow for big dataframes. I would be grateful for 
any suggestions.

Thank you in advance.


Best regards,

Artem

________________________________
********************** IMPORTANT--PLEASE READ ************************ This 
electronic message, including its attachments, is CONFIDENTIAL and may contain 
PROPRIETARY or LEGALLY PRIVILEGED or PROTECTED information and is intended for 
the authorized recipient of the sender. If you are not the intended recipient, 
you are hereby notified that any use, disclosure, copying, or distribution of 
this message or any of the information included in it is unauthorized and 
strictly prohibited. If you have received this message in error, please 
immediately notify the sender by reply e-mail and permanently delete this 
message and its attachments, along with any copies thereof, from all locations 
received (e.g., computer, mobile device, etc.). Thank you. 
************************************************************************

Reply via email to