Reasoning in files (vs datasets as i first thought of this question), I
think this is more adequate in Spark:

> org.apache.spark.util.Utils.getFileLength(new File("filePath"),null);

it will yield same result as

> new File("filePath").length();


Le dim. 19 juin 2022 à 11:11, Enrico Minack <i...@enrico.minack.dev> a
écrit :

> Maybe a
>
>   .as[String].mapPartitions(it => if (it.hasNext) Iterator(it.next) else 
> Iterator.empty)
>
> might be faster than the
>
>   .distinct.as[String]
>
>
> Enrico
>
>
> Am 19.06.22 um 08:59 schrieb Enrico Minack:
>
> Given you already know your input files (input_file_name), why not getting
> their size and summing this up?
>
> import java.io.Fileimport java.net.URIimport 
> org.apache.spark.sql.functions.input_file_name
> ds.select(input_file_name.as("filename"))
>   .distinct.as[String]
>   .map(filename => new File(new URI(filename).getPath).length)
>   .select(sum($"value"))
>   .show()
>
>
> Enrico
>
>
> Am 19.06.22 um 03:16 schrieb Yong Walt:
>
> import java.io.Fileval someFile = new File("somefile.txt")val fileSize = 
> someFile.length
>
> This one?
>
>
> On Sun, Jun 19, 2022 at 4:33 AM mbreuer <msbre...@gmail.com> wrote:
>
>> Hello Community,
>>
>> I am working on optimizations for file sizes and number of files. In the
>> data frame there is a function input_file_name which returns the file
>> name. I miss a counterpart to get the size of the file. Just the size,
>> like "ls -l" returns. Is there something like that?
>>
>> Kind regards,
>> Markus
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>
>

Reply via email to