Reasoning in files (vs datasets as i first thought of this question), I think this is more adequate in Spark:
> org.apache.spark.util.Utils.getFileLength(new File("filePath"),null); it will yield same result as > new File("filePath").length(); Le dim. 19 juin 2022 à 11:11, Enrico Minack <i...@enrico.minack.dev> a écrit : > Maybe a > > .as[String].mapPartitions(it => if (it.hasNext) Iterator(it.next) else > Iterator.empty) > > might be faster than the > > .distinct.as[String] > > > Enrico > > > Am 19.06.22 um 08:59 schrieb Enrico Minack: > > Given you already know your input files (input_file_name), why not getting > their size and summing this up? > > import java.io.Fileimport java.net.URIimport > org.apache.spark.sql.functions.input_file_name > ds.select(input_file_name.as("filename")) > .distinct.as[String] > .map(filename => new File(new URI(filename).getPath).length) > .select(sum($"value")) > .show() > > > Enrico > > > Am 19.06.22 um 03:16 schrieb Yong Walt: > > import java.io.Fileval someFile = new File("somefile.txt")val fileSize = > someFile.length > > This one? > > > On Sun, Jun 19, 2022 at 4:33 AM mbreuer <msbre...@gmail.com> wrote: > >> Hello Community, >> >> I am working on optimizations for file sizes and number of files. In the >> data frame there is a function input_file_name which returns the file >> name. I miss a counterpart to get the size of the file. Just the size, >> like "ls -l" returns. Is there something like that? >> >> Kind regards, >> Markus >> >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> > >