I tried that already, same exception. I also tried using an accumulator to collect all filenames. The filename is not the problem.
Even this crashes with the same exception: sc.parallelize(files.value).map { fileName => println(s"Scanning $fileName") try { println(s"Scanning $fileName") sc.textFile(fileName).take(1) s"Successfully scanned $fileName" } catch { case t: Throwable => s"Failed to process $fileName, reason ${t.getStackTrace.head}" } } .saveAsTextFile(output) The output file contains “Failed to process…" for each file. On 26.10.2014, at 00:10, Buttler, David <buttl...@llnl.gov> wrote: > This sounds like expected behavior to me. The foreach call should be > distributed on the workers. perhaps you want to use map instead, and then > collect the failed file names locally, or save the whole thing out to a file > ________________________________________ > From: Marius Soutier [mps....@gmail.com] > Sent: Friday, October 24, 2014 6:35 AM > To: user@spark.apache.org > Subject: scala.collection.mutable.ArrayOps$ofRef$.length$extension since > Spark 1.1.0 > > Hi, > > I’m running a job whose simple task it is to find files that cannot be read > (sometimes our gz files are corrupted). > > With 1.0.x, this worked perfectly. Since 1.1.0 however, I’m getting an > exception: > scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:114) > > sc.wholeTextFiles(input) > .foreach { case (fileName, _) => > try { > println(s"Scanning $fileName") > sc.textFile(fileName).take(1) > println(s"Successfully scanned $fileName") > } catch { > case t: Throwable => println(s"Failed to process $fileName, reason > ${t.getStackTrace.head}") > } > } > > > Also since 1.1.0, the printlns are no longer visible on the console, only in > the Spark UI worker output. > > > Thanks for any help > - Marius > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org