I tried that already, same exception. I also tried using an accumulator to 
collect all filenames. The filename is not the problem.

Even this crashes with the same exception:

sc.parallelize(files.value).map { fileName =>
      println(s"Scanning $fileName")
      try {
        println(s"Scanning $fileName")
        sc.textFile(fileName).take(1)
        s"Successfully scanned $fileName"
      } catch {
        case t: Throwable => s"Failed to process $fileName, reason 
${t.getStackTrace.head}"
      }
    }
    .saveAsTextFile(output)

The output file contains “Failed to process…" for each file.


On 26.10.2014, at 00:10, Buttler, David <buttl...@llnl.gov> wrote:

> This sounds like expected behavior to me.  The foreach call should be 
> distributed on the workers.  perhaps you want to use map instead, and then 
> collect the failed file names locally, or save the whole thing out to a file
> ________________________________________
> From: Marius Soutier [mps....@gmail.com]
> Sent: Friday, October 24, 2014 6:35 AM
> To: user@spark.apache.org
> Subject: scala.collection.mutable.ArrayOps$ofRef$.length$extension since 
> Spark 1.1.0
> 
> Hi,
> 
> I’m running a job whose simple task it is to find files that cannot be read 
> (sometimes our gz files are corrupted).
> 
> With 1.0.x, this worked perfectly. Since 1.1.0 however, I’m getting an 
> exception: 
> scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:114)
> 
>    sc.wholeTextFiles(input)
>      .foreach { case (fileName, _) =>
>        try {
>          println(s"Scanning $fileName")
>          sc.textFile(fileName).take(1)
>          println(s"Successfully scanned $fileName")
>        } catch {
>          case t: Throwable => println(s"Failed to process $fileName, reason 
> ${t.getStackTrace.head}")
>        }
>      }
> 
> 
> Also since 1.1.0, the printlns are no longer visible on the console, only in 
> the Spark UI worker output.
> 
> 
> Thanks for any help
> - Marius
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to