So, apparently `wholeTextFiles` runs the job again, passing null as argument 
list, which in turn blows up my argument parsing mechanics. I never thought I 
had to check for null again in a pure Scala environment ;)

On 26.10.2014, at 11:57, Marius Soutier <mps....@gmail.com> wrote:

> I tried that already, same exception. I also tried using an accumulator to 
> collect all filenames. The filename is not the problem.
> 
> Even this crashes with the same exception:
> 
> sc.parallelize(files.value).map { fileName =>
>      println(s"Scanning $fileName")
>      try {
>        println(s"Scanning $fileName")
>        sc.textFile(fileName).take(1)
>        s"Successfully scanned $fileName"
>      } catch {
>        case t: Throwable => s"Failed to process $fileName, reason 
> ${t.getStackTrace.head}"
>      }
>    }
>    .saveAsTextFile(output)
> 
> The output file contains “Failed to process…" for each file.
> 
> 
> On 26.10.2014, at 00:10, Buttler, David <buttl...@llnl.gov> wrote:
> 
>> This sounds like expected behavior to me.  The foreach call should be 
>> distributed on the workers.  perhaps you want to use map instead, and then 
>> collect the failed file names locally, or save the whole thing out to a file
>> ________________________________________
>> From: Marius Soutier [mps....@gmail.com]
>> Sent: Friday, October 24, 2014 6:35 AM
>> To: user@spark.apache.org
>> Subject: scala.collection.mutable.ArrayOps$ofRef$.length$extension since 
>> Spark 1.1.0
>> 
>> Hi,
>> 
>> I’m running a job whose simple task it is to find files that cannot be read 
>> (sometimes our gz files are corrupted).
>> 
>> With 1.0.x, this worked perfectly. Since 1.1.0 however, I’m getting an 
>> exception: 
>> scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:114)
>> 
>>   sc.wholeTextFiles(input)
>>     .foreach { case (fileName, _) =>
>>       try {
>>         println(s"Scanning $fileName")
>>         sc.textFile(fileName).take(1)
>>         println(s"Successfully scanned $fileName")
>>       } catch {
>>         case t: Throwable => println(s"Failed to process $fileName, reason 
>> ${t.getStackTrace.head}")
>>       }
>>     }
>> 
>> 
>> Also since 1.1.0, the printlns are no longer visible on the console, only in 
>> the Spark UI worker output.
>> 
>> 
>> Thanks for any help
>> - Marius
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to