Re: ArrayIndexOutOfBoundsException when reading bzip2 files

2014-06-09 Thread Akhil Das
Can you paste the piece of code!?

Thanks
Best Regards


On Mon, Jun 9, 2014 at 5:24 PM, MEETHU MATHEW meethu2...@yahoo.co.in
wrote:

 Hi,
 I am getting ArrayIndexOutOfBoundsException while reading from bz2 files
  in HDFS.I have come across the same issue in JIRA at
 https://issues.apache.org/jira/browse/SPARK-1861, but it seems to be
 resolved. I have tried the workaround suggested(SPARK_WORKER_CORES=1),but
 its still showing error.What may be the possible reason that I am getting
 the same error again?
 I am using Spark1.0.0 with hadoop 1.2.1.
 java.lang.ArrayIndexOutOfBoundsException: 90
 at
 org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.getAndMoveToFrontDecode(CBZip2InputStream.java:897)
 at
 org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock(CBZip2InputStream.java:499)
 at
 org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.changeStateToProcessABlock(CBZip2InputStream.java:330)
 at
 org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.read(CBZip2InputStream.java:394)
 at
 org.apache.hadoop.io.compress.BZip2Codec$BZip2CompressionInputStream.read(BZip2Codec.java:422)
 at java.io.InputStream.read(InputStream.java:101)
 at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:205)
 at org.apache.hadoop.util.LineReader.readLine(LineReader.java:169)
 at
 org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:176)
 at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:43)
 at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:198)
 at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:181)
 at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
 at
 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at
 org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:303)
 at
 org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:200)
 at
 org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:175)
 at
 org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:175)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160)
 at
 org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:174)

 Thanks  Regards,
 Meethu M



Re: ArrayIndexOutOfBoundsException when reading bzip2 files

2014-06-09 Thread MEETHU MATHEW
Hi Akhil,
Plz find the code below.
 x = sc.textFile(hdfs:///**)
 x = x.filter(lambda z:z.split(,)[0]!=' ')
 x = x.filter(lambda z:z.split(,)[3]!=' ')
 z = x.reduce(add)
 
Thanks  Regards, 
Meethu M


On Monday, 9 June 2014 5:52 PM, Akhil Das ak...@sigmoidanalytics.com wrote:
 


Can you paste the piece of code!?


Thanks
Best Regards


On Mon, Jun 9, 2014 at 5:24 PM, MEETHU MATHEW meethu2...@yahoo.co.in wrote:

Hi,
I am getting ArrayIndexOutOfBoundsException while reading from bz2 files  in 
HDFS.I have come across the same issue in JIRA at 
https://issues.apache.org/jira/browse/SPARK-1861, but it seems to be resolved. 
I have tried the workaround suggested(SPARK_WORKER_CORES=1),but its still 
showing error.What may be the possible reason that I am getting the same error 
again?
I am using Spark1.0.0 with hadoop 1.2.1.
java.lang.ArrayIndexOutOfBoundsException: 90
at 
org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.getAndMoveToFrontDecode(CBZip2InputStream.java:897)
at 
org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock(CBZip2InputStream.java:499)
at 
org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.changeStateToProcessABlock(CBZip2InputStream.java:330)
at 
org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.read(CBZip2InputStream.java:394)
at 
org.apache.hadoop.io.compress.BZip2Codec$BZip2CompressionInputStream.read(BZip2Codec.java:422)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:205)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:169)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:176)
at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:43)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:198)
at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:181)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:303)
at 
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:200)
at 
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:175)
at 
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:175)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160)
at org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:174)
 

Thanks  Regards, 
Meethu M

Re: ArrayIndexOutOfBoundsException when reading bzip2 files

2014-06-09 Thread Sean Owen
Have a search online / at the Spark JIRA. This was a known upstream
bug in Hadoop.

https://issues.apache.org/jira/browse/SPARK-1861

On Mon, Jun 9, 2014 at 7:54 AM, MEETHU MATHEW meethu2...@yahoo.co.in wrote:
 Hi,
 I am getting ArrayIndexOutOfBoundsException while reading from bz2 files  in
 HDFS.I have come across the same issue in JIRA at
 https://issues.apache.org/jira/browse/SPARK-1861, but it seems to be
 resolved. I have tried the workaround suggested(SPARK_WORKER_CORES=1),but
 its still showing error.What may be the possible reason that I am getting
 the same error again?
 I am using Spark1.0.0 with hadoop 1.2.1.
 java.lang.ArrayIndexOutOfBoundsException: 90
 at
 org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.getAndMoveToFrontDecode(CBZip2InputStream.java:897)
 at
 org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock(CBZip2InputStream.java:499)
 at
 org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.changeStateToProcessABlock(CBZip2InputStream.java:330)
 at
 org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.read(CBZip2InputStream.java:394)
 at
 org.apache.hadoop.io.compress.BZip2Codec$BZip2CompressionInputStream.read(BZip2Codec.java:422)
 at java.io.InputStream.read(InputStream.java:101)
 at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:205)
 at org.apache.hadoop.util.LineReader.readLine(LineReader.java:169)
 at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:176)
 at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:43)
 at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:198)
 at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:181)
 at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
 at
 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at
 org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:303)
 at
 org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:200)
 at
 org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:175)
 at
 org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:175)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160)
 at
 org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:174)

 Thanks  Regards,
 Meethu M


Re: ArrayIndexOutOfBoundsException when reading bzip2 files

2014-06-09 Thread MEETHU MATHEW
Hi Sean,

Thank you for the fast response.
 
Thanks  Regards, 
Meethu M


On Monday, 9 June 2014 6:04 PM, Sean Owen so...@cloudera.com wrote:
 


Have a search online / at the Spark JIRA. This was a known upstream
bug in Hadoop.

https://issues.apache.org/jira/browse/SPARK-1861


On Mon, Jun 9, 2014 at 7:54 AM, MEETHU MATHEW meethu2...@yahoo.co.in wrote:
 Hi,
 I am getting ArrayIndexOutOfBoundsException while reading from bz2 files  in
 HDFS.I have come across the same issue in JIRA at
 https://issues.apache.org/jira/browse/SPARK-1861, but it seems to be
 resolved. I have tried the workaround suggested(SPARK_WORKER_CORES=1),but
 its still showing error.What may be the possible reason that I am getting
 the same error again?
 I am using Spark1.0.0 with hadoop 1.2.1.
 java.lang.ArrayIndexOutOfBoundsException: 90
 at
 org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.getAndMoveToFrontDecode(CBZip2InputStream.java:897)
 at
 org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.initBlock(CBZip2InputStream.java:499)
 at
 org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.changeStateToProcessABlock(CBZip2InputStream.java:330)
 at
 org.apache.hadoop.io.compress.bzip2.CBZip2InputStream.read(CBZip2InputStream.java:394)
 at
 org.apache.hadoop.io.compress.BZip2Codec$BZip2CompressionInputStream.read(BZip2Codec.java:422)
 at java.io.InputStream.read(InputStream.java:101)
 at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:205)
 at org.apache.hadoop.util.LineReader.readLine(LineReader.java:169)
 at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:176)
 at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:43)
 at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:198)
 at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:181)
 at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
 at
 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:350)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at
 org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:303)
 at
 org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:200)
 at
 org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:175)
 at
 org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:175)
 at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1160)
 at
 org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:174)

 Thanks  Regards,
 Meethu M

Re: ArrayIndexOutOfBoundsException when reading bzip2 files

2014-06-09 Thread sam
Any idea when they will release it?  Also I'm uncertain what we will need to
do to fix the shell? Will we have to reinstall spark? or reinstall hadoop?
(i'm not a devops so maybe this question sounds silly)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ArrayIndexOutOfBoundsException-when-reading-bzip2-files-tp7237p7263.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.