drop() function is in scala,an attribute of Array,no in spark
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-ArrayIndexOutofBoundsException-tp15639p28127.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-- Forwarded message --
From: Liquan Pei liquan...@gmail.com
Date: Thu, Oct 2, 2014 at 3:42 PM
Subject: Re: Spark SQL: ArrayIndexOutofBoundsException
To: SK skrishna...@gmail.com
There is only one place you use index 1. One possible issue is that your
may have only one element
Thanks for the help. Yes, I did not realize that the first header line has a
different separator.
By the way, is there a way to drop the first line that contains the header?
Something along the following lines:
sc.textFile(inp_file)
.drop(1) // or tail() to drop the header
You can do filter with startswith ?
On Thu, Oct 2, 2014 at 4:04 PM, SK skrishna...@gmail.com wrote:
Thanks for the help. Yes, I did not realize that the first header line has
a
different separator.
By the way, is there a way to drop the first line that contains the header?
Something along
This is hard to do in general, but you can get what you are asking for by
putting the following class in scope.
implicit class BetterRDD[A: scala.reflect.ClassTag](rdd:
org.apache.spark.rdd.RDD[A]) {
def dropOne = rdd.mapPartitionsWithIndex((i, iter) = if(i == 0
iter.hasNext) { iter.next; iter