Re: Fwd: Spark SQL: ArrayIndexOutofBoundsException

2016-11-24 Thread cossy
drop() function is in scala,an attribute of Array,no in spark



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-ArrayIndexOutofBoundsException-tp15639p28127.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Fwd: Spark SQL: ArrayIndexOutofBoundsException

2014-10-02 Thread Liquan Pei
-- Forwarded message --
From: Liquan Pei liquan...@gmail.com
Date: Thu, Oct 2, 2014 at 3:42 PM
Subject: Re: Spark SQL: ArrayIndexOutofBoundsException
To: SK skrishna...@gmail.com


There is only one place you use index 1. One possible issue is that your
may have only one element after your split by \t.

Can you try to run the following code to make sure every line has at least
two elements?

val tusers = sc.textFile(inp_file)
   .map(_.split(\t))
   .filter( x = x.length  2)
   .count()
It should return non zero values if your data contains a line with less
than two values

Liquan

On Thu, Oct 2, 2014 at 3:35 PM, SK skrishna...@gmail.com wrote:

 Hi,

 I am trying to extract the number of distinct users from a file using Spark
 SQL, but  I am getting  the following error:


 ERROR Executor: Exception in task 1.0 in stage 8.0 (TID 15)
 java.lang.ArrayIndexOutOfBoundsException: 1


  I am following the code in examples/sql/RDDRelation.scala. My code is as
 follows. The error is appearing when it executes the SQL statement. I am
 new
 to  Spark SQL. I would like to know how I can fix this issue.

 thanks for your help.


  val sql_cxt = new SQLContext(sc)
  import sql_cxt._

  // read the data using th e schema and create a schema RDD
  val tusers = sc.textFile(inp_file)
.map(_.split(\t))
.map(p = TUser(p(0), p(1).trim.toInt))

  // register the RDD as a table
  tusers.registerTempTable(tusers)

  // get the number of unique users
  val unique_count = sql_cxt.sql(SELECT COUNT (DISTINCT userid) FROM
 tusers).collect().head.getLong(0)

  println(unique_count)






 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-ArrayIndexOutofBoundsException-tp15639.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




-- 
Liquan Pei
Department of Physics
University of Massachusetts Amherst



-- 
Liquan Pei
Department of Physics
University of Massachusetts Amherst


Re: Fwd: Spark SQL: ArrayIndexOutofBoundsException

2014-10-02 Thread SK
Thanks for the help. Yes, I did not realize that the first header line has a
different separator.  

By the way, is there a way to drop the first line that contains the header?
Something along the following lines:

  sc.textFile(inp_file)
  .drop(1)  // or tail() to drop the header line 
  .map  // rest of the processing 

I could not find a drop() function or take the bottom (n) elements for RDD.
Alternatively, a way to create the case class schema from the header line of
the file  and use the rest for the data would be useful - just as a
suggestion.  Currently I am just deleting this header line manually before
processing it in Spark. 


thanks





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-ArrayIndexOutofBoundsException-tp15639p15642.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Fwd: Spark SQL: ArrayIndexOutofBoundsException

2014-10-02 Thread Sunny Khatri
You can do filter with startswith ?

On Thu, Oct 2, 2014 at 4:04 PM, SK skrishna...@gmail.com wrote:

 Thanks for the help. Yes, I did not realize that the first header line has
 a
 different separator.

 By the way, is there a way to drop the first line that contains the header?
 Something along the following lines:

   sc.textFile(inp_file)
   .drop(1)  // or tail() to drop the header line
   .map  // rest of the processing

 I could not find a drop() function or take the bottom (n) elements for RDD.
 Alternatively, a way to create the case class schema from the header line
 of
 the file  and use the rest for the data would be useful - just as a
 suggestion.  Currently I am just deleting this header line manually before
 processing it in Spark.


 thanks





 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-ArrayIndexOutofBoundsException-tp15639p15642.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Fwd: Spark SQL: ArrayIndexOutofBoundsException

2014-10-02 Thread Michael Armbrust
This is hard to do in general, but you can get what you are asking for by
putting the following class in scope.

implicit class BetterRDD[A: scala.reflect.ClassTag](rdd:
org.apache.spark.rdd.RDD[A]) {
  def dropOne = rdd.mapPartitionsWithIndex((i, iter) = if(i == 0 
iter.hasNext) { iter.next; iter } else iter)
}

On Thu, Oct 2, 2014 at 4:06 PM, Sunny Khatri sunny.k...@gmail.com wrote:

 You can do filter with startswith ?

 On Thu, Oct 2, 2014 at 4:04 PM, SK skrishna...@gmail.com wrote:

 Thanks for the help. Yes, I did not realize that the first header line
 has a
 different separator.

 By the way, is there a way to drop the first line that contains the
 header?
 Something along the following lines:

   sc.textFile(inp_file)
   .drop(1)  // or tail() to drop the header line
   .map  // rest of the processing

 I could not find a drop() function or take the bottom (n) elements for
 RDD.
 Alternatively, a way to create the case class schema from the header line
 of
 the file  and use the rest for the data would be useful - just as a
 suggestion.  Currently I am just deleting this header line manually before
 processing it in Spark.


 thanks





 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-ArrayIndexOutofBoundsException-tp15639p15642.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org