The bug is likely in your data. Do you have lines in your input file that do not contain the "\t" character? If so .split will only return a single element and p(1) from the .map() is going to throw java.lang. ArrayIndexOutOfBoundsException: 1
On Thu, Oct 2, 2014 at 3:35 PM, SK <skrishna...@gmail.com> wrote: > Hi, > > I am trying to extract the number of distinct users from a file using Spark > SQL, but I am getting the following error: > > > ERROR Executor: Exception in task 1.0 in stage 8.0 (TID 15) > java.lang.ArrayIndexOutOfBoundsException: 1 > > > I am following the code in examples/sql/RDDRelation.scala. My code is as > follows. The error is appearing when it executes the SQL statement. I am > new > to Spark SQL. I would like to know how I can fix this issue. > > thanks for your help. > > > val sql_cxt = new SQLContext(sc) > import sql_cxt._ > > // read the data using th e schema and create a schema RDD > val tusers = sc.textFile(inp_file) > .map(_.split("\t")) > .map(p => TUser(p(0), p(1).trim.toInt)) > > // register the RDD as a table > tusers.registerTempTable("tusers") > > // get the number of unique users > val unique_count = sql_cxt.sql("SELECT COUNT (DISTINCT userid) FROM > tusers").collect().head.getLong(0) > > println(unique_count) > > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-ArrayIndexOutofBoundsException-tp15639.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >