Hi All,

Word2Vec and TF-IDF algorithms in spark mllib-1.1.0 are working only in
local mode and not on distributed mode. Null pointer exception has been
thrown. Is this a bug in spark-1.1.0 ?

*Following is the code:*
  def main(args:Array[String])
  {
     val conf=new SparkConf
     val sc=new SparkContext(conf)
     val
documents=sc.textFile("hdfs://IMPETUS-DSRV02:9000/nlp/sampletext").map(_.split("
").toSeq)
     val hashingTF = new HashingTF()
     val tf= hashingTF.transform(documents)
     tf.cache()
    val idf = new IDF().fit(tf)
    val tfidf = idf.transform(tf)
     val rdd=tfidf.map { vec => println("vector is...."+vec)
                                (10)
                       }
     rdd.saveAsTextFile("/home/padma/usecase")

  }




*Exception thrown:*

15/01/06 12:36:09 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0
with 2 tasks
15/01/06 12:36:10 INFO cluster.SparkDeploySchedulerBackend: Registered
executor: Actor[akka.tcp://
sparkexecu...@impetus-dsrv05.impetus.co.in:33898/user/Executor#-1525890167]
with ID 0
15/01/06 12:36:10 INFO scheduler.TaskSetManager: Starting task 0.0 in stage
0.0 (TID 0, IMPETUS-DSRV05.impetus.co.in, NODE_LOCAL, 1408 bytes)
15/01/06 12:36:10 INFO scheduler.TaskSetManager: Starting task 1.0 in stage
0.0 (TID 1, IMPETUS-DSRV05.impetus.co.in, NODE_LOCAL, 1408 bytes)
15/01/06 12:36:10 INFO storage.BlockManagerMasterActor: Registering block
manager IMPETUS-DSRV05.impetus.co.in:35130 with 2.1 GB RAM
15/01/06 12:36:12 INFO network.ConnectionManager: Accepted connection from [
IMPETUS-DSRV05.impetus.co.in/192.168.145.195:46888]
15/01/06 12:36:12 INFO network.SendingConnection: Initiating connection to [
IMPETUS-DSRV05.impetus.co.in/192.168.145.195:35130]
15/01/06 12:36:12 INFO network.SendingConnection: Connected to [
IMPETUS-DSRV05.impetus.co.in/192.168.145.195:35130], 1 messages pending
15/01/06 12:36:12 INFO storage.BlockManagerInfo: Added broadcast_1_piece0
in memory on IMPETUS-DSRV05.impetus.co.in:35130 (size: 2.1 KB, free: 2.1 GB)
15/01/06 12:36:12 INFO storage.BlockManagerInfo: Added broadcast_0_piece0
in memory on IMPETUS-DSRV05.impetus.co.in:35130 (size: 10.1 KB, free: 2.1
GB)
15/01/06 12:36:13 INFO storage.BlockManagerInfo: Added rdd_3_1 in memory on
IMPETUS-DSRV05.impetus.co.in:35130 (size: 280.0 B, free: 2.1 GB)
15/01/06 12:36:13 INFO storage.BlockManagerInfo: Added rdd_3_0 in memory on
IMPETUS-DSRV05.impetus.co.in:35130 (size: 416.0 B, free: 2.1 GB)
15/01/06 12:36:13 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 0.0
(TID 1, IMPETUS-DSRV05.impetus.co.in): java.lang.NullPointerException:
        org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
        org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)

org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
        org.apache.spark.scheduler.Task.run(Task.scala:54)

org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        java.lang.Thread.run(Thread.java:722)


Thanks,
Padma Ch

Reply via email to