Spark mllib throwing error

2014-04-24 Thread John King
./spark-shell: line 153: 17654 Killed
$FWDIR/bin/spark-class org.apache.spark.repl.Main $@


Any ideas?


Re: Spark mllib throwing error

2014-04-24 Thread Xiangrui Meng
Could you share the command you used and more of the error message?
Also, is it an MLlib specific problem? -Xiangrui

On Thu, Apr 24, 2014 at 11:49 AM, John King
usedforprinting...@gmail.com wrote:
 ./spark-shell: line 153: 17654 Killed
 $FWDIR/bin/spark-class org.apache.spark.repl.Main $@


 Any ideas?


Re: Spark mllib throwing error

2014-04-24 Thread John King
Last command was:

val model = new NaiveBayes().run(points)


On Thu, Apr 24, 2014 at 4:27 PM, Xiangrui Meng men...@gmail.com wrote:

 Could you share the command you used and more of the error message?
 Also, is it an MLlib specific problem? -Xiangrui

 On Thu, Apr 24, 2014 at 11:49 AM, John King
 usedforprinting...@gmail.com wrote:
  ./spark-shell: line 153: 17654 Killed
  $FWDIR/bin/spark-class org.apache.spark.repl.Main $@
 
 
  Any ideas?



Re: Spark mllib throwing error

2014-04-24 Thread Xiangrui Meng
Do you mind sharing more code and error messages? The information you
provided is too little to identify the problem. -Xiangrui

On Thu, Apr 24, 2014 at 1:55 PM, John King usedforprinting...@gmail.com wrote:
 Last command was:

 val model = new NaiveBayes().run(points)



 On Thu, Apr 24, 2014 at 4:27 PM, Xiangrui Meng men...@gmail.com wrote:

 Could you share the command you used and more of the error message?
 Also, is it an MLlib specific problem? -Xiangrui

 On Thu, Apr 24, 2014 at 11:49 AM, John King
 usedforprinting...@gmail.com wrote:
  ./spark-shell: line 153: 17654 Killed
  $FWDIR/bin/spark-class org.apache.spark.repl.Main $@
 
 
  Any ideas?




Re: Spark mllib throwing error

2014-04-24 Thread John King
In the other thread I had an issue with Python. In this issue, I tried
switching to Scala. The code is:

*import* org.apache.spark.mllib.regression.*LabeledPoint**;*

*import org.apache.spark.mllib.linalg.SparseVector;*

*import org.apache.spark.mllib.classification.NaiveBayes;*

import scala.collection.mutable.ArrayBuffer



def isEmpty(a: String): Boolean = a != null 
!a.replaceAll((?m)\s+$, ).isEmpty()

def parsePoint(a: String): LabeledPoint = {

   val values = a.split('\t')

   val feat = values(1).split(' ')

   val indices = ArrayBuffer.empty[Int]

   val featValues = ArrayBuffer.empty[Double]

   for (f - feat) {

   val q = f.split(':')

   if (q.length == 2) {

  indices += (q(0).toInt)

  featValues += (q(1).toDouble)

   }

   }

   val vector = new SparseVector(2357815, indices.toArray,
featValues.toArray)

   return LabeledPoint(values(0).toDouble, vector)

   }


val data = sc.textFile(data.txt)

val empty = data.filter(isEmpty)

val points = empty.map(parsePoint)

points.cache()

val model = new NaiveBayes().run(points)


On Thu, Apr 24, 2014 at 6:57 PM, Xiangrui Meng men...@gmail.com wrote:

 Do you mind sharing more code and error messages? The information you
 provided is too little to identify the problem. -Xiangrui

 On Thu, Apr 24, 2014 at 1:55 PM, John King usedforprinting...@gmail.com
 wrote:
  Last command was:
 
  val model = new NaiveBayes().run(points)
 
 
 
  On Thu, Apr 24, 2014 at 4:27 PM, Xiangrui Meng men...@gmail.com wrote:
 
  Could you share the command you used and more of the error message?
  Also, is it an MLlib specific problem? -Xiangrui
 
  On Thu, Apr 24, 2014 at 11:49 AM, John King
  usedforprinting...@gmail.com wrote:
   ./spark-shell: line 153: 17654 Killed
   $FWDIR/bin/spark-class org.apache.spark.repl.Main $@
  
  
   Any ideas?
 
 



Re: Spark mllib throwing error

2014-04-24 Thread Xiangrui Meng
I don't see anything wrong with your code. Could you do points.count()
to see how many training examples you have? Also, make sure you don't
have negative feature values. The error message you sent did not say
NaiveBayes went wrong, but the Spark shell was killed. -Xiangrui

On Thu, Apr 24, 2014 at 4:05 PM, John King usedforprinting...@gmail.com wrote:
 In the other thread I had an issue with Python. In this issue, I tried
 switching to Scala. The code is:

 import org.apache.spark.mllib.regression.LabeledPoint;

 import org.apache.spark.mllib.linalg.SparseVector;

 import org.apache.spark.mllib.classification.NaiveBayes;

 import scala.collection.mutable.ArrayBuffer



 def isEmpty(a: String): Boolean = a != null  !a.replaceAll((?m)\s+$,
 ).isEmpty()

 def parsePoint(a: String): LabeledPoint = {

val values = a.split('\t')

val feat = values(1).split(' ')

val indices = ArrayBuffer.empty[Int]

val featValues = ArrayBuffer.empty[Double]

for (f - feat) {

val q = f.split(':')

if (q.length == 2) {

   indices += (q(0).toInt)

   featValues += (q(1).toDouble)

}

}

val vector = new SparseVector(2357815, indices.toArray,
 featValues.toArray)

return LabeledPoint(values(0).toDouble, vector)

}


 val data = sc.textFile(data.txt)

 val empty = data.filter(isEmpty)

 val points = empty.map(parsePoint)

 points.cache()

 val model = new NaiveBayes().run(points)



 On Thu, Apr 24, 2014 at 6:57 PM, Xiangrui Meng men...@gmail.com wrote:

 Do you mind sharing more code and error messages? The information you
 provided is too little to identify the problem. -Xiangrui

 On Thu, Apr 24, 2014 at 1:55 PM, John King usedforprinting...@gmail.com
 wrote:
  Last command was:
 
  val model = new NaiveBayes().run(points)
 
 
 
  On Thu, Apr 24, 2014 at 4:27 PM, Xiangrui Meng men...@gmail.com wrote:
 
  Could you share the command you used and more of the error message?
  Also, is it an MLlib specific problem? -Xiangrui
 
  On Thu, Apr 24, 2014 at 11:49 AM, John King
  usedforprinting...@gmail.com wrote:
   ./spark-shell: line 153: 17654 Killed
   $FWDIR/bin/spark-class org.apache.spark.repl.Main $@
  
  
   Any ideas?
 
 




Re: Spark mllib throwing error

2014-04-24 Thread John King
It just displayed this error and stopped on its own. Do the lines of code
mentioned in the error have anything to do with it?


On Thu, Apr 24, 2014 at 7:54 PM, Xiangrui Meng men...@gmail.com wrote:

 I don't see anything wrong with your code. Could you do points.count()
 to see how many training examples you have? Also, make sure you don't
 have negative feature values. The error message you sent did not say
 NaiveBayes went wrong, but the Spark shell was killed. -Xiangrui

 On Thu, Apr 24, 2014 at 4:05 PM, John King usedforprinting...@gmail.com
 wrote:
  In the other thread I had an issue with Python. In this issue, I tried
  switching to Scala. The code is:
 
  import org.apache.spark.mllib.regression.LabeledPoint;
 
  import org.apache.spark.mllib.linalg.SparseVector;
 
  import org.apache.spark.mllib.classification.NaiveBayes;
 
  import scala.collection.mutable.ArrayBuffer
 
 
 
  def isEmpty(a: String): Boolean = a != null 
 !a.replaceAll((?m)\s+$,
  ).isEmpty()
 
  def parsePoint(a: String): LabeledPoint = {
 
 val values = a.split('\t')
 
 val feat = values(1).split(' ')
 
 val indices = ArrayBuffer.empty[Int]
 
 val featValues = ArrayBuffer.empty[Double]
 
 for (f - feat) {
 
 val q = f.split(':')
 
 if (q.length == 2) {
 
indices += (q(0).toInt)
 
featValues += (q(1).toDouble)
 
 }
 
 }
 
 val vector = new SparseVector(2357815, indices.toArray,
  featValues.toArray)
 
 return LabeledPoint(values(0).toDouble, vector)
 
 }
 
 
  val data = sc.textFile(data.txt)
 
  val empty = data.filter(isEmpty)
 
  val points = empty.map(parsePoint)
 
  points.cache()
 
  val model = new NaiveBayes().run(points)
 
 
 
  On Thu, Apr 24, 2014 at 6:57 PM, Xiangrui Meng men...@gmail.com wrote:
 
  Do you mind sharing more code and error messages? The information you
  provided is too little to identify the problem. -Xiangrui
 
  On Thu, Apr 24, 2014 at 1:55 PM, John King 
 usedforprinting...@gmail.com
  wrote:
   Last command was:
  
   val model = new NaiveBayes().run(points)
  
  
  
   On Thu, Apr 24, 2014 at 4:27 PM, Xiangrui Meng men...@gmail.com
 wrote:
  
   Could you share the command you used and more of the error message?
   Also, is it an MLlib specific problem? -Xiangrui
  
   On Thu, Apr 24, 2014 at 11:49 AM, John King
   usedforprinting...@gmail.com wrote:
./spark-shell: line 153: 17654 Killed
$FWDIR/bin/spark-class org.apache.spark.repl.Main $@
   
   
Any ideas?
  
  
 
 



Re: Spark mllib throwing error

2014-04-24 Thread Xiangrui Meng
I only see one risk: if your feature indices are not sorted, it might
have undefined behavior. Other than that, I don't see any thing
suspicious. -Xiangrui

On Thu, Apr 24, 2014 at 4:56 PM, John King usedforprinting...@gmail.com wrote:
 It just displayed this error and stopped on its own. Do the lines of code
 mentioned in the error have anything to do with it?


 On Thu, Apr 24, 2014 at 7:54 PM, Xiangrui Meng men...@gmail.com wrote:

 I don't see anything wrong with your code. Could you do points.count()
 to see how many training examples you have? Also, make sure you don't
 have negative feature values. The error message you sent did not say
 NaiveBayes went wrong, but the Spark shell was killed. -Xiangrui

 On Thu, Apr 24, 2014 at 4:05 PM, John King usedforprinting...@gmail.com
 wrote:
  In the other thread I had an issue with Python. In this issue, I tried
  switching to Scala. The code is:
 
  import org.apache.spark.mllib.regression.LabeledPoint;
 
  import org.apache.spark.mllib.linalg.SparseVector;
 
  import org.apache.spark.mllib.classification.NaiveBayes;
 
  import scala.collection.mutable.ArrayBuffer
 
 
 
  def isEmpty(a: String): Boolean = a != null 
  !a.replaceAll((?m)\s+$,
  ).isEmpty()
 
  def parsePoint(a: String): LabeledPoint = {
 
 val values = a.split('\t')
 
 val feat = values(1).split(' ')
 
 val indices = ArrayBuffer.empty[Int]
 
 val featValues = ArrayBuffer.empty[Double]
 
 for (f - feat) {
 
 val q = f.split(':')
 
 if (q.length == 2) {
 
indices += (q(0).toInt)
 
featValues += (q(1).toDouble)
 
 }
 
 }
 
 val vector = new SparseVector(2357815, indices.toArray,
  featValues.toArray)
 
 return LabeledPoint(values(0).toDouble, vector)
 
 }
 
 
  val data = sc.textFile(data.txt)
 
  val empty = data.filter(isEmpty)
 
  val points = empty.map(parsePoint)
 
  points.cache()
 
  val model = new NaiveBayes().run(points)
 
 
 
  On Thu, Apr 24, 2014 at 6:57 PM, Xiangrui Meng men...@gmail.com wrote:
 
  Do you mind sharing more code and error messages? The information you
  provided is too little to identify the problem. -Xiangrui
 
  On Thu, Apr 24, 2014 at 1:55 PM, John King
  usedforprinting...@gmail.com
  wrote:
   Last command was:
  
   val model = new NaiveBayes().run(points)
  
  
  
   On Thu, Apr 24, 2014 at 4:27 PM, Xiangrui Meng men...@gmail.com
   wrote:
  
   Could you share the command you used and more of the error message?
   Also, is it an MLlib specific problem? -Xiangrui
  
   On Thu, Apr 24, 2014 at 11:49 AM, John King
   usedforprinting...@gmail.com wrote:
./spark-shell: line 153: 17654 Killed
$FWDIR/bin/spark-class org.apache.spark.repl.Main $@
   
   
Any ideas?