Re: Parallel LogisticRegression?

2014-06-20 Thread Kyle Ellrott
I looks like I was running into
https://issues.apache.org/jira/browse/SPARK-2204
The issues went away when I changed to spark.mesos.coarse.

Kyle


On Fri, Jun 20, 2014 at 10:36 AM, Kyle Ellrott 
wrote:

> I've tried to parallelize the separate regressions using
> allResponses.toParArray.map( x=> do logistic regression against labels in x)
> But I start to see messages like
> 14/06/20 10:10:26 WARN scheduler.TaskSetManager: Lost TID 4193 (task
> 363.0:4)
> 14/06/20 10:10:27 WARN scheduler.TaskSetManager: Loss was due to fetch
> failure from null
> and finally
> 14/06/20 10:10:26 ERROR scheduler.TaskSetManager: Task 363.0:4 failed 4
> times; aborting job
>
> Then
> 14/06/20 10:10:26 ERROR scheduler.DAGSchedulerActorSupervisor:
> eventProcesserActor failed due to the error null; shutting down SparkContext
> 14/06/20 10:10:26 ERROR actor.OneForOneStrategy:
> java.lang.UnsupportedOperationException
> at
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
> at
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>  at
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
>
>
> This doesn't happen when I don't use toParArray. I read that spark was
> thread safe, but I seem to be running into problems. Am I doing something
> wrong?
>
> Kyle
>
>
>
> On Thu, Jun 19, 2014 at 11:21 AM, Kyle Ellrott 
> wrote:
>
>>
>> I'm working on a problem learning several different sets of responses
>> against the same set of training features. Right now I've written the
>> program to cycle through all of the different label sets, attached them to
>> the training data and run LogisticRegressionWithSGD on each of them. ie
>>
>> foreach curResponseSet in allResponses:
>>  currentRDD : RDD[LabeledPoints] = curResponseSet joined with
>> trainingData
>>  LogisticRegressionWithSGD.train(currentRDD)
>>
>>
>> Each of the different training runs are independent. It seems like I
>> should be parallelize them as well.
>> Is there a better way to do this?
>>
>>
>> Kyle
>>
>
>


Re: Parallel LogisticRegression?

2014-06-20 Thread Kyle Ellrott
I've tried to parallelize the separate regressions using
allResponses.toParArray.map( x=> do logistic regression against labels in x)
But I start to see messages like
14/06/20 10:10:26 WARN scheduler.TaskSetManager: Lost TID 4193 (task
363.0:4)
14/06/20 10:10:27 WARN scheduler.TaskSetManager: Loss was due to fetch
failure from null
and finally
14/06/20 10:10:26 ERROR scheduler.TaskSetManager: Task 363.0:4 failed 4
times; aborting job

Then
14/06/20 10:10:26 ERROR scheduler.DAGSchedulerActorSupervisor:
eventProcesserActor failed due to the error null; shutting down SparkContext
14/06/20 10:10:26 ERROR actor.OneForOneStrategy:
java.lang.UnsupportedOperationException
at
org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
at
org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
at
org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)


This doesn't happen when I don't use toParArray. I read that spark was
thread safe, but I seem to be running into problems. Am I doing something
wrong?

Kyle



On Thu, Jun 19, 2014 at 11:21 AM, Kyle Ellrott 
wrote:

>
> I'm working on a problem learning several different sets of responses
> against the same set of training features. Right now I've written the
> program to cycle through all of the different label sets, attached them to
> the training data and run LogisticRegressionWithSGD on each of them. ie
>
> foreach curResponseSet in allResponses:
>  currentRDD : RDD[LabeledPoints] = curResponseSet joined with
> trainingData
>  LogisticRegressionWithSGD.train(currentRDD)
>
>
> Each of the different training runs are independent. It seems like I
> should be parallelize them as well.
> Is there a better way to do this?
>
>
> Kyle
>