Parallel LogisticRegression?

2014-06-19 Thread Kyle Ellrott
I'm working on a problem learning several different sets of responses
against the same set of training features. Right now I've written the
program to cycle through all of the different label sets, attached them to
the training data and run LogisticRegressionWithSGD on each of them. ie

foreach curResponseSet in allResponses:
 currentRDD : RDD[LabeledPoints] = curResponseSet joined with

Each of the different training runs are independent. It seems like I should
be parallelize them as well.
Is there a better way to do this?


Re: Parallel LogisticRegression?

2014-06-20 Thread Kyle Ellrott
I've tried to parallelize the separate regressions using x=> do logistic regression against labels in x)
But I start to see messages like
14/06/20 10:10:26 WARN scheduler.TaskSetManager: Lost TID 4193 (task
14/06/20 10:10:27 WARN scheduler.TaskSetManager: Loss was due to fetch
failure from null
and finally
14/06/20 10:10:26 ERROR scheduler.TaskSetManager: Task 363.0:4 failed 4
times; aborting job

14/06/20 10:10:26 ERROR scheduler.DAGSchedulerActorSupervisor:
eventProcesserActor failed due to the error null; shutting down SparkContext
14/06/20 10:10:26 ERROR actor.OneForOneStrategy:

This doesn't happen when I don't use toParArray. I read that spark was
thread safe, but I seem to be running into problems. Am I doing something


On Thu, Jun 19, 2014 at 11:21 AM, Kyle Ellrott 

> I'm working on a problem learning several different sets of responses
> against the same set of training features. Right now I've written the
> program to cycle through all of the different label sets, attached them to
> the training data and run LogisticRegressionWithSGD on each of them. ie
> foreach curResponseSet in allResponses:
>  currentRDD : RDD[LabeledPoints] = curResponseSet joined with
> trainingData
>  LogisticRegressionWithSGD.train(currentRDD)
> Each of the different training runs are independent. It seems like I
> should be parallelize them as well.
> Is there a better way to do this?
> Kyle

Re: Parallel LogisticRegression?

2014-06-20 Thread Kyle Ellrott
I looks like I was running into
The issues went away when I changed to spark.mesos.coarse.


On Fri, Jun 20, 2014 at 10:36 AM, Kyle Ellrott 

> I've tried to parallelize the separate regressions using
> x=> do logistic regression against labels in x)
> But I start to see messages like
> 14/06/20 10:10:26 WARN scheduler.TaskSetManager: Lost TID 4193 (task
> 363.0:4)
> 14/06/20 10:10:27 WARN scheduler.TaskSetManager: Loss was due to fetch
> failure from null
> and finally
> 14/06/20 10:10:26 ERROR scheduler.TaskSetManager: Task 363.0:4 failed 4
> times; aborting job
> Then
> 14/06/20 10:10:26 ERROR scheduler.DAGSchedulerActorSupervisor:
> eventProcesserActor failed due to the error null; shutting down SparkContext
> 14/06/20 10:10:26 ERROR actor.OneForOneStrategy:
> java.lang.UnsupportedOperationException
> at
> org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
> at
> org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
>  at
> org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
> This doesn't happen when I don't use toParArray. I read that spark was
> thread safe, but I seem to be running into problems. Am I doing something
> wrong?
> Kyle
> On Thu, Jun 19, 2014 at 11:21 AM, Kyle Ellrott 
> wrote:
>> I'm working on a problem learning several different sets of responses
>> against the same set of training features. Right now I've written the
>> program to cycle through all of the different label sets, attached them to
>> the training data and run LogisticRegressionWithSGD on each of them. ie
>> foreach curResponseSet in allResponses:
>>  currentRDD : RDD[LabeledPoints] = curResponseSet joined with
>> trainingData
>>  LogisticRegressionWithSGD.train(currentRDD)
>> Each of the different training runs are independent. It seems like I
>> should be parallelize them as well.
>> Is there a better way to do this?
>> Kyle