date:20131207

MapReduce Training phase

2013-12-07 Thread unmesha sreeveni

I want to know..more things on how the algorithms like svm is made parallel
weather MR -ed training phase or prediction or both...


In normal cases training phase is apt for MR as it takes lot of time.
Do we need to MR prediction also?


-- 
*Thanks  Regards*

Unmesha Sreeveni U.B

*Junior Developer*

Re: SVM Implementation for mahout?

2013-12-07 Thread Fernando Santos

Thanks Manuel.

It seems that these two (https://issues.apache.org/jira/browse/MAHOUT-334
 and https://issues.apache.org/jira/browse/MAHOUT-232) patches might work,
although not in parallel.

Does anyone has sucessfully used any of these two patches already and could
share some comments about it?

Thanks


2013/12/6 Manuel Blechschmidt manuel.blechschm...@gmx.de

 Hi Fernando,
 there are some patches and some discussions:

 SVM:
 https://issues.apache.org/jira/browse/MAHOUT-334
 https://issues.apache.org/jira/browse/MAHOUT-232
 https://issues.apache.org/jira/browse/MAHOUT-14
 https://issues.apache.org/jira/browse/MAHOUT-227

 /Manuel

 On 06.12.2013, at 19:14, Fernando Santos wrote:

  Hello,
 
  Is there any tested SVM implementation for Mahout?
 
  Mahout in action says there is a sequential implementation, but
  Experimental still. I couldn't find this implementation.
 
  Thanks
 
  --
  Fernando Santos
  +55 61 8129 8505

 --
 Manuel Blechschmidt
 M.Sc. IT Systems Engineering
 Dortustr. 57
 14467 Potsdam
 Mobil: 0173/6322621
 Twitter: http://twitter.com/Manuel_B




-- 
Fernando Santos
+55 61 8129 8505

Re: Test naivebayes task running really slowly and not in distributed mode

2013-12-07 Thread Fernando Santos

I realized what was the problem.

First of all the data was not big enough to split the job in more than one
task. Training file was 30MB and my block sizes were 64MB.

Besides that, I set the number of map (mapred.map.tasks) and reduce (
mapred.reduce.tasks) tasks in the mapred-site.xml file of hadoop.

After that the algorithm started running in an acceptable time.

2013/12/2 Fernando Santos fernandoleandro1...@gmail.com

Train and test set are in single files (part-r-0). Training file is
30MB and testing file is 2MB.

2013/12/2 Fernando Santos fernandoleandro1...@gmail.com

Hello Ted,

No, the training ran also in one machine. What happens sometimes is that
each box execute one job one at a time, but not together. For example, if
it will run 3 jobs, it runs the first job in box1, the next in box2 and the
next in box 1 again.

The full dataset is a csv around 70MB. I turned it into sequence file,
applied seq2sparse, then splitted and trained. The training task was quite
fast, some minutes to execute. But the test is really slow as I said, and
also running in one machine.

2013/12/1 Ted Dunning ted.dunn...@gmail.com

Did the training run use both machines?

How large is the input for the test run?

Is it contained in a single file?

On Sat, Nov 30, 2013 at 11:22 AM, Fernando Santos
fernandoleandro1...@gmail.com wrote:

Hello everyone,

I'm trying to do a text classification task. My dataset is not that
big, I
have around 700.000 small comments.

Following the 20newsgroups example, I created the vector from the text,
splited it and trained the model. Now I'm trying to test it but it is
really slow and also I cannot make it to run in the cluster. Whatever
I do
it always just run in one machine. And I think the testnb algorithm is
supposed to run using mapReduce, right?

I also tried this example here (

http://chimpler.wordpress.com/2013/06/24/using-the-mahout-naive-bayes-classifier-to-automatically-classify-twitter-messages-part-2-distribute-classification-with-hadoop/
)
but also, the other box in the cluster is not executing any task. In
fact,
when I execute the testnb or using the MapReduceClassifier proposed in
this
tutorial above, I get one job, executing one task and this task runs
really
slowly (like 6 minutes to achieve 0.13% of the task).

I think I must be doing something wrong so that the cluster is not
working
how it is supposed to be.

I have a cluster with 2 box configured with hadoop 0.20.205.0 and using
mahout 0.8.

I also tried versions 0.7 and 0.6 of mahout but nothing changed.

Any help would be aprreciated.

The logs I have from this task:

*stdout logs*

Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library
/usr/local/hadoop/lib/libhadoop.so which might have disabled stack
guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c
libfile', or link it with '-z noexecstack'.

*syslog logs*

2013-11-30 17:09:19,191 WARN org.apache.hadoop.util.NativeCodeLoader:
Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable
2013-11-30 17:09:19,400 WARN
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi
already exists!
2013-11-30 17:09:19,472 INFO org.apache.hadoop.util.ProcessTree:
setsid exited with exit code 0
2013-11-30 17:09:19,474 INFO org.apache.hadoop.mapred.Task: Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@5810d963
2013-11-30 17:09:19,543 INFO org.apache.hadoop.mapred.MapTask:
io.sort.mb
= 100
2013-11-30 17:09:19,569 INFO org.apache.hadoop.mapred.MapTask: data
buffer = 79691776/99614720
2013-11-30 17:09:19,569 INFO org.apache.hadoop.mapred.MapTask: record
buffer = 262144/327680

--
Fernando Santos
+55 61 8129 8505

Re: SVM Implementation for mahout?

2013-12-07 Thread Suneel Marthi

Any specific reasons u r looking for an SVM implementation only?  
R u sure that those patches r still relevant given the codebase today?





On Saturday, December 7, 2013 2:58 PM, Fernando Santos 
fernandoleandro1...@gmail.com wrote:
 
Thanks Manuel.

It seems that these two (https://issues.apache.org/jira/browse/MAHOUT-334
and https://issues.apache.org/jira/browse/MAHOUT-232) patches might work,
although not in parallel.

Does anyone has sucessfully used any of these two patches already and could
share some comments about it?

Thanks


2013/12/6 Manuel Blechschmidt manuel.blechschm...@gmx.de

 Hi Fernando,
 there are some patches and some discussions:

 SVM:
 https://issues.apache.org/jira/browse/MAHOUT-334
 https://issues.apache.org/jira/browse/MAHOUT-232
 https://issues.apache.org/jira/browse/MAHOUT-14
 https://issues.apache.org/jira/browse/MAHOUT-227

 /Manuel

 On 06.12.2013, at 19:14, Fernando Santos wrote:

  Hello,
 
  Is there any tested SVM implementation for Mahout?
 
  Mahout in action says there is a sequential implementation, but
  Experimental still. I couldn't find this implementation.
 
  Thanks
 
  --
  Fernando Santos
  +55 61 8129 8505

 --
 Manuel Blechschmidt
 M.Sc. IT Systems Engineering
 Dortustr. 57
 14467 Potsdam
 Mobil: 0173/6322621
 Twitter: http://twitter.com/Manuel_B





-- 
Fernando Santos
+55 61 8129 8505

Re: SVM Implementation for mahout?

2013-12-07 Thread Fernando Santos

Hello Suneel,

I want to check if any better performance is reached with SVM.

I've been using naive bayes, but my data is quite unbalanced and therefore
I'm getting pretty bad results with it. I also tried the complementary
naive bayes, but got the same bad results. I read about this difference
between NaiveBayes performance of Weka and Mahout implementations and maybe
that's the cause (
http://mail-archives.apache.org/mod_mbox/mahout-user/201109.mbox/%3ccabdaxxijtfv9nhqxxpyd72rrsv-h60ps13h0pund2injx70...@mail.gmail.com%3E
).

I also tried logistic regression and got around 77% accuracy. So maybe with
SVM it could be better.

2013/12/7 Suneel Marthi suneel_mar...@yahoo.com

Any specific reasons u r looking for an SVM implementation only?
R u sure that those patches r still relevant given the codebase today?

On Saturday, December 7, 2013 2:58 PM, Fernando Santos
fernandoleandro1...@gmail.com wrote:

Thanks Manuel.

It seems that these two (https://issues.apache.org/jira/browse/MAHOUT-334
and https://issues.apache.org/jira/browse/MAHOUT-232) patches might work,
although not in parallel.

Does anyone has sucessfully used any of these two patches already and could
share some comments about it?

Thanks

2013/12/6 Manuel Blechschmidt manuel.blechschm...@gmx.de

Hi Fernando,
there are some patches and some discussions:

SVM:
https://issues.apache.org/jira/browse/MAHOUT-334
https://issues.apache.org/jira/browse/MAHOUT-232
https://issues.apache.org/jira/browse/MAHOUT-14
https://issues.apache.org/jira/browse/MAHOUT-227

/Manuel

On 06.12.2013, at 19:14, Fernando Santos wrote:

Hello,

Is there any tested SVM implementation for Mahout?

Mahout in action says there is a sequential implementation, but
Experimental still. I couldn't find this implementation.

Thanks

--
Fernando Santos
+55 61 8129 8505

--
Manuel Blechschmidt
M.Sc. IT Systems Engineering
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B

--
Fernando Santos
+55 61 8129 8505

Re: SVM Implementation for mahout?

2013-12-07 Thread Lucas Fernandes Brunialti

Hello Fernando,

The naive bayes approach makes the assumption that your features are
independent, if your featurea have a high correlation, naive bayes won't be
a good choice.

I would advice you to try the neural networks (mlp), it can get a better
decision surface than logistic regression...

Best.

Lucas.
On Dec 7, 2013 6:53 PM, Fernando Santos fernandoleandro1...@gmail.com
wrote:

Hello Suneel,

I want to check if any better performance is reached with SVM.

http://mail-archives.apache.org/mod_mbox/mahout-user/201109.mbox/%3ccabdaxxijtfv9nhqxxpyd72rrsv-h60ps13h0pund2injx70...@mail.gmail.com%3E
).

I also tried logistic regression and got around 77% accuracy. So maybe with
SVM it could be better.

2013/12/7 Suneel Marthi suneel_mar...@yahoo.com

Any specific reasons u r looking for an SVM implementation only?
R u sure that those patches r still relevant given the codebase today?

On Saturday, December 7, 2013 2:58 PM, Fernando Santos
fernandoleandro1...@gmail.com wrote:

Thanks Manuel.

It seems that these two (
https://issues.apache.org/jira/browse/MAHOUT-334
and https://issues.apache.org/jira/browse/MAHOUT-232) patches might
work,
although not in parallel.

Does anyone has sucessfully used any of these two patches already and
could
share some comments about it?

Thanks

2013/12/6 Manuel Blechschmidt manuel.blechschm...@gmx.de

Hi Fernando,
there are some patches and some discussions:

SVM:
https://issues.apache.org/jira/browse/MAHOUT-334
https://issues.apache.org/jira/browse/MAHOUT-232
https://issues.apache.org/jira/browse/MAHOUT-14
https://issues.apache.org/jira/browse/MAHOUT-227

/Manuel

On 06.12.2013, at 19:14, Fernando Santos wrote:

Hello,

Is there any tested SVM implementation for Mahout?

Mahout in action says there is a sequential implementation, but
Experimental still. I couldn't find this implementation.

Thanks

--
Fernando Santos
+55 61 8129 8505

--
Manuel Blechschmidt
M.Sc. IT Systems Engineering
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B

--
Fernando Santos
+55 61 8129 8505

MapReduce Training phase

Re: SVM Implementation for mahout?

Re: Test naivebayes task running really slowly and not in distributed mode

Re: SVM Implementation for mahout?

Re: SVM Implementation for mahout?

Re: SVM Implementation for mahout?

6 matches

Site Navigation

Mail list logo

Footer information