MapReduce Training phase
I want to know..more things on how the algorithms like svm is made parallel weather MR -ed training phase or prediction or both... In normal cases training phase is apt for MR as it takes lot of time. Do we need to MR prediction also? -- *Thanks Regards* Unmesha Sreeveni U.B *Junior Developer*
Re: SVM Implementation for mahout?
Thanks Manuel. It seems that these two (https://issues.apache.org/jira/browse/MAHOUT-334 and https://issues.apache.org/jira/browse/MAHOUT-232) patches might work, although not in parallel. Does anyone has sucessfully used any of these two patches already and could share some comments about it? Thanks 2013/12/6 Manuel Blechschmidt manuel.blechschm...@gmx.de Hi Fernando, there are some patches and some discussions: SVM: https://issues.apache.org/jira/browse/MAHOUT-334 https://issues.apache.org/jira/browse/MAHOUT-232 https://issues.apache.org/jira/browse/MAHOUT-14 https://issues.apache.org/jira/browse/MAHOUT-227 /Manuel On 06.12.2013, at 19:14, Fernando Santos wrote: Hello, Is there any tested SVM implementation for Mahout? Mahout in action says there is a sequential implementation, but Experimental still. I couldn't find this implementation. Thanks -- Fernando Santos +55 61 8129 8505 -- Manuel Blechschmidt M.Sc. IT Systems Engineering Dortustr. 57 14467 Potsdam Mobil: 0173/6322621 Twitter: http://twitter.com/Manuel_B -- Fernando Santos +55 61 8129 8505
Re: Test naivebayes task running really slowly and not in distributed mode
I realized what was the problem. First of all the data was not big enough to split the job in more than one task. Training file was 30MB and my block sizes were 64MB. Besides that, I set the number of map (mapred.map.tasks) and reduce ( mapred.reduce.tasks) tasks in the mapred-site.xml file of hadoop. After that the algorithm started running in an acceptable time. 2013/12/2 Fernando Santos fernandoleandro1...@gmail.com Train and test set are in single files (part-r-0). Training file is 30MB and testing file is 2MB. 2013/12/2 Fernando Santos fernandoleandro1...@gmail.com Hello Ted, No, the training ran also in one machine. What happens sometimes is that each box execute one job one at a time, but not together. For example, if it will run 3 jobs, it runs the first job in box1, the next in box2 and the next in box 1 again. The full dataset is a csv around 70MB. I turned it into sequence file, applied seq2sparse, then splitted and trained. The training task was quite fast, some minutes to execute. But the test is really slow as I said, and also running in one machine. 2013/12/1 Ted Dunning ted.dunn...@gmail.com Did the training run use both machines? How large is the input for the test run? Is it contained in a single file? On Sat, Nov 30, 2013 at 11:22 AM, Fernando Santos fernandoleandro1...@gmail.com wrote: Hello everyone, I'm trying to do a text classification task. My dataset is not that big, I have around 700.000 small comments. Following the 20newsgroups example, I created the vector from the text, splited it and trained the model. Now I'm trying to test it but it is really slow and also I cannot make it to run in the cluster. Whatever I do it always just run in one machine. And I think the testnb algorithm is supposed to run using mapReduce, right? I also tried this example here ( http://chimpler.wordpress.com/2013/06/24/using-the-mahout-naive-bayes-classifier-to-automatically-classify-twitter-messages-part-2-distribute-classification-with-hadoop/ ) but also, the other box in the cluster is not executing any task. In fact, when I execute the testnb or using the MapReduceClassifier proposed in this tutorial above, I get one job, executing one task and this task runs really slowly (like 6 minutes to achieve 0.13% of the task). I think I must be doing something wrong so that the cluster is not working how it is supposed to be. I have a cluster with 2 box configured with hadoop 0.20.205.0 and using mahout 0.8. I also tried versions 0.7 and 0.6 of mahout but nothing changed. Any help would be aprreciated. The logs I have from this task: *stdout logs* Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /usr/local/hadoop/lib/libhadoop.so which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c libfile', or link it with '-z noexecstack'. *syslog logs* 2013-11-30 17:09:19,191 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2013-11-30 17:09:19,400 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists! 2013-11-30 17:09:19,472 INFO org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0 2013-11-30 17:09:19,474 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@5810d963 2013-11-30 17:09:19,543 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100 2013-11-30 17:09:19,569 INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720 2013-11-30 17:09:19,569 INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680 -- Fernando Santos +55 61 8129 8505 -- Fernando Santos +55 61 8129 8505 -- Fernando Santos +55 61 8129 8505 -- Fernando Santos +55 61 8129 8505
Re: SVM Implementation for mahout?
Any specific reasons u r looking for an SVM implementation only? R u sure that those patches r still relevant given the codebase today? On Saturday, December 7, 2013 2:58 PM, Fernando Santos fernandoleandro1...@gmail.com wrote: Thanks Manuel. It seems that these two (https://issues.apache.org/jira/browse/MAHOUT-334 and https://issues.apache.org/jira/browse/MAHOUT-232) patches might work, although not in parallel. Does anyone has sucessfully used any of these two patches already and could share some comments about it? Thanks 2013/12/6 Manuel Blechschmidt manuel.blechschm...@gmx.de Hi Fernando, there are some patches and some discussions: SVM: https://issues.apache.org/jira/browse/MAHOUT-334 https://issues.apache.org/jira/browse/MAHOUT-232 https://issues.apache.org/jira/browse/MAHOUT-14 https://issues.apache.org/jira/browse/MAHOUT-227 /Manuel On 06.12.2013, at 19:14, Fernando Santos wrote: Hello, Is there any tested SVM implementation for Mahout? Mahout in action says there is a sequential implementation, but Experimental still. I couldn't find this implementation. Thanks -- Fernando Santos +55 61 8129 8505 -- Manuel Blechschmidt M.Sc. IT Systems Engineering Dortustr. 57 14467 Potsdam Mobil: 0173/6322621 Twitter: http://twitter.com/Manuel_B -- Fernando Santos +55 61 8129 8505
Re: SVM Implementation for mahout?
Hello Suneel, I want to check if any better performance is reached with SVM. I've been using naive bayes, but my data is quite unbalanced and therefore I'm getting pretty bad results with it. I also tried the complementary naive bayes, but got the same bad results. I read about this difference between NaiveBayes performance of Weka and Mahout implementations and maybe that's the cause ( http://mail-archives.apache.org/mod_mbox/mahout-user/201109.mbox/%3ccabdaxxijtfv9nhqxxpyd72rrsv-h60ps13h0pund2injx70...@mail.gmail.com%3E ). I also tried logistic regression and got around 77% accuracy. So maybe with SVM it could be better. 2013/12/7 Suneel Marthi suneel_mar...@yahoo.com Any specific reasons u r looking for an SVM implementation only? R u sure that those patches r still relevant given the codebase today? On Saturday, December 7, 2013 2:58 PM, Fernando Santos fernandoleandro1...@gmail.com wrote: Thanks Manuel. It seems that these two (https://issues.apache.org/jira/browse/MAHOUT-334 and https://issues.apache.org/jira/browse/MAHOUT-232) patches might work, although not in parallel. Does anyone has sucessfully used any of these two patches already and could share some comments about it? Thanks 2013/12/6 Manuel Blechschmidt manuel.blechschm...@gmx.de Hi Fernando, there are some patches and some discussions: SVM: https://issues.apache.org/jira/browse/MAHOUT-334 https://issues.apache.org/jira/browse/MAHOUT-232 https://issues.apache.org/jira/browse/MAHOUT-14 https://issues.apache.org/jira/browse/MAHOUT-227 /Manuel On 06.12.2013, at 19:14, Fernando Santos wrote: Hello, Is there any tested SVM implementation for Mahout? Mahout in action says there is a sequential implementation, but Experimental still. I couldn't find this implementation. Thanks -- Fernando Santos +55 61 8129 8505 -- Manuel Blechschmidt M.Sc. IT Systems Engineering Dortustr. 57 14467 Potsdam Mobil: 0173/6322621 Twitter: http://twitter.com/Manuel_B -- Fernando Santos +55 61 8129 8505 -- Fernando Santos +55 61 8129 8505
Re: SVM Implementation for mahout?
Hello Fernando, The naive bayes approach makes the assumption that your features are independent, if your featurea have a high correlation, naive bayes won't be a good choice. I would advice you to try the neural networks (mlp), it can get a better decision surface than logistic regression... Best. Lucas. On Dec 7, 2013 6:53 PM, Fernando Santos fernandoleandro1...@gmail.com wrote: Hello Suneel, I want to check if any better performance is reached with SVM. I've been using naive bayes, but my data is quite unbalanced and therefore I'm getting pretty bad results with it. I also tried the complementary naive bayes, but got the same bad results. I read about this difference between NaiveBayes performance of Weka and Mahout implementations and maybe that's the cause ( http://mail-archives.apache.org/mod_mbox/mahout-user/201109.mbox/%3ccabdaxxijtfv9nhqxxpyd72rrsv-h60ps13h0pund2injx70...@mail.gmail.com%3E ). I also tried logistic regression and got around 77% accuracy. So maybe with SVM it could be better. 2013/12/7 Suneel Marthi suneel_mar...@yahoo.com Any specific reasons u r looking for an SVM implementation only? R u sure that those patches r still relevant given the codebase today? On Saturday, December 7, 2013 2:58 PM, Fernando Santos fernandoleandro1...@gmail.com wrote: Thanks Manuel. It seems that these two ( https://issues.apache.org/jira/browse/MAHOUT-334 and https://issues.apache.org/jira/browse/MAHOUT-232) patches might work, although not in parallel. Does anyone has sucessfully used any of these two patches already and could share some comments about it? Thanks 2013/12/6 Manuel Blechschmidt manuel.blechschm...@gmx.de Hi Fernando, there are some patches and some discussions: SVM: https://issues.apache.org/jira/browse/MAHOUT-334 https://issues.apache.org/jira/browse/MAHOUT-232 https://issues.apache.org/jira/browse/MAHOUT-14 https://issues.apache.org/jira/browse/MAHOUT-227 /Manuel On 06.12.2013, at 19:14, Fernando Santos wrote: Hello, Is there any tested SVM implementation for Mahout? Mahout in action says there is a sequential implementation, but Experimental still. I couldn't find this implementation. Thanks -- Fernando Santos +55 61 8129 8505 -- Manuel Blechschmidt M.Sc. IT Systems Engineering Dortustr. 57 14467 Potsdam Mobil: 0173/6322621 Twitter: http://twitter.com/Manuel_B -- Fernando Santos +55 61 8129 8505 -- Fernando Santos +55 61 8129 8505