Hi Eduardo, It depends very much on the dataset and the cluster setup. In my setup, covType takes more or less the same time in local or cluster environment (up to parallelism 8).
There are some inefficiencies with serialization that we are aware of, but it should not affect the performance to the point of slowing it down one order of magnitude. Have you validated your cluster setup? Cheers, -- Gianmarco On Mon, Mar 28, 2016 at 3:05 PM, Eduardo Costa <[email protected]> wrote: > Hi Gianmarco, > > Yes, it helped me! > I put the STORM, HADOOP and SAMOA in the same cluster, it worked! However, > I > am thinking the execution too slow. > Considering the same task and covtypeNorm.arff dataset , Samoa (local mode) > takes 18 seconds. Already in cluster mode, several minutes. Is this normal? > > Best regards, > Eduardo. > > 2016-03-13 4:48 GMT-03:00 Gianmarco De Francisci Morales <[email protected] > >: > > > Hi Eduardo, > > > > As long as you can access the HDFS cluster from the machines composing > the > > Storm cluster, there should be no problem. > > However, you need to figure out how to set the environment variables to > > point to the right installation of Hadoop (set the HADOOP_HOME variable). > > You just need to set your configuration files (e.g., hdfs-site.xml) to > > point to the correct Hadoop cluster. > > > > Hope it helps, > > > > -- Gianmarco > > > > On Sat, Mar 12, 2016 at 4:21 PM, Eduardo Costa <[email protected]> > > wrote: > > > > > Hi, Gianmarco! > > > Thank you so much by response! > > > Now, I have another doubt: I run the SAMOA (in cluster mode) in a > > different > > > machine (cluster) from Hadoop cluster because I run the SAMOA on top > of > > > Storm cluster. Is there some way to read arff files from this Hadoop > > > cluster remote to run the SAMOA on top of Storm cluster? > > > Sorry for bothering so much, but I need it to give continidade my > > master's > > > thesis in Brazil at the Federal University of the State of Rio de > Janeiro > > > (UNIRIO). As previously mentioned, I'm trying to build a rudimentary > > > anomaly detection system using SAMOA, but I am a layman in relation to > > > Samoa. > > > > > > Best regards, > > > Eduardo. > > > > > > 2016-03-06 8:59 GMT-03:00 Gianmarco De Francisci Morales < > > [email protected] > > > >: > > > > > > > Hi Eduardo, > > > > Yes, it is possible to read ARFF files from HDFS. > > > > However, right now it is way more complicated than it should be, and > > it's > > > > not documented at all. > > > > Thanks for asking the question. > > > > > > > > I managed to do it with this command line: > > > > > > > > ./bin/samoa local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar > > > > "PrequentialEvaluation -s (org.apache.samoa.streams.ArffFileStream -s > > > > HDFSFileStreamSource -f /user/$USER/covtypeNorm.arff)" > > > > > > > > But I had to do a small modification to HDFSFileStreamSource to make > it > > > > work, by adding this line after line 61 > > > > > > > > config.set("fs.hdfs.impl", > > > > > > > > > org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()); > > > > > > > > Things to notice: > > > > - We rely on HADOOP_HOME being set to your hadoop installation. This > > > should > > > > be made more robust. > > > > - I used explicitly org.apache.samoa.streams.ArffFileStream as the > > normal > > > > ArffFileStream does not support HDFS (this is related to SAMOA-14 > > > > <https://issues.apache.org/jira/browse/SAMOA-14>, and I plan to fix > it > > > > asap). > > > > - I will add the snippet of code above in the same patch for SAMOA-14 > > > > > > > > > > > > Hope it helps, > > > > > > > > > > > > > > > > > > > > -- Gianmarco > > > > > > > > On Fri, Feb 12, 2016 at 6:45 PM, Eduardo Costa < > [email protected]> > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > Could I pass arff files, by "-s " argumment, from hadoop HDFS to > > SAMOA. > > > > If > > > > > I could, how to make? > > > > > > > > > > Best regards, > > > > > Eduardo. > > > > > > > > > > > > > > >
