Hi Eduardo,
Yes, it is possible to read ARFF files from HDFS.
However, right now it is way more complicated than it should be, and it's
not documented at all.
Thanks for asking the question.
I managed to do it with this command line:
./bin/samoa local target/SAMOA-Local-0.4.0-incubating-SNAPSHOT.jar
"PrequentialEvaluation -s (org.apache.samoa.streams.ArffFileStream -s
HDFSFileStreamSource -f /user/$USER/covtypeNorm.arff)"
But I had to do a small modification to HDFSFileStreamSource to make it
work, by adding this line after line 61
config.set("fs.hdfs.impl",
org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
Things to notice:
- We rely on HADOOP_HOME being set to your hadoop installation. This should
be made more robust.
- I used explicitly org.apache.samoa.streams.ArffFileStream as the normal
ArffFileStream does not support HDFS (this is related to SAMOA-14
<https://issues.apache.org/jira/browse/SAMOA-14>, and I plan to fix it
asap).
- I will add the snippet of code above in the same patch for SAMOA-14
Hope it helps,
-- Gianmarco
On Fri, Feb 12, 2016 at 6:45 PM, Eduardo Costa <[email protected]>
wrote:
> Hi,
>
> Could I pass arff files, by "-s " argumment, from hadoop HDFS to SAMOA. If
> I could, how to make?
>
> Best regards,
> Eduardo.
>