Hi, yaphet. It seems that the code you pasted should be located in LibSVM, rather than SVM. Do I misunderstand?
For LibSVMDataSource, 1. if numFeatures is unspecified, only one file is valid input. val df = spark.read.format("libsvm") .load("data/mllib/sample_libsvm_data.txt") 2. otherwise, multiple files are OK. val df = spark.read.format("libsvm") .option("numFeatures", "780") .load("data/mllib/sample_libsvm_data.txt") For more to see: http://spark.apache.org/docs/latest/api/scala/index.html# org.apache.spark.ml.source.libsvm.LibSVMDataSource On Mon, Jun 12, 2017 at 11:46 AM, darion.yaphet <fly...@163.com> wrote: > Hi team : > > Currently when we using SVM to train dataset we found the input > files limit only one . > > the source code as following : > > val path = if (dataFiles.length == 1) { > dataFiles.head.getPath.toUri.toString > } else if (dataFiles.isEmpty) { > throw new IOException("No input path specified for libsvm data") > } else { > throw new IOException("Multiple input paths are not supported for libsvm > data.") > } > > The file store on the Distributed File System such as HDFS is split into > mutil piece and I think this limit is not necessary . I'm not sure is it a > bug ? or something I'm using not correctly . > > thanks a lot ~~~ > > > >