Re: LibSVM should have just one input file

Yan Facai Sun, 11 Jun 2017 22:45:08 -0700

Hi, yaphet.
It seems that the code you pasted should be located in  LibSVM,  rather
than SVM.
Do I misunderstand?


For LibSVMDataSource,
1. if numFeatures is unspecified, only one file is valid input.

val df = spark.read.format("libsvm")
  .load("data/mllib/sample_libsvm_data.txt")

2. otherwise, multiple files are OK.

val df = spark.read.format("libsvm")
  .option("numFeatures", "780")
  .load("data/mllib/sample_libsvm_data.txt")


For more to see: http://spark.apache.org/docs/latest/api/scala/index.html#
org.apache.spark.ml.source.libsvm.LibSVMDataSource


On Mon, Jun 12, 2017 at 11:46 AM, darion.yaphet <fly...@163.com> wrote:

> Hi team :
>
> Currently when we using SVM to train dataset we found the input
> files limit only one .
>
> the source code as following :
>
> val path = if (dataFiles.length == 1) {
> dataFiles.head.getPath.toUri.toString
> } else if (dataFiles.isEmpty) {
> throw new IOException("No input path specified for libsvm data")
> } else {
> throw new IOException("Multiple input paths are not supported for libsvm
> data.")
> }
>
> The file store on the Distributed File System such as HDFS is split into
> mutil piece and I think this limit is not necessary . I'm not sure is it a
> bug ? or something I'm using not correctly .
>
> thanks a lot ~~~
>
>
>
>

Re: LibSVM should have just one input file

Reply via email to