[ https://issues.apache.org/jira/browse/SPARK-21066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055328#comment-16055328 ]
Yan Facai (颜发才) edited comment on SPARK-21066 at 6/20/17 8:12 AM: ------------------------------------------------------------------ Hi, [~darion] . If `numFeatures` is specified, multiple files are OK. {code} val df = spark.read.format("libsvm") .option("numFeatures", "780") .load("data/mllib/sample_libsvm_data.txt") {code} see: http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.source.libsvm.LibSVMDataSource was (Author: facai): Hi, [~darion] . If `numFeature` is specified, multiple files are OK. {code} val df = spark.read.format("libsvm") .option("numFeatures", "780") .load("data/mllib/sample_libsvm_data.txt") {code} see: http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.source.libsvm.LibSVMDataSource > LibSVM load just one input file > ------------------------------- > > Key: SPARK-21066 > URL: https://issues.apache.org/jira/browse/SPARK-21066 > Project: Spark > Issue Type: Bug > Components: ML > Affects Versions: 2.1.1 > Reporter: darion yaphet > > Currently when we using SVM to train dataset we found the input files limit > only one . > The file store on the Distributed File System such as HDFS is split into > mutil piece and I think this limit is not necessary . > We can join input paths into a string split with comma. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org