Here's a decent GitHub book: Mastering Apache Spark <https://www.gitbook.com/book/jaceklaskowski/mastering-apache-spark/details> .
I'm new at Scala too. I found it very helpful to study the Scala language without Spark. The documentation found here <http://docs.scala-lang.org/index.html> is excellent. Pete On Wed, Sep 7, 2016 at 1:39 AM, 颜发才(Yan Facai) <yaf...@gmail.com> wrote: > Hi Peter, > I'm familiar with Pandas / Numpy in python, while spark / scala is > totally new for me. > Pandas provides a detailed document, like how to slice data, parse file, > use apply and filter function. > > Do spark have some more detailed document? > > > > On Tue, Sep 6, 2016 at 9:58 PM, Peter Figliozzi <pete.figlio...@gmail.com> > wrote: > >> Hi Yan, I think you'll have to map the features column to a new numerical >> features column. >> >> Here's one way to do the individual transform: >> >> scala> val x = "[1, 2, 3, 4, 5]" >> x: String = [1, 2, 3, 4, 5] >> >> scala> val y:Array[Int] = x slice(1, x.length - 1) replace(",", "") >> split(" ") map(_.toInt) >> y: Array[Int] = Array(1, 2, 3, 4, 5) >> >> If you don't know about the Scala command line, just type "scala" in a >> terminal window. It's a good place to try things out. >> >> You can make a function out of this transformation and apply it to your >> features column to make a new column. Then add this with >> Dataset.withColumn. >> >> See here >> <http://stackoverflow.com/questions/35227568/applying-function-to-spark-dataframe-column> >> on how to apply a function to a Column to make a new column. >> >> On Tue, Sep 6, 2016 at 1:56 AM, 颜发才(Yan Facai) <yaf...@gmail.com> wrote: >> >>> Hi, >>> I have a csv file like: >>> uid mid features label >>> 123 5231 [0, 1, 3, ...] True >>> >>> Both "features" and "label" columns are used for GBTClassifier. >>> >>> However, when I read the file: >>> Dataset<Row> samples = sparkSession.read().csv(file); >>> The type of samples.select("features") is String. >>> >>> My question is: >>> How to map samples.select("features") to Vector or any appropriate type, >>> so I can use it to train like: >>> GBTClassifier gbdt = new GBTClassifier() >>> .setLabelCol("label") >>> .setFeaturesCol("features") >>> .setMaxIter(2) >>> .setMaxDepth(7); >>> >>> Thanks. >>> >> >> >