In Spark-Scala, how to copy Array of Lists into new DataFrame?

2016-09-25 Thread Dan Bikle
Hello World, I am familiar with Python and I am learning Spark-Scala. I want to build a DataFrame which has structure desribed by this syntax: *// Prepare training data from a list of (label, features) tuples.val training = spark.createDataFrame(Seq( (1.1, Vectors.dense(1.1, 0.1)),

How to use Spark-Scala to download a CSV file from the web?

2016-09-25 Thread Dan Bikle
hello spark-world, How to use Spark-Scala to download a CSV file from the web and load the file into a spark-csv DataFrame? Currently I depend on curl in a shell command to get my CSV file. Here is the syntax I want to enhance: */* fb_csv.scalaThis script should load FB

With spark DataFrame, how to write to existing folder?

2016-09-23 Thread Dan Bikle
spark-world, I am walking through the example here: https://github.com/databricks/spark-csv#scala-api The example complains if I try to write a DataFrame to an existing folder: *val selectedData = df.select("year", "model")selectedData.write .format("com.databricks.spark.csv")

databricks spark-csv: linking coordinates are what?

2016-09-23 Thread Dan Bikle
hello world-of-spark, I am learning spark today. I want to understand the spark code in this repo: https://github.com/databricks/spark-csv In the README.md I see this info: Linking You can link against this library in your program at the following coordinates: Scala 2.10 groupId:

Optimal/Expected way to run demo spark-scala scripts?

2016-09-23 Thread Dan Bikle
hello spark-world, I am new to spark and want to learn how to use it. I come from the Python world. I see an example at the url below: http://spark.apache.org/docs/latest/ml-pipeline.html#example-estimator-transformer-and-param What would be an optimal way to run the above example? In the

In Spark-scala, how to fill Vectors.dense in DataFrame from CSV?

2016-09-22 Thread Dan Bikle
hello spark-world, I am new to spark. I noticed this online example: http://spark.apache.org/docs/latest/ml-pipeline.html I am curious about this syntax: // Prepare training data from a list of (label, features) tuples. val training = spark.createDataFrame(Seq( (1.0,