If you have used spark-sas7bdat package to transform SAS data set to Spark, please be aware

2016-10-27 Thread Shi Yu
I found some main issues and wrote it on my blog: https://eilianyu.wordpress.com/2016/10/27/be-aware-of-hidden-data-errors-using-spark-sas7bdat-pacakge-to-ingest-sas-datasets-to-spark/

Spark Beginner Question

2016-07-26 Thread Shi Yu
Hello, *Question 1: *I am new to Spark. I am trying to train classification model on Spark DataFrame. I am using PySpark. And aFrame object in df:ted a Spark DataFrame object in df: from pyspark.sql.types import * query = """select * from table""" df = sqlContext.sql(query) My question is

Re: obtain cluster assignment in K-means

2015-02-12 Thread Shi Yu
, 2nd is RDD[Vector] Robin On 12 Feb 2015, at 06:37, Shi Yu shiyu@gmail.com wrote: Hi there, I am new to spark. When training a model using K-means using the following code, how do I obtain the cluster assignment in the next step? val clusters = KMeans.train(parsedData, numClusters

obtain cluster assignment in K-means

2015-02-11 Thread Shi Yu
Hi there, I am new to spark. When training a model using K-means using the following code, how do I obtain the cluster assignment in the next step? val clusters = KMeans.train(parsedData, numClusters, numIterations) I searched around many examples but they mostly calculate the WSSSE. I am