Running R codes in sparkR

2016-05-31 Thread Arunkumar Pillai
Hi I have some basic doubt regarding spark R. 1. can we run R codes in spark using sparkR or some spark functionalities that are executed in spark through R. -- Thanks and Regards Arun

Calculation of histogram bins and frequency in Apache spark 1.6

2016-02-23 Thread Arunkumar Pillai
Hi Is there any predefined method to calculate histogram bins and frequency in spark. Currently I take range and find bins then count frequency using SQL query. Is there any better way

Percentile calculation in spark 1.6

2016-02-23 Thread Arunkumar Pillai
How to calculate percentile in spark 1.6 ? -- Thanks and Regards Arun

Re: spark 1.6 Not able to start spark

2016-02-23 Thread Arunkumar Pillai
nk this may be some permission issue. Check your spark conf for > hadoop related. > > -- > fightf...@163.com > > > *From:* Arunkumar Pillai <arunkumar1...@gmail.com> > *Date:* 2016-02-23 14:08 > *To:* user <user@spark.apache.org> >

spark 1.6 Not able to start spark

2016-02-22 Thread Arunkumar Pillai
Hi When i try to start spark-shell I'm getting following error Exception in thread "main" java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131) at

Logistic Regression using ML Pipeline

2016-02-18 Thread Arunkumar Pillai
Hi I'm trying to build logistic regression using ML Pipeline val lr = new LogisticRegression() lr.setFitIntercept(true) lr.setMaxIter(100) val model = lr.fit(data) println(model.summary) I'm getting coefficients but not able to get the predicted and probability values.

Extract all the values from describe

2016-02-08 Thread Arunkumar Pillai
hi I have a dataframe df and i use df.decribe() to get the stats value. but not able to parse and extract all the individual information. Please help -- Thanks and Regards Arun

[Spark 1.5.1] percentile in spark

2016-02-08 Thread Arunkumar Pillai
Hi I'm using sql query find the percentile value. Is there any pre defined functions for percentile calculation -- Thanks and Regards Arun

[Spark 1.6] Mismatch in kurtosis values

2016-02-04 Thread Arunkumar Pillai
Hi I have observed that kurtosis values coming from apache spark has a difference of 3. The value coming from excel and in R as same values(11.333) but the kurtosis value coming from spark1.6 differs by 3 (8.333). Please let me know if I'm doing something wrong. I'm executing via

Need to user univariate summary stats

2016-02-04 Thread Arunkumar Pillai
Hi I'm currently using query sqlContext.sql("SELECT MAX(variablesArray) FROM " + tableName) to extract mean max min. is there any better optimized way ? In the example i saw df.groupBy("key").agg(skewness("a"), kurtosis("a")) But i don't have key anywhere in the data. How to extract the

[Spark 1.6] Univariate Stats using apache spark

2016-02-04 Thread Arunkumar Pillai
Hi Currently after creating a dataframe i'm queryingmax max min mean it to get result. sqlContext.sql("SELECT MAX(variablesArray) FROM " + tableName) Is this an optimized way? I'm not able to find the all stats like min max mean variance skewness kurtosis directly from a dataframe Please

AIC in Linear Regression in ml pipeline

2016-01-15 Thread Arunkumar Pillai
Hi Is it possible to get AIC value in Linear Regression using ml pipeline ? Is so please help me -- Thanks and Regards Arun

LogisticsRegression in ML pipeline help page

2016-01-06 Thread Arunkumar Pillai
Hi I need help page for Logistics Regression in ML pipeline. when i browsed I'm getting the 1.6 help please help me. -- Thanks and Regards Arun

finding distinct count using dataframe

2016-01-05 Thread Arunkumar Pillai
Hi Is there any functions to find distinct count of all the variables in dataframe. val sc = new SparkContext(conf) // spark context val options = Map("header" -> "true", "delimiter" -> delimiter, "inferSchema" -> "true") val sqlContext = new org.apache.spark.sql.SQLContext(sc) // sql context

Re: finding distinct count using dataframe

2016-01-05 Thread Arunkumar Pillai
atasetDF.select(countDistinct(col1, col2, col3, ...)) or > approxCountDistinct for a approximate result. > > 2016-01-05 17:11 GMT+08:00 Arunkumar Pillai <arunkumar1...@gmail.com>: > >> Hi >> >> Is there any functions to find distinct count of all the variables in

Re: GLM I'm ml pipeline

2016-01-03 Thread Arunkumar Pillai
;: > >> keyStoneML could be an alternative. >> >> Ardo. >> >> On 03 Jan 2016, at 15:50, Arunkumar Pillai <arunkumar1...@gmail.com> >> wrote: >> >> Is there any road map for glm in pipeline? >> >> > -- Thanks and Regards Arun

GLM I'm ml pipeline

2016-01-03 Thread Arunkumar Pillai
Is there any road map for glm in pipeline?

Re: Extract SSerr SStot from Linear Regression using ml package

2015-12-22 Thread Arunkumar Pillai
, meanSquaredError, > rootMeanSquaredError and r2 as metrics of LinearRegression. > Although actually you can get SSerr, SStot and SSreg from the composition > of above metrics. > > Yanbo > > > 2015-12-22 12:23 GMT+08:00 Arunkumar Pillai <arunkumar1...@gmail.com>: &g

Extract SSerr SStot from Linear Regression using ml package

2015-12-21 Thread Arunkumar Pillai
Hi I'm using Linear Regression using ml package I'm able to see SSerr SStot and SSreg from val model = lr.fit(dat1) model.summary.metric But this metric is not accessible. It would be good if we can get those values. Any suggestion -- Thanks and Regards Arun

Creating vectors from a dataframe

2015-12-20 Thread Arunkumar Pillai
Hi I'm trying to use Linear Regression from ml library but the problem is the independent variable should be a vector. My code snippet is as as follows var dataDF = sqlContext.emptyDataFrame dataDF = sqlContext.sql("SELECT "+ dependentVariable+","+independentVariables +" FROM " +

Getting estimates and standard error using ml.LinearRegression

2015-12-20 Thread Arunkumar Pillai
Hi I'm using ml.LinearRegession package How to get estimates and standard Error for the coefficient PFB the code snippet val lr = new LinearRegression() lr.setMaxIter(10) .setRegParam(0.01) .setFitIntercept(true) val model= lr.fit(test) val estimates = model.summary

Matrix Inverse

2015-12-17 Thread Arunkumar Pillai
Hi I want to find matrix inverse of (XTranspose * X). PFB my code. This code does not work for even slight larger dataset. Please help me if the approach is correct. val sqlQuery = "SELECT column1,column2 ,column3 FROM " + tableName val matrixDF` = sqlContext.sql(sqlQuery) var

Intercept in Linear Regression

2015-12-15 Thread Arunkumar Pillai
How to get intercept in Linear Regression Model? LinearRegressionWithSGD.train(parsedData, numIterations) -- Thanks and Regards Arun

Need clarifications in Regression

2015-12-15 Thread Arunkumar Pillai
Hi The Regression algorithm in the MLlib is using Loss function to calculate the regression estimates and R is using matrix method to calculate the estimates. I see some difference between the results of Both Spark and R. I was using the following class LinearRegressionWithSGD.train(parsedData,

Linear Regression with OLS

2015-12-14 Thread Arunkumar Pillai
Hi I need an exmaple for Linear Regression using OLS val data = sc.textFile("data/mllib/ridge-data/lpsa.data") val parsedData = data.map { line => val parts = line.split(',') LabeledPoint(parts(0).toDouble, Vectors.dense(parts(1).split(' ').map(_.toDouble))) }.cache() // Building the model

Inverse of the matrix

2015-12-10 Thread Arunkumar Pillai
Hi I need to find inverse (X(Transpose) * X) matrix. I have found X transpose and matrix multiplication. is there any way to find to find the inverse of the matrix. -- Thanks and Regards Arun

GLM in apache spark in MLlib

2015-12-09 Thread Arunkumar Pillai
Hi I'm started using apache spark 1.5.2 version. I'm able to see GLM using SparkR but it is not there in MLlib. Is there any plans or road map for that -- Thanks and Regards Arun