Tensor Flow

2016-12-12 Thread Meeraj Kunnumpurath
Hello, Is there anything available in Spark similar to Tensor Flow? I am looking at a mechanism for performing nearest neighbour search on vectorized image data. Regards -- *Meeraj Kunnumpurath* *Director and Executive PrincipalService Symphony Ltd00 44 7702 693597* *00 971 50 409 0169mee

Logistic regression using gradient ascent

2016-11-30 Thread Meeraj Kunnumpurath
. https://github.com/kunnum/sandbox/blob/master/classification/src/main/scala/com/ss/ml/classification/lr/LRWithGradientAscent.scala Regards -- *Meeraj Kunnumpurath* *Director and Executive PrincipalService Symphony Ltd00 44 7702 693597* *00 971 50 409 0169mee...@servicesymphony.com <

Re: UDF for gradient ascent

2016-11-26 Thread Meeraj Kunnumpurath
One thing I noticed inside the UDF is that original column names from the data frame have disappeared and the columns are called col1, col2 etc. Regards Meeraj On Sat, Nov 26, 2016 at 7:31 PM, Meeraj Kunnumpurath < mee...@servicesymphony.com> wrote: > Hello, > > I have a data

UDF for gradient ascent

2016-11-26 Thread Meeraj Kunnumpurath
line 2445 in the generated code, /* 2445 */ Object project_arg = scan_isNull1 ? null : project_converter2.apply(scan_value1); Many thanks -- *Meeraj Kunnumpurath* *Director and Executive PrincipalService Symphony Ltd00 44 7702 693597* *00 971 50 409 0169mee...@servicesymphony.com <mee...@servicesymphony.com>*

Re: Logistic Regression Match Error

2016-11-19 Thread Meeraj Kunnumpurath
Thank you, it was the escape character, option("escape", "\"") Regards On Sat, Nov 19, 2016 at 11:10 PM, Meeraj Kunnumpurath < mee...@servicesymphony.com> wrote: > I triied .option("quote", "\""), which I believe is the default, still

Re: Logistic Regression Match Error

2016-11-19 Thread Meeraj Kunnumpurath
I should have done in the first place.",2 On Sat, Nov 19, 2016 at 10:59 PM, Meeraj Kunnumpurath < mee...@servicesymphony.com> wrote: > Digging through it looks like an issue with reading CSV. Some of the data > have embedded commas in them, these fields are rightly quoted. However,

Re: Logistic Regression Match Error

2016-11-19 Thread Meeraj Kunnumpurath
the fields, otherwise they are unquoted. Regards Meeraj On Sat, Nov 19, 2016 at 10:10 PM, Meeraj Kunnumpurath < mee...@servicesymphony.com> wrote: > Hello, > > I have the following code that trains a mapping of review text to ratings. > I use a tokenizer to get all the wor

Logistic Regression Match Error

2016-11-19 Thread Meeraj Kunnumpurath
anager.scala:910) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:866) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:910) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:668) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:330) Many thanks -- *Meeraj Kunnumpurath* *Director and Executive PrincipalService Symphony Ltd00 44 7702 693597* *00 971 50 409 0169mee...@servicesymphony.com <mee...@servicesymphony.com>*

Re: Nearest neighbour search

2016-11-13 Thread Meeraj Kunnumpurath
.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140

Re: Nearest neighbour search

2016-11-13 Thread Meeraj Kunnumpurath
mA) * Math.sqrt(normB))) } def dorProduct(vectorA: Vector, vectorB: Vector) = { var dp = 0.0 var index = vectorA.size - 1 for (i <- 0 to index) { dp += vectorA(i) * vectorB(i) } dp } On Sun, Nov 13, 2016 at 7:04 PM, Meeraj Kunnumpurath < mee...@servicesy

Nearest neighbour search

2016-11-13 Thread Meeraj Kunnumpurath
Col("_c2").setOutputCol("words") val tf = new HashingTF().setInputCol("words").setOutputCol("tf") val idf = new IDF().setInputCol("tf").setOutputCol("tf-idf") val df1 = tf.transform(tk.transform(df)) idf.fit(df1).transform(df1

Re: RowMatrix from DenseVector

2016-10-13 Thread Meeraj Kunnumpurath
Apologies, oversight, I had a mix of mllib and ml imports. On Thu, Oct 13, 2016 at 2:27 PM, Meeraj Kunnumpurath < mee...@servicesymphony.com> wrote: > Hello, > > How do I create a row matrix from a dense vector. The following code, > doesn't compile. > > val

RowMatrix from DenseVector

2016-10-13 Thread Meeraj Kunnumpurath
er error Error:(24, 33) type mismatch; found : org.apache.spark.rdd.RDD[org.apache.spark.ml.linalg.Vector] required: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] val rowMatrix = new RowMatrix(features, features.count(), 2) ^ -- *Meeraj Ku

Re: Linear Regression Error

2016-10-12 Thread Meeraj Kunnumpurath
If I drop the last feature on the third model, the error seems to go away. On Wed, Oct 12, 2016 at 11:52 PM, Meeraj Kunnumpurath < mee...@servicesymphony.com> wrote: > Hello, > > I have some code trying to compare linear regression coefficients with > three sets of featur

Linear Regression Error

2016-10-12 Thread Meeraj Kunnumpurath
n(App.scala:76) at com.ss.ml.regression.MultipleRegression$.main(MultipleRegression.scala:12) at com.ss.ml.regression.MultipleRegression.main(MultipleRegression.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140) Does anyone know what is going wrong here? Many thanks -- *Meeraj Kunnumpurath* *Director and Executive PrincipalService Symphony Ltd00 44 7702 693597* *00 971 50 409 0169mee...@servicesymphony.com <mee...@servicesymphony.com>*

Matrix Operations

2016-10-12 Thread Meeraj Kunnumpurath
Hello, Does anyone have examples of doing Matrix operations (multiplication, transpose, inverse etc) using the Spark ML API? Many thanks -- *Meeraj Kunnumpurath* *Director and Executive PrincipalService Symphony Ltd00 44 7702 693597* *00 971 50 409 0169mee...@servicesymphony.com <

Re: UDF on multiple columns

2016-10-12 Thread Meeraj Kunnumpurath
t;sales") spark.sql("select bedrooms * bedrooms, bedrooms * bathrooms, lat + long, log(sqft_living), price from sales") } On Wed, Oct 12, 2016 at 9:56 PM, Meeraj Kunnumpurath < mee...@servicesymphony.com> wrote: > Hello, > > How do I write a UDF that operate on two

UDF on multiple columns

2016-10-12 Thread Meeraj Kunnumpurath
Hello, How do I write a UDF that operate on two columns. For example, how do I introduce a new column, which is a product of two columns already on the dataframe. Many thanks Meeraj