Re: Weight column values not used in Binary Logistic Regression Summary

2017-12-09 Thread Sea aj
Hello everyone, I have a data frame which has two columns: ids and features each cell in feature column is an array of Vectors.dense type. like: [(DenseVector([0.5692]),), (DenseVector([0.5086]),)] I need to train a new model for every single row of my data frame. How can I do it? ‌ On

Re: Training A ML Model on a Huge Dataframe

2017-08-23 Thread Sea aj
> > On 23 August 2017 at 14:27, Sea aj <saj3...@gmail.com> wrote: > >> Hi, >> >> I am trying to feed a huge dataframe to a ml algorithm in Spark but it >> crashes due to the shortage of memory. >> >> Is there a way to train the model on a subset

Training A ML Model on a Huge Dataframe

2017-08-23 Thread Sea aj
Hi, I am trying to feed a huge dataframe to a ml algorithm in Spark but it crashes due to the shortage of memory. Is there a way to train the model on a subset of the data in multiple steps? Thanks Sent with Mailtrack

Re: UI for spark machine learning.

2017-08-22 Thread Sea aj
st linear > regression? Probably your model would do equally well with much less > samples. Have you checked bias and variance if you use much less random > samples? > > On 22. Aug 2017, at 12:58, Sea aj <saj3...@gmail.com> wrote: > > I have a large dataframe of 1 billio

Re: UI for spark machine learning.

2017-08-22 Thread Sea aj
I have a large dataframe of 1 billion rows of type LabeledPoint. I tried to train a linear regression model on the df but it failed due to lack of memory although I'm using 9 slaves, each with 100gb of ram and 16 cores of CPU. I decided to split my data into multiple chunks and train the model in

Re: SPARK Issue in Standalone cluster

2017-08-22 Thread Sea aj
Hi everyone, I have a huge dataframe with 1 billion rows and each row is a nested list. That being said, I want to train some ML models on this df but due to the huge size, I get out memory error on one of my nodes when I run fit function. currently, my configuration is: 144 cores, 16 cores for

Reading csv.gz files

2017-07-05 Thread Sea aj
I need to import a set of files with csv.gz extension into Spark. each file contains a table of data. I was wondering if anyone knows how to read it? Sent with Mailtrack

How does Spark deal with Data Skewness?

2017-06-22 Thread Sea aj
Hi everyone, I have read about some interesting ideas on how to manage skew but I was not sure if any of these techniques are being used in Spark 2.x versions or not? To name a few, "Salting the Data" and "Dynamic Repartitioning" are techniques introduced in Spark Summits. I am really curious to