from:"Javier Rey"

Fwd: Problem with Execution plan using loop

2017-04-15 Thread Javier Rey

fg", "perimetro_abdominal", "presion_sistolica", "presion_diastolica", "imc", "peso", "talla", "frecuencia_cardiaca", "saturacion_oxigeno", "porcentaje_grasa"] -- df = create_lag_columns(df, 6, columns_to_lag) Thanks, Javier Rey

Problem with Execution plan using loop

2017-04-15 Thread Javier Rey

"presion_sistolica", "presion_diastolica", "imc", "peso", "talla", "frecuencia_cardiaca", "saturacion_oxigeno", "porcentaje_grasa"] -- df = create_lag_columns(df, 6, columns_to_lag) Thanks, Javier Rey

unsuscribe

2016-12-16 Thread Javier Rey

Re: Sum array values by row in new column

2016-08-16 Thread Javier Rey

er <m...@flexiblecreations.com > > wrote: > >> Assuming you know the number of elements in the list, this should work: >> >> df.withColumn('total', df["_1"].getItem(0) + df["_1"].getItem(1) + >> df["_1"].getItem(2)) >> >&g

Sum array values by row in new column

2016-08-15 Thread Javier Rey

Hi everyone, I have one dataframe with one column this column is an array of numbers, how can I sum each array by row a obtain a new column with sum? in pyspark. Example: ++ | numbers| ++ |[10, 20, 30]| |[40, 50, 60]| |[70, 80, 90]| ++ The idea is obtain

Re: na.fill doesn't work

2016-08-11 Thread Javier Rey

Thanks Assem I'll check this. Samir On Aug 11, 2016 4:39 AM, "Aseem Bansal" <asmbans...@gmail.com> wrote: > Check the schema of the data frame. It may be that your columns are > String. You are trying to give default for numerical data. > > On Thu, Aug 11, 2016

na.fill doesn't work

2016-08-10 Thread Javier Rey

Hi everybody, I have a data frame after many transformation, my final task is fill na's with zeros, but I run this command : df_fil1 = df_fil.na.fill(0), but this command doesn't work nulls doesn't disappear. I did a toy test it works correctly. I don't understand what happend. Thanks in

Random forest binary classification H20 difference Spark

2016-08-07 Thread Javier Rey

Hi everybody. I have executed RF on H2O I didn't troubles with nulls values, by in contrast in Spark using dataframes and ML library I obtain this error,l I know my dataframe contains nulls, but I understand that Random Forest supports null values: "Values to assemble cannot be null" Any

Add column sum as new column in PySpark dataframe

2016-08-04 Thread Javier Rey

Hi everybody, Sorry, I sent last mesage it was imcomplete this is complete: I'm using PySpark and I have a Spark dataframe with a bunch of numeric columns. I want to add a column that is the sum of all the other columns. Suppose my dataframe had columns "a", "b", and "c". I know I can do this:

Add column sum as new column in PySpark dataframe

2016-08-04 Thread Javier Rey

I'm using PySpark and I have a Spark dataframe with a bunch of numeric columns. I want to add a column that is the sum of all the other columns. Suppose my dataframe had columns "a", "b", and "c". I know I can do this:

Spark crashes with two parquet files

2016-07-10 Thread Javier Rey

Hi everybody, I installed Spark 1.6.1, I have two parquet files, but when I need show registers using unionAll, Spark crash I don't understand what happens. But when I use show() only one parquet file this is work correctly. code with fault: path = '/data/train_parquet/' train_df =

Fwd: Problem with Execution plan using loop

Problem with Execution plan using loop

unsuscribe

Re: Sum array values by row in new column

Sum array values by row in new column

Re: na.fill doesn't work

na.fill doesn't work

Random forest binary classification H20 difference Spark

Add column sum as new column in PySpark dataframe

Add column sum as new column in PySpark dataframe

Spark crashes with two parquet files

11 matches

Site Navigation

Mail list logo

Footer information