Fwd: Problem with Execution plan using loop

2017-04-15 Thread Javier Rey
fg", "perimetro_abdominal", "presion_sistolica", "presion_diastolica", "imc", "peso", "talla", "frecuencia_cardiaca", "saturacion_oxigeno", "porcentaje_grasa"] -- df = create_lag_columns(df, 6, columns_to_lag) Thanks, Javier Rey

Problem with Execution plan using loop

2017-04-15 Thread Javier Rey
"presion_sistolica", "presion_diastolica", "imc", "peso", "talla", "frecuencia_cardiaca", "saturacion_oxigeno", "porcentaje_grasa"] -- df = create_lag_columns(df, 6, columns_to_lag) Thanks, Javier Rey

unsuscribe

2016-12-16 Thread Javier Rey

Re: Sum array values by row in new column

2016-08-16 Thread Javier Rey
er <m...@flexiblecreations.com > > wrote: > >> Assuming you know the number of elements in the list, this should work: >> >> df.withColumn('total', df["_1"].getItem(0) + df["_1"].getItem(1) + >> df["_1"].getItem(2)) >> >&g

Sum array values by row in new column

2016-08-15 Thread Javier Rey
Hi everyone, I have one dataframe with one column this column is an array of numbers, how can I sum each array by row a obtain a new column with sum? in pyspark. Example: ++ | numbers| ++ |[10, 20, 30]| |[40, 50, 60]| |[70, 80, 90]| ++ The idea is obtain

Re: na.fill doesn't work

2016-08-11 Thread Javier Rey
Thanks Assem I'll check this. Samir On Aug 11, 2016 4:39 AM, "Aseem Bansal" <asmbans...@gmail.com> wrote: > Check the schema of the data frame. It may be that your columns are > String. You are trying to give default for numerical data. > > On Thu, Aug 11, 2016

na.fill doesn't work

2016-08-10 Thread Javier Rey
Hi everybody, I have a data frame after many transformation, my final task is fill na's with zeros, but I run this command : df_fil1 = df_fil.na.fill(0), but this command doesn't work nulls doesn't disappear. I did a toy test it works correctly. I don't understand what happend. Thanks in

Random forest binary classification H20 difference Spark

2016-08-07 Thread Javier Rey
Hi everybody. I have executed RF on H2O I didn't troubles with nulls values, by in contrast in Spark using dataframes and ML library I obtain this error,l I know my dataframe contains nulls, but I understand that Random Forest supports null values: "Values to assemble cannot be null" Any

Add column sum as new column in PySpark dataframe

2016-08-04 Thread Javier Rey
Hi everybody, Sorry, I sent last mesage it was imcomplete this is complete: I'm using PySpark and I have a Spark dataframe with a bunch of numeric columns. I want to add a column that is the sum of all the other columns. Suppose my dataframe had columns "a", "b", and "c". I know I can do this:

Add column sum as new column in PySpark dataframe

2016-08-04 Thread Javier Rey
I'm using PySpark and I have a Spark dataframe with a bunch of numeric columns. I want to add a column that is the sum of all the other columns. Suppose my dataframe had columns "a", "b", and "c". I know I can do this:

Spark crashes with two parquet files

2016-07-10 Thread Javier Rey
Hi everybody, I installed Spark 1.6.1, I have two parquet files, but when I need show registers using unionAll, Spark crash I don't understand what happens. But when I use show() only one parquet file this is work correctly. code with fault: path = '/data/train_parquet/' train_df =