Cloudera 5.8.0 and spark 2.1.1

2017-05-17 Thread issues solution
Hi , it s possible to use prebuilt version of spark 2.1 inside cloudera 5.8 where scala 2.1.0 not scala 2.1.1 and java 1.7 not java 1.8 Why ? i am in corporate area and i want to test last version of spark. but my probleme i dont Know if the version 2.1.1 of spark can or not work with this

Re: save SPark ml

2017-05-15 Thread issues solution
Hi , please i need help about that question 2017-05-15 10:32 GMT+02:00 issues solution <issues.solut...@gmail.com>: > Hi, > I am under Pyspark 1.6 i want save my model in hdfs file like parquet > > how i can do this ? > > > My model it s a Rando

save SPark ml

2017-05-15 Thread issues solution
Hi, I am under Pyspark 1.6 i want save my model in hdfs file like parquet how i can do this ? My model it s a RandomForestClassifier performed with corssvalidation like this rf_csv2 = CrossValidator() how i can save it ? thx for adavance

CROSSVALIDATION and hypotetic fail

2017-05-12 Thread issues solution
Hi , often we preform a grid search and Cross validation under pyspark to find best perameters , but when you have in error not related to computation but to networks or any think else . HOW WE CAN SAVE INTERMADAITE RESULT ,particulary when you have a large process during 3 or 4 days

CrossValidator and stackoverflowError

2017-05-10 Thread issues solution
Hi , when i try to perform CrossValidator i get the stackoverflowError i have aleardy perform all necessary transforimation Stringindexer vector and save data frame in HDFS like parquet afeter that i load all in new data frame and split to train and test when i try fit(train_set) i get

URGENT :

2017-05-10 Thread issues solution
Hi , i know you busy about questions but i don't undestand : 1- why we dont have features importance inside pyspakr features ? 2- why we can't use cache data frame with cross validation ? 3- why the documnetation it s not clear when we talk about pyspark ? you can understand

features IMportance

2017-05-10 Thread issues solution
Hi , some one can tell me if we have features importance inside pyspark 1.6.0 thx

SPARK randomforestclassifer and balancing classe

2017-05-09 Thread issues solution
HI i have aleardy ask this question but i still without ansewr somone can help me to figure out who i can balance my class when i use fit methode of randomforestclassifer thx for adavance.

Crossvalidator after fit

2017-05-05 Thread issues solution
Hi get the following error after trying to perform gridsearch and crossvalidation on randomforst estimator for classificaiton rf = RandomForestClassifier(labelCol="Labeld",featuresCol="features") evaluator = BinaryClassificationEvaluator(metricName="F1 Score") rf_cv =

imbalance classe inside RANDOMFOREST CLASSIFIER

2017-05-05 Thread issues solution
Hi , in sicki-learn we have sample_weights option that allow us to create array to balacne class category By calling like that rf.fit(X,Y,sample_weights=[10 10 10 ...1 1 10 ]) i 'am wondering if equivelent exist inside ml or mlib class ??? if yes can i ask refrence or example thx for

Normalize columns items for Onehotencoder

2017-05-04 Thread issues solution
Hi, I have 3 data frame with not same items inside labled values i mean : data frame 1 collabled a b c dataframe2 collabled a w z when i enode the first data fram i get collabled ab c a1 0 0 b 01 0 c

Create multiple columns in pyspak with one shot

2017-05-04 Thread issues solution
Hi , How we can create multiple columns iteratively i mean how you can create empty columns inside loop because : with for i in listl : df = df.withcolumn(i,F.lit(0)) we get stackoverflow how we can do that inside list of columns like that df.select([F.col(i).lit(0) for i in

spark 1.6 .0 and gridsearchcv

2017-05-03 Thread issues solution
Hi , i wonder if we have methode under pyspakr 1.6 to perform gridsearchCv ? if yes can i ask example please . thx

Re: java.lang.java.lang.UnsupportedOperationException

2017-04-19 Thread issues solution
Pyspark 1.6 On cloudera 5.5 (yearn) 2017-04-19 13:42 GMT+02:00 issues solution <issues.solut...@gmail.com>: > Hi , > somone can tell me why i get the folowing error with udf apply like udf > > def replaceCempty(x): > if x is None : > return "&q

java.lang.java.lang.UnsupportedOperationException

2017-04-19 Thread issues solution
Hi , somone can tell me why i get the folowing error with udf apply like udf def replaceCempty(x): if x is None : return "" else : return x.encode('utf-8') udf_replaceCempty = F.udf(replaceCempty,StringType()) dfTotaleNormalize53 = dfTotaleNormalize52.select([i if i

create column with map function apply to dataframe

2017-04-14 Thread issues solution
Hi , how you can create column inside map function like that : df.map(lambd l : len(l) ) . but instead return rdd we create column insde data frame .

checkpoint

2017-04-14 Thread issues solution
Hi somone can give me an complete example to work with chekpoint under Pyspark 1.6 ? thx regards

how to master cache and chekpoint for pyspark

2017-04-13 Thread issues solution
hi can ask you to give me example (complete) where : you use udf multiple time one after one and cache after that your data frame or you checkpoint dataframe according to appropriate steps (cache or checkpoint) thanks

Number of column in data frame

2017-04-13 Thread issues solution
Hi , the number of columns that spark can handle without fuss regards

How to coorect code after java.lang.stackoverflow

2017-04-13 Thread issues solution
Hi , i wonder if we have solution to correct code after getting stackoverflow error i mean you have df.<- transformation 1 df.<- transformation 12 df.<- transformation 3 df.<- transformation 4 . . . df.<- transformation 1n and : df.<- transformation n+1 get error stack overflow error how

checkpoint how to use correctly checkpoint with udf

2017-04-13 Thread issues solution
Hi , somone can explain me how i can use inPYSPAK not in scala chekpoint , Because i have lot of udf to apply on large data frame and i dont understand how i can use checkpoint to break lineag to prevent from java.lang.stackoverflow regrads

why we can t apply udf on rdd ???

2017-04-13 Thread issues solution
hi what kind of orgine of this error ??? java.lang.UnsupportedOperationException: Cannot evaluate expression: PythonUDF#Grappra(input[410, StringType]) regrads

checkpoint

2017-04-13 Thread issues solution
Hi I am newer in spark and i want ask you what wrang with checkpoint On pyspark 1.6.0 i dont unertsand what happen after i try to use it under datframe : dfTotaleNormalize24 = dfTotaleNormalize23.select([i if i not in listrapcot else udf_Grappra(F.col(i)).alias(i) for i in