from:"ndjido"

Re: unit testing in spark

2016-12-08 Thread ndjido

Hi Pseudo, Just use unittest https://docs.python.org/2/library/unittest.html . > On 8 Dec 2016, at 19:14, pseudo oduesp wrote: > > somone can tell me how i can make unit test on pyspark ? > (book, tutorial ...)

Re: how to generate a column using mapParition and then add it back to the df?

2016-08-08 Thread ndjido

Hi MoTao, What about broadcasting the model? Cheers, Ndjido. > On 08 Aug 2016, at 11:00, MoTao <mo...@sensetime.com> wrote: > > Hi all, > > I'm trying to append a column to a df. > I understand that the new column must be created by > 1) using literals, > 2) t

Re: add spark-csv jar to ipython notbook without packages flags

2016-07-25 Thread Ndjido Ardo BAR

Hi Pseudo, try this : export SPARK_SUBMIT_OPTIONS = "--jars spark-csv_2.10-1.4.0.jar, commons-csv-1.1.jar" this have been working for me for a longtime ;-) both in Zeppelin(for Spark Scala) and Ipython Notebook (for PySpark). cheers, Ardo On Mon, Jul 25, 2016 at 1:28 PM, pseudo oduesp

Re: lift coefficien

2016-07-22 Thread ndjido

Just apply Lift = Recall / Support formula with respect to a given threshold on your population distribution. The computation is quite straightforward. Cheers, Ardo > On 20 Jul 2016, at 15:05, pseudo oduesp wrote: > > Hi , > how we can claculate lift coeff from

Re: add multiple columns

2016-06-26 Thread ndjido

Hi guy! I'm afraid you have to loop...The update of the Logical Plan is getting faster on Spark. Cheers, Ardo. Sent from my iPhone > On 26 Jun 2016, at 14:20, pseudo oduesp wrote: > > Hi who i can add multiple columns to data frame > > withcolumns allow to add one

Re: Labeledpoint

2016-06-21 Thread Ndjido Ardo BAR

To answer more accurately to your question, the model.fit(df) method takes in a DataFrame of Row(label=double, feature=Vectors.dense([...])) . cheers, Ardo. On Tue, Jun 21, 2016 at 6:44 PM, Ndjido Ardo BAR <ndj...@gmail.com> wrote: > Hi, > > You can use a RDD of LabelPoints to

Re: Labeledpoint

2016-06-21 Thread Ndjido Ardo BAR

Hi, You can use a RDD of LabelPoints to fit your model. Check the doc for more example : http://spark.apache.org/docs/latest/api/python/pyspark.ml.html?highlight=transform#pyspark.ml.classification.RandomForestClassificationModel.transform cheers, Ardo. On Tue, Jun 21, 2016 at 6:12 PM, pseudo

Re: H2O + Spark Streaming?

2016-05-05 Thread ndjido

Sure! Check the following working example : https://github.com/h2oai/qcon2015/tree/master/05-spark-streaming/ask-craig-streaming-app Cheers. Ardo Sent from my iPhone > On 05 May 2016, at 17:26, diplomatic Guru wrote: > > Hello all, I was wondering if it is

Re: Mllib using model to predict probability

2016-05-05 Thread ndjido

You can user the BinaryClassificationEvaluator class to get both predicted classes (0/1) and probabilities. Check the following spark doc https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html . Cheers, Ardo Sent from my iPhone > On 05 May 2016, at 07:59, colin

Re: prefix column Spark

2016-04-19 Thread Ndjido Ardo BAR

This can help: import org.apache.spark.sql.DataFrame def prefixDf(dataFrame: DataFrame, prefix: String): DataFrame = { val colNames = dataFrame.columns colNames.foldLeft(dataFrame){ (df, colName) => { df.withColumnRenamed(colName, s"${prefix}_${colName}") } } }

Re: Calling Python code from Scala

2016-04-18 Thread Ndjido Ardo BAR

Hi Didier, I think with PySpark you can wrap your legacy Python functions into UDFs and use it in your DataFrames. But you have to use DataFrames instead of RDD. cheers, Ardo On Mon, Apr 18, 2016 at 7:13 PM, didmar wrote: > Hi, > > I have a Spark project in Scala and I

Re: How to estimate the size of dataframe using pyspark?

2016-04-09 Thread Ndjido Ardo BAR

What's the size of your driver? On Sat, 9 Apr 2016 at 20:33, Buntu Dev wrote: > Actually, df.show() works displaying 20 rows but df.count() is the one > which is causing the driver to run out of memory. There are just 3 INT > columns. > > Any idea what could be the reason? >

Re: Sample project on Image Processing

2016-02-22 Thread ndjido

Hi folks, KeystoneML has some image processing features: http://keystone-ml.org/examples.html Cheers, Ardo Sent from my iPhone > On 22 Feb 2016, at 14:34, Sainath Palla wrote: > > Here is one simple example of Image classification in Java. > >

Re: Pyspark - How to add new column to dataframe based on existing column value

2016-02-10 Thread ndjido

Hi Viktor, Try to create a UDF. It's quite simple! Ardo. > On 10 Feb 2016, at 10:34, Viktor ARDELEAN wrote: > > Hello, > > I want to add a new String column to the dataframe based on an existing > column values: > > from pyspark.sql.functions import lit >

Issue with spark-shell in yarn mode

2016-01-26 Thread ndjido

Hi folks, On Spark 1.6.0, I submitted 2 lines of code via spark-shell in Yarn-client mode: 1) sc.parallelize(Array(1,2,3,3,3,3,4)).collect() 2) sc.parallelize(Array(1,2,3,3,3,3,4)).map( x => (x, 1)).collect() 1) works well whereas 2) raises the following exception: Driver stacktrace:

Re: GLM I'm ml pipeline

2016-01-03 Thread ndjido

keyStoneML could be an alternative. Ardo. > On 03 Jan 2016, at 15:50, Arunkumar Pillai wrote: > > Is there any road map for glm in pipeline?

Re: Can't filter

2015-12-10 Thread Ndjido Ardo Bar

Please send your call stack with the full description of the exception . > On 10 Dec 2015, at 12:10, Бобров Виктор wrote: > > Hi, I can’t filter my rdd. > > def filter1(tp: ((Array[String], Int), (Array[String], Int))): Boolean= { > tp._1._2 > tp._2._2 > } > val mail_rdd =

Re: RDD functions

2015-12-04 Thread Ndjido Ardo BAR

Hi Michal, I think the following link could interest you. You gonna find there a lot of examples! http://homepage.cs.latrobe.edu.au/zhe/ZhenHeSparkRDDAPIExamples.html cheers, Ardo On Fri, Dec 4, 2015 at 2:31 PM, Michal Klos wrote: >

Re: Grid search with Random Forest

2015-12-01 Thread Ndjido Ardo BAR

Thanks for the clarification. Gonna test that and give you feedbacks. Ndjido On Tue, 1 Dec 2015 at 19:29, Joseph Bradley <jos...@databricks.com> wrote: > You can do grid search if you set the evaluator to a > MulticlassClassificationEvaluator, which expects a predi

Re: Grid search with Random Forest

2015-11-30 Thread Ndjido Ardo BAR

hould work with 1.5+. > > On Thu, Nov 26, 2015 at 12:53 PM, Ndjido Ardo Bar <ndj...@gmail.com> > wrote: > >> >> Hi folks, >> >> Does anyone know whether the Grid Search capability is enabled since the >> issue spark-9011 of version 1.4.0 ? I'm

Re: Grid search with Random Forest

2015-11-30 Thread Ndjido Ardo BAR

Hi Benjamin, Thanks, the documentation you sent is clear. Is there any other way to perform a Grid Search with GBT? Ndjido On Tue, 1 Dec 2015 at 08:32, Benjamin Fradet <benjamin.fra...@gmail.com> wrote: > Hi Ndjido, > > This is because GBTClassifier doesn't yet have a rawPre

Re: Debug Spark

2015-11-29 Thread Ndjido Ardo BAR

ek Galstyan > > Նարեկ Գալստյան > > On 29 November 2015 at 20:51, Ndjido Ardo BAR <ndj...@gmail.com> wrote: > >> Masf, the following link sets the basics to start debugging your spark >> apps in local mode: >> >> >> https://medium.com/large-scal

Re: Debug Spark

2015-11-29 Thread Ndjido Ardo BAR

com> wrote: > Hi Ardo > > > Some tutorial to debug with Intellij? > > Thanks > > Regards. > Miguel. > > > On Sun, Nov 29, 2015 at 5:32 PM, Ndjido Ardo BAR <ndj...@gmail.com> wrote: > >> hi, >> >> IntelliJ is just great for that! >&

Re: Debug Spark

2015-11-29 Thread Ndjido Ardo BAR

hi, IntelliJ is just great for that! cheers, Ardo. On Sun, Nov 29, 2015 at 5:18 PM, Masf wrote: > Hi > > Is it possible to debug spark locally with IntelliJ or another IDE? > > Thanks > > -- > Regards. > Miguel Ángel >

Grid search with Random Forest

2015-11-26 Thread Ndjido Ardo Bar

Hi folks, Does anyone know whether the Grid Search capability is enabled since the issue spark-9011 of version 1.4.0 ? I'm getting the "rawPredictionCol column doesn't exist" when trying to perform a grid search with Spark 1.4.0. Cheers, Ardo

Re: can I use Spark as alternative for gem fire cache ?

2015-10-17 Thread Ndjido Ardo Bar

Hi Kali, If I do understand you well, Tachyon ( http://tachyon-project.org) can be good alternative. You can use Spark Api to load and persist data into Tachyon. Hope that will help. Ardo > On 17 Oct 2015, at 15:28, "kali.tumm...@gmail.com" > wrote: > > Hi All, >

Re: Scala api end points

2015-09-24 Thread Ndjido Ardo BAR

Hi Masoom Alam, I successfully experimented the following project on Github https://github.com/erisa85/WikiSparkJobServer . I do recommand it to you. cheers, Ardo. On Thu, Sep 24, 2015 at 5:20 PM, masoom alam wrote: > Hi everyone > > I am new to Scala. I have a

Re: Small File to HDFS

2015-09-03 Thread Ndjido Ardo Bar

Hi Nibiau, Hbase seems to be a good solution to your problems. As you may know storing yours messages as a key-value pairs in Hbase saves you the overhead of manually resizing blocks of data using zip files. The added advantage along with the fact that Hbase uses HDFS for storage, is the

Re: unit testing in spark

Re: how to generate a column using mapParition and then add it back to the df?

Re: add spark-csv jar to ipython notbook without packages flags

Re: lift coefficien

Re: add multiple columns

Re: Labeledpoint

Re: Labeledpoint

Re: H2O + Spark Streaming?

Re: Mllib using model to predict probability

Re: prefix column Spark

Re: Calling Python code from Scala

Re: How to estimate the size of dataframe using pyspark?

Re: Sample project on Image Processing

Re: Pyspark - How to add new column to dataframe based on existing column value

Issue with spark-shell in yarn mode

Re: GLM I'm ml pipeline

Re: Can't filter

Re: RDD functions

Re: Grid search with Random Forest

Re: Grid search with Random Forest

Re: Grid search with Random Forest

Re: Debug Spark

Re: Debug Spark

Re: Debug Spark

Grid search with Random Forest

Re: can I use Spark as alternative for gem fire cache ?

Re: Scala api end points

Re: Small File to HDFS

28 matches

Site Navigation

Mail list logo

Footer information