Re: FPGrowth Model is taking too long to generate frequent item sets

2017-03-07 Thread Eli Super
017 at 3:39 AM, Raju Bairishetti <r...@apache.org> wrote: > @Eli, Thanks for the suggestion. If you do not mind can you please > elaborate approaches? > > On Mon, Mar 6, 2017 at 7:29 PM, Eli Super <eli.su...@gmail.com> wrote: > >> Hi >> >> Try to impleme

Re: FPGrowth Model is taking too long to generate frequent item sets

2017-03-06 Thread Eli Super
Hi Try to implement binning and/or feature engineering (smart feature selection for example) Good luck On Mon, Mar 6, 2017 at 6:56 AM, Raju Bairishetti wrote: > Hi, > I am new to Spark ML Lib. I am using FPGrowth model for finding related > items. > > Number of transactions

I'm trying to understand how to compile Spark

2016-07-19 Thread Eli Super
Hi I have a windows laptop I just downloaded the spark 1.4.1 source code. I try to compile *org.apache.spark.mllib.fpm* with *mvn * My goal is to replace *original *org\apache\spark\mllib\fpm\* in *spark-assembly-1.4.1-hadoop2.6.0.jar* As I understand from this link

Graphical representation of Spark Decision Tree . How to do it ?

2016-04-12 Thread Eli Super
Hi Spark Users, I need your help. I've some output after running DecisionTree : I work with Jupyter notebook and python 2.7 How I can create a graphical representation of the Decision Tree model ? In sklearn I can use tree.export_graphviz , in R I can see the Decision Tree output as well .

Is there a way to save csv file fast ?

2016-02-10 Thread Eli Super
Hi I work with pyspark & spark 1.5.2 Currently saving rdd into csv file is very very slow , uses 2% CPU only I use : my_dd.write.format("com.databricks.spark.csv").option("header", "false").save('file:///my_folder') Is there a way to save csv faster ? Many thanks

ZlibFactor warning

2016-01-27 Thread Eli Super
Hi I'm running spark locally on win 2012 R2 server No hadoop installed I'm getting following error : *WARN ZlibFactory: Failed to load/initialize native-zlib library* *Is it something to wary about ?* Thanks !

Re: How to discretize Continuous Variable with Spark DataFrames

2016-01-25 Thread Eli Super
"Bucketizer transforms a column of continuous features to a > column of feature buckets, where the buckets are specified by users." > > [1]: http://spark.apache.org/docs/latest/ml-features.html#bucketizer > > On Mon, Jan 25, 2016 at 5:34 AM, Eli Super <eli.su...@gmail.com&

SparkSQL : "select non null values from column"

2016-01-25 Thread Eli Super
Hi I try to select all values but not NULL values from column contains NULL values with sqlContext.sql("select my_column from my_table where my_column <> null ").show(15) or sqlContext.sql("select my_column from my_table where my_column != null ").show(15) I get empty result Thanks !

How to discretize Continuous Variable with Spark DataFrames

2016-01-25 Thread Eli Super
Hi What is a best way to discretize Continuous Variable within Spark DataFrames ? I want to discretize some variable 1) by equal frequency 2) by k-means I usually use R for this porpoises _http://www.inside-r.org/packages/cran/arules/docs/discretize R code for example : ### equal frequency

Re: Spark SQL . How to enlarge output rows ?

2016-01-24 Thread Eli Super
t; > > And yeah that looks like a Python – I’m not hot with Python but it may be > capitalised as False or FALSE? > > > > > > *From:* Eli Super [mailto:eli.su...@gmail.com] > *Sent:* 21 January 2016 14:48 > *To:* Spencer, Alex (Santander) > *Cc:* user@spark.apa

question about query SparkSQL

2016-01-21 Thread Eli Super
Hi I try to save parts of large table as csv files I use following commands : sqlContext.sql("select * from my_table where trans_time between '2015/12/18 12:00' and '2015/12/18 12:06'").write.format("com.databricks.spark.csv").option("header", "false").save('00_06') and sqlContext.sql("select

cast column string -> timestamp in Parquet file

2016-01-21 Thread Eli Super
Hi I have a large size parquet file . I need to cast the whole column to timestamp format , then save What the right way to do it ? Thanks a lot

Re: Spark SQL . How to enlarge output rows ?

2016-01-21 Thread Eli Super
(int numRows, > > boolean truncate) > > > > > > Kind Regards, > > Alex. > > > > *From:* Eli Super [mailto:eli.su...@gmail.com] > *Sent:* 14 January 2016 13:09 > *To:* user@spark.apache.org > *Subject:* Spark SQL . How to enlarge output rows ? >

Re: a lot of warnings when build spark 1.6.0

2016-01-20 Thread Eli Super
just build local spark only with csv package and thrift server , what hadoop version to use to avoid warnings ? Thanks a lot ! On Thu, Jan 21, 2016 at 9:08 AM, Eli Super <eli.su...@gmail.com> wrote: > Hi > > I get WARNINGS when try to build spark 1.6.0 > > overall I get S

a lot of warnings when build spark 1.6.0

2016-01-20 Thread Eli Super
Hi I get WARNINGS when try to build spark 1.6.0 overall I get SUCCESS message on all projects command I used : mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Dscala-2.10 -Phive -Phive-thriftserver -DskipTests clean package from pom.xml 2.10.5 2.10 example of warnings : [INFO]

Spark SQL . How to enlarge output rows ?

2016-01-14 Thread Eli Super
Hi After executing sql sqlContext.sql("select day_time from my_table limit 10").show() my output looks like : ++ | day_time| ++ |2015/12/15 15:52:...| |2015/12/15 15:53:...| |2015/12/15 15:52:...| |2015/12/15 15:52:...| |2015/12/15 15:52:...|