Data Format for Running Collaborative Filtering in Spark MLlib

2016-10-03 Thread Baktaawar
Hi I am working on building a recommender system on a learning content data. My data format is a user-item matrix of views. Similar to the below one NS

Re: Spark Java Heap Error

2016-09-13 Thread Baktaawar
this is the settings I have. # Example: # spark.master spark://master:7077 # spark.eventLog.enabled true # spark.eventLog.dir hdfs://namenode:8021/directory # spark.serializer org.apache.spark.serializer.KryoSerializer spark.driver.memory

Re: Spark Java Heap Error

2016-09-13 Thread Baktaawar
Data set is not big. It is 56K X 9K . It does have column names as long strings. It fits very easily in Pandas. That is also in memory thing. So I am not sure if memory is an issue here. If Pandas can fit it very easily and work on it very fast then Spark shouldnt have problems too right? ᐧ On

Re: Spark Java Heap Error

2016-09-13 Thread Baktaawar
I put driver memory as 6gb instead of 8(half of 16). But does 2 gb make this difference? On Tuesday, September 13, 2016, neil90 [via Apache Spark User List] < ml-node+s1001560n27704...@n3.nabble.com> wrote: > Double check your Driver Memory in your Spark Web UI make sure the driver > Memory is

Re: Spark Java Heap Error

2016-09-12 Thread Baktaawar
Hi I even tried the dataframe.cache() action to carry out the cross tab transformation. However still I get the same OOM error. recommender_ct.cache() --- Py4JJavaError Traceback (most recent

Re: Spark Java Heap Error

2016-09-09 Thread Baktaawar
Hi Thanks I tried that. But got this error. Again OOM. I am not sure what to do now. For spark.driver.maxResultSize i kept 2g. Rest I did as mentioned above. 16Gb for driver and 2g for executor. I have 16Gb mac. Please help. I am very delayed on my work because of this and not able to move ahead.