Re: Spark UI crashes on Large Workloads

2017-07-18 Thread Saatvik Shah
I solved many issues just by sizing an app that I > would first check memory size, cpu allocations and so on.. > > Best, > > On Tue, Jul 18, 2017 at 3:30 PM, Saatvik Shah <saatvikshah1...@gmail.com> > wrote: > >> Hi Riccardo, >> >> Yes, Thanks for suggesti

Re: Spark UI crashes on Large Workloads

2017-07-18 Thread Saatvik Shah
ng? >> >> >> >> -- >> View this message in context: http://apache-spark-user-list. >> 1001560.n3.nabble.com/Spark-UI-crashes-on-Large-Workloads-tp28873.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> -

Re: PySpark working with Generators

2017-07-05 Thread Saatvik Shah
and Regards, Saatvik Shah On Fri, Jun 30, 2017 at 10:16 AM, Jörn Franke <jornfra...@gmail.com> wrote: > In this case i do not see so many benefits of using Spark. Is the data > volume high? > Alternatively i recommend to convert the proprietary format into a format > Sparks under

Re: PySpark working with Generators

2017-06-30 Thread Saatvik Shah
Shah On Fri, Jun 30, 2017 at 12:50 AM, Mahesh Sawaiker < mahesh_sawai...@persistent.com> wrote: > Wouldn’t this work if you load the files in hdfs and let the partitions be > equal to the amount of parallelism you want? > > > > *From:* Saatvik Shah [mailto:saatviksha

Re: PySpark working with Generators

2017-06-29 Thread Saatvik Shah
Hey Ayan, This isnt a typical text file - Its a proprietary data format for which a native Spark reader is not available. Thanks and Regards, Saatvik Shah On Thu, Jun 29, 2017 at 6:48 PM, ayan guha <guha.a...@gmail.com> wrote: > If your files are in same location you can use sc.whol

Re: Merging multiple Pandas dataframes

2017-06-22 Thread Saatvik Shah
ould do is persist every iteration, then after some (say 5) I > would write to disk and reload. At that point you should call unpersist to > free the memory as it is no longer relevant. > > > > Thanks, > > Assaf. > > > > *From:* Saatvik Shah [mailto:saatv

Re: Merging multiple Pandas dataframes

2017-06-20 Thread Saatvik Shah
----- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- *Saatvik Shah,* *1st Year,* *Masters in the School of Computer Science,* *Carnegie Mellon University* *https://saatvikshah1994.github.io/ <https://saatvikshah1994.github.io/>*

Re: Best alternative for Category Type in Spark Dataframe

2017-06-17 Thread Saatvik Shah
ansformer{ >>>>> override def transform(inputData: DataFrame): DataFrame = { >>>>> inputData.select("col1").filter("col1 in ('happy')") >>>>> } >>>>> override def copy(extra: ParamMap): Transformer = ??? >>&

Re: Best alternative for Category Type in Spark Dataframe

2017-06-16 Thread Saatvik Shah
Hi Pralabh, I want the ability to create a column such that its values be restricted to a specific set of predefined values. For example, suppose I have a column called EMOTION: I want to ensure each row value is one of HAPPY,SAD,ANGRY,NEUTRAL,NA. Thanks and Regards, Saatvik Shah On Fri, Jun 16

Re: Best alternative for Category Type in Spark Dataframe

2017-06-16 Thread Saatvik Shah
egards, Saatvik Shah On Fri, Jun 16, 2017 at 1:42 AM, 颜发才(Yan Facai) <facai@gmail.com> wrote: > You can use some Transformers to handle categorical data, > For example, > StringIndexer encodes a string column of labels to a column of label > indices: > http://spar