date:20160430

Re: Dataframe saves for a large set but throws OOM for a small dataset

2016-04-30 Thread Bijay Pathak

Sorry, for the confusion this was supposed to be answer for another thread. Bijay On Sat, Apr 30, 2016 at 2:37 PM, Bijay Kumar Pathak wrote: > Hi, > > I was facing the same issue on Spark 1.6. My data size was around 100 GB > and was writing in the partition Hive table. > > I

Re: Dataframe saves for a large set but throws OOM for a small dataset

2016-04-30 Thread Bijay Kumar Pathak

Hi, I was facing the same issue on Spark 1.6. My data size was around 100 GB and was writing in the partition Hive table. I was able to solve this issue by starting from 6G of memory and reaching upto 15GB of memory per executor with overhead of 2GB and partitioning the DataFrame before doing

Re: Dataframe saves for a large set but throws OOM for a small dataset

2016-04-30 Thread Brandon White

randomSplit instead of randomSample On Apr 30, 2016 1:51 PM, "Brandon White" wrote: > val df = globalDf > val filteredDfs= filterExpressions.map { expr => > val filteredDf = df.filter(expr) > val samples = filteredDf.randomSample([.7, .3]) >(samples(0),

Re: Dataframe saves for a large set but throws OOM for a small dataset

2016-04-30 Thread Ted Yu

Can you provide a bit more information: Does the smaller dataset have skew ? Which release of Spark are you using ? How much memory did you specify ? Thanks On Sat, Apr 30, 2016 at 1:17 PM, Brandon White wrote: > Hello, > > I am writing to datasets. One dataset is

Dataframe saves for a large set but throws OOM for a small dataset

2016-04-30 Thread Brandon White

Hello, I am writing to datasets. One dataset is x2 larger than the other. Both datasets are written to parquet the exact same way using df.write.mode("Overwrite").parquet(outputFolder) The smaller dataset OOMs while the larger dataset writes perfectly fine. Here is the stack trace: Any ideas

Error in spark-xml

2016-04-30 Thread Sourav Mazumder

Hi, Looks like there is a problem in spark-xml if the xml has multiple attributes with no child element. For example say the xml has a nested object as below bk_113 bk_114 Now if I create a dataframe starting with rowtag bkval and then I do a select on that data frame it

Re: GraphFrames and IPython notebook issue - No module named graphframes

2016-04-30 Thread Felix Cheung

Please see http://stackoverflow.com/questions/36397136/importing-pyspark-packages On Mon, Apr 25, 2016 at 2:39 AM -0700, "Camelia Elena Ciolac" wrote: Hello, I work locally on my laptop, not using DataBricks Community edition. I downloaded

Re: Dataframe saves for a large set but throws OOM for a small dataset

Re: Dataframe saves for a large set but throws OOM for a small dataset

Re: Dataframe saves for a large set but throws OOM for a small dataset

Re: Dataframe saves for a large set but throws OOM for a small dataset

Dataframe saves for a large set but throws OOM for a small dataset

Error in spark-xml

Re: GraphFrames and IPython notebook issue - No module named graphframes

7 matches

Site Navigation

Mail list logo

Footer information