Sorry, for the confusion this was supposed to be answer for another thread.
Bijay
On Sat, Apr 30, 2016 at 2:37 PM, Bijay Kumar Pathak
wrote:
> Hi,
>
> I was facing the same issue on Spark 1.6. My data size was around 100 GB
> and was writing in the partition Hive table.
>
> I
Hi,
I was facing the same issue on Spark 1.6. My data size was around 100 GB
and was writing in the partition Hive table.
I was able to solve this issue by starting from 6G of memory and reaching
upto 15GB of memory per executor with overhead of 2GB and partitioning
the DataFrame before doing
randomSplit instead of randomSample
On Apr 30, 2016 1:51 PM, "Brandon White" wrote:
> val df = globalDf
> val filteredDfs= filterExpressions.map { expr =>
> val filteredDf = df.filter(expr)
> val samples = filteredDf.randomSample([.7, .3])
>(samples(0),
Can you provide a bit more information:
Does the smaller dataset have skew ?
Which release of Spark are you using ?
How much memory did you specify ?
Thanks
On Sat, Apr 30, 2016 at 1:17 PM, Brandon White
wrote:
> Hello,
>
> I am writing to datasets. One dataset is
Hello,
I am writing to datasets. One dataset is x2 larger than the other. Both
datasets are written to parquet the exact same way using
df.write.mode("Overwrite").parquet(outputFolder)
The smaller dataset OOMs while the larger dataset writes perfectly fine.
Here is the stack trace: Any ideas
Hi,
Looks like there is a problem in spark-xml if the xml has multiple
attributes with no child element.
For example say the xml has a nested object as below
bk_113
bk_114
Now if I create a dataframe starting with rowtag bkval and then I do a
select on that data frame it
Please see
http://stackoverflow.com/questions/36397136/importing-pyspark-packages
On Mon, Apr 25, 2016 at 2:39 AM -0700, "Camelia Elena Ciolac"
wrote:
Hello,
I work locally on my laptop, not using DataBricks Community edition.
I downloaded