bq. Read data from hbase using custom DefaultSource (implemented using TableScan)
Did you use the DefaultSource from hbase-spark module in hbase master branch ? If you wrote your own, mind sharing related code ? Thanks On Thu, Jun 9, 2016 at 2:53 AM, raaggarw <raagg...@adobe.com> wrote: > Hi, > > I was trying to port my code from spark 1.5.2 to spark 2.0 however i faced > some outofMemory issues. On drilling down i could see that OOM is because > of > join, because removing join fixes the issue. I then created a small > spark-app to reproduce this: > > (48 cores, 300gb ram - divided among 4 workers) > > line1 :df1 = Read a set a of parquet files into dataframe > line2: df1.count > line3: df2 = Read data from hbase using custom DefaultSource (implemented > using TableScan) > line4: df2.count > line5: df3 = df1.join(df2, df1("field1") === df2("field2"), "inner") > line6: df3.count -> *this is where it fails in Spark 2.0 and runs fine in > spark 1.5.2* > > Problem: > First lot of WARN messages > 2016-06-09 08:14:18,884 WARN [broadcast-exchange-0] > memory.TaskMemoryManager (TaskMemoryManager.java:allocatePage(264)) - > Failed > to allocate a page (1048576 bytes), try again. > And then OOM > > I then tried to dump data fetched from hbase into s3 and then created df2 > from s3 rather than hbase, then it worked fine in spark 2.0 as well. > > Could someone please guide me through next steps? > > Thanks > Ravi > Computer Scientist @ Adobe > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemory-when-doing-joins-in-spark-2-0-while-same-code-runs-fine-in-spark-1-5-2-tp27124.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >