Hi John, I think it relates to drivers memory more than the others thing you said.
Can you just increase more memory for driver? > On Jul 1, 2016, at 9:03 AM, johnzeng <jo...@fossil.com> wrote: > > I am trying to load a 1 TB collection into spark cluster from mongo. But I am > keep getting stack overflow error after running for a while. > > I have posted a question in stackoverflow.com, and tried all advies they > have provide, nothing works... > > how to load large database into spark > <http://stackoverflow.com/questions/38096502/how-to-load-large-table-in-spark> > > > I have tried: > 1, use persist to make it MemoryAndDisk, same error after running same > time. > 2, add more instance, same error after running same time. > 3, run this script on another collection which is much smaller, everything > is good, so I think my codes are all right. > 4, remove the reduce process, same error after running same time. > 5, remove the map process, same error after running same time. > 6, change the sql I used, it's faster, but same error after running shorter > time. > 7,retrieve "_id" instead of "u_at" and "c_at", same error after running > same time. > > Anyone knows how many resources do I need to handle this 1TB database? I > only retrieve two fields form it, and this field is only 1% of a > document(because we have an array containing about 90+ embedded documents in > it.) > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Looking-for-help-about-stackoverflow-in-spark-tp27255.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org