I am trying to load a 1 TB collection into spark cluster from mongo. But I am keep getting stack overflow error after running for a while.
I have posted a question in stackoverflow.com, and tried all advies they have provide, nothing works... how to load large database into spark <http://stackoverflow.com/questions/38096502/how-to-load-large-table-in-spark> I have tried: 1, use persist to make it MemoryAndDisk, same error after running same time. 2, add more instance, same error after running same time. 3, run this script on another collection which is much smaller, everything is good, so I think my codes are all right. 4, remove the reduce process, same error after running same time. 5, remove the map process, same error after running same time. 6, change the sql I used, it's faster, but same error after running shorter time. 7,retrieve "_id" instead of "u_at" and "c_at", same error after running same time. Anyone knows how many resources do I need to handle this 1TB database? I only retrieve two fields form it, and this field is only 1% of a document(because we have an array containing about 90+ embedded documents in it.) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Looking-for-help-about-stackoverflow-in-spark-tp27255.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org