Re: using spark to load a data warehouse in real time

2017-02-28 Thread Henry Tremblay
do what I want? Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics, LLC 913.938.6685 www.massstreet.net <http://www.massstreet.net> www.linkedin.com/in/bobwakefieldmba <http://www.linkedin.com/in/bobwakefieldmba> Twitter: @BobLovesData -- Henry Tremblay Robert Half Technology

Re: Spark runs out of memory with small file

2017-02-28 Thread Henry Tremblay
-mappartitions On Tue, Feb 28, 2017 at 2:17 AM, Henry Tremblay <paulhtremb...@gmail.com <mailto:paulhtremb...@gmail.com>> wrote: Thanks! That works: def process_file(my_iter): the_id = "init" final = [] for chunk in my_iter: line

Re: Spark runs out of memory with small file

2017-02-27 Thread Henry Tremblay
: Hi, Tremblay, map processes text line by line, so it is not the method you need. However, mapPartition and iterator can help you maintain a state. like: http://homepage.cs.latrobe.edu.au/zhe/ZhenHeSparkRDDAPIExamples.html#mapPartitions On Mon, Feb 27, 2017 at 4:24 PM, Henry Tremblay <

Re: Spark runs out of memory with small file

2017-02-27 Thread Henry Tremblay
est, Pavel On Mon, 27 Feb 2017, 06:28 Henry Tremblay, <paulhtremb...@gmail.com <mailto:paulhtremb...@gmail.com>> wrote: Not sure where you want me to put yield. My first try caused an error in Spark that it could not pickle generator objects. On 02/26/2017 03:25 PM, ay

Re: Spark runs out of memory with small file

2017-02-26 Thread Henry Tremblay
solve problems which exists rather than problems which does not exist any more. Please let me know in case I can be of any further help. Regards, Gourav On Sun, Feb 26, 2017 at 7:09 PM, Henry Tremblay <paulhtremb...@gmail.com &l

Re: Spark runs out of memory with small file

2017-02-26 Thread Henry Tremblay
s rather than problems which does not exist any more. Please let me know in case I can be of any further help. Regards, Gourav On Sun, Feb 26, 2017 at 7:09 PM, Henry Tremblay <paulhtremb...@gmail.com <mailto:paulhtremb...@gmail.com>> wrote: The file is so small that a stand alone

Re: Spark runs out of memory with small file

2017-02-26 Thread Henry Tremblay
Perhaps repartition method could solve it, I guess. On Sun, Feb 26, 2017 at 3:33 AM, Henry Tremblay <paulhtremb...@gmail.com <mailto:paulhtremb...@gmail.com>> wrote: I am reading in a single small file from hadoop with wholeText. If I process each line and create a row with two

Spark runs out of memory with small file

2017-02-25 Thread Henry Tremblay
torLostFailure (executor 5 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 10.3 GB of 10.3 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.

Re: wholeTextFiles fails, but textFile succeeds for same path

2017-02-11 Thread Henry Tremblay
-mail: user-unsubscr...@spark.apache.org <mailto:user-unsubscr...@spark.apache.org> -- Henry Tremblay Robert Half Technology