do what I want?
Adaryl "Bob" Wakefield, MBA
Principal
Mass Street Analytics, LLC
913.938.6685
www.massstreet.net <http://www.massstreet.net>
www.linkedin.com/in/bobwakefieldmba
<http://www.linkedin.com/in/bobwakefieldmba>
Twitter: @BobLovesData
--
Henry Tremblay
Robert Half Technology
-mappartitions
On Tue, Feb 28, 2017 at 2:17 AM, Henry Tremblay
<paulhtremb...@gmail.com <mailto:paulhtremb...@gmail.com>> wrote:
Thanks! That works:
def process_file(my_iter):
the_id = "init"
final = []
for chunk in my_iter:
line
:
Hi, Tremblay,
map processes text line by line, so it is not the method you need.
However,
mapPartition and iterator can help you maintain a state.
like:
http://homepage.cs.latrobe.edu.au/zhe/ZhenHeSparkRDDAPIExamples.html#mapPartitions
On Mon, Feb 27, 2017 at 4:24 PM, Henry Tremblay
<
est,
Pavel
On Mon, 27 Feb 2017, 06:28 Henry Tremblay, <paulhtremb...@gmail.com
<mailto:paulhtremb...@gmail.com>> wrote:
Not sure where you want me to put yield. My first try caused an
error in Spark that it could not pickle generator objects.
On 02/26/2017 03:25 PM, ay
solve problems which exists rather than problems
which does not exist any more.
Please let me know in case I can be of any further help.
Regards,
Gourav
On Sun, Feb 26, 2017 at 7:09 PM, Henry Tremblay
<paulhtremb...@gmail.com &l
s rather than problems which does not exist
any more.
Please let me know in case I can be of any further help.
Regards,
Gourav
On Sun, Feb 26, 2017 at 7:09 PM, Henry Tremblay
<paulhtremb...@gmail.com <mailto:paulhtremb...@gmail.com>> wrote:
The file is so small that a stand alone
Perhaps repartition method could solve it, I guess.
On Sun, Feb 26, 2017 at 3:33 AM, Henry Tremblay
<paulhtremb...@gmail.com <mailto:paulhtremb...@gmail.com>> wrote:
I am reading in a single small file from hadoop with wholeText. If
I process each line and create a row with two
torLostFailure (executor 5 exited caused by one of the running
tasks) Reason: Container killed by YARN for exceeding memory limits.
10.3 GB of 10.3 GB physical memory used. Consider boosting
spark.yarn.executor.memoryOverhead.
-mail: user-unsubscr...@spark.apache.org
<mailto:user-unsubscr...@spark.apache.org>
--
Henry Tremblay
Robert Half Technology