I am fiddling around with GAE mapreduce and have one question:

Is it possible to change a variable only for a certain job in mapreduce?

The reason I am asking is:

The input csv and output csv of my mapreduce job are supposed to have the
same header row - however, the header row is somewhere in the output csv,
but never at the top. To get the right header row, I inserted a counter
into my reduce function that checks the current iteration of the reduce job
and if it is 0, it will pass the hard-coded header-row to the pipeline. The
counter gets reset when the output csv gets stored in the blobstore.

The problem: More often than not the counter resets itself randomly,
probably because I had to define it as global variable "reduce_counter = 0"
outside of the function.

Is there any method to chain a variable/parameter to a job or is there any
better way to get the header_row?

I don't think that I can work with the DictReader or csv module as the
output is stored in the blobstore and blobstore objects cannot be altered
as far as I know.

You can find my code on www.github.com/jvdheyden/ste in the main.py
document.

Thanks!


-- 
Jonas von der Heyden
+49 163 2464010
http://de.linkedin.com/in/jvheyden

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to