[google-appengine] Re: Blobstore filename created in MapReduce job too long to create BlobKey

2013-03-06 Thread Ryan Huebsch
Hi,

We've filed an 
issue https://code.google.com/p/googleappengine/issues/detail?id=8932 to 
track the issue. We are investigating the problem.


On Friday, March 1, 2013 8:25:17 PM UTC-8, Jamie Niemasik wrote:
>
> I've been receiving intermittent errors from MapReduce jobs. I'm running 
> Python 2.7.
>
> The specific error is "BadValueError: name must be under 500 bytes" which 
> is raised when calling datastore.Key.from_path() within 
> blobstore.get_blob_key(); the filename being provided is way too long to 
> make a key from.
>
> This all occurs within the code in the mapreduce package… nothing in my 
> code seems to affect it.
>
> Some of the filenames are 288 bytes long, while some are 992. The M/R spec 
> name and id in each case is nearly the same and is very short; I don't see 
> where this variance comes from.
>
> The sequence of events is this:
> mapreduce.output_writers.init_job() creates a reasonable, short filename 
> and passes it to files.blobstore.create()
> create() calls files.file._create('blobstore', …, filename)
> _create() sets up an rpc with that filename and calls _make_call('Create', 
> ...)
>
> And that call sometimes returns a filename that's 288 bytes, sometimes 
> 992. I have no idea why or how to work around this — any help would be 
> appreciated.
>
> Thanks,
> Jamie
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: [google-appengine] Re: Blobstore filename created in MapReduce job too long to create BlobKey

2013-03-06 Thread Jamie Niemasik
: )  I tried the same thing last night, but wasn't ready to declare victory
because I saw new errors. Happily, it turns out those were from old tasks
(with retry counts nearing 1000) whose states were not compatible with the
new MR code. I've purged the queue and cleaned out the associated blobs and
everything's humming along now.

When I first started using MR, I had to make a lot of modifications to get
it working with py2.7 and NDB, using two different versions of the MR lib
that I found. I was hesitant to touch that code since it had been working
perfectly for so long, until sometime in February when I started
experiencing these errors. I'm still not sure why that happened.

But, happily, all I had to do to the svn version of MR this time around was
change mapreduce/main.py to mapreduce.main.APP in include.yaml. Nice!

On Wed, Mar 6, 2013 at 5:13 AM, bmurr  wrote:

> Well, I think I have it solved.
>
> I was using an older version of the mapreduce library, which uses
> mapreduce.lib.files to interact with the blobstore.
> The newer version of mapreduce uses google.appengine.api.files instead,
> which doesn't cause this problem.
>
> These two libraries seem pretty similar -- so I'm not sure what precisely
> was causing the issue.
>
>
>
> On Saturday, March 2, 2013 4:25:17 AM UTC, Jamie Niemasik wrote:
>
>> I've been receiving intermittent errors from MapReduce jobs. I'm running
>> Python 2.7.
>>
>> The specific error is "BadValueError: name must be under 500 bytes" which
>> is raised when calling datastore.Key.from_path() within
>> blobstore.get_blob_key(); the filename being provided is way too long to
>> make a key from.
>>
>> This all occurs within the code in the mapreduce package… nothing in my
>> code seems to affect it.
>>
>> Some of the filenames are 288 bytes long, while some are 992. The M/R
>> spec name and id in each case is nearly the same and is very short; I don't
>> see where this variance comes from.
>>
>> The sequence of events is this:
>> mapreduce.output_writers.init_**job() creates a reasonable, short
>> filename and passes it to files.blobstore.create()
>> create() calls files.file._create('blobstore'**, …, filename)
>> _create() sets up an rpc with that filename and calls
>> _make_call('Create', ...)
>>
>> And that call sometimes returns a filename that's 288 bytes, sometimes
>> 992. I have no idea why or how to work around this — any help would be
>> appreciated.
>>
>> Thanks,
>> Jamie
>>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "Google App Engine" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/google-appengine/I0pXHW1poWU/unsubscribe?hl=en
> .
> To unsubscribe from this group and all its topics, send an email to
> google-appengine+unsubscr...@googlegroups.com.
> To post to this group, send email to google-appengine@googlegroups.com.
> Visit this group at http://groups.google.com/group/google-appengine?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




[google-appengine] Re: Blobstore filename created in MapReduce job too long to create BlobKey

2013-03-06 Thread bmurr
Well, I think I have it solved.

I was using an older version of the mapreduce library, which uses 
mapreduce.lib.files to interact with the blobstore.
The newer version of mapreduce uses google.appengine.api.files instead, 
which doesn't cause this problem.

These two libraries seem pretty similar -- so I'm not sure what precisely 
was causing the issue.



On Saturday, March 2, 2013 4:25:17 AM UTC, Jamie Niemasik wrote:
>
> I've been receiving intermittent errors from MapReduce jobs. I'm running 
> Python 2.7.
>
> The specific error is "BadValueError: name must be under 500 bytes" which 
> is raised when calling datastore.Key.from_path() within 
> blobstore.get_blob_key(); the filename being provided is way too long to 
> make a key from.
>
> This all occurs within the code in the mapreduce package… nothing in my 
> code seems to affect it.
>
> Some of the filenames are 288 bytes long, while some are 992. The M/R spec 
> name and id in each case is nearly the same and is very short; I don't see 
> where this variance comes from.
>
> The sequence of events is this:
> mapreduce.output_writers.init_job() creates a reasonable, short filename 
> and passes it to files.blobstore.create()
> create() calls files.file._create('blobstore', …, filename)
> _create() sets up an rpc with that filename and calls _make_call('Create', 
> ...)
>
> And that call sometimes returns a filename that's 288 bytes, sometimes 
> 992. I have no idea why or how to work around this — any help would be 
> appreciated.
>
> Thanks,
> Jamie
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: [google-appengine] Re: Blobstore filename created in MapReduce job too long to create BlobKey

2013-03-05 Thread Jamie Niemasik
Thanks Alex. Yes, they are writable; however, I can't find any errors in
the logs, other than the name errors themselves (and those occur right
after "Final result for job … is 'success'). The mapreduce code gets
through files.finalize(filename), but blows up in get_blob_key because the
filename is no shorter than before.

Ben, I wish I could do that, the mapreduce lib is creating
a __BlobFileIndex__
datastore key using this filename as the id, so I don't know what sort of
change I could make there. Unfortunately it's not something I'm storing as
a property on my own model.

Jamie

On Tue, Mar 5, 2013 at 4:48 PM, Alex Burgel  wrote:

> On Friday, March 1, 2013 11:25:17 PM UTC-5, Jamie Niemasik wrote:
>
>> Some of the filenames are 288 bytes long, while some are 992. The M/R
>> spec name and id in each case is nearly the same and is very short; I don't
>> see where this variance comes from.
>>
>
> Have you noticed if the long file names contain the word 'writable' at the
> beginning?
>
> If so, it might similar to an issue that I had (my issue was with google
> storage, not blobstore, but their APIs are similar):
>
>
> https://groups.google.com/forum/?fromgroups=#!topic/app-engine-pipeline-api/vfWakN0NKSw
>
> It seems that when a file is writable, its in a special state with a
> filename that is very long. When the MR job finishes, it finalizes the file
> and gets another filename that is shorter. My issue had to do with the MR
> job not finishing properly. Some of my code was throwing exceptions but it
> wasn't causing the job to finalize properly and therefore not getting the
> shorter filename.
>
> I would take a look at your logs to see if there are any errors. They may
> be causing the MR job to not finish properly and then return unfinalized
> filenames.
>
> --Alex
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Google App Engine" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/google-appengine/I0pXHW1poWU/unsubscribe?hl=en
> .
> To unsubscribe from this group and all its topics, send an email to
> google-appengine+unsubscr...@googlegroups.com.
> To post to this group, send email to google-appengine@googlegroups.com.
> Visit this group at http://groups.google.com/group/google-appengine?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




[google-appengine] Re: Blobstore filename created in MapReduce job too long to create BlobKey

2013-03-05 Thread Alex Burgel
On Friday, March 1, 2013 11:25:17 PM UTC-5, Jamie Niemasik wrote:

> Some of the filenames are 288 bytes long, while some are 992. The M/R spec 
> name and id in each case is nearly the same and is very short; I don't see 
> where this variance comes from.
>

Have you noticed if the long file names contain the word 'writable' at the 
beginning?

If so, it might similar to an issue that I had (my issue was with google 
storage, not blobstore, but their APIs are similar):

https://groups.google.com/forum/?fromgroups=#!topic/app-engine-pipeline-api/vfWakN0NKSw

It seems that when a file is writable, its in a special state with a 
filename that is very long. When the MR job finishes, it finalizes the file 
and gets another filename that is shorter. My issue had to do with the MR 
job not finishing properly. Some of my code was throwing exceptions but it 
wasn't causing the job to finalize properly and therefore not getting the 
shorter filename.

I would take a look at your logs to see if there are any errors. They may 
be causing the MR job to not finish properly and then return unfinalized 
filenames.

--Alex

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




[google-appengine] Re: Blobstore filename created in MapReduce job too long to create BlobKey

2013-03-05 Thread Ben Fox
I've been having a similar problem. As far as I know, we didn't run into 
this problem until 1st March. Before then, I am fairly certain all 
filenames were 288 characters in length and since then have been 992 
characters. 

A solution (at least in Java, presumably Python has something similar) is 
to use a google.appengine.api.datastore.Text object, which will work with 
the longer filenames. This isn't the ideal solution, at least for us 
because of backwards compatibility issues with our datastore filled with 
string values.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




[google-appengine] Re: Blobstore filename created in MapReduce job too long to create BlobKey

2013-03-05 Thread bmurr
I've also been having this problem.

Did you manage to find any workarounds?




On Saturday, March 2, 2013 4:25:17 AM UTC, Jamie Niemasik wrote:
>
> I've been receiving intermittent errors from MapReduce jobs. I'm running 
> Python 2.7.
>
> The specific error is "BadValueError: name must be under 500 bytes" which 
> is raised when calling datastore.Key.from_path() within 
> blobstore.get_blob_key(); the filename being provided is way too long to 
> make a key from.
>
> This all occurs within the code in the mapreduce package… nothing in my 
> code seems to affect it.
>
> Some of the filenames are 288 bytes long, while some are 992. The M/R spec 
> name and id in each case is nearly the same and is very short; I don't see 
> where this variance comes from.
>
> The sequence of events is this:
> mapreduce.output_writers.init_job() creates a reasonable, short filename 
> and passes it to files.blobstore.create()
> create() calls files.file._create('blobstore', …, filename)
> _create() sets up an rpc with that filename and calls _make_call('Create', 
> ...)
>
> And that call sometimes returns a filename that's 288 bytes, sometimes 
> 992. I have no idea why or how to work around this — any help would be 
> appreciated.
>
> Thanks,
> Jamie
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.