[google-appengine] Re: Duplicate records in Backup Data?

2015-09-21 Thread Nick (Cloud Platform Support)
Hey Oliver, 

If you're experiencing an issue, I recommend posting to the BigQuery public 
issue tracker , 
since an old thread like this probably won't have much activity, and the 
public issue tracker is a more responsive way to report an issue. 

Best wishes,

Nick

On Wednesday, September 9, 2015 at 10:32:08 AM UTC-4, Oliver Urs Lenz wrote:
>
> I can confirm that more than two years later, this is still an issue.. :-(
>
> On Monday, May 6, 2013 at 10:28:27 AM UTC+2, Mike wrote:
>>
>> Great - thanks Arie. Any idea when this will be ready? An approximation 
>> only would be appreciated. i.e. 1 month, 6 months, 1 year?
>>
>> On Friday, May 3, 2013 6:30:34 AM UTC+10, Arie Ozarov wrote:
>>>
>>>
>>>
>>> On Wednesday, May 1, 2013 3:39:06 PM UTC-7, Jason Collins wrote:

 On reflection, I suspect it has more to do with Map-Reduce task retries 
 than some race condition.
>>>
>>> Correct. Not an issue for backup/restore but is a known issue for 
>>> BigQuery imports.
>>> We plan to eliminate duplicates in the MR level. 
>>>
>>>
 j

 On Tuesday, 30 April 2013 22:59:53 UTC-6, Jason Collins wrote:
>
> We have seen the same phenomenon. 
>
> It's likely due to some kind of race condition in the backup tool 
> itself, but is not a problem there because when restoring, one of the 
> dups 
> will just overwrite the other. But it does become a problem once ingested 
> into BigQuery.
>
> j
>
> On Monday, 29 April 2013 20:10:34 UTC-6, Mike wrote:
>>
>> Hi there
>>
>> I've noticed there may be duplicate records in the Backup data that 
>> AppEngine produces.
>>
>> I can verify this because I'm loading the Backups into BigQuery. When 
>> I search one of my tables, I can see the duplicates:
>>
>> SELECT __key__.id as X_id, COUNT(__key__.id) as X_count, created FROM 
>> [TableId] GROUP BY X_id, created HAVING X_count > 1 ORDER BY created 
>> DESC;
>>
>> This shows there are 5,807 duplicates in a table of ~2 million 
>> entries (~0.2%)
>>
>> I can give Google employees access to our BigQuery and Google Storage 
>> accounts if that helps track down the issue.
>>
>> Cheers
>> Mike
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/1d925c32-ce98-4619-8d1b-f135fd3e3510%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[google-appengine] Re: Duplicate records in Backup Data?

2015-09-09 Thread Oliver Urs Lenz
I can confirm that more than two years later, this is still an issue.. :-(

On Monday, May 6, 2013 at 10:28:27 AM UTC+2, Mike wrote:
>
> Great - thanks Arie. Any idea when this will be ready? An approximation 
> only would be appreciated. i.e. 1 month, 6 months, 1 year?
>
> On Friday, May 3, 2013 6:30:34 AM UTC+10, Arie Ozarov wrote:
>>
>>
>>
>> On Wednesday, May 1, 2013 3:39:06 PM UTC-7, Jason Collins wrote:
>>>
>>> On reflection, I suspect it has more to do with Map-Reduce task retries 
>>> than some race condition.
>>
>> Correct. Not an issue for backup/restore but is a known issue for 
>> BigQuery imports.
>> We plan to eliminate duplicates in the MR level. 
>>
>>
>>> j
>>>
>>> On Tuesday, 30 April 2013 22:59:53 UTC-6, Jason Collins wrote:

 We have seen the same phenomenon. 

 It's likely due to some kind of race condition in the backup tool 
 itself, but is not a problem there because when restoring, one of the dups 
 will just overwrite the other. But it does become a problem once ingested 
 into BigQuery.

 j

 On Monday, 29 April 2013 20:10:34 UTC-6, Mike wrote:
>
> Hi there
>
> I've noticed there may be duplicate records in the Backup data that 
> AppEngine produces.
>
> I can verify this because I'm loading the Backups into BigQuery. When 
> I search one of my tables, I can see the duplicates:
>
> SELECT __key__.id as X_id, COUNT(__key__.id) as X_count, created FROM 
> [TableId] GROUP BY X_id, created HAVING X_count > 1 ORDER BY created DESC;
>
> This shows there are 5,807 duplicates in a table of ~2 million 
> entries (~0.2%)
>
> I can give Google employees access to our BigQuery and Google Storage 
> accounts if that helps track down the issue.
>
> Cheers
> Mike
>


-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/ea51891d-c783-41f3-8294-d1ed9ba7c54a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[google-appengine] Re: Duplicate records in Backup Data?

2013-05-06 Thread Mike
Great - thanks Arie. Any idea when this will be ready? An approximation 
only would be appreciated. i.e. 1 month, 6 months, 1 year?

On Friday, May 3, 2013 6:30:34 AM UTC+10, Arie Ozarov wrote:
>
>
>
> On Wednesday, May 1, 2013 3:39:06 PM UTC-7, Jason Collins wrote:
>>
>> On reflection, I suspect it has more to do with Map-Reduce task retries 
>> than some race condition.
>
> Correct. Not an issue for backup/restore but is a known issue for BigQuery 
> imports.
> We plan to eliminate duplicates in the MR level. 
>
>
>> j
>>
>> On Tuesday, 30 April 2013 22:59:53 UTC-6, Jason Collins wrote:
>>>
>>> We have seen the same phenomenon. 
>>>
>>> It's likely due to some kind of race condition in the backup tool 
>>> itself, but is not a problem there because when restoring, one of the dups 
>>> will just overwrite the other. But it does become a problem once ingested 
>>> into BigQuery.
>>>
>>> j
>>>
>>> On Monday, 29 April 2013 20:10:34 UTC-6, Mike wrote:

 Hi there

 I've noticed there may be duplicate records in the Backup data that 
 AppEngine produces.

 I can verify this because I'm loading the Backups into BigQuery. When I 
 search one of my tables, I can see the duplicates:

 SELECT __key__.id as X_id, COUNT(__key__.id) as X_count, created FROM 
 [TableId] GROUP BY X_id, created HAVING X_count > 1 ORDER BY created DESC;

 This shows there are 5,807 duplicates in a table of ~2 million entries 
 (~0.2%)

 I can give Google employees access to our BigQuery and Google Storage 
 accounts if that helps track down the issue.

 Cheers
 Mike

>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




[google-appengine] Re: Duplicate records in Backup Data?

2013-05-02 Thread Arie Ozarov


On Wednesday, May 1, 2013 3:39:06 PM UTC-7, Jason Collins wrote:
>
> On reflection, I suspect it has more to do with Map-Reduce task retries 
> than some race condition.

Correct. Not an issue for backup/restore but is a known issue for BigQuery 
imports.
We plan to eliminate duplicates in the MR level. 


> j
>
> On Tuesday, 30 April 2013 22:59:53 UTC-6, Jason Collins wrote:
>>
>> We have seen the same phenomenon. 
>>
>> It's likely due to some kind of race condition in the backup tool itself, 
>> but is not a problem there because when restoring, one of the dups will 
>> just overwrite the other. But it does become a problem once ingested into 
>> BigQuery.
>>
>> j
>>
>> On Monday, 29 April 2013 20:10:34 UTC-6, Mike wrote:
>>>
>>> Hi there
>>>
>>> I've noticed there may be duplicate records in the Backup data that 
>>> AppEngine produces.
>>>
>>> I can verify this because I'm loading the Backups into BigQuery. When I 
>>> search one of my tables, I can see the duplicates:
>>>
>>> SELECT __key__.id as X_id, COUNT(__key__.id) as X_count, created FROM 
>>> [TableId] GROUP BY X_id, created HAVING X_count > 1 ORDER BY created DESC;
>>>
>>> This shows there are 5,807 duplicates in a table of ~2 million entries 
>>> (~0.2%)
>>>
>>> I can give Google employees access to our BigQuery and Google Storage 
>>> accounts if that helps track down the issue.
>>>
>>> Cheers
>>> Mike
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




[google-appengine] Re: Duplicate records in Backup Data?

2013-05-01 Thread Mike
I would think it would be possible for the BigQuery team to discard 
duplicates when running the import? That's probably going to be the easiest 
solution



On Thursday, May 2, 2013 8:39:06 AM UTC+10, Jason Collins wrote:
>
> On reflection, I suspect it has more to do with Map-Reduce task retries 
> than some race condition.
>
> j
>
> On Tuesday, 30 April 2013 22:59:53 UTC-6, Jason Collins wrote:
>>
>> We have seen the same phenomenon. 
>>
>> It's likely due to some kind of race condition in the backup tool itself, 
>> but is not a problem there because when restoring, one of the dups will 
>> just overwrite the other. But it does become a problem once ingested into 
>> BigQuery.
>>
>> j
>>
>> On Monday, 29 April 2013 20:10:34 UTC-6, Mike wrote:
>>>
>>> Hi there
>>>
>>> I've noticed there may be duplicate records in the Backup data that 
>>> AppEngine produces.
>>>
>>> I can verify this because I'm loading the Backups into BigQuery. When I 
>>> search one of my tables, I can see the duplicates:
>>>
>>> SELECT __key__.id as X_id, COUNT(__key__.id) as X_count, created FROM 
>>> [TableId] GROUP BY X_id, created HAVING X_count > 1 ORDER BY created DESC;
>>>
>>> This shows there are 5,807 duplicates in a table of ~2 million entries 
>>> (~0.2%)
>>>
>>> I can give Google employees access to our BigQuery and Google Storage 
>>> accounts if that helps track down the issue.
>>>
>>> Cheers
>>> Mike
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




[google-appengine] Re: Duplicate records in Backup Data?

2013-05-01 Thread Jason Collins
On reflection, I suspect it has more to do with Map-Reduce task retries 
than some race condition.

j

On Tuesday, 30 April 2013 22:59:53 UTC-6, Jason Collins wrote:
>
> We have seen the same phenomenon. 
>
> It's likely due to some kind of race condition in the backup tool itself, 
> but is not a problem there because when restoring, one of the dups will 
> just overwrite the other. But it does become a problem once ingested into 
> BigQuery.
>
> j
>
> On Monday, 29 April 2013 20:10:34 UTC-6, Mike wrote:
>>
>> Hi there
>>
>> I've noticed there may be duplicate records in the Backup data that 
>> AppEngine produces.
>>
>> I can verify this because I'm loading the Backups into BigQuery. When I 
>> search one of my tables, I can see the duplicates:
>>
>> SELECT __key__.id as X_id, COUNT(__key__.id) as X_count, created FROM 
>> [TableId] GROUP BY X_id, created HAVING X_count > 1 ORDER BY created DESC;
>>
>> This shows there are 5,807 duplicates in a table of ~2 million entries 
>> (~0.2%)
>>
>> I can give Google employees access to our BigQuery and Google Storage 
>> accounts if that helps track down the issue.
>>
>> Cheers
>> Mike
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




[google-appengine] Re: Duplicate records in Backup Data?

2013-04-30 Thread Jason Collins
We have seen the same phenomenon. 

It's likely due to some kind of race condition in the backup tool itself, 
but is not a problem there because when restoring, one of the dups will 
just overwrite the other. But it does become a problem once ingested into 
BigQuery.

j

On Monday, 29 April 2013 20:10:34 UTC-6, Mike wrote:
>
> Hi there
>
> I've noticed there may be duplicate records in the Backup data that 
> AppEngine produces.
>
> I can verify this because I'm loading the Backups into BigQuery. When I 
> search one of my tables, I can see the duplicates:
>
> SELECT __key__.id as X_id, COUNT(__key__.id) as X_count, created FROM 
> [TableId] GROUP BY X_id, created HAVING X_count > 1 ORDER BY created DESC;
>
> This shows there are 5,807 duplicates in a table of ~2 million entries 
> (~0.2%)
>
> I can give Google employees access to our BigQuery and Google Storage 
> accounts if that helps track down the issue.
>
> Cheers
> Mike
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.