[ 
https://issues.apache.org/jira/browse/BEAM-8825?focusedWorklogId=359511&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-359511
 ]

ASF GitHub Bot logged work on BEAM-8825:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/Dec/19 17:13
            Start Date: 13/Dec/19 17:13
    Worklog Time Spent: 10m 
      Work Description: udim commented on issue #10380: [BEAM-8825] Add limit 
on number of mutated rows to batching/sorting stages.
URL: https://github.com/apache/beam/pull/10380#issuecomment-565523737
 
 
   Run Java PreCommit
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 359511)
    Time Spent: 1h 10m  (was: 1h)

> OOM when writing large numbers of 'narrow' rows
> -----------------------------------------------
>
>                 Key: BEAM-8825
>                 URL: https://issues.apache.org/jira/browse/BEAM-8825
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-gcp
>    Affects Versions: 2.9.0, 2.10.0, 2.11.0, 2.12.0, 2.13.0, 2.14.0, 2.15.0, 
> 2.16.0, 2.17.0
>            Reporter: Niel Markwick
>            Assignee: Niel Markwick
>            Priority: Major
>             Fix For: 2.18.0
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> SpannerIO can OOM when writing large numbers of 'narrow' rows. 
>  
> SpannerIO puts  input mutation elements into batches for efficient writing.
> These batches are limited by number of cells mutated, and size of data 
> written (5000 cells, 1MB data). SpannerIO groups enough mutations to build 
> 1000 of these groups (5M cells, 1GB data), then sorts and batches them.
> When the number of cells and size of data is very small (<5 cells, <100 
> bytes), the memory overhead of storing millions of mutations for batching is 
> significant, and can lead to OOMs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to