[ 
https://issues.apache.org/jira/browse/ARROW-17473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sasha Sirovica updated ARROW-17473:
-----------------------------------
          Component/s:     (was: Go)
    Affects Version/s:     (was: 9.0.0)
          Description:     (was: When using `arrow.BinaryTypes.String` in a 
schema, appending multiple strings, and then writing a record out to parquet 
the memory of the program continuously increases.

 

I took a heap dump on my computer midway through the program and the majority 
of allocations comes from `StringBuilder.Append`. I approached 16GB of RAM 
before terminating the program.

 

I was not able to replicate this behavior with just PrimativeTypes. Another 
interesting point, if the records are created but never written with pqarrow 
there are also no memory leaks. In the below program commenting out 
`w.Write(rec)` will not cause memory issues.

 

Example program which causes memory to leak:
{code:java}
package main

import (
   "os"
   "testing"

   "github.com/apache/arrow/go/v9/arrow"
   "github.com/apache/arrow/go/v9/arrow/array"
   "github.com/apache/arrow/go/v9/arrow/memory"
   "github.com/apache/arrow/go/v9/parquet"
   "github.com/apache/arrow/go/v9/parquet/compress"
   "github.com/apache/arrow/go/v9/parquet/pqarrow"
)

func main() {
   f, _ := os.Create("/tmp/test.parquet")

   arrowProps := pqarrow.DefaultWriterProps()
   schema := arrow.NewSchema(
      []arrow.Field{
         {Name: "aString", Type: arrow.BinaryTypes.String},
      },
      nil,
   )
   w, _ := pqarrow.NewFileWriter(schema, f, 
parquet.NewWriterProperties(parquet.WithCompression(compress.Codecs.Snappy)), 
arrowProps)

   builder := array.NewRecordBuilder(memory.DefaultAllocator, schema)
   for i := 1; i < 50000000; i++ {
      builder.Field(0).(*array.StringBuilder).Append("HelloWorld!")
      if i%2000000 == 0 {
         // Write row groups out every 2M times
         rec := builder.NewRecord()
         w.Write(rec)
         rec.Release()
      }
   }
   w.Close()
}{code})
              Summary: .  (was: [Go] String Binary Builder Leaks Memory When 
Writing to Parquet)

> .
> -
>
>                 Key: ARROW-17473
>                 URL: https://issues.apache.org/jira/browse/ARROW-17473
>             Project: Apache Arrow
>          Issue Type: Bug
>         Environment: Mac
>            Reporter: Sasha Sirovica
>            Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to