[ 
https://issues.apache.org/jira/browse/ARROW-18274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laurent Querel updated ARROW-18274:
-----------------------------------
    Description: 
There is a bug with union of structs in V10.

The first unit test crash with a panic (i.e. invalid memory address or nil 
pointer dereference). The second test works as expected.

 
{code:go}
func TestDoesNotWork(t *testing.T) {
   dt1 := arrow.SparseUnionOf([]arrow.Field{
      {Name: "c", Type: arrow2.DictU16String},
   }, []arrow.UnionTypeCode{0})
   dt2 := arrow.StructOf(
      arrow.Field{Name: "b", Type: dt1},
   )
   dt3 := arrow.SparseUnionOf([]arrow.Field{
      {Name: "a", Type: dt2},
   }, []arrow.UnionTypeCode{0})
   pool := memory.NewGoAllocator()

   builder := array.NewSparseUnionBuilder(pool, dt3)
   arr := builder.NewArray()
   assert.Equal(t, 0, arr.Len())
}

func TestWorksAsExpected(t *testing.T) {
   dt1 := arrow.SparseUnionOf([]arrow.Field{
      {Name: "c", Type: &arrow.DictionaryType{
         IndexType: arrow.PrimitiveTypes.Uint16,
         ValueType: arrow.BinaryTypes.String,
         Ordered:   false,
      }},
   }, []arrow.UnionTypeCode{0})
   dt2 := arrow.SparseUnionOf([]arrow.Field{
      {Name: "a", Type: dt1},
   }, []arrow.UnionTypeCode{0})
   pool := memory.NewGoAllocator()

   builder := array.NewSparseUnionBuilder(pool, dt2)
   arr := builder.NewArray()
   assert.Equal(t, 0, arr.Len())
}
{code}
 

*Analysis:*
 - The `NewSparseUnionBuilder` calls the builders for each variant and also 
calls defer builder.Release. 
 - The Struct Release method calls the Release methods of every field even if 
the refCount is not 0, so the Release method of the second union is called 
followed by the Release method of the dictionary. 
 - Although, the union builder is returned without error, the builder is not 
usable.
 - This bug doesn't happen with 2 nested unions. As the internal counter is 
properly tested.

 

First, I don't understand why the Release method of each variant is called 
right after the Union constructor is created. I also don't understand why the 
Release method of the structure calls the Release method of each field 
regardless of the value of the internal refCount. This looks like a bug to me, 
but I'm not quite sure yet what the right way to fix it will be.

 

Any idea?

  was:
Union of structs is currently buggy in V10. See the following example.

 
{code:go}
dt1 := arrow.SparseUnionOf([]arrow.Field{
{Name: "c", Type: &arrow.DictionaryType
{ IndexType: arrow.PrimitiveTypes.Uint16, ValueType: arrow.BinaryTypes.String, 
Ordered: false, }}
,
}, []arrow.UnionTypeCode{0})
dt2 := arrow.SparseUnionOf([]arrow.Field
{ \{Name: "a", Type: dt1}
,
}, []arrow.UnionTypeCode{0})
pool := memory.NewGoAllocator()
array := array.NewSparseUnionBuilder(pool, dt2) {code}
 

The created array is unusable because the memo table of the dictionary builder 
(field 'c') is nil.

When I replace the struct by a second union (so 2 nested union), the dictionary 
builder is properly initialized.

 

*First analysis:*
 - The `NewSparseUnionBuilder` calls the builders for each variant and also 
calls defer builder.Release. 
 - The Struct Release method calls the Release methods of every field even if 
the refCount is not 0, so the Release method of the second union is called 
followed by the Release method of the dictionary. 

 

This bug doesn't happen with 2 nested unions as the internal counter is 
properly tested.

 

In the first place I don't understand why the Release method of each variant is 
call just after the creation of the Union builder. I also don't understand why 
the Release method of the Struct calls the Release method of each field 
independently of the value of the internal refCount.

 

Any idea?


> [Go] Sparse union of structs is buggy
> -------------------------------------
>
>                 Key: ARROW-18274
>                 URL: https://issues.apache.org/jira/browse/ARROW-18274
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Go
>    Affects Versions: 10.0.0
>            Reporter: Laurent Querel
>            Priority: Major
>
> There is a bug with union of structs in V10.
> The first unit test crash with a panic (i.e. invalid memory address or nil 
> pointer dereference). The second test works as expected.
>  
> {code:go}
> func TestDoesNotWork(t *testing.T) {
>    dt1 := arrow.SparseUnionOf([]arrow.Field{
>       {Name: "c", Type: arrow2.DictU16String},
>    }, []arrow.UnionTypeCode{0})
>    dt2 := arrow.StructOf(
>       arrow.Field{Name: "b", Type: dt1},
>    )
>    dt3 := arrow.SparseUnionOf([]arrow.Field{
>       {Name: "a", Type: dt2},
>    }, []arrow.UnionTypeCode{0})
>    pool := memory.NewGoAllocator()
>    builder := array.NewSparseUnionBuilder(pool, dt3)
>    arr := builder.NewArray()
>    assert.Equal(t, 0, arr.Len())
> }
> func TestWorksAsExpected(t *testing.T) {
>    dt1 := arrow.SparseUnionOf([]arrow.Field{
>       {Name: "c", Type: &arrow.DictionaryType{
>          IndexType: arrow.PrimitiveTypes.Uint16,
>          ValueType: arrow.BinaryTypes.String,
>          Ordered:   false,
>       }},
>    }, []arrow.UnionTypeCode{0})
>    dt2 := arrow.SparseUnionOf([]arrow.Field{
>       {Name: "a", Type: dt1},
>    }, []arrow.UnionTypeCode{0})
>    pool := memory.NewGoAllocator()
>    builder := array.NewSparseUnionBuilder(pool, dt2)
>    arr := builder.NewArray()
>    assert.Equal(t, 0, arr.Len())
> }
> {code}
>  
> *Analysis:*
>  - The `NewSparseUnionBuilder` calls the builders for each variant and also 
> calls defer builder.Release. 
>  - The Struct Release method calls the Release methods of every field even if 
> the refCount is not 0, so the Release method of the second union is called 
> followed by the Release method of the dictionary. 
>  - Although, the union builder is returned without error, the builder is not 
> usable.
>  - This bug doesn't happen with 2 nested unions. As the internal counter is 
> properly tested.
>  
> First, I don't understand why the Release method of each variant is called 
> right after the Union constructor is created. I also don't understand why the 
> Release method of the structure calls the Release method of each field 
> regardless of the value of the internal refCount. This looks like a bug to 
> me, but I'm not quite sure yet what the right way to fix it will be.
>  
> Any idea?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to