[jira] [Updated] (ARROW-3698) Segmentation fault when using a large table in Gandiva

2018-11-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-3698:
--
Labels: pull-request-available  (was: )

> Segmentation fault when using a large table in Gandiva
> --
>
> Key: ARROW-3698
> URL: https://issues.apache.org/jira/browse/ARROW-3698
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Gandiva
>Reporter: Siyuan Zhuang
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> >>> import pyarrow as pa
> Registry has 519 pre-compiled functions
> >>> import pandas as pd
> >>> import numpy as np
> >>> import pyarrow.gandiva as gandiva
> >>> import timeit
> >>>
> >>> from matplotlib import pyplot as plt
> >>> for scale in range(25, 26):
> ... frame_data = 1.0 * np.random.randint(0, 100, size=(2**scale, 2))
> ... df = pd.DataFrame(frame_data).add_prefix("col")
> ... table = pa.Table.from_pandas(df)
> ...
> >>>
> >>> def float64_add(table):
> ... builder = gandiva.TreeExprBuilder()
> ... node_a = builder.make_field(table.schema.field_by_name("col0"))
> ... node_b = builder.make_field(table.schema.field_by_name("col1"))
> ... sum = builder.make_function(b"add", [node_a, node_b], pa.float64())
> ... field_result = pa.field("c", pa.float64())
> ... expr = builder.make_expression(sum, field_result)
> ... projector = gandiva.make_projector(table.schema, [expr], 
> pa.default_memory_pool())
> ... return projector
> ...
> >>> projector = float64_add(table)
> >>> projector.evaluate(table.to_batches()[0])
> [1] 36393 segmentation fault python{code}
> It is because there is an integer overflow in Gandiva:
> [https://github.com/apache/arrow/blob/1a6545aa51f5f41f0233ee0a11ef87d21127c5ed/cpp/src/gandiva/projector.cc#L141]
> It should be `int64_t` instead of `int`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3698) Segmentation fault when using a large table in Gandiva

2018-11-03 Thread Siyuan Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyuan Zhuang updated ARROW-3698:
-
Summary: Segmentation fault when using a large table in Gandiva  (was: 
Segmentation fault when using large table in Gandiva)

> Segmentation fault when using a large table in Gandiva
> --
>
> Key: ARROW-3698
> URL: https://issues.apache.org/jira/browse/ARROW-3698
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Gandiva
>Reporter: Siyuan Zhuang
>Priority: Major
>
> {code}
> >>> import pyarrow as pa
> Registry has 519 pre-compiled functions
> >>> import pandas as pd
> >>> import numpy as np
> >>> import pyarrow.gandiva as gandiva
> >>> import timeit
> >>>
> >>> from matplotlib import pyplot as plt
> >>> for scale in range(25, 26):
> ... frame_data = 1.0 * np.random.randint(0, 100, size=(2**scale, 2))
> ... df = pd.DataFrame(frame_data).add_prefix("col")
> ... table = pa.Table.from_pandas(df)
> ...
> >>>
> >>> def float64_add(table):
> ... builder = gandiva.TreeExprBuilder()
> ... node_a = builder.make_field(table.schema.field_by_name("col0"))
> ... node_b = builder.make_field(table.schema.field_by_name("col1"))
> ... sum = builder.make_function(b"add", [node_a, node_b], pa.float64())
> ... field_result = pa.field("c", pa.float64())
> ... expr = builder.make_expression(sum, field_result)
> ... projector = gandiva.make_projector(table.schema, [expr], 
> pa.default_memory_pool())
> ... return projector
> ...
> >>> projector = float64_add(table)
> >>> projector.evaluate(table.to_batches()[0])
> [1] 36393 segmentation fault python{code}
> It is because there is an integer overflow in Gandiva:
> [https://github.com/apache/arrow/blob/1a6545aa51f5f41f0233ee0a11ef87d21127c5ed/cpp/src/gandiva/projector.cc#L141]
> It should be `int64_t` instead of `int`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)