[ https://issues.apache.org/jira/browse/ARROW-3698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney reassigned ARROW-3698: ----------------------------------- Assignee: Siyuan Zhuang > [C++] Segmentation fault when using a large table in Gandiva > ------------------------------------------------------------ > > Key: ARROW-3698 > URL: https://issues.apache.org/jira/browse/ARROW-3698 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Gandiva > Reporter: Siyuan Zhuang > Assignee: Siyuan Zhuang > Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 4.5h > Remaining Estimate: 0h > > {code} > >>> import pyarrow as pa > Registry has 519 pre-compiled functions > >>> import pandas as pd > >>> import numpy as np > >>> import pyarrow.gandiva as gandiva > >>> import timeit > >>> > >>> from matplotlib import pyplot as plt > >>> for scale in range(25, 26): > ... frame_data = 1.0 * np.random.randint(0, 100, size=(2**scale, 2)) > ... df = pd.DataFrame(frame_data).add_prefix("col") > ... table = pa.Table.from_pandas(df) > ... > >>> > >>> def float64_add(table): > ... builder = gandiva.TreeExprBuilder() > ... node_a = builder.make_field(table.schema.field_by_name("col0")) > ... node_b = builder.make_field(table.schema.field_by_name("col1")) > ... sum = builder.make_function(b"add", [node_a, node_b], pa.float64()) > ... field_result = pa.field("c", pa.float64()) > ... expr = builder.make_expression(sum, field_result) > ... projector = gandiva.make_projector(table.schema, [expr], > pa.default_memory_pool()) > ... return projector > ... > >>> projector = float64_add(table) > >>> projector.evaluate(table.to_batches()[0]) > [1] 36393 segmentation fault python{code} > It is because there is an integer overflow in Gandiva: > [https://github.com/apache/arrow/blob/1a6545aa51f5f41f0233ee0a11ef87d21127c5ed/cpp/src/gandiva/projector.cc#L141] > It should be `int64_t` instead of `int`. -- This message was sent by Atlassian JIRA (v7.6.3#76005)