Jiangtao Peng created ARROW-17721:
-------------------------------------

             Summary: [C++][Gandiva] Expression Evaluation Performance 
Improvement using Mimalloc
                 Key: ARROW-17721
                 URL: https://issues.apache.org/jira/browse/ARROW-17721
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++ - Gandiva
            Reporter: Jiangtao Peng


Arrow use jemalloc as default memory allocator. For some reason, I am going to 
use mimalloc instead. But there seems have big performance difference between 
two memory allocators.

Here are my steps.

I use simple compile options:
{code:java}
-DCMAKE_BUILD_TYPE=debug
-DARROW_JEMALLOC=OFF|ON
-DARROW_MIMALLOC=ON|OFF
-DARROW_GANDIVA=ON
-DARROW_GANDIVA_STATIC_LIBSTDCPP=ON
-DARROW_BUILD_TESTS=ON
{code}
 
Then I write a simple case:
{code:cpp}
#include <gtest/gtest.h>
#include "arrow/memory_pool.h"
#include "arrow/status.h"

#include "gandiva/projector.h"
#include "gandiva/tests/test_util.h"
#include "gandiva/tree_expr_builder.h"

#include <chrono>
#include <iostream>

namespace gandiva {

using arrow::boolean;
using arrow::date64;
using arrow::int32;
using arrow::int64;
using arrow::utf8;

class TestUtf8Perf : public ::testing::Test {
 public:
  void SetUp() { pool_ = arrow::default_memory_pool(); }

 protected:
  arrow::MemoryPool* pool_;
};

void TestPerf(int64_t char_length, int64_t num_records) {
  // schema for input fields
  auto field_a = field("a", utf8());
  auto schema = arrow::schema({field_a});

  // output fields
  auto res = field("res", utf8());

  auto node_a = TreeExprBuilder::MakeField(field_a);
  auto upper_a = TreeExprBuilder::MakeFunction("upper", {node_a}, utf8());
  auto expr = TreeExprBuilder::MakeExpression(upper_a, res);

  // Build a projector for the expressions.
  std::shared_ptr<Projector> projector;
  auto status = Projector::Make(schema, {expr}, TestConfiguration(), 
&projector);
  EXPECT_TRUE(status.ok()) << status.message();

  std::string val = std::string(char_length, 'a');
  arrow::StringBuilder builder;
  for (int i = 0; i < num_records; i++) {
    auto _ = builder.Append(val);
  }
  std::shared_ptr<arrow::StringArray> array_a;
  auto _ = builder.Finish(&array_a);

  // prepare input record batch
  auto in_batch = arrow::RecordBatch::Make(schema, num_records, {array_a});

  auto start_epoch = std::chrono::duration_cast<std::chrono::milliseconds>(
                         std::chrono::system_clock::now().time_since_epoch())
                         .count();
  // Evaluate expression
  arrow::ArrayVector outputs;
  status = projector->Evaluate(*in_batch, pool_, &outputs);
  EXPECT_TRUE(status.ok()) << status.message();

  std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(
                   std::chrono::system_clock::now().time_since_epoch())
                       .count() -
                   start_epoch
            << "ms" << std::endl;
}
TEST_F(TestUtf8Perf, TestMemoryAllocsPerf) {
  TestPerf(20, 10000);
  TestPerf(20, 100000);
  TestPerf(200, 10000);
  TestPerf(200, 100000);
  TestPerf(2000, 10000);
}

}  // namespace gandiva
{code}
this case is going to calculate expression {*}upper(a){*}, *a* has different 
size with 20/200/2000. Evaluation time results are:
|char_length|num_records|Using Mimalloc (ms)|Using Jemalloc(ms)|
|20|10000|29|3|
|20|100000|2686|26|
|200|10000|954|11|
|200|100000|220153|118|
|2000|10000|21162|89|

 
Is this performance gap expected? Or any other compile options should I note? 
How to make performance better using mimalloc?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to