[ https://issues.apache.org/jira/browse/ARROW-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420090#comment-16420090 ]
ASF GitHub Bot commented on ARROW-2351: --------------------------------------- gaolizhou commented on a change in pull request #1803: ARROW-2351 [C++] StringBuilder::append(vector<string>...) not impleme… URL: https://github.com/apache/arrow/pull/1803#discussion_r178223472 ########## File path: cpp/src/arrow/builder.cc ########## @@ -1385,6 +1385,28 @@ const uint8_t* BinaryBuilder::GetValue(int64_t i, int32_t* out_length) const { StringBuilder::StringBuilder(MemoryPool* pool) : BinaryBuilder(utf8(), pool) {} +Status StringBuilder::Append(const std::vector<std::string>& values, + uint8_t* null_bytes) { + std::size_t total_length = std::accumulate( + values.begin(), values.end(), 0ULL, + [](uint64_t sum, const std::string& str) { return sum + str.size(); }); Review comment: Tested with below code, for-loop-solution and accumulate-solution almost has same performance.To count 100000000 strings, for-loop-solution takes 212.312 ms, accumulate-solution takes 215.120 ms. log: ``` lizgao@lizgao-ubuntu:~/CLionProjects/cmake-test$ ./cmake_test WARNING: Logging before InitGoogleLogging() is written to STDERR I0330 10:29:56.864248 3989 accmulate-lambda-performance.cc:15] Begin for_loop_solution I0330 10:29:57.076560 3989 accmulate-lambda-performance.cc:19] End for_loop_solution = 8000000000 I0330 10:29:57.076603 3989 accmulate-lambda-performance.cc:23] Begin accumulate_solution I0330 10:29:57.291723 3989 accmulate-lambda-performance.cc:27] End accumulate_solution = 8000000000 ``` ``` // // Created by lizgao on 3/30/18. // #include <cstdint> #include <numeric> #include <string> #include <vector> #include <glog/logging.h> namespace { void for_loop_solution(const std::vector<std::string> &values) { std::size_t total_length = 0; std::size_t cnt = values.size(); LOG(INFO) << "Begin for_loop_solution"; for (std::size_t i = 0; i < cnt; ++i) { total_length += values[i].size(); } LOG(INFO) << "End for_loop_solution = " << total_length; } void accumulate_solution(const std::vector<std::string> &values) { LOG(INFO) << "Begin accumulate_solution"; std::size_t total_length = std::accumulate( values.begin(), values.end(), 0ULL, [](uint64_t sum, const std::string& str) { return sum + str.size(); }); LOG(INFO) << "End accumulate_solution = " << total_length; } } void accumulate_lambda_performance_test() { std::vector<std::string> str_list(100000000, std::string(80, '*')); for_loop_solution(str_list); accumulate_solution(str_list); } ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] StringBuilder::append(vector<string>...) not implemented > -------------------------------------------------------------- > > Key: ARROW-2351 > URL: https://issues.apache.org/jira/browse/ARROW-2351 > Project: Apache Arrow > Issue Type: Bug > Components: C++ > Affects Versions: 0.9.0 > Reporter: Rares Vernica > Priority: Major > Labels: pull-request-available > Fix For: 0.10.0 > > > For {{StringBuilder}} an {{append(vector<string>, uint8_t*)}} function is > [declared|https://github.com/apache/arrow/blob/7b2c79765cf92760e1f8cca079159d9613b86412/cpp/src/arrow/builder.h#L721] > and > [documented|http://arrow.apache.org/docs/cpp/classarrow_1_1_string_builder.html#a59be34b5e11017a392b4ee019d90da3c] > but it does not seem to be implemented. > {code:java} > undefined reference to `arrow::StringBuilder::Append(std::vector<std::string, > std::allocator<std::string> > const&, unsigned char*)' > collect2: error: ld returned 1 exit status > {code} > Also worth noting is that the similar function in {{NumericBuilder}} uses > {{vector<bool>}} for the null values instead of {{uint8_t*}}. It might be > worth making them consistent. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)