[GitHub] [incubator-doris] morningman commented on a change in pull request #3641: [Bug] Ignore loading DELETE status tablet error when restarting BE
morningman commented on a change in pull request #3641: URL: https://github.com/apache/incubator-doris/pull/3641#discussion_r428472871 ## File path: docs/en/administrator-guide/operation/tablet-meta-tool.md ## @@ -93,14 +93,39 @@ Order: ### Delete header -In order to realize the function of deleting a tablet from a disk of a be. +In order to realize the function of deleting a tablet from a disk of a BE. Support single delete and batch delete. -Order: +Single delete: ``` ./lib/meta_tool --operation=delete_meta --root_path=/path/to/root_path --tablet_id=xxx --schema_hash=xxx` ``` +Batch delete: + +``` +./lib/meta_tool --operation=batch_delete_meta --tablet_file=/path/to/tablet_file.txt Review comment: Maybe at another PR This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman commented on a change in pull request #3641: [Bug] Ignore loading DELETE status tablet error when restarting BE
morningman commented on a change in pull request #3641: URL: https://github.com/apache/incubator-doris/pull/3641#discussion_r428472197 ## File path: be/src/tools/meta_tool.cpp ## @@ -122,13 +126,131 @@ void delete_meta(DataDir* data_dir) { std::cout << "delete meta successfully" << std::endl; } +Status init_data_dir(const std::string dir, DataDir** ret) { +std::string root_path; +Status st = FileUtils::canonicalize(dir, &root_path); +if (!st.ok()) { +std::cout << "invalid root path:" << FLAGS_root_path +<< ", error: " << st.to_string() << std::endl; +return Status::InternalError("invalid root path"); +} +doris::StorePath path; +auto res = parse_root_path(root_path, &path); +if (res != OLAP_SUCCESS) { +std::cout << "parse root path failed:" << root_path << std::endl; +return Status::InternalError("parse root path failed"); +} + +*ret = new (std::nothrow) DataDir(path.path, path.capacity_bytes, path.storage_medium); +if (*ret == nullptr) { +std::cout << "new data dir failed" << std::endl; +return Status::InternalError("new data dir failed"); +} +st = (*ret)->init(); +if (!st.ok()) { +std::cout << "data_dir load failed" << std::endl; +return Status::InternalError("data_dir load failed"); +} + +return Status::OK(); +} + +void batch_delete_meta(const std::string tablet_file) { +// each line in tablet file indicate a tablet to delete, format is: +// data_dir,tablet_id,schema_hash +// eg: +// /data1/palo.HDD,100010,11212389324 +// /data2/palo.HDD,100010,23049230234 +std::ifstream infile(tablet_file); +std::string line; +int err_num = 0; +int delete_num = 0; +int total_num = 0; +std::unordered_map> dir_map; +while (std::getline(infile, line)) { +total_num++; +vector v = strings::Split(line, ","); +if (v.size() != 3) { +std::cout << "invalid line in tablet_file: " << line << std::endl; +err_num++; +continue; +} +// 1. get dir +std::string dir; +Status st = FileUtils::canonicalize(v[0], &dir); +if (!st.ok()) { +std::cout << "invalid root dir in tablet_file: " << line << std::endl; +err_num++; +continue; +} + +if (dir_map.find(dir) == dir_map.end()) { +// new data dir, init it +DataDir* data_dir_p = nullptr; +Status st = init_data_dir(dir, &data_dir_p); +if (!st.ok()) { +std::cout << "invalid root path:" << FLAGS_root_path +<< ", error: " << st.to_string() << std::endl; +err_num++; +continue; +} +dir_map[dir] = std::unique_ptr(data_dir_p); +std::cout << "get a new data dir: " << dir << std::endl; +} +DataDir* data_dir = dir_map[dir].get(); +if (data_dir == nullptr) { +std::cout << "failed to get data dir: " << line << std::endl; +err_num++; +continue; +} + +// 2. get tablet id/schema_hash +int64_t tablet_id; +if (!safe_strto64(v[1].c_str(), &tablet_id)) { +std::cout << "invalid tablet id: " << line << std::endl; +err_num++; +continue; +} +int64_t schema_hash; +if (!safe_strto64(v[2].c_str(), &schema_hash)) { +std::cout << "invalid schema hash: " << line << std::endl; +err_num++; +continue; +} + +OLAPStatus s = TabletMetaManager::remove(data_dir, tablet_id, schema_hash); +if (s != OLAP_SUCCESS) { +std::cout << "delete tablet meta failed for tablet_id:" +<< FLAGS_tablet_id << ", schema_hash:" << FLAGS_schema_hash Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman commented on a change in pull request #3641: [Bug] Ignore loading DELETE status tablet error when restarting BE
morningman commented on a change in pull request #3641: URL: https://github.com/apache/incubator-doris/pull/3641#discussion_r428472083 ## File path: be/src/tools/meta_tool.cpp ## @@ -122,13 +126,131 @@ void delete_meta(DataDir* data_dir) { std::cout << "delete meta successfully" << std::endl; } +Status init_data_dir(const std::string dir, DataDir** ret) { +std::string root_path; +Status st = FileUtils::canonicalize(dir, &root_path); +if (!st.ok()) { +std::cout << "invalid root path:" << FLAGS_root_path +<< ", error: " << st.to_string() << std::endl; +return Status::InternalError("invalid root path"); +} +doris::StorePath path; +auto res = parse_root_path(root_path, &path); +if (res != OLAP_SUCCESS) { +std::cout << "parse root path failed:" << root_path << std::endl; +return Status::InternalError("parse root path failed"); +} + +*ret = new (std::nothrow) DataDir(path.path, path.capacity_bytes, path.storage_medium); +if (*ret == nullptr) { +std::cout << "new data dir failed" << std::endl; +return Status::InternalError("new data dir failed"); +} +st = (*ret)->init(); +if (!st.ok()) { +std::cout << "data_dir load failed" << std::endl; +return Status::InternalError("data_dir load failed"); +} + +return Status::OK(); +} + +void batch_delete_meta(const std::string tablet_file) { +// each line in tablet file indicate a tablet to delete, format is: +// data_dir,tablet_id,schema_hash +// eg: +// /data1/palo.HDD,100010,11212389324 +// /data2/palo.HDD,100010,23049230234 +std::ifstream infile(tablet_file); +std::string line; Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman commented on a change in pull request #3641: [Bug] Ignore loading DELETE status tablet error when restarting BE
morningman commented on a change in pull request #3641: URL: https://github.com/apache/incubator-doris/pull/3641#discussion_r428471970 ## File path: be/src/tools/meta_tool.cpp ## @@ -122,13 +126,131 @@ void delete_meta(DataDir* data_dir) { std::cout << "delete meta successfully" << std::endl; } +Status init_data_dir(const std::string dir, DataDir** ret) { +std::string root_path; +Status st = FileUtils::canonicalize(dir, &root_path); +if (!st.ok()) { +std::cout << "invalid root path:" << FLAGS_root_path +<< ", error: " << st.to_string() << std::endl; +return Status::InternalError("invalid root path"); +} +doris::StorePath path; +auto res = parse_root_path(root_path, &path); +if (res != OLAP_SUCCESS) { +std::cout << "parse root path failed:" << root_path << std::endl; +return Status::InternalError("parse root path failed"); +} + +*ret = new (std::nothrow) DataDir(path.path, path.capacity_bytes, path.storage_medium); +if (*ret == nullptr) { +std::cout << "new data dir failed" << std::endl; +return Status::InternalError("new data dir failed"); +} +st = (*ret)->init(); +if (!st.ok()) { +std::cout << "data_dir load failed" << std::endl; +return Status::InternalError("data_dir load failed"); +} + +return Status::OK(); +} + +void batch_delete_meta(const std::string tablet_file) { Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman commented on a change in pull request #3641: [Bug] Ignore loading DELETE status tablet error when restarting BE
morningman commented on a change in pull request #3641: URL: https://github.com/apache/incubator-doris/pull/3641#discussion_r428470241 ## File path: be/src/olap/data_dir.cpp ## @@ -709,13 +716,25 @@ OLAPStatus DataDir::load() { return true; }; OLAPStatus load_tablet_status = TabletMetaManager::traverse_headers(_meta, load_tablet_func); -if (failed_tablet_ids.size() != 0 && !config::ignore_load_tablet_failure) { -LOG(FATAL) << "load tablets from header failed, failed tablets size: " << failed_tablet_ids.size(); +if (failed_tablet_ids.size() != 0) { +LOG(WARNING) << "load tablets from header failed" +<< ", loaded tablet: " << tablet_ids.size() +<< ", error tablet: " << failed_tablet_ids.size() +<< ", path: " << _path; +if (!config::ignore_load_tablet_failure) { +LOG(FATAL) << "load tablets encounter failure. stop BE process. path: " << _path; +} } if (load_tablet_status != OLAP_SUCCESS) { -LOG(WARNING) << "there is failure when loading tablet headers, path:" << _path; +LOG(WARNING) << "there is failure when loading tablet headers" +<< ", loaded tablet: " << tablet_ids.size() +<< ", error tablet: " << failed_tablet_ids.size() +<< ", path: " << _path; } else { -LOG(INFO) << "load rowset from meta finished, data dir: " << _path; +LOG(INFO) << "load rowset from meta finished" Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman commented on a change in pull request #3641: [Bug] Ignore loading DELETE status tablet error when restarting BE
morningman commented on a change in pull request #3641: URL: https://github.com/apache/incubator-doris/pull/3641#discussion_r428470336 ## File path: be/src/tools/meta_tool.cpp ## @@ -122,13 +126,131 @@ void delete_meta(DataDir* data_dir) { std::cout << "delete meta successfully" << std::endl; } +Status init_data_dir(const std::string dir, DataDir** ret) { Review comment: I will change it to unique_ptr This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] kangpinghuang commented on a change in pull request #3641: [Bug] Ignore loading DELETE status tablet error when restarting BE
kangpinghuang commented on a change in pull request #3641: URL: https://github.com/apache/incubator-doris/pull/3641#discussion_r428463285 ## File path: be/src/olap/data_dir.cpp ## @@ -709,13 +716,25 @@ OLAPStatus DataDir::load() { return true; }; OLAPStatus load_tablet_status = TabletMetaManager::traverse_headers(_meta, load_tablet_func); -if (failed_tablet_ids.size() != 0 && !config::ignore_load_tablet_failure) { -LOG(FATAL) << "load tablets from header failed, failed tablets size: " << failed_tablet_ids.size(); +if (failed_tablet_ids.size() != 0) { +LOG(WARNING) << "load tablets from header failed" +<< ", loaded tablet: " << tablet_ids.size() +<< ", error tablet: " << failed_tablet_ids.size() +<< ", path: " << _path; +if (!config::ignore_load_tablet_failure) { +LOG(FATAL) << "load tablets encounter failure. stop BE process. path: " << _path; +} } if (load_tablet_status != OLAP_SUCCESS) { -LOG(WARNING) << "there is failure when loading tablet headers, path:" << _path; +LOG(WARNING) << "there is failure when loading tablet headers" +<< ", loaded tablet: " << tablet_ids.size() +<< ", error tablet: " << failed_tablet_ids.size() +<< ", path: " << _path; } else { -LOG(INFO) << "load rowset from meta finished, data dir: " << _path; +LOG(INFO) << "load rowset from meta finished" Review comment: ```suggestion LOG(INFO) << "load tablet meta finished" ``` ## File path: be/src/tools/meta_tool.cpp ## @@ -122,13 +126,131 @@ void delete_meta(DataDir* data_dir) { std::cout << "delete meta successfully" << std::endl; } +Status init_data_dir(const std::string dir, DataDir** ret) { Review comment: ```suggestion Status init_data_dir(const std::string& dir, DataDir** ret) { ``` ## File path: be/src/tools/meta_tool.cpp ## @@ -122,13 +126,131 @@ void delete_meta(DataDir* data_dir) { std::cout << "delete meta successfully" << std::endl; } +Status init_data_dir(const std::string dir, DataDir** ret) { +std::string root_path; +Status st = FileUtils::canonicalize(dir, &root_path); +if (!st.ok()) { +std::cout << "invalid root path:" << FLAGS_root_path +<< ", error: " << st.to_string() << std::endl; +return Status::InternalError("invalid root path"); +} +doris::StorePath path; +auto res = parse_root_path(root_path, &path); +if (res != OLAP_SUCCESS) { +std::cout << "parse root path failed:" << root_path << std::endl; +return Status::InternalError("parse root path failed"); +} + +*ret = new (std::nothrow) DataDir(path.path, path.capacity_bytes, path.storage_medium); +if (*ret == nullptr) { +std::cout << "new data dir failed" << std::endl; +return Status::InternalError("new data dir failed"); +} +st = (*ret)->init(); +if (!st.ok()) { +std::cout << "data_dir load failed" << std::endl; +return Status::InternalError("data_dir load failed"); +} + +return Status::OK(); +} + +void batch_delete_meta(const std::string tablet_file) { +// each line in tablet file indicate a tablet to delete, format is: +// data_dir,tablet_id,schema_hash +// eg: +// /data1/palo.HDD,100010,11212389324 +// /data2/palo.HDD,100010,23049230234 +std::ifstream infile(tablet_file); +std::string line; +int err_num = 0; +int delete_num = 0; +int total_num = 0; +std::unordered_map> dir_map; +while (std::getline(infile, line)) { +total_num++; +vector v = strings::Split(line, ","); +if (v.size() != 3) { +std::cout << "invalid line in tablet_file: " << line << std::endl; +err_num++; +continue; +} +// 1. get dir +std::string dir; +Status st = FileUtils::canonicalize(v[0], &dir); +if (!st.ok()) { +std::cout << "invalid root dir in tablet_file: " << line << std::endl; +err_num++; +continue; +} + +if (dir_map.find(dir) == dir_map.end()) { +// new data dir, init it +DataDir* data_dir_p = nullptr; +Status st = init_data_dir(dir, &data_dir_p); +if (!st.ok()) { +std::cout << "invalid root path:" << FLAGS_root_path +<< ", error: " << st.to_string() << std::endl; +err_num++; +continue; +} +dir_map[dir] = std::unique_ptr(data_dir_p); +std::cout << "get a new data dir: " << dir << std::endl; +} +DataDir* data_dir = dir_map[dir].get(); +if (data_dir == nullptr) { +std::cout << "failed to get data dir: " << line << std::endl; +err_num++; +
[GitHub] [incubator-doris] morningman opened a new pull request #3647: [Enhancement] Add detail msg to show the reason of publish failure.
morningman opened a new pull request #3647: URL: https://github.com/apache/incubator-doris/pull/3647 Add a new column `ErrMsg` to show some errors happen during the transaction process. Can be seen by executing: `SHOW PROC "/transactions/dbId/"`. Currently is only record error happen in publish phase, which can help us to find out which txn is blocked. Fix #3646 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman opened a new issue #3646: [Enhancement][Txn] Add more info to find way publish failed.
morningman opened a new issue #3646: URL: https://github.com/apache/incubator-doris/issues/3646 Sometimes user may see a lot of txn blocked in COMMITTED state, and not be published for a long time. It usually duo to some txns failed to publish on some certain tablets, and then block all other txns behind. When we meet this issue, it hard to see which txn is the root cause and which tablet is failed to be published. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] worker24h opened a new pull request #3645: Add error code into error message
worker24h opened a new pull request #3645: URL: https://github.com/apache/incubator-doris/pull/3645 Add error code into error message This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result
morningman commented on a change in pull request #3584: URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r428444348 ## File path: be/src/runtime/file_result_writer.h ## @@ -0,0 +1,132 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include "runtime/result_writer.h" +#include "runtime/runtime_state.h" +#include "gen_cpp/DataSinks_types.h" + +namespace doris { + +class ExprContext; +class FileWriter; +class ParquetWriterWrapper; +class RowBatch; +class RuntimeProfile; +class TupleRow; + +struct ResultFileOptions { +bool is_local_file; +std::string file_path; +TFileFormatType::type file_format; +std::string column_separator; +std::string line_delimiter; +size_t max_file_size_bytes = 1 * 1024 * 1024 * 1024; // 1GB +std::vector broker_addresses; +std::map broker_properties; Review comment: this map will be assigned from a thrift map object. If use unordered_map here, I have to convert it. And this properties only has few elements, so i think its not a big deal. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] kangkaisen commented on a change in pull request #3638: Supoort utf-8 encoding in instr, locate, locate_pos, lpad, rpad
kangkaisen commented on a change in pull request #3638: URL: https://github.com/apache/incubator-doris/pull/3638#discussion_r428454426 ## File path: be/src/exprs/string_functions.cpp ## @@ -196,28 +196,56 @@ StringVal StringFunctions::lpad( if (str.is_null || len.is_null || pad.is_null || len.val < 0) { return StringVal::null(); } + +size_t str_char_size = 0; +size_t pad_char_size = 0; +size_t byte_pos = 0; +std::vector str_index; +std::vector pad_index; +for (size_t i = 0, char_size = 0; i < str.len; i += char_size) { +char_size = get_utf8_byte_length((unsigned)(str.ptr)[i]); +str_index.push_back(byte_pos); +byte_pos += char_size; +++str_char_size; +} +byte_pos = 0; +for (size_t i = 0, char_size = 0; i < pad.len; i += char_size) { +char_size = get_utf8_byte_length((unsigned)(pad.ptr)[i]); +pad_index.push_back(byte_pos); +byte_pos += char_size; +++pad_char_size; +} + // Corner cases: Shrink the original string, or leave it alone. // TODO: Hive seems to go into an infinite loop if pad.len == 0, // so we should pay attention to Hive's future solution to be compatible. -if (len.val <= str.len || pad.len == 0) { -return StringVal(str.ptr, len.val); +if (len.val <= str_char_size || pad.len == 0) { +if (len.val >= str_index.size()) { +return StringVal::null(); +} +return StringVal(str.ptr, str_index.at(len.val)); } // TODO pengyubing // StringVal result = StringVal::create_temp_string_val(context, len.val); -StringVal result(context, len.val); +int32_t pad_byte_len = 0; +int32_t pad_times = (len.val - str_char_size) / pad_char_size; +int32_t pad_remainder = (len.val - str_char_size) % pad_char_size; +pad_byte_len = pad_times * pad.len; +pad_byte_len += pad_index.at(pad_remainder); Review comment: I know operator[] doesn't have bound-checked. But: 1. Here all code is under your control. couldn't we make sure there is no out of range? 2.If you use at, you should try catch the out of range exception, otherwise, I think no meaning to use at. because operator[] performance is better then at. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] kangkaisen commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result
kangkaisen commented on a change in pull request #3584: URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r428451196 ## File path: docs/zh-CN/administrator-guide/outfile.md ## @@ -0,0 +1,183 @@ +--- +{ +"title": "导出查询结果集", +"language": "zh-CN" +} +--- + + + +# 导出查询结果集 + +本文档介绍如何使用 `SELECT INTO OUTFILE` 命令进行查询结果的导入操作。 + +## 语法 + +`SELECT INTO OUTFILE` 语句可以将查询结果导出到文件中。目前仅支持通过 Broker 进程导出到远端存储,如 HDFS,S3,BOS 上。语法如下 + +``` +query_stmt +INTO OUTFILE "file_path" +[format_as] +WITH BROKER `broker_name` +[broker_properties] +[other_properties] +``` + +* `file_path` + +`file_path` 指向文件存储的路径以及文件前缀。如 `hdfs://path/to/my_file`。 + +最终的文件名将由 `my_file`,文件序号以及文件格式后缀组成。其中文件序号由0开始,数量为文件被分割的数量。如: + +``` +my_file_0.csv +my_file_1.csv +my_file_2.csv +``` + +* `[format_as]` + +``` +FORMAT AS CSV +``` + +指定导出格式。默认为 CSV。 + +* `[broker_properties]` + +``` +("broker_prop_key" = "broker_prop_val", ...) +``` + +Broker 相关的一些参数,如 HDFS 的 认证信息等。具体参阅[Broker 文档](./broker.html)。 + +* `[other_properties]` + +``` +("key1" = "val1", "key2" = "val2", ...) +``` + +其他属性,目前支持以下属性: + +* `column_separator`:列分隔符,仅对 CSV 格式适用。默认为 `\t`。 +* `line_delimiter`:行分隔符,仅对 CSV 格式适用。默认为 `\n`。 +* `max_file_size_bytes`:单个文件的最大大小。默认为 1GB。取值范围在 5MB 到 2GB 之间。超过这个大小的文件将会被切分。 + +1. 示例1 + +将简单查询结果导出到文件 `hdfs:/path/to/result.txt`。指定导出格式为 CSV。使用 `my_broker` 并设置 kerberos 认证信息。指定列分隔符为 `,`,行分隔符为 `\n`。 + +``` +SELECT * FROM tbl +INTO OUTFILE "hdfs:/path/to/result" +FORMAT AS CSV +WITH BROKER "my_broker" +( +"hadoop.security.authentication" = "kerberos", +"kerberos_principal" = "do...@your.com", +"kerberos_keytab" = "/home/doris/my.keytab" +) +PROPERTIELS +( +"column_separator" = ",", +"line_delimiter" = "\n", +"max_file_size_bytes" = "100MB" +); +``` + +最终生成文件如如果不大于 100MB,则为:`result_0.csv`。 + +如果大于 100MB,则可能为 `result_0.csv, result_1.csv, ...`。 + +2. 示例2 + +将 CTE 语句的查询结果导出到文件 `hdfs:/path/to/result.txt`。默认导出格式为 CSV。使用 `my_broker` 并设置 hdfs 高可用信息。使用默认的行列分隔符。 + +``` +WITH +x1 AS +(SELECT k1, k2 FROM tbl1), +x2 AS +(SELECT k3 FROM tbl2) +SELEC k1 FROM x1 UNION SELECT k3 FROM x2 +INTO OUTFILE "hdfs:/path/to/result.txt" +WITH BROKER "my_broker" +( +"username"="user", +"password"="passwd", +"dfs.nameservices" = "my_ha", +"dfs.ha.namenodes.my_ha" = "my_namenode1, my_namenode2", +"dfs.namenode.rpc-address.my_ha.my_namenode1" = "nn1_host:rpc_port", +"dfs.namenode.rpc-address.my_ha.my_namenode2" = "nn2_host:rpc_port", +"dfs.client.failover.proxy.provider" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider" +); +``` + +最终生成文件如如果不大于 1GB,则为:`result_0.csv`。 + +如果大于 1GB,则可能为 `result_0.csv, result_1.csv, ...`。 + +3. 示例3 + +将 UNION 语句的查询结果导出到文件 `bos://bucket/result.txt`。指定导出格式为 PARQUET。使用 `my_broker` 并设置 hdfs 高可用信息。PARQUET 格式无需指定列分割符。 + +``` +SELECT k1 FROM tbl1 UNION SELECT k2 FROM tbl1 +INTO OUTFILE "bos://bucket/result.txt" +FORMAT AS PARQUET +WITH BROKER "my_broker" +( +"bos_endpoint" = "http://bj.bcebos.com";, +"bos_accesskey" = "xx", +"bos_secret_accesskey" = "yy" +) +``` + +最终生成文件如如果不大于 1GB,则为:`result_0.parquet`。 + +如果大于 1GB,则可能为 `result_0.parquet, result_1.parquet, ...`。 + +## 返回结果 + +导出命令为同步命令。命令返回,即表示操作结束。 + +如果正常导出并返回,则结果如下: + +``` +mysql> SELECT * FROM tbl INTO OUTFILE ... Query OK, 10 row affected (5.86 sec) +``` + +其中 `10 row affected` 表示导出的结果集行数。 + +如果执行错误,则会返回错误信息,如: + +``` +mysql> SELECT * FROM tbl INTO OUTFILE ... ERROR 1064 (HY000): errCode = 2, detailMessage = Open broker writer failed ... +``` + +## 注意事项 + +* 查询结果是由单个 BE 节点,单线程导出的。因此导出时间和导出结果集大小正相关。 +* 导出命令不会检查文件及文件路径是否存在。是否会自动创建路径、或是否会覆盖已存在文件,完全由远端存储系统的语义决定。 +* 如果在导出过程中出现错误,可能会有导出文件残留在远端存储系统上。Doris 不会清理这些文件。需要用户手动清理。 +* 导出命令的超时时间同查询的超时时间。可以通过 `SET query_timeout=xxx` 进行设置。 +* 对于结果集为空的查询,依然后产生一个大小为0的文件。 Review comment: 只是记得MapReduce writer 是支持这种配置的,因为这种需求可能是存在的。不过可以后期用户有真实需求了再支持,这个PR就不考虑了。 ## File path: docs/zh-CN/administrator-guide/outfile.md ## @@ -0,0 +1,183 @@ +--- +{ +"title": "导出查询结果集", +"language": "zh-CN" +} +--- + + + +# 导出查询结果
[incubator-doris] branch master updated (792307a -> ef8fd1f)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/incubator-doris.git. from 792307a [CMake] Different cmake build directories for different build types (#3623) (#3629) add ef8fd1f [Load] Support load json-data into Doris by RoutineLoad or StreamLoad (#3553) No new revisions were added by this update. Summary of changes: be/src/exec/CMakeLists.txt | 1 + be/src/exec/broker_reader.cpp | 5 + be/src/exec/broker_reader.h| 1 + be/src/exec/broker_scan_node.cpp | 9 + be/src/exec/broker_scanner.cpp | 2 + be/src/exec/file_reader.h | 12 + be/src/exec/json_scanner.cpp | 550 + be/src/exec/json_scanner.h | 147 ++ be/src/exec/local_file_reader.cpp | 20 +- be/src/exec/local_file_reader.h| 1 + be/src/exec/parquet_scanner.cpp| 1 + be/src/exprs/json_functions.cpp| 216 +--- be/src/exprs/json_functions.h | 62 ++- be/src/http/action/stream_load.cpp | 16 +- be/src/http/http_common.h | 2 + .../runtime/routine_load/data_consumer_group.cpp | 20 +- be/src/runtime/routine_load/kafka_consumer_pipe.h | 5 +- .../routine_load/routine_load_task_executor.cpp| 4 +- be/src/runtime/stream_load/stream_load_context.h | 5 + be/src/runtime/stream_load/stream_load_pipe.h | 37 +- be/test/exec/CMakeLists.txt| 2 + ...quet_scanner_test.cpp => json_scanner_test.cpp} | 195 +++- ...est.cpp => json_scanner_test_with_jsonpath.cpp} | 198 +++- .../exec/test_data/json_scanner/test_array.json| 4 + .../exec/test_data/json_scanner/test_simple2.json | 5 + be/test/exprs/json_function_test.cpp | 95 +++- .../Data Manipulation/ROUTINE LOAD.md | 99 +++- .../Data Manipulation/STREAM LOAD.md | 37 ++ .../Data Manipulation/ROUTINE LOAD.md | 120 - .../Data Manipulation/STREAM LOAD.md | 73 ++- .../org/apache/doris/analysis/CreateFileStmt.java | 11 +- .../doris/analysis/CreateRoutineLoadStmt.java | 53 +- .../doris/load/routineload/KafkaTaskInfo.java | 6 + .../doris/load/routineload/RoutineLoadJob.java | 74 ++- .../apache/doris/planner/StreamLoadScanNode.java | 13 +- .../apache/doris/service/FrontendServiceImpl.java | 10 +- .../java/org/apache/doris/task/StreamLoadTask.java | 47 +- .../doris/planner/StreamLoadPlannerTest.java | 4 +- .../doris/planner/StreamLoadScanNodeTest.java | 14 +- gensrc/thrift/BackendService.thrift| 2 + gensrc/thrift/FrontendService.thrift | 2 + gensrc/thrift/PlanNodes.thrift | 6 +- 42 files changed, 1736 insertions(+), 450 deletions(-) create mode 100644 be/src/exec/json_scanner.cpp create mode 100644 be/src/exec/json_scanner.h copy be/test/exec/{parquet_scanner_test.cpp => json_scanner_test.cpp} (69%) copy be/test/exec/{parquet_scanner_test.cpp => json_scanner_test_with_jsonpath.cpp} (63%) create mode 100644 be/test/exec/test_data/json_scanner/test_array.json create mode 100644 be/test/exec/test_data/json_scanner/test_simple2.json - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman merged pull request #3553: Support load json-data into Doris by RoutineLoad or StreamLoad
morningman merged pull request #3553: URL: https://github.com/apache/incubator-doris/pull/3553 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman commented on pull request #3637: [Memory Engine] Add MemSubTablet, MemTablet, WriteTx, PartialRowBatch
morningman commented on pull request #3637: URL: https://github.com/apache/incubator-doris/pull/3637#issuecomment-631880632 > https://hornad.fei.tuke.sk/~genci/Vyucba/SRBDp/Externisti/Zdroje/physical-database-design-the-database-professionals-guide-to-exploiting-indexes-views-storage-and-more.9780123693891.28944.pdf I see. Thanks for you replay. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman commented on a change in pull request #3637: [Memory Engine] Add MemSubTablet, MemTablet, WriteTx, PartialRowBatch
morningman commented on a change in pull request #3637: URL: https://github.com/apache/incubator-doris/pull/3637#discussion_r428444871 ## File path: be/src/olap/memory/write_tx.h ## @@ -0,0 +1,58 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include "olap/memory/common.h" +#include "olap/memory/partial_row_batch.h" +#include "olap/memory/schema.h" + +namespace doris { +namespace memory { + +class PartialRowBatch; + +// Class for write transaction +// +// Note: Currently it stores all its operations in memory, to make things simple, +// so we can quickly complete the whole create/read/write pipeline. The data structure may +// change as the project evolves. +// +// TODO: add write to/load from WritexTx files in future. +class WriteTx { Review comment: Alright. However, other parts of the code use `txn` as an abbreviation, so it is recommended to unify. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] imay commented on pull request #3626: Update Compiling environment of BE to Support C++14/17
imay commented on pull request #3626: URL: https://github.com/apache/incubator-doris/pull/3626#issuecomment-631879704 @morningman I don't understand you very well. I think we can make this feature in our next release, so we can merge it right now. what's your opinion about this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result
morningman commented on a change in pull request #3584: URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r42809 ## File path: be/src/runtime/file_result_writer.h ## @@ -0,0 +1,132 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include "runtime/result_writer.h" +#include "runtime/runtime_state.h" +#include "gen_cpp/DataSinks_types.h" + +namespace doris { + +class ExprContext; +class FileWriter; +class ParquetWriterWrapper; +class RowBatch; +class RuntimeProfile; +class TupleRow; + +struct ResultFileOptions { +bool is_local_file; +std::string file_path; +TFileFormatType::type file_format; +std::string column_separator; +std::string line_delimiter; +size_t max_file_size_bytes = 1 * 1024 * 1024 * 1024; // 1GB +std::vector broker_addresses; +std::map broker_properties; + +ResultFileOptions(const TResultFileSinkOptions& t_opt) { +file_path = t_opt.file_path; +file_format = t_opt.file_format; +column_separator = t_opt.__isset.column_separator ? t_opt.column_separator : "\t"; +line_delimiter = t_opt.__isset.line_delimiter ? t_opt.line_delimiter : "\n"; +max_file_size_bytes = t_opt.__isset.max_file_size_bytes ? +t_opt.max_file_size_bytes : max_file_size_bytes; + +is_local_file = true; +if (t_opt.__isset.broker_addresses) { +broker_addresses = t_opt.broker_addresses; +is_local_file = false; +} +if (t_opt.__isset.broker_properties) { +broker_properties = t_opt.broker_properties; +} +} +}; + +// write result to file +class FileResultWriter : public ResultWriter { Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result
morningman commented on a change in pull request #3584: URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r428444339 ## File path: be/src/runtime/mysql_result_writer.h ## @@ -0,0 +1,70 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include "runtime/result_writer.h" +#include "runtime/runtime_state.h" + +namespace doris { + +class TupleRow; +class RowBatch; +class ExprContext; +class MysqlRowBuffer; +class BufferControlBlock; +class RuntimeProfile; + +// convert the row batch to mysql protol row +class MysqlResultWriter : public ResultWriter { Review comment: OK ## File path: be/src/runtime/file_result_writer.h ## @@ -0,0 +1,132 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include "runtime/result_writer.h" +#include "runtime/runtime_state.h" +#include "gen_cpp/DataSinks_types.h" + +namespace doris { + +class ExprContext; +class FileWriter; +class ParquetWriterWrapper; +class RowBatch; +class RuntimeProfile; +class TupleRow; + +struct ResultFileOptions { +bool is_local_file; +std::string file_path; +TFileFormatType::type file_format; +std::string column_separator; +std::string line_delimiter; +size_t max_file_size_bytes = 1 * 1024 * 1024 * 1024; // 1GB +std::vector broker_addresses; +std::map broker_properties; Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result
morningman commented on a change in pull request #3584: URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r428443963 ## File path: be/src/runtime/mysql_result_writer.h ## @@ -0,0 +1,70 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include "runtime/result_writer.h" +#include "runtime/runtime_state.h" + +namespace doris { + +class TupleRow; +class RowBatch; +class ExprContext; +class MysqlRowBuffer; +class BufferControlBlock; +class RuntimeProfile; + +// convert the row batch to mysql protol row +class MysqlResultWriter : public ResultWriter { +public: +MysqlResultWriter(BufferControlBlock* sinker, +const std::vector& output_expr_ctxs, +RuntimeProfile* parent_profile); +virtual ~MysqlResultWriter(); + +virtual Status init(RuntimeState* state) override; +// convert one row batch to mysql result and +// append this batch to the result sink +virtual Status append_row_batch(const RowBatch* batch) override; + +virtual Status close() override; + +private: +void _init_profile(); +// convert one tuple row +Status _add_one_row(TupleRow* row); + +private: +// The expressions that are run to create tuples to be written to hbase. Review comment: Removed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result
morningman commented on a change in pull request #3584: URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r428443827 ## File path: docs/zh-CN/administrator-guide/outfile.md ## @@ -0,0 +1,183 @@ +--- +{ +"title": "导出查询结果集", +"language": "zh-CN" +} +--- + + + +# 导出查询结果集 + +本文档介绍如何使用 `SELECT INTO OUTFILE` 命令进行查询结果的导入操作。 + +## 语法 + +`SELECT INTO OUTFILE` 语句可以将查询结果导出到文件中。目前仅支持通过 Broker 进程导出到远端存储,如 HDFS,S3,BOS 上。语法如下 + +``` +query_stmt +INTO OUTFILE "file_path" +[format_as] +WITH BROKER `broker_name` +[broker_properties] +[other_properties] +``` + +* `file_path` + +`file_path` 指向文件存储的路径以及文件前缀。如 `hdfs://path/to/my_file`。 + +最终的文件名将由 `my_file`,文件序号以及文件格式后缀组成。其中文件序号由0开始,数量为文件被分割的数量。如: + +``` +my_file_0.csv +my_file_1.csv +my_file_2.csv +``` + +* `[format_as]` + +``` +FORMAT AS CSV +``` + +指定导出格式。默认为 CSV。 + +* `[broker_properties]` + +``` +("broker_prop_key" = "broker_prop_val", ...) +``` + +Broker 相关的一些参数,如 HDFS 的 认证信息等。具体参阅[Broker 文档](./broker.html)。 + +* `[other_properties]` + +``` +("key1" = "val1", "key2" = "val2", ...) +``` + +其他属性,目前支持以下属性: + +* `column_separator`:列分隔符,仅对 CSV 格式适用。默认为 `\t`。 +* `line_delimiter`:行分隔符,仅对 CSV 格式适用。默认为 `\n`。 +* `max_file_size_bytes`:单个文件的最大大小。默认为 1GB。取值范围在 5MB 到 2GB 之间。超过这个大小的文件将会被切分。 + +1. 示例1 + +将简单查询结果导出到文件 `hdfs:/path/to/result.txt`。指定导出格式为 CSV。使用 `my_broker` 并设置 kerberos 认证信息。指定列分隔符为 `,`,行分隔符为 `\n`。 + +``` +SELECT * FROM tbl +INTO OUTFILE "hdfs:/path/to/result" +FORMAT AS CSV +WITH BROKER "my_broker" +( +"hadoop.security.authentication" = "kerberos", +"kerberos_principal" = "do...@your.com", +"kerberos_keytab" = "/home/doris/my.keytab" +) +PROPERTIELS +( +"column_separator" = ",", +"line_delimiter" = "\n", +"max_file_size_bytes" = "100MB" +); +``` + +最终生成文件如如果不大于 100MB,则为:`result_0.csv`。 + +如果大于 100MB,则可能为 `result_0.csv, result_1.csv, ...`。 + +2. 示例2 + +将 CTE 语句的查询结果导出到文件 `hdfs:/path/to/result.txt`。默认导出格式为 CSV。使用 `my_broker` 并设置 hdfs 高可用信息。使用默认的行列分隔符。 + +``` +WITH +x1 AS +(SELECT k1, k2 FROM tbl1), +x2 AS +(SELECT k3 FROM tbl2) +SELEC k1 FROM x1 UNION SELECT k3 FROM x2 +INTO OUTFILE "hdfs:/path/to/result.txt" +WITH BROKER "my_broker" +( +"username"="user", +"password"="passwd", +"dfs.nameservices" = "my_ha", +"dfs.ha.namenodes.my_ha" = "my_namenode1, my_namenode2", +"dfs.namenode.rpc-address.my_ha.my_namenode1" = "nn1_host:rpc_port", +"dfs.namenode.rpc-address.my_ha.my_namenode2" = "nn2_host:rpc_port", +"dfs.client.failover.proxy.provider" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider" +); +``` + +最终生成文件如如果不大于 1GB,则为:`result_0.csv`。 + +如果大于 1GB,则可能为 `result_0.csv, result_1.csv, ...`。 + +3. 示例3 + +将 UNION 语句的查询结果导出到文件 `bos://bucket/result.txt`。指定导出格式为 PARQUET。使用 `my_broker` 并设置 hdfs 高可用信息。PARQUET 格式无需指定列分割符。 + +``` +SELECT k1 FROM tbl1 UNION SELECT k2 FROM tbl1 +INTO OUTFILE "bos://bucket/result.txt" +FORMAT AS PARQUET +WITH BROKER "my_broker" +( +"bos_endpoint" = "http://bj.bcebos.com";, +"bos_accesskey" = "xx", +"bos_secret_accesskey" = "yy" +) +``` + +最终生成文件如如果不大于 1GB,则为:`result_0.parquet`。 + +如果大于 1GB,则可能为 `result_0.parquet, result_1.parquet, ...`。 + +## 返回结果 + +导出命令为同步命令。命令返回,即表示操作结束。 + +如果正常导出并返回,则结果如下: + +``` +mysql> SELECT * FROM tbl INTO OUTFILE ... Query OK, 10 row affected (5.86 sec) +``` + +其中 `10 row affected` 表示导出的结果集行数。 + +如果执行错误,则会返回错误信息,如: + +``` +mysql> SELECT * FROM tbl INTO OUTFILE ... ERROR 1064 (HY000): errCode = 2, detailMessage = Open broker writer failed ... +``` + +## 注意事项 + +* 查询结果是由单个 BE 节点,单线程导出的。因此导出时间和导出结果集大小正相关。 Review comment: 这里目前只是简单复用了查询返回结果的逻辑。 多线程的支持会麻烦一点。比如需要判断select语句是否包含order by 等信息。如果包含,则只能使用单线程顺序写(因为结果是顺序返回的)。 即使不包含order by,目前查询框架结果还是单线程返回的,最多是改成多线程写文件。但是很多远端系统不支持指定offset写,所以多个线程只能写到多个文件里。也比较麻烦。 不太确定其他系统对于select结果的导出具体是怎么实现的。 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme
[GitHub] [incubator-doris] morningman commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result
morningman commented on a change in pull request #3584: URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r428442688 ## File path: docs/zh-CN/administrator-guide/outfile.md ## @@ -0,0 +1,183 @@ +--- +{ +"title": "导出查询结果集", +"language": "zh-CN" +} +--- + + + +# 导出查询结果集 + +本文档介绍如何使用 `SELECT INTO OUTFILE` 命令进行查询结果的导入操作。 + +## 语法 + +`SELECT INTO OUTFILE` 语句可以将查询结果导出到文件中。目前仅支持通过 Broker 进程导出到远端存储,如 HDFS,S3,BOS 上。语法如下 + +``` +query_stmt +INTO OUTFILE "file_path" +[format_as] +WITH BROKER `broker_name` +[broker_properties] +[other_properties] +``` + +* `file_path` + +`file_path` 指向文件存储的路径以及文件前缀。如 `hdfs://path/to/my_file`。 + +最终的文件名将由 `my_file`,文件序号以及文件格式后缀组成。其中文件序号由0开始,数量为文件被分割的数量。如: + +``` +my_file_0.csv +my_file_1.csv +my_file_2.csv +``` + +* `[format_as]` + +``` +FORMAT AS CSV +``` + +指定导出格式。默认为 CSV。 + +* `[broker_properties]` + +``` +("broker_prop_key" = "broker_prop_val", ...) +``` + +Broker 相关的一些参数,如 HDFS 的 认证信息等。具体参阅[Broker 文档](./broker.html)。 + +* `[other_properties]` + +``` +("key1" = "val1", "key2" = "val2", ...) +``` + +其他属性,目前支持以下属性: + +* `column_separator`:列分隔符,仅对 CSV 格式适用。默认为 `\t`。 +* `line_delimiter`:行分隔符,仅对 CSV 格式适用。默认为 `\n`。 +* `max_file_size_bytes`:单个文件的最大大小。默认为 1GB。取值范围在 5MB 到 2GB 之间。超过这个大小的文件将会被切分。 + +1. 示例1 + +将简单查询结果导出到文件 `hdfs:/path/to/result.txt`。指定导出格式为 CSV。使用 `my_broker` 并设置 kerberos 认证信息。指定列分隔符为 `,`,行分隔符为 `\n`。 + +``` +SELECT * FROM tbl +INTO OUTFILE "hdfs:/path/to/result" +FORMAT AS CSV +WITH BROKER "my_broker" +( +"hadoop.security.authentication" = "kerberos", +"kerberos_principal" = "do...@your.com", +"kerberos_keytab" = "/home/doris/my.keytab" +) +PROPERTIELS +( +"column_separator" = ",", +"line_delimiter" = "\n", +"max_file_size_bytes" = "100MB" +); +``` + +最终生成文件如如果不大于 100MB,则为:`result_0.csv`。 + +如果大于 100MB,则可能为 `result_0.csv, result_1.csv, ...`。 + +2. 示例2 + +将 CTE 语句的查询结果导出到文件 `hdfs:/path/to/result.txt`。默认导出格式为 CSV。使用 `my_broker` 并设置 hdfs 高可用信息。使用默认的行列分隔符。 + +``` +WITH +x1 AS +(SELECT k1, k2 FROM tbl1), +x2 AS +(SELECT k3 FROM tbl2) +SELEC k1 FROM x1 UNION SELECT k3 FROM x2 +INTO OUTFILE "hdfs:/path/to/result.txt" +WITH BROKER "my_broker" +( +"username"="user", +"password"="passwd", +"dfs.nameservices" = "my_ha", +"dfs.ha.namenodes.my_ha" = "my_namenode1, my_namenode2", +"dfs.namenode.rpc-address.my_ha.my_namenode1" = "nn1_host:rpc_port", +"dfs.namenode.rpc-address.my_ha.my_namenode2" = "nn2_host:rpc_port", +"dfs.client.failover.proxy.provider" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider" +); +``` + +最终生成文件如如果不大于 1GB,则为:`result_0.csv`。 + +如果大于 1GB,则可能为 `result_0.csv, result_1.csv, ...`。 + +3. 示例3 + +将 UNION 语句的查询结果导出到文件 `bos://bucket/result.txt`。指定导出格式为 PARQUET。使用 `my_broker` 并设置 hdfs 高可用信息。PARQUET 格式无需指定列分割符。 + +``` +SELECT k1 FROM tbl1 UNION SELECT k2 FROM tbl1 +INTO OUTFILE "bos://bucket/result.txt" +FORMAT AS PARQUET +WITH BROKER "my_broker" +( +"bos_endpoint" = "http://bj.bcebos.com";, +"bos_accesskey" = "xx", +"bos_secret_accesskey" = "yy" +) +``` + +最终生成文件如如果不大于 1GB,则为:`result_0.parquet`。 + +如果大于 1GB,则可能为 `result_0.parquet, result_1.parquet, ...`。 + +## 返回结果 + +导出命令为同步命令。命令返回,即表示操作结束。 + +如果正常导出并返回,则结果如下: + +``` +mysql> SELECT * FROM tbl INTO OUTFILE ... Query OK, 10 row affected (5.86 sec) +``` + +其中 `10 row affected` 表示导出的结果集行数。 + +如果执行错误,则会返回错误信息,如: + +``` +mysql> SELECT * FROM tbl INTO OUTFILE ... ERROR 1064 (HY000): errCode = 2, detailMessage = Open broker writer failed ... +``` + +## 注意事项 + +* 查询结果是由单个 BE 节点,单线程导出的。因此导出时间和导出结果集大小正相关。 +* 导出命令不会检查文件及文件路径是否存在。是否会自动创建路径、或是否会覆盖已存在文件,完全由远端存储系统的语义决定。 +* 如果在导出过程中出现错误,可能会有导出文件残留在远端存储系统上。Doris 不会清理这些文件。需要用户手动清理。 +* 导出命令的超时时间同查询的超时时间。可以通过 `SET query_timeout=xxx` 进行设置。 +* 对于结果集为空的查询,依然后产生一个大小为0的文件。 Review comment: 感觉这种配置意义不大?增加了功能复杂度。 产出一个空文件,至少说明“运行过”,用户可检查结果集是否的确为空。如果无任何产出,无法排除是中间过程出了bug,还是结果集的确为空。 This is an automated message from the Apache Git Service. To respond to the message, pl
[GitHub] [incubator-doris] morningman commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result
morningman commented on a change in pull request #3584: URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r428442267 ## File path: docs/zh-CN/administrator-guide/outfile.md ## @@ -0,0 +1,183 @@ +--- +{ +"title": "导出查询结果集", +"language": "zh-CN" +} +--- + + + +# 导出查询结果集 + +本文档介绍如何使用 `SELECT INTO OUTFILE` 命令进行查询结果的导入操作。 Review comment: OK This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] sduzh closed issue #3623: palo_be should be compiled in different directories for different build types.
sduzh closed issue #3623: URL: https://github.com/apache/incubator-doris/issues/3623 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] vagetablechicken opened a new issue #3644: mem limit in NodeChannel
vagetablechicken opened a new issue #3644: URL: https://github.com/apache/incubator-doris/issues/3644 We do mem limit here. https://github.com/apache/incubator-doris/blob/792307ae54ee9dbe1be8b7e6fe30b4ef90b2cca9/be/src/exec/tablet_sink.cpp#L182-L185 The sink node mem is only reduced by sending batches. add_row() is a sequential process. If we blocked in one channel::add_row(), it's possible that no channel‘s going to reduce the mem, e.g. some channels have no pending batches, some channels are cancelled in the mem check loop. It'll be a endless loop. So we need to consider about cancelled status and pending batches in mem check loop: 1. mem check loop needs to check _cancelled, to break the loop. 1. we allow to add when the node channel has no pending batch, so there will be two situations: 1. if channel already has pending batches, after sent, the mem will reduce. 1. if no pending batches, add_row() can still generate a pending batch, back to situation 1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] decster commented on pull request #3637: [Memory Engine] Add MemSubTablet, MemTablet, WriteTx, PartialRowBatch
decster commented on pull request #3637: URL: https://github.com/apache/incubator-doris/pull/3637#issuecomment-631862702 > Hi @decster, Could you explain more about `MemSubTablet`? In your design doc you said this is for Multi Dimension Cluster. But I am not sure what it is. Basically it's a sub-partition of data in tablet, e.g. a (id, sex, city, uv, pv) tablet can sub-partition by sex&city, so filter about sex/city can only read a fraction of data, the effect is similar as sort+zonemap. I read about this in physical database design book chapter 8: https://hornad.fei.tuke.sk/~genci/Vyucba/SRBDp/Externisti/Zdroje/physical-database-design-the-database-professionals-guide-to-exploiting-indexes-views-storage-and-more.9780123693891.28944.pdf This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] vagetablechicken opened a new pull request #3643: fix mem limit in NodeChannel
vagetablechicken opened a new pull request #3643: URL: https://github.com/apache/incubator-doris/pull/3643 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] hww123 removed a comment on issue #3634: [Schema change] throw error 'Can not change default value'
hww123 removed a comment on issue #3634: URL: https://github.com/apache/incubator-doris/issues/3634#issuecomment-631857164 Follow your advice, but throw error 'Error : errCode = 2, detailMessage = Invalid number format: ' This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] hww123 commented on issue #3634: [Schema change] throw error 'Can not change default value'
hww123 commented on issue #3634: URL: https://github.com/apache/incubator-doris/issues/3634#issuecomment-631857255 > try > `ALTER TABLE test.test1 MODIFY COLUMN field4 bigint(20) default ""` Follow your advice, but throw error 'Error : errCode = 2, detailMessage = Invalid number format: ' This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] hww123 commented on issue #3634: [Schema change] throw error 'Can not change default value'
hww123 commented on issue #3634: URL: https://github.com/apache/incubator-doris/issues/3634#issuecomment-631857164 Follow your advice, but throw error 'Error : errCode = 2, detailMessage = Invalid number format: ' This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] decster commented on a change in pull request #3637: [Memory Engine] Add MemSubTablet, MemTablet, WriteTx, PartialRowBatch
decster commented on a change in pull request #3637: URL: https://github.com/apache/incubator-doris/pull/3637#discussion_r428422061 ## File path: be/src/olap/memory/write_tx.h ## @@ -0,0 +1,58 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include "olap/memory/common.h" +#include "olap/memory/partial_row_batch.h" +#include "olap/memory/schema.h" + +namespace doris { +namespace memory { + +class PartialRowBatch; + +// Class for write transaction +// +// Note: Currently it stores all its operations in memory, to make things simple, +// so we can quickly complete the whole create/read/write pipeline. The data structure may +// change as the project evolves. +// +// TODO: add write to/load from WritexTx files in future. +class WriteTx { Review comment: Both are valid abbreviations. ![img](https://www.allacronyms.com/4871909rbot.png) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] imay commented on issue #3600: large bitmap serialization failed because of overflow of size
imay commented on issue #3600: URL: https://github.com/apache/incubator-doris/issues/3600#issuecomment-631851926 Hi, @kangpinghuang Can you explain the scenario that cause this problem? I think it will cause many problems if the body is so large. And if we are going to support it, we should review all the related places. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] imay commented on a change in pull request #3637: [Memory Engine] Add MemSubTablet, MemTablet, WriteTx, PartialRowBatch
imay commented on a change in pull request #3637: URL: https://github.com/apache/incubator-doris/pull/3637#discussion_r428414008 ## File path: be/src/olap/memory/partial_row_batch.h ## @@ -0,0 +1,140 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include "olap/memory/common.h" +#include "olap/memory/schema.h" + +namespace doris { +namespace memory { + +// A chunk of memory that stores a batch of serialized partial rows +// +// Serialization format for a batch: +// 4 byte len | serialized partial row +// 4 byte len | serialized partial row +// ... +// 4 byte len | serialized partial row +// +// Serialization format for a partial row +// bit vector(se + null) byte size (2 byte) | +// bit vector mark set cells | +// bit vector mark nullable cells' null value | +// 8bit padding +// serialized not null cells +// +// Note: currently only fixed length column types are supported. All length and scalar types store +// in native byte order(little endian in x86-64). +// +// Note: The serialization format is simple, it only provides basic functionalities +// so we can quickly complete the whole create/read/write pipeline. The format may change +// as the project evolves. +class PartialRowBatch { +public: +static const size_t DEFAULT_BYTE_CAPACITY = 1 << 20; +static const size_t DEFAULT_ROW_CAPACIT = 1 << 16; + +PartialRowBatch(scoped_refptr* schema, size_t byte_capacity = DEFAULT_BYTE_CAPACITY, +size_t row_capacity = DEFAULT_ROW_CAPACIT); +~PartialRowBatch(); + +const Schema& schema() const { return *_schema.get(); } + +size_t row_size() const { return _row_offsets.size(); } +size_t row_capacity() const { return _row_capacity; } +size_t byte_size() const { return _bsize; } +size_t byte_capacity() const { return _byte_capacity; } + +const uint8_t* get_row(size_t idx) const; + +private: +friend class PartialRowWriter; +friend class PartialRowReader; +scoped_refptr _schema; +vector _row_offsets; +uint8_t* _data; +size_t _bsize; +size_t _byte_capacity; +size_t _row_capacity; +}; + +// Writer for PartialRowBatch Review comment: Give an example about how to use this class ## File path: be/src/olap/memory/mem_sub_tablet.h ## @@ -0,0 +1,101 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include "olap/memory/common.h" +#include "olap/memory/schema.h" + +namespace doris { +namespace memory { + +class HashIndex; +class ColumnReader; +class PartialRowReader; +class Column; +class ColumnWriter; + +// A MemTablet can contain multiple MemSubTablets (currently only one). +// MemSubTablet hold a HashIndex and a collection of columns. +// It supports single-writer multi-reader concurrently. +class MemSubTablet { Review comment: Better to give a example about how to use this class. ## File path: be/src/olap/memory/partial_row_batch.cpp ## @@ -0,0 +1,267 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/
[GitHub] [incubator-doris] yangzhg commented on a change in pull request #3638: Supoort utf-8 encoding in instr, locate, locate_pos, lpad, rpad
yangzhg commented on a change in pull request #3638: URL: https://github.com/apache/incubator-doris/pull/3638#discussion_r428406815 ## File path: be/src/exprs/string_functions.cpp ## @@ -196,28 +196,56 @@ StringVal StringFunctions::lpad( if (str.is_null || len.is_null || pad.is_null || len.val < 0) { return StringVal::null(); } + +size_t str_char_size = 0; +size_t pad_char_size = 0; +size_t byte_pos = 0; +std::vector str_index; +std::vector pad_index; +for (size_t i = 0, char_size = 0; i < str.len; i += char_size) { +char_size = get_utf8_byte_length((unsigned)(str.ptr)[i]); +str_index.push_back(byte_pos); +byte_pos += char_size; +++str_char_size; +} +byte_pos = 0; +for (size_t i = 0, char_size = 0; i < pad.len; i += char_size) { +char_size = get_utf8_byte_length((unsigned)(pad.ptr)[i]); +pad_index.push_back(byte_pos); +byte_pos += char_size; +++pad_char_size; +} + // Corner cases: Shrink the original string, or leave it alone. // TODO: Hive seems to go into an infinite loop if pad.len == 0, // so we should pay attention to Hive's future solution to be compatible. -if (len.val <= str.len || pad.len == 0) { -return StringVal(str.ptr, len.val); +if (len.val <= str_char_size || pad.len == 0) { +if (len.val >= str_index.size()) { +return StringVal::null(); +} +return StringVal(str.ptr, str_index.at(len.val)); } // TODO pengyubing // StringVal result = StringVal::create_temp_string_val(context, len.val); -StringVal result(context, len.val); +int32_t pad_byte_len = 0; +int32_t pad_times = (len.val - str_char_size) / pad_char_size; +int32_t pad_remainder = (len.val - str_char_size) % pad_char_size; +pad_byte_len = pad_times * pad.len; +pad_byte_len += pad_index.at(pad_remainder); Review comment: operator[] isn't bound-checked if the requested position is out of range , it will hide the problem and make it difficault to find problem when when change the code This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] imay commented on pull request #3642: Fix some unit test failed
imay commented on pull request #3642: URL: https://github.com/apache/incubator-doris/pull/3642#issuecomment-631817163 Hi @yangzhg Can you explain why glog will cause unit test fail in commit message? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] kangkaisen commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result
kangkaisen commented on a change in pull request #3584: URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r428132193 ## File path: docs/zh-CN/administrator-guide/outfile.md ## @@ -0,0 +1,183 @@ +--- +{ +"title": "导出查询结果集", +"language": "zh-CN" +} +--- + + + +# 导出查询结果集 + +本文档介绍如何使用 `SELECT INTO OUTFILE` 命令进行查询结果的导入操作。 Review comment: ```suggestion 本文档介绍如何使用 `SELECT INTO OUTFILE` 命令进行查询结果的导出操作。 ``` ## File path: docs/zh-CN/administrator-guide/outfile.md ## @@ -0,0 +1,183 @@ +--- +{ +"title": "导出查询结果集", +"language": "zh-CN" +} +--- + + + +# 导出查询结果集 + +本文档介绍如何使用 `SELECT INTO OUTFILE` 命令进行查询结果的导入操作。 + +## 语法 + +`SELECT INTO OUTFILE` 语句可以将查询结果导出到文件中。目前仅支持通过 Broker 进程导出到远端存储,如 HDFS,S3,BOS 上。语法如下 + +``` +query_stmt +INTO OUTFILE "file_path" +[format_as] +WITH BROKER `broker_name` +[broker_properties] +[other_properties] +``` + +* `file_path` + +`file_path` 指向文件存储的路径以及文件前缀。如 `hdfs://path/to/my_file`。 + +最终的文件名将由 `my_file`,文件序号以及文件格式后缀组成。其中文件序号由0开始,数量为文件被分割的数量。如: + +``` +my_file_0.csv +my_file_1.csv +my_file_2.csv +``` + +* `[format_as]` + +``` +FORMAT AS CSV +``` + +指定导出格式。默认为 CSV。 + +* `[broker_properties]` + +``` +("broker_prop_key" = "broker_prop_val", ...) +``` + +Broker 相关的一些参数,如 HDFS 的 认证信息等。具体参阅[Broker 文档](./broker.html)。 + +* `[other_properties]` + +``` +("key1" = "val1", "key2" = "val2", ...) +``` + +其他属性,目前支持以下属性: + +* `column_separator`:列分隔符,仅对 CSV 格式适用。默认为 `\t`。 +* `line_delimiter`:行分隔符,仅对 CSV 格式适用。默认为 `\n`。 +* `max_file_size_bytes`:单个文件的最大大小。默认为 1GB。取值范围在 5MB 到 2GB 之间。超过这个大小的文件将会被切分。 + +1. 示例1 + +将简单查询结果导出到文件 `hdfs:/path/to/result.txt`。指定导出格式为 CSV。使用 `my_broker` 并设置 kerberos 认证信息。指定列分隔符为 `,`,行分隔符为 `\n`。 + +``` +SELECT * FROM tbl +INTO OUTFILE "hdfs:/path/to/result" +FORMAT AS CSV +WITH BROKER "my_broker" +( +"hadoop.security.authentication" = "kerberos", +"kerberos_principal" = "do...@your.com", +"kerberos_keytab" = "/home/doris/my.keytab" +) +PROPERTIELS +( +"column_separator" = ",", +"line_delimiter" = "\n", +"max_file_size_bytes" = "100MB" +); +``` + +最终生成文件如如果不大于 100MB,则为:`result_0.csv`。 + +如果大于 100MB,则可能为 `result_0.csv, result_1.csv, ...`。 + +2. 示例2 + +将 CTE 语句的查询结果导出到文件 `hdfs:/path/to/result.txt`。默认导出格式为 CSV。使用 `my_broker` 并设置 hdfs 高可用信息。使用默认的行列分隔符。 + +``` +WITH +x1 AS +(SELECT k1, k2 FROM tbl1), +x2 AS +(SELECT k3 FROM tbl2) +SELEC k1 FROM x1 UNION SELECT k3 FROM x2 +INTO OUTFILE "hdfs:/path/to/result.txt" +WITH BROKER "my_broker" +( +"username"="user", +"password"="passwd", +"dfs.nameservices" = "my_ha", +"dfs.ha.namenodes.my_ha" = "my_namenode1, my_namenode2", +"dfs.namenode.rpc-address.my_ha.my_namenode1" = "nn1_host:rpc_port", +"dfs.namenode.rpc-address.my_ha.my_namenode2" = "nn2_host:rpc_port", +"dfs.client.failover.proxy.provider" = "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider" +); +``` + +最终生成文件如如果不大于 1GB,则为:`result_0.csv`。 + +如果大于 1GB,则可能为 `result_0.csv, result_1.csv, ...`。 + +3. 示例3 + +将 UNION 语句的查询结果导出到文件 `bos://bucket/result.txt`。指定导出格式为 PARQUET。使用 `my_broker` 并设置 hdfs 高可用信息。PARQUET 格式无需指定列分割符。 + +``` +SELECT k1 FROM tbl1 UNION SELECT k2 FROM tbl1 +INTO OUTFILE "bos://bucket/result.txt" +FORMAT AS PARQUET +WITH BROKER "my_broker" +( +"bos_endpoint" = "http://bj.bcebos.com";, +"bos_accesskey" = "xx", +"bos_secret_accesskey" = "yy" +) +``` + +最终生成文件如如果不大于 1GB,则为:`result_0.parquet`。 + +如果大于 1GB,则可能为 `result_0.parquet, result_1.parquet, ...`。 + +## 返回结果 + +导出命令为同步命令。命令返回,即表示操作结束。 + +如果正常导出并返回,则结果如下: + +``` +mysql> SELECT * FROM tbl INTO OUTFILE ... Query OK, 10 row affected (5.86 sec) +``` + +其中 `10 row affected` 表示导出的结果集行数。 + +如果执行错误,则会返回错误信息,如: + +``` +mysql> SELECT * FROM tbl INTO OUTFILE ... ERROR 1064 (HY000): errCode = 2, detailMessage = Open broker writer failed ... +``` + +## 注意事项 + +* 查询结果是由单个 BE 节点,单线程导出的。因此导出时间和导出结果集大小正相关。 +* 导出命令不会检查文件及文件路径是否存在。是否会自动创建路径、或是否会覆盖已存在文件,完全由远端存储系统的语义决定。 +* 如果在导出过程中出现错误,可能会有导出文件残留在远端存储系统上。Doris 不会清理这些文件。需要用户手动清理。 +* 导出命令的超时时间同查询的超时时间。可以通过 `SET quer
[GitHub] [incubator-doris] morningman commented on a change in pull request #3637: [Memory Engine] Add MemSubTablet, MemTablet, WriteTx, PartialRowBatch
morningman commented on a change in pull request #3637: URL: https://github.com/apache/incubator-doris/pull/3637#discussion_r428102043 ## File path: be/src/olap/memory/write_tx.h ## @@ -0,0 +1,58 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +#pragma once + +#include "olap/memory/common.h" +#include "olap/memory/partial_row_batch.h" +#include "olap/memory/schema.h" + +namespace doris { +namespace memory { + +class PartialRowBatch; + +// Class for write transaction +// +// Note: Currently it stores all its operations in memory, to make things simple, +// so we can quickly complete the whole create/read/write pipeline. The data structure may +// change as the project evolves. +// +// TODO: add write to/load from WritexTx files in future. +class WriteTx { Review comment: The abbreviation of transaction is Txn. So better rename to `WriteTxn` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] kangkaisen commented on a change in pull request #3638: Supoort utf-8 encoding in instr, locate, locate_pos, lpad, rpad
kangkaisen commented on a change in pull request #3638: URL: https://github.com/apache/incubator-doris/pull/3638#discussion_r428101660 ## File path: be/src/exprs/string_functions.cpp ## @@ -196,28 +196,56 @@ StringVal StringFunctions::lpad( if (str.is_null || len.is_null || pad.is_null || len.val < 0) { return StringVal::null(); } + +size_t str_char_size = 0; +size_t pad_char_size = 0; +size_t byte_pos = 0; +std::vector str_index; +std::vector pad_index; +for (size_t i = 0, char_size = 0; i < str.len; i += char_size) { +char_size = get_utf8_byte_length((unsigned)(str.ptr)[i]); +str_index.push_back(byte_pos); +byte_pos += char_size; +++str_char_size; +} +byte_pos = 0; +for (size_t i = 0, char_size = 0; i < pad.len; i += char_size) { +char_size = get_utf8_byte_length((unsigned)(pad.ptr)[i]); +pad_index.push_back(byte_pos); +byte_pos += char_size; +++pad_char_size; +} + // Corner cases: Shrink the original string, or leave it alone. // TODO: Hive seems to go into an infinite loop if pad.len == 0, // so we should pay attention to Hive's future solution to be compatible. -if (len.val <= str.len || pad.len == 0) { -return StringVal(str.ptr, len.val); +if (len.val <= str_char_size || pad.len == 0) { +if (len.val >= str_index.size()) { +return StringVal::null(); +} +return StringVal(str.ptr, str_index.at(len.val)); } // TODO pengyubing // StringVal result = StringVal::create_temp_string_val(context, len.val); -StringVal result(context, len.val); +int32_t pad_byte_len = 0; +int32_t pad_times = (len.val - str_char_size) / pad_char_size; +int32_t pad_remainder = (len.val - str_char_size) % pad_char_size; +pad_byte_len = pad_times * pad.len; +pad_byte_len += pad_index.at(pad_remainder); Review comment: Should use operator[] ## File path: be/src/exprs/string_functions.cpp ## @@ -196,28 +196,56 @@ StringVal StringFunctions::lpad( if (str.is_null || len.is_null || pad.is_null || len.val < 0) { return StringVal::null(); } + +size_t str_char_size = 0; +size_t pad_char_size = 0; +size_t byte_pos = 0; +std::vector str_index; +std::vector pad_index; +for (size_t i = 0, char_size = 0; i < str.len; i += char_size) { Review comment: Reduplicative code ``` for (size_t i = 0, char_size = 0; i < str.len; i += char_size) { char_size = get_utf8_byte_length((unsigned)(str.ptr)[i]); str_index.push_back(byte_pos); byte_pos += char_size; ++str_char_size; } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman commented on pull request #3637: [Memory Engine] Add MemSubTablet, MemTablet, WriteTx, PartialRowBatch
morningman commented on pull request #3637: URL: https://github.com/apache/incubator-doris/pull/3637#issuecomment-631543723 Hi @decster, Could you explain more about `MemSubTablet`? In your design doc you said this is for Multi Dimension Cluster. But I am not sure what it is. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman commented on pull request #3626: Update Compiling environment of BE to Support C++14/17
morningman commented on pull request #3626: URL: https://github.com/apache/incubator-doris/pull/3626#issuecomment-631484829 I suggest to merge this PR before we planing to release next version. So we can have a clear cut of the dev environment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman commented on pull request #3638: Supoort utf-8 encoding in instr, locate, locate_pos, lpad, rpad
morningman commented on pull request #3638: URL: https://github.com/apache/incubator-doris/pull/3638#issuecomment-631483236 Good work. Could please add docs for these functions? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[incubator-doris] branch master updated: [CMake] Different cmake build directories for different build types (#3623) (#3629)
This is an automated email from the ASF dual-hosted git repository. morningman pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-doris.git The following commit(s) were added to refs/heads/master by this push: new 792307a [CMake] Different cmake build directories for different build types (#3623) (#3629) 792307a is described below commit 792307ae54ee9dbe1be8b7e6fe30b4ef90b2cca9 Author: sduzh AuthorDate: Wed May 20 21:41:44 2020 +0800 [CMake] Different cmake build directories for different build types (#3623) (#3629) add `CMAKE_BUILD_TYPE` as the suffix of build directory. --- be/CMakeLists.txt | 3 --- build.sh | 12 +++- 2 files changed, 7 insertions(+), 8 deletions(-) diff --git a/be/CMakeLists.txt b/be/CMakeLists.txt index 4f37075..25d0297 100644 --- a/be/CMakeLists.txt +++ b/be/CMakeLists.txt @@ -29,9 +29,6 @@ endif() project(doris CXX C) # set CMAKE_BUILD_TYPE -if (DEFINED ENV{BUILD_TYPE}) -set(CMAKE_BUILD_TYPE $ENV{BUILD_TYPE}) -endif() if (NOT CMAKE_BUILD_TYPE) set(CMAKE_BUILD_TYPE RELEASE) endif() diff --git a/build.sh b/build.sh index 2d208e9..244f628 100755 --- a/build.sh +++ b/build.sh @@ -154,14 +154,16 @@ cd ${DORIS_HOME} # Clean and build Backend if [ ${BUILD_BE} -eq 1 ] ; then -echo "Build Backend" +CMAKE_BUILD_TYPE=${BUILD_TYPE:-Release} +echo "Build Backend: ${CMAKE_BUILD_TYPE}" +CMAKE_BUILD_DIR=${DORIS_HOME}/be/build_${CMAKE_BUILD_TYPE} if [ ${CLEAN} -eq 1 ]; then -rm -rf ${DORIS_HOME}/be/build/ +rm -rf $CMAKE_BUILD_DIR rm -rf ${DORIS_HOME}/be/output/ fi -mkdir -p ${DORIS_HOME}/be/build/ -cd ${DORIS_HOME}/be/build/ -${CMAKE_CMD} -DMAKE_TEST=OFF -DWITH_MYSQL=${WITH_MYSQL} -DWITH_LZO=${WITH_LZO} ../ +mkdir -p ${CMAKE_BUILD_DIR} +cd ${CMAKE_BUILD_DIR} +${CMAKE_CMD} -DCMAKE_BUILD_TYPE=${CMAKE_BUILD_TYPE} -DMAKE_TEST=OFF -DWITH_MYSQL=${WITH_MYSQL} -DWITH_LZO=${WITH_LZO} ../ make -j${PARALLEL} make install cd ${DORIS_HOME} - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman merged pull request #3629: different cmake build directories for different build types
morningman merged pull request #3629: URL: https://github.com/apache/incubator-doris/pull/3629 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman commented on a change in pull request #3641: [Bug] Ignore loading DELETE status tablet error when restarting BE
morningman commented on a change in pull request #3641: URL: https://github.com/apache/incubator-doris/pull/3641#discussion_r428013303 ## File path: be/src/tools/meta_tool.cpp ## @@ -122,13 +126,131 @@ void delete_meta(DataDir* data_dir) { std::cout << "delete meta successfully" << std::endl; } +Status init_data_dir(const std::string dir, DataDir** ret) { +std::string root_path; +Status st = FileUtils::canonicalize(dir, &root_path); +if (!st.ok()) { +std::cout << "invalid root path:" << FLAGS_root_path +<< ", error: " << st.to_string() << std::endl; +return Status::InternalError("invalid root path"); +} +doris::StorePath path; +auto res = parse_root_path(root_path, &path); +if (res != OLAP_SUCCESS) { +std::cout << "parse root path failed:" << root_path << std::endl; +return Status::InternalError("parse root path failed"); +} + +*ret = new (std::nothrow) DataDir(path.path, path.capacity_bytes, path.storage_medium); +if (*ret == nullptr) { +std::cout << "new data dir failed" << std::endl; +return Status::InternalError("new data dir failed"); +} +st = (*ret)->init(); +if (!st.ok()) { +std::cout << "data_dir load failed" << std::endl; +return Status::InternalError("data_dir load failed"); +} + +return Status::OK(); +} + +void batch_delete_meta(const std::string tablet_file) { +// each line in tablet file indicate a tablet to delete, format is: +// data_dir tablet_id schema_hash +// eg: +// /data1/palo.HDD 100010 11212389324 +// /data2/palo.HDD 100010 23049230234 +std::ifstream infile(tablet_file); +std::string line; +int err_num = 0; +int delete_num = 0; +int total_num = 0; +std::unordered_map> dir_map; +while (std::getline(infile, line)) { +total_num++; +vector v = strings::Split(line, " "); Review comment: ok ## File path: be/src/olap/data_dir.cpp ## @@ -709,13 +716,29 @@ OLAPStatus DataDir::load() { return true; }; OLAPStatus load_tablet_status = TabletMetaManager::traverse_headers(_meta, load_tablet_func); -if (failed_tablet_ids.size() != 0 && !config::ignore_load_tablet_failure) { -LOG(FATAL) << "load tablets from header failed, failed tablets size: " << failed_tablet_ids.size(); +if (failed_tablet_ids.size() != 0) { +if (!config::ignore_load_tablet_failure) { Review comment: ok This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] yangzhg commented on a change in pull request #3625: Support more syntax in case when clause
yangzhg commented on a change in pull request #3625: URL: https://github.com/apache/incubator-doris/pull/3625#discussion_r428011341 ## File path: fe/src/main/java/org/apache/doris/analysis/CaseExpr.java ## @@ -191,7 +191,7 @@ public void analyzeImpl(Analyzer analyzer) throws AnalysisException { throw new AnalysisException("Subquery in case-when must return scala type"); } if (whenExpr.contains(Predicates.instanceOf(Subquery.class)) -&& !((hasCaseExpr() && whenExpr instanceof Subquery || whenExpr instanceof BinaryPredicate))) { +&& !((hasCaseExpr() && whenExpr instanceof Subquery || !checkSubquery(whenExpr { throw new AnalysisException("Only support subquery in binary predicate in case statement."); Review comment: Instead of deleting BinaryPredicate, we still only support BinaryPredicate, but just adjust the check of BinaryPredicate from the first layer to support check of children of other expr This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] yangzhg opened a new pull request #3642: Fix some unit test failed
yangzhg opened a new pull request #3642: URL: https://github.com/apache/incubator-doris/pull/3642 Fix some unittest failed due to glog This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] kangpinghuang closed issue #3552: Support Bitmap Intersect
kangpinghuang closed issue #3552: URL: https://github.com/apache/incubator-doris/issues/3552 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] kangpinghuang merged pull request #3571: Support bitmap_intersect
kangpinghuang merged pull request #3571: URL: https://github.com/apache/incubator-doris/pull/3571 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[incubator-doris] branch master updated: Support bitmap_intersect (#3571)
This is an automated email from the ASF dual-hosted git repository. kangpinghuang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-doris.git The following commit(s) were added to refs/heads/master by this push: new 0d66e6b Support bitmap_intersect (#3571) 0d66e6b is described below commit 0d66e6bd1578eba0e9a58cf591f05a83f9e2b334 Author: EmmyMiao87 <522274...@qq.com> AuthorDate: Wed May 20 21:12:02 2020 +0800 Support bitmap_intersect (#3571) * Support bitmap_intersect Support aggregate function Bitmap Intersect, it is mainly used to take intersection of grouped data. The function 'bitmap_intersect(expr)' calculates the intersection of bitmap columns and returns a bitmap object. The defination is following: FunctionName: bitmap_intersect, InputType: bitmap, OutputType: bitmap The scenario is as follows: Query which users satisfy the three tags a, b, and c at the same time. ``` select bitmap_to_string(bitmap_intersect(user_id)) from ( select bitmap_union(user_id) user_id from bitmap_intersect_test where tag in ('a', 'b', 'c') group by tag ) a ``` Closed #3552. * Add docs of bitmap_union and bitmap_intersect * Support null of bitmap_intersect --- be/src/exprs/bitmap_function.cpp | 31 +++ be/src/exprs/bitmap_function.h | 6 ++- be/test/exprs/bitmap_function_test.cpp | 34 docs/.vuepress/sidebar/en.js | 2 + docs/.vuepress/sidebar/zh-CN.js| 2 + .../bitmap-functions/bitmap_intersect.md | 61 + .../sql-functions/bitmap-functions/bitmap_union.md | 58 .../bitmap-functions/bitmap_intersect.md | 62 ++ .../sql-functions/bitmap-functions/bitmap_union.md | 58 .../apache/doris/analysis/FunctionCallExpr.java| 3 +- .../java/org/apache/doris/catalog/FunctionSet.java | 11 11 files changed, 326 insertions(+), 2 deletions(-) diff --git a/be/src/exprs/bitmap_function.cpp b/be/src/exprs/bitmap_function.cpp index 09fdd14..0d9bf25 100644 --- a/be/src/exprs/bitmap_function.cpp +++ b/be/src/exprs/bitmap_function.cpp @@ -302,6 +302,31 @@ void BitmapFunctions::bitmap_union(FunctionContext* ctx, const StringVal& src, S } } +// the dst value could be null +void BitmapFunctions::nullable_bitmap_init(FunctionContext* ctx, StringVal* dst) { +dst->is_null = true; +} + +void BitmapFunctions::bitmap_intersect(FunctionContext* ctx, const StringVal& src, StringVal* dst) { +if (src.is_null) { +return; +} +// if dst is null, the src input is the first value +if (dst->is_null) { +dst->is_null = false; +dst->len = sizeof(BitmapValue); +dst->ptr = (uint8_t*)new BitmapValue((char*) src.ptr); +return; +} +auto dst_bitmap = reinterpret_cast(dst->ptr); +// zero size means the src input is a agg object +if (src.len == 0) { +(*dst_bitmap) &= *reinterpret_cast(src.ptr); +} else { +(*dst_bitmap) &= BitmapValue((char*) src.ptr); +} +} + BigIntVal BitmapFunctions::bitmap_count(FunctionContext* ctx, const StringVal& src) { if (src.is_null) { return 0; @@ -343,12 +368,17 @@ StringVal BitmapFunctions::bitmap_hash(doris_udf::FunctionContext* ctx, const do } StringVal BitmapFunctions::bitmap_serialize(FunctionContext* ctx, const StringVal& src) { +if (src.is_null) { +return src; +} + auto src_bitmap = reinterpret_cast(src.ptr); StringVal result = serialize(ctx, src_bitmap); delete src_bitmap; return result; } +// This is a init function for intersect_count not for bitmap_intersect. template void BitmapFunctions::bitmap_intersect_init(FunctionContext* ctx, StringVal* dst) { dst->is_null = false; @@ -510,6 +540,7 @@ template void BitmapFunctions::bitmap_update_int( template void BitmapFunctions::bitmap_update_int( FunctionContext* ctx, const BigIntVal& src, StringVal* dst); +// this is init function for intersect_count not for bitmap_intersect template void BitmapFunctions::bitmap_intersect_init( FunctionContext* ctx, StringVal* dst); template void BitmapFunctions::bitmap_intersect_init( diff --git a/be/src/exprs/bitmap_function.h b/be/src/exprs/bitmap_function.h index b69fc49..5d86228 100644 --- a/be/src/exprs/bitmap_function.h +++ b/be/src/exprs/bitmap_function.h @@ -51,6 +51,9 @@ public: static BigIntVal bitmap_get_value(FunctionContext* ctx, const StringVal& src); static void bitmap_union(FunctionContext* ctx, const StringVal& src, StringVal* dst); +// the dst value could be null +static void nullable_bitmap_init(FunctionContext* ctx, StringVal* dst); +static void bitmap_intersect(FunctionContext* ctx, const StringVal& src, Strin
[GitHub] [incubator-doris] kangpinghuang commented on pull request #3571: Support bitmap_intersect
kangpinghuang commented on pull request #3571: URL: https://github.com/apache/incubator-doris/pull/3571#issuecomment-631463449 LGTM, +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] HappenLee commented on pull request #3626: Update Compiling environment of BE to Support C++14/17
HappenLee commented on pull request #3626: URL: https://github.com/apache/incubator-doris/pull/3626#issuecomment-631454781 @kangkaisen ok,i will update the document of doris。but about the part of docker image version. maybe we should chose a version and commid id to start use C++17. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] EmmyMiao87 commented on a change in pull request #3571: Support bitmap_intersect
EmmyMiao87 commented on a change in pull request #3571: URL: https://github.com/apache/incubator-doris/pull/3571#discussion_r427945170 ## File path: docs/en/sql-reference/sql-functions/bitmap-functions/bitmap_intersect.md ## @@ -0,0 +1,54 @@ +--- +{ +"title": "bitmap_intersect", +"language": "en" Review comment: Added ## File path: be/src/exprs/bitmap_function.cpp ## @@ -302,6 +302,31 @@ void BitmapFunctions::bitmap_union(FunctionContext* ctx, const StringVal& src, S } } +// this is the read init function for bitmap_intersect +void BitmapFunctions::bitmap_intersect_init_real(FunctionContext* ctx, StringVal* dst) { +dst->is_null = true; Review comment: The initial result bitmap must be null. Otherwise, the intersection between dst and src will be empty. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] WingsGo commented on pull request #2215: Optimize tablet report with expired transaction.
WingsGo commented on pull request #2215: URL: https://github.com/apache/incubator-doris/pull/2215#issuecomment-631388099 The PR did a great job! In our enviorment the optimize logic can reduce the `tablet report` from about 35hours to 15s. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] wutiangan commented on a change in pull request #3625: Support more syntax in case when clause
wutiangan commented on a change in pull request #3625: URL: https://github.com/apache/incubator-doris/pull/3625#discussion_r427884222 ## File path: fe/src/main/java/org/apache/doris/analysis/CaseExpr.java ## @@ -191,7 +191,7 @@ public void analyzeImpl(Analyzer analyzer) throws AnalysisException { throw new AnalysisException("Subquery in case-when must return scala type"); } if (whenExpr.contains(Predicates.instanceOf(Subquery.class)) -&& !((hasCaseExpr() && whenExpr instanceof Subquery || whenExpr instanceof BinaryPredicate))) { +&& !((hasCaseExpr() && whenExpr instanceof Subquery || !checkSubquery(whenExpr { throw new AnalysisException("Only support subquery in binary predicate in case statement."); Review comment: you need to modify exception message for you delete 'BinaryPredicate' This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] yangzhg commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result
yangzhg commented on a change in pull request #3584: URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r427873732 ## File path: be/src/runtime/result_sink.cpp ## @@ -35,6 +36,17 @@ ResultSink::ResultSink(const RowDescriptor& row_desc, const std::vector& : _row_desc(row_desc), _t_output_expr(t_output_expr), _buf_size(buffer_size) { + +if (!sink.__isset.type || sink.type == TResultSinkType::MYSQL_PROTOCAL) { Review comment: It's better to add a comments here This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] yangzhg commented on a change in pull request #3641: [Bug] Ignore loading DELETE status tablet error when restarting BE
yangzhg commented on a change in pull request #3641: URL: https://github.com/apache/incubator-doris/pull/3641#discussion_r427867432 ## File path: be/src/tools/meta_tool.cpp ## @@ -122,13 +126,131 @@ void delete_meta(DataDir* data_dir) { std::cout << "delete meta successfully" << std::endl; } +Status init_data_dir(const std::string dir, DataDir** ret) { +std::string root_path; +Status st = FileUtils::canonicalize(dir, &root_path); +if (!st.ok()) { +std::cout << "invalid root path:" << FLAGS_root_path +<< ", error: " << st.to_string() << std::endl; +return Status::InternalError("invalid root path"); +} +doris::StorePath path; +auto res = parse_root_path(root_path, &path); +if (res != OLAP_SUCCESS) { +std::cout << "parse root path failed:" << root_path << std::endl; +return Status::InternalError("parse root path failed"); +} + +*ret = new (std::nothrow) DataDir(path.path, path.capacity_bytes, path.storage_medium); +if (*ret == nullptr) { +std::cout << "new data dir failed" << std::endl; +return Status::InternalError("new data dir failed"); +} +st = (*ret)->init(); +if (!st.ok()) { +std::cout << "data_dir load failed" << std::endl; +return Status::InternalError("data_dir load failed"); +} + +return Status::OK(); +} + +void batch_delete_meta(const std::string tablet_file) { +// each line in tablet file indicate a tablet to delete, format is: +// data_dir tablet_id schema_hash +// eg: +// /data1/palo.HDD 100010 11212389324 +// /data2/palo.HDD 100010 23049230234 +std::ifstream infile(tablet_file); +std::string line; +int err_num = 0; +int delete_num = 0; +int total_num = 0; +std::unordered_map> dir_map; +while (std::getline(infile, line)) { +total_num++; +vector v = strings::Split(line, " "); Review comment: I think its better to use ',' as separation character, it is more common, and likes a csv file This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] yangzhg commented on a change in pull request #3641: [Bug] Ignore loading DELETE status tablet error when restarting BE
yangzhg commented on a change in pull request #3641: URL: https://github.com/apache/incubator-doris/pull/3641#discussion_r427867432 ## File path: be/src/tools/meta_tool.cpp ## @@ -122,13 +126,131 @@ void delete_meta(DataDir* data_dir) { std::cout << "delete meta successfully" << std::endl; } +Status init_data_dir(const std::string dir, DataDir** ret) { +std::string root_path; +Status st = FileUtils::canonicalize(dir, &root_path); +if (!st.ok()) { +std::cout << "invalid root path:" << FLAGS_root_path +<< ", error: " << st.to_string() << std::endl; +return Status::InternalError("invalid root path"); +} +doris::StorePath path; +auto res = parse_root_path(root_path, &path); +if (res != OLAP_SUCCESS) { +std::cout << "parse root path failed:" << root_path << std::endl; +return Status::InternalError("parse root path failed"); +} + +*ret = new (std::nothrow) DataDir(path.path, path.capacity_bytes, path.storage_medium); +if (*ret == nullptr) { +std::cout << "new data dir failed" << std::endl; +return Status::InternalError("new data dir failed"); +} +st = (*ret)->init(); +if (!st.ok()) { +std::cout << "data_dir load failed" << std::endl; +return Status::InternalError("data_dir load failed"); +} + +return Status::OK(); +} + +void batch_delete_meta(const std::string tablet_file) { +// each line in tablet file indicate a tablet to delete, format is: +// data_dir tablet_id schema_hash +// eg: +// /data1/palo.HDD 100010 11212389324 +// /data2/palo.HDD 100010 23049230234 +std::ifstream infile(tablet_file); +std::string line; +int err_num = 0; +int delete_num = 0; +int total_num = 0; +std::unordered_map> dir_map; +while (std::getline(infile, line)) { +total_num++; +vector v = strings::Split(line, " "); Review comment: I think its better to use ',' as split This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] yangzhg commented on a change in pull request #3641: [Bug] Ignore loading DELETE status tablet error when restarting BE
yangzhg commented on a change in pull request #3641: URL: https://github.com/apache/incubator-doris/pull/3641#discussion_r427863742 ## File path: be/src/olap/data_dir.cpp ## @@ -709,13 +716,29 @@ OLAPStatus DataDir::load() { return true; }; OLAPStatus load_tablet_status = TabletMetaManager::traverse_headers(_meta, load_tablet_func); -if (failed_tablet_ids.size() != 0 && !config::ignore_load_tablet_failure) { -LOG(FATAL) << "load tablets from header failed, failed tablets size: " << failed_tablet_ids.size(); +if (failed_tablet_ids.size() != 0) { +if (!config::ignore_load_tablet_failure) { Review comment: how about log warning first, and then ``` if (!config::ignore_load_tablet_failure) { LOG(FATAL) << "be will shutdown due to ignore_load_tablet_failure is false" } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] EmmyMiao87 commented on pull request #3625: support more syntax in case when clause
EmmyMiao87 commented on pull request #3625: URL: https://github.com/apache/incubator-doris/pull/3625#issuecomment-631320732 Please add issue and change the title of PR This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] vagetablechicken commented on issue #3469: [Memory Engine] Write Pipeline: internal data processing and storage
vagetablechicken commented on issue #3469: URL: https://github.com/apache/incubator-doris/issues/3469#issuecomment-631289944 ### WAL file format WAL file will save multiple partial rows. One write TX's WAL may be split by size => generates multiple {txn_id}_{segment_id}.wal For the related data structs, such as ParitalRowBatch, refer to https://github.com/decster/choco/blob/b644430e4e540da8ce0a6d13f07223c818706425/src/choco/partial_row_batch.h#L11 A WAL file has the same struct as PartialRowBatch, shown below. WAL file size could be determined by config::write_buffer_size. ![image](https://user-images.githubusercontent.com/24697960/82416937-715a8980-9aad-11ea-8fb4-90a3fabb2176.png) The schema part, we could reuse the definition in segment proto. https://github.com/apache/incubator-doris/blob/d2d95bfa841245d734bcd416b4c429ebe98b8321/gensrc/proto/segment_v2.proto#L158 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] stalary closed issue #3494: [BUG] Be all hang up
stalary closed issue #3494: URL: https://github.com/apache/incubator-doris/issues/3494 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org
[GitHub] [incubator-doris] morningman opened a new pull request #3641: [Bug] Ignore loading DELETE status tablet error when restarting BE
morningman opened a new pull request #3641: URL: https://github.com/apache/incubator-doris/pull/3641 Fix: #3640 Also add a `batch delete meta` feature for `meta tool` Fix #3639 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org