[GitHub] [incubator-doris] morningman commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result

2020-05-25 Thread GitBox


morningman commented on a change in pull request #3584:
URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r429788998



##
File path: be/src/runtime/query_statistics.h
##
@@ -71,6 +77,9 @@ class QueryStatistics {
 
 int64_t scan_rows;
 int64_t scan_bytes;
+// number rows returned by query.
+// only set once by result sink when closing.
+int64_t returned_rows;

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result

2020-05-22 Thread GitBox


morningman commented on a change in pull request #3584:
URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r429319147



##
File path: docs/zh-CN/administrator-guide/outfile.md
##
@@ -0,0 +1,183 @@
+---
+{
+"title": "导出查询结果集",
+"language": "zh-CN"
+}
+---
+
+
+
+# 导出查询结果集
+
+本文档介绍如何使用 `SELECT INTO OUTFILE` 命令进行查询结果的导出操作。
+
+## 语法
+
+`SELECT INTO OUTFILE` 语句可以将查询结果导出到文件中。目前仅支持通过 Broker 进程导出到远端存储,如 HDFS,S3,BOS 
上。语法如下
+
+```
+query_stmt
+INTO OUTFILE "file_path"
+[format_as]
+WITH BROKER `broker_name`

Review comment:
   OK,新的语法更新在了proposal里





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result

2020-05-22 Thread GitBox


morningman commented on a change in pull request #3584:
URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r429319050



##
File path: docs/zh-CN/administrator-guide/outfile.md
##
@@ -0,0 +1,183 @@
+---
+{
+"title": "导出查询结果集",
+"language": "zh-CN"
+}
+---
+
+
+
+# 导出查询结果集
+
+本文档介绍如何使用 `SELECT INTO OUTFILE` 命令进行查询结果的导出操作。
+
+## 语法
+
+`SELECT INTO OUTFILE` 语句可以将查询结果导出到文件中。目前仅支持通过 Broker 进程导出到远端存储,如 HDFS,S3,BOS 
上。语法如下
+
+```
+query_stmt
+INTO OUTFILE "file_path"
+[format_as]
+WITH BROKER `broker_name`
+[broker_properties]
+[other_properties]
+```
+
+* `file_path`
+
+`file_path` 指向文件存储的路径以及文件前缀。如 `hdfs://path/to/my_file`。
+
+最终的文件名将由 `my_file`,文件序号以及文件格式后缀组成。其中文件序号由0开始,数量为文件被分割的数量。如:
+
+```
+my_file_0.csv
+my_file_1.csv
+my_file_2.csv
+```
+
+* `[format_as]`
+
+```
+FORMAT AS CSV
+```
+
+指定导出格式。默认为 CSV。
+
+* `[broker_properties]`
+
+```
+("broker_prop_key" = "broker_prop_val", ...)
+``` 
+
+Broker 相关的一些参数,如 HDFS 的 认证信息等。具体参阅[Broker 文档](./broker.html)。
+
+* `[other_properties]`
+
+```
+("key1" = "val1", "key2" = "val2", ...)
+```
+
+其他属性,目前支持以下属性:
+
+* `column_separator`:列分隔符,仅对 CSV 格式适用。默认为 `\t`。
+* `line_delimiter`:行分隔符,仅对 CSV 格式适用。默认为 `\n`。
+* `max_file_size_bytes`:单个文件的最大大小。默认为 1GB。取值范围在 5MB 到 2GB 
之间。超过这个大小的文件将会被切分。
+
+1. 示例1
+
+将简单查询结果导出到文件 `hdfs:/path/to/result.txt`。指定导出格式为 CSV。使用 `my_broker` 并设置 
kerberos 认证信息。指定列分隔符为 `,`,行分隔符为 `\n`。
+
+```
+SELECT * FROM tbl
+INTO OUTFILE "hdfs:/path/to/result"
+FORMAT AS CSV
+WITH BROKER "my_broker"
+(
+"hadoop.security.authentication" = "kerberos",
+"kerberos_principal" = "do...@your.com",
+"kerberos_keytab" = "/home/doris/my.keytab"
+)
+PROPERTIELS
+(
+"column_separator" = ",",
+"line_delimiter" = "\n",
+"max_file_size_bytes" = "100MB"

Review comment:
   OK

##
File path: docs/zh-CN/administrator-guide/outfile.md
##
@@ -0,0 +1,183 @@
+---
+{
+"title": "导出查询结果集",
+"language": "zh-CN"
+}
+---
+
+
+
+# 导出查询结果集
+
+本文档介绍如何使用 `SELECT INTO OUTFILE` 命令进行查询结果的导出操作。
+
+## 语法
+
+`SELECT INTO OUTFILE` 语句可以将查询结果导出到文件中。目前仅支持通过 Broker 进程导出到远端存储,如 HDFS,S3,BOS 
上。语法如下
+
+```
+query_stmt
+INTO OUTFILE "file_path"
+[format_as]
+WITH BROKER `broker_name`
+[broker_properties]
+[other_properties]
+```
+
+* `file_path`
+
+`file_path` 指向文件存储的路径以及文件前缀。如 `hdfs://path/to/my_file`。
+
+最终的文件名将由 `my_file`,文件序号以及文件格式后缀组成。其中文件序号由0开始,数量为文件被分割的数量。如:
+
+```
+my_file_0.csv

Review comment:
   OK

##
File path: docs/zh-CN/administrator-guide/outfile.md
##
@@ -0,0 +1,183 @@
+---
+{
+"title": "导出查询结果集",
+"language": "zh-CN"
+}
+---
+
+
+
+# 导出查询结果集
+
+本文档介绍如何使用 `SELECT INTO OUTFILE` 命令进行查询结果的导出操作。
+
+## 语法
+
+`SELECT INTO OUTFILE` 语句可以将查询结果导出到文件中。目前仅支持通过 Broker 进程导出到远端存储,如 HDFS,S3,BOS 
上。语法如下
+
+```
+query_stmt
+INTO OUTFILE "file_path"
+[format_as]
+WITH BROKER `broker_name`

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result

2020-05-20 Thread GitBox


morningman commented on a change in pull request #3584:
URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r428444348



##
File path: be/src/runtime/file_result_writer.h
##
@@ -0,0 +1,132 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include "runtime/result_writer.h"
+#include "runtime/runtime_state.h"
+#include "gen_cpp/DataSinks_types.h"
+
+namespace doris {
+
+class ExprContext;
+class FileWriter;
+class ParquetWriterWrapper;
+class RowBatch;
+class RuntimeProfile;
+class TupleRow;
+
+struct ResultFileOptions {
+bool is_local_file;
+std::string file_path;
+TFileFormatType::type file_format;
+std::string column_separator;
+std::string line_delimiter;
+size_t max_file_size_bytes = 1 * 1024 * 1024 * 1024; // 1GB
+std::vector broker_addresses;
+std::map broker_properties;

Review comment:
   this map will be assigned from a thrift map object. If use unordered_map 
here, I have to
   convert it.
   And this properties only has few elements, so i think its not a big deal.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result

2020-05-20 Thread GitBox


morningman commented on a change in pull request #3584:
URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r42809



##
File path: be/src/runtime/file_result_writer.h
##
@@ -0,0 +1,132 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include "runtime/result_writer.h"
+#include "runtime/runtime_state.h"
+#include "gen_cpp/DataSinks_types.h"
+
+namespace doris {
+
+class ExprContext;
+class FileWriter;
+class ParquetWriterWrapper;
+class RowBatch;
+class RuntimeProfile;
+class TupleRow;
+
+struct ResultFileOptions {
+bool is_local_file;
+std::string file_path;
+TFileFormatType::type file_format;
+std::string column_separator;
+std::string line_delimiter;
+size_t max_file_size_bytes = 1 * 1024 * 1024 * 1024; // 1GB
+std::vector broker_addresses;
+std::map broker_properties;
+
+ResultFileOptions(const TResultFileSinkOptions& t_opt) {
+file_path = t_opt.file_path;
+file_format = t_opt.file_format;
+column_separator = t_opt.__isset.column_separator ? 
t_opt.column_separator : "\t";
+line_delimiter = t_opt.__isset.line_delimiter ? t_opt.line_delimiter : 
"\n";
+max_file_size_bytes = t_opt.__isset.max_file_size_bytes ?
+t_opt.max_file_size_bytes : max_file_size_bytes;
+
+is_local_file = true;
+if (t_opt.__isset.broker_addresses) {
+broker_addresses = t_opt.broker_addresses;
+is_local_file = false;
+}
+if (t_opt.__isset.broker_properties) {
+broker_properties = t_opt.broker_properties;
+}
+}
+};
+
+// write result to file
+class FileResultWriter : public ResultWriter {

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result

2020-05-20 Thread GitBox


morningman commented on a change in pull request #3584:
URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r428444339



##
File path: be/src/runtime/mysql_result_writer.h
##
@@ -0,0 +1,70 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include "runtime/result_writer.h"
+#include "runtime/runtime_state.h"
+
+namespace doris {
+
+class TupleRow;
+class RowBatch;
+class ExprContext;
+class MysqlRowBuffer;
+class BufferControlBlock;
+class RuntimeProfile;
+
+// convert the row batch to mysql protol row
+class MysqlResultWriter : public ResultWriter {

Review comment:
   OK

##
File path: be/src/runtime/file_result_writer.h
##
@@ -0,0 +1,132 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include "runtime/result_writer.h"
+#include "runtime/runtime_state.h"
+#include "gen_cpp/DataSinks_types.h"
+
+namespace doris {
+
+class ExprContext;
+class FileWriter;
+class ParquetWriterWrapper;
+class RowBatch;
+class RuntimeProfile;
+class TupleRow;
+
+struct ResultFileOptions {
+bool is_local_file;
+std::string file_path;
+TFileFormatType::type file_format;
+std::string column_separator;
+std::string line_delimiter;
+size_t max_file_size_bytes = 1 * 1024 * 1024 * 1024; // 1GB
+std::vector broker_addresses;
+std::map broker_properties;

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result

2020-05-20 Thread GitBox


morningman commented on a change in pull request #3584:
URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r428443963



##
File path: be/src/runtime/mysql_result_writer.h
##
@@ -0,0 +1,70 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include "runtime/result_writer.h"
+#include "runtime/runtime_state.h"
+
+namespace doris {
+
+class TupleRow;
+class RowBatch;
+class ExprContext;
+class MysqlRowBuffer;
+class BufferControlBlock;
+class RuntimeProfile;
+
+// convert the row batch to mysql protol row
+class MysqlResultWriter : public ResultWriter {
+public:
+MysqlResultWriter(BufferControlBlock* sinker,
+const std::vector& output_expr_ctxs,
+RuntimeProfile* parent_profile);
+virtual ~MysqlResultWriter();
+
+virtual Status init(RuntimeState* state) override;
+// convert one row batch to mysql result and
+// append this batch to the result sink
+virtual Status append_row_batch(const RowBatch* batch) override;
+
+virtual Status close() override;
+
+private:
+void _init_profile();
+// convert one tuple row
+Status _add_one_row(TupleRow* row);
+
+private:
+// The expressions that are run to create tuples to be written to hbase.

Review comment:
   Removed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result

2020-05-20 Thread GitBox


morningman commented on a change in pull request #3584:
URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r428443827



##
File path: docs/zh-CN/administrator-guide/outfile.md
##
@@ -0,0 +1,183 @@
+---
+{
+"title": "导出查询结果集",
+"language": "zh-CN"
+}
+---
+
+
+
+# 导出查询结果集
+
+本文档介绍如何使用 `SELECT INTO OUTFILE` 命令进行查询结果的导入操作。
+
+## 语法
+
+`SELECT INTO OUTFILE` 语句可以将查询结果导出到文件中。目前仅支持通过 Broker 进程导出到远端存储,如 HDFS,S3,BOS 
上。语法如下
+
+```
+query_stmt
+INTO OUTFILE "file_path"
+[format_as]
+WITH BROKER `broker_name`
+[broker_properties]
+[other_properties]
+```
+
+* `file_path`
+
+`file_path` 指向文件存储的路径以及文件前缀。如 `hdfs://path/to/my_file`。
+
+最终的文件名将由 `my_file`,文件序号以及文件格式后缀组成。其中文件序号由0开始,数量为文件被分割的数量。如:
+
+```
+my_file_0.csv
+my_file_1.csv
+my_file_2.csv
+```
+
+* `[format_as]`
+
+```
+FORMAT AS CSV
+```
+
+指定导出格式。默认为 CSV。
+
+* `[broker_properties]`
+
+```
+("broker_prop_key" = "broker_prop_val", ...)
+``` 
+
+Broker 相关的一些参数,如 HDFS 的 认证信息等。具体参阅[Broker 文档](./broker.html)。
+
+* `[other_properties]`
+
+```
+("key1" = "val1", "key2" = "val2", ...)
+```
+
+其他属性,目前支持以下属性:
+
+* `column_separator`:列分隔符,仅对 CSV 格式适用。默认为 `\t`。
+* `line_delimiter`:行分隔符,仅对 CSV 格式适用。默认为 `\n`。
+* `max_file_size_bytes`:单个文件的最大大小。默认为 1GB。取值范围在 5MB 到 2GB 
之间。超过这个大小的文件将会被切分。
+
+1. 示例1
+
+将简单查询结果导出到文件 `hdfs:/path/to/result.txt`。指定导出格式为 CSV。使用 `my_broker` 并设置 
kerberos 认证信息。指定列分隔符为 `,`,行分隔符为 `\n`。
+
+```
+SELECT * FROM tbl
+INTO OUTFILE "hdfs:/path/to/result"
+FORMAT AS CSV
+WITH BROKER "my_broker"
+(
+"hadoop.security.authentication" = "kerberos",
+"kerberos_principal" = "do...@your.com",
+"kerberos_keytab" = "/home/doris/my.keytab"
+)
+PROPERTIELS
+(
+"column_separator" = ",",
+"line_delimiter" = "\n",
+"max_file_size_bytes" = "100MB"
+);
+```
+
+最终生成文件如如果不大于 100MB,则为:`result_0.csv`。
+
+如果大于 100MB,则可能为 `result_0.csv, result_1.csv, ...`。
+
+2. 示例2
+
+将 CTE 语句的查询结果导出到文件 `hdfs:/path/to/result.txt`。默认导出格式为 CSV。使用 `my_broker` 
并设置 hdfs 高可用信息。使用默认的行列分隔符。
+
+```
+WITH
+x1 AS
+(SELECT k1, k2 FROM tbl1),
+x2 AS
+(SELECT k3 FROM tbl2)
+SELEC k1 FROM x1 UNION SELECT k3 FROM x2
+INTO OUTFILE "hdfs:/path/to/result.txt"
+WITH BROKER "my_broker"
+(
+"username"="user",
+"password"="passwd",
+"dfs.nameservices" = "my_ha",
+"dfs.ha.namenodes.my_ha" = "my_namenode1, my_namenode2",
+"dfs.namenode.rpc-address.my_ha.my_namenode1" = "nn1_host:rpc_port",
+"dfs.namenode.rpc-address.my_ha.my_namenode2" = "nn2_host:rpc_port",
+"dfs.client.failover.proxy.provider" = 
"org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
+);
+```
+
+最终生成文件如如果不大于 1GB,则为:`result_0.csv`。
+
+如果大于 1GB,则可能为 `result_0.csv, result_1.csv, ...`。
+
+3. 示例3
+
+将 UNION 语句的查询结果导出到文件 `bos://bucket/result.txt`。指定导出格式为 PARQUET。使用 
`my_broker` 并设置 hdfs 高可用信息。PARQUET 格式无需指定列分割符。
+
+```
+SELECT k1 FROM tbl1 UNION SELECT k2 FROM tbl1
+INTO OUTFILE "bos://bucket/result.txt"
+FORMAT AS PARQUET
+WITH BROKER "my_broker"
+(
+"bos_endpoint" = "http://bj.bcebos.com;,
+"bos_accesskey" = "xx",
+"bos_secret_accesskey" = "yy"
+)
+```
+
+最终生成文件如如果不大于 1GB,则为:`result_0.parquet`。
+
+如果大于 1GB,则可能为 `result_0.parquet, result_1.parquet, ...`。
+
+## 返回结果
+
+导出命令为同步命令。命令返回,即表示操作结束。
+
+如果正常导出并返回,则结果如下:
+
+```
+mysql> SELECT * FROM tbl INTO OUTFILE ...  


  Query OK, 10 row 
affected (5.86 sec)
+```
+
+其中 `10 row affected` 表示导出的结果集行数。
+
+如果执行错误,则会返回错误信息,如:
+
+```
+mysql> SELECT * FROM tbl INTO OUTFILE ...  


ERROR 1064 (HY000): 
errCode = 2, detailMessage = Open broker writer failed ...
+```
+
+## 注意事项
+
+* 查询结果是由单个 BE 节点,单线程导出的。因此导出时间和导出结果集大小正相关。

Review comment:
   这里目前只是简单复用了查询返回结果的逻辑。
   多线程的支持会麻烦一点。比如需要判断select语句是否包含order by 等信息。如果包含,则只能使用单线程顺序写(因为结果是顺序返回的)。
   即使不包含order 
by,目前查询框架结果还是单线程返回的,最多是改成多线程写文件。但是很多远端系统不支持指定offset写,所以多个线程只能写到多个文件里。也比较麻烦。
   
   不太确定其他系统对于select结果的导出具体是怎么实现的。





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific 

[GitHub] [incubator-doris] morningman commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result

2020-05-20 Thread GitBox


morningman commented on a change in pull request #3584:
URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r428442688



##
File path: docs/zh-CN/administrator-guide/outfile.md
##
@@ -0,0 +1,183 @@
+---
+{
+"title": "导出查询结果集",
+"language": "zh-CN"
+}
+---
+
+
+
+# 导出查询结果集
+
+本文档介绍如何使用 `SELECT INTO OUTFILE` 命令进行查询结果的导入操作。
+
+## 语法
+
+`SELECT INTO OUTFILE` 语句可以将查询结果导出到文件中。目前仅支持通过 Broker 进程导出到远端存储,如 HDFS,S3,BOS 
上。语法如下
+
+```
+query_stmt
+INTO OUTFILE "file_path"
+[format_as]
+WITH BROKER `broker_name`
+[broker_properties]
+[other_properties]
+```
+
+* `file_path`
+
+`file_path` 指向文件存储的路径以及文件前缀。如 `hdfs://path/to/my_file`。
+
+最终的文件名将由 `my_file`,文件序号以及文件格式后缀组成。其中文件序号由0开始,数量为文件被分割的数量。如:
+
+```
+my_file_0.csv
+my_file_1.csv
+my_file_2.csv
+```
+
+* `[format_as]`
+
+```
+FORMAT AS CSV
+```
+
+指定导出格式。默认为 CSV。
+
+* `[broker_properties]`
+
+```
+("broker_prop_key" = "broker_prop_val", ...)
+``` 
+
+Broker 相关的一些参数,如 HDFS 的 认证信息等。具体参阅[Broker 文档](./broker.html)。
+
+* `[other_properties]`
+
+```
+("key1" = "val1", "key2" = "val2", ...)
+```
+
+其他属性,目前支持以下属性:
+
+* `column_separator`:列分隔符,仅对 CSV 格式适用。默认为 `\t`。
+* `line_delimiter`:行分隔符,仅对 CSV 格式适用。默认为 `\n`。
+* `max_file_size_bytes`:单个文件的最大大小。默认为 1GB。取值范围在 5MB 到 2GB 
之间。超过这个大小的文件将会被切分。
+
+1. 示例1
+
+将简单查询结果导出到文件 `hdfs:/path/to/result.txt`。指定导出格式为 CSV。使用 `my_broker` 并设置 
kerberos 认证信息。指定列分隔符为 `,`,行分隔符为 `\n`。
+
+```
+SELECT * FROM tbl
+INTO OUTFILE "hdfs:/path/to/result"
+FORMAT AS CSV
+WITH BROKER "my_broker"
+(
+"hadoop.security.authentication" = "kerberos",
+"kerberos_principal" = "do...@your.com",
+"kerberos_keytab" = "/home/doris/my.keytab"
+)
+PROPERTIELS
+(
+"column_separator" = ",",
+"line_delimiter" = "\n",
+"max_file_size_bytes" = "100MB"
+);
+```
+
+最终生成文件如如果不大于 100MB,则为:`result_0.csv`。
+
+如果大于 100MB,则可能为 `result_0.csv, result_1.csv, ...`。
+
+2. 示例2
+
+将 CTE 语句的查询结果导出到文件 `hdfs:/path/to/result.txt`。默认导出格式为 CSV。使用 `my_broker` 
并设置 hdfs 高可用信息。使用默认的行列分隔符。
+
+```
+WITH
+x1 AS
+(SELECT k1, k2 FROM tbl1),
+x2 AS
+(SELECT k3 FROM tbl2)
+SELEC k1 FROM x1 UNION SELECT k3 FROM x2
+INTO OUTFILE "hdfs:/path/to/result.txt"
+WITH BROKER "my_broker"
+(
+"username"="user",
+"password"="passwd",
+"dfs.nameservices" = "my_ha",
+"dfs.ha.namenodes.my_ha" = "my_namenode1, my_namenode2",
+"dfs.namenode.rpc-address.my_ha.my_namenode1" = "nn1_host:rpc_port",
+"dfs.namenode.rpc-address.my_ha.my_namenode2" = "nn2_host:rpc_port",
+"dfs.client.failover.proxy.provider" = 
"org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider"
+);
+```
+
+最终生成文件如如果不大于 1GB,则为:`result_0.csv`。
+
+如果大于 1GB,则可能为 `result_0.csv, result_1.csv, ...`。
+
+3. 示例3
+
+将 UNION 语句的查询结果导出到文件 `bos://bucket/result.txt`。指定导出格式为 PARQUET。使用 
`my_broker` 并设置 hdfs 高可用信息。PARQUET 格式无需指定列分割符。
+
+```
+SELECT k1 FROM tbl1 UNION SELECT k2 FROM tbl1
+INTO OUTFILE "bos://bucket/result.txt"
+FORMAT AS PARQUET
+WITH BROKER "my_broker"
+(
+"bos_endpoint" = "http://bj.bcebos.com;,
+"bos_accesskey" = "xx",
+"bos_secret_accesskey" = "yy"
+)
+```
+
+最终生成文件如如果不大于 1GB,则为:`result_0.parquet`。
+
+如果大于 1GB,则可能为 `result_0.parquet, result_1.parquet, ...`。
+
+## 返回结果
+
+导出命令为同步命令。命令返回,即表示操作结束。
+
+如果正常导出并返回,则结果如下:
+
+```
+mysql> SELECT * FROM tbl INTO OUTFILE ...  


  Query OK, 10 row 
affected (5.86 sec)
+```
+
+其中 `10 row affected` 表示导出的结果集行数。
+
+如果执行错误,则会返回错误信息,如:
+
+```
+mysql> SELECT * FROM tbl INTO OUTFILE ...  


ERROR 1064 (HY000): 
errCode = 2, detailMessage = Open broker writer failed ...
+```
+
+## 注意事项
+
+* 查询结果是由单个 BE 节点,单线程导出的。因此导出时间和导出结果集大小正相关。
+* 导出命令不会检查文件及文件路径是否存在。是否会自动创建路径、或是否会覆盖已存在文件,完全由远端存储系统的语义决定。
+* 如果在导出过程中出现错误,可能会有导出文件残留在远端存储系统上。Doris 不会清理这些文件。需要用户手动清理。
+* 导出命令的超时时间同查询的超时时间。可以通过 `SET query_timeout=xxx` 进行设置。
+* 对于结果集为空的查询,依然后产生一个大小为0的文件。

Review comment:
   感觉这种配置意义不大?增加了功能复杂度。
   产出一个空文件,至少说明“运行过”,用户可检查结果集是否的确为空。如果无任何产出,无法排除是中间过程出了bug,还是结果集的确为空。





This is an automated message from the Apache Git Service.
To respond to the message, 

[GitHub] [incubator-doris] morningman commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result

2020-05-20 Thread GitBox


morningman commented on a change in pull request #3584:
URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r428442267



##
File path: docs/zh-CN/administrator-guide/outfile.md
##
@@ -0,0 +1,183 @@
+---
+{
+"title": "导出查询结果集",
+"language": "zh-CN"
+}
+---
+
+
+
+# 导出查询结果集
+
+本文档介绍如何使用 `SELECT INTO OUTFILE` 命令进行查询结果的导入操作。

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result

2020-05-19 Thread GitBox


morningman commented on a change in pull request #3584:
URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r427307837



##
File path: docs/zh-CN/administrator-guide/outfile.md
##
@@ -0,0 +1,183 @@
+---
+{
+"title": "导出查询结果集",
+"language": "zh-CN"
+}
+---
+
+
+
+# 导出查询结果集
+
+本文档介绍如何使用 `SELECT INTO OUTFILE` 命令进行查询结果的导入操作。
+
+## 语法
+
+`SELECT INTO OUTFILE` 语句可以将查询结果导出到文件中。目前仅支持通过 Broker 进程导出到远端存储,如 HDFS,S3,BOS 
上。语法如下
+
+```
+query_stmt
+INTO OUFILE "file_path"

Review comment:
   OK





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result

2020-05-18 Thread GitBox


morningman commented on a change in pull request #3584:
URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r427019024



##
File path: docs/zh-CN/administrator-guide/outfile.md
##
@@ -0,0 +1,183 @@
+---
+{
+"title": "查询结果集导出",

Review comment:
   ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result

2020-05-18 Thread GitBox


morningman commented on a change in pull request #3584:
URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r427018439



##
File path: be/src/runtime/result_sink.cpp
##
@@ -35,6 +36,17 @@ ResultSink::ResultSink(const RowDescriptor& row_desc, const 
std::vector&
 : _row_desc(row_desc),
   _t_output_expr(t_output_expr),
   _buf_size(buffer_size) {
+
+if (!sink.__isset.type || sink.type == TResultSinkType::MYSQL_PROTOCAL) {

Review comment:
   `sink.type` is a newly added field of `TResultSink`.
   So for the request from the old FE planner, this field is not set.
   And for the request from the new FE planner, this field may be set to 
`MYSQL_PROTOCAL`.

##
File path: build.sh
##
@@ -42,6 +42,7 @@ if [[ ! -f ${DORIS_THIRDPARTY}/installed/lib/libs2.a ]]; then
 fi
 
 PARALLEL=$[$(nproc)/4+1]
+PARALLEL=12

Review comment:
   Removed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result

2020-05-18 Thread GitBox


morningman commented on a change in pull request #3584:
URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r427018048



##
File path: be/src/runtime/mysql_result_writer.h
##
@@ -0,0 +1,71 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include "runtime/result_writer.h"
+#include "runtime/runtime_state.h"
+
+namespace doris {
+
+class TupleRow;
+class RowBatch;
+class ExprContext;
+class MysqlRowBuffer;
+class BufferControlBlock;
+class RuntimeProfile;
+
+// convert the row batch to mysql protol row
+class MysqlResultWriter : public ResultWriter {
+public:
+MysqlResultWriter(BufferControlBlock* sinker,
+const std::vector& output_expr_ctxs,
+RuntimeProfile* parent_profile);
+virtual ~MysqlResultWriter();
+
+virtual Status init(RuntimeState* state) override;
+// convert one row batch to mysql result and
+// append this batch to the result sink
+virtual Status append_row_batch(const RowBatch* batch) override;
+
+virtual Status close() override;
+
+private:
+void _init_profile();
+
+private:
+// convert one tuple row
+Status add_one_row(TupleRow* row);
+
+// The expressions that are run to create tuples to be written to hbase.
+BufferControlBlock* _sinker;
+const std::vector& _output_expr_ctxs;
+MysqlRowBuffer* _row_buffer;

Review comment:
   I didn't change the logic here. This is just copied from the old 
`result_writer`.
   Since it works well for a long time, I think better not changing it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result

2020-05-18 Thread GitBox


morningman commented on a change in pull request #3584:
URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r427017714



##
File path: be/src/runtime/mysql_result_writer.h
##
@@ -0,0 +1,71 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include "runtime/result_writer.h"
+#include "runtime/runtime_state.h"
+
+namespace doris {
+
+class TupleRow;
+class RowBatch;
+class ExprContext;
+class MysqlRowBuffer;
+class BufferControlBlock;
+class RuntimeProfile;
+
+// convert the row batch to mysql protol row
+class MysqlResultWriter : public ResultWriter {
+public:
+MysqlResultWriter(BufferControlBlock* sinker,
+const std::vector& output_expr_ctxs,
+RuntimeProfile* parent_profile);
+virtual ~MysqlResultWriter();
+
+virtual Status init(RuntimeState* state) override;
+// convert one row batch to mysql result and
+// append this batch to the result sink
+virtual Status append_row_batch(const RowBatch* batch) override;
+
+virtual Status close() override;
+
+private:
+void _init_profile();
+
+private:
+// convert one tuple row
+Status add_one_row(TupleRow* row);

Review comment:
   I moved the `add_on_row` method to the first `private.`
   first `private` is for private methods. second is for members.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result

2020-05-18 Thread GitBox


morningman commented on a change in pull request #3584:
URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r427017396



##
File path: be/src/runtime/file_result_writer.cpp
##
@@ -0,0 +1,319 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "runtime/file_result_writer.h"
+
+#include "exec/broker_writer.h"
+#include "exec/local_file_writer.h"
+#include "exec/parquet_writer.h"
+#include "exprs/expr.h"
+#include "runtime/primitive_type.h"
+#include "runtime/row_batch.h"
+#include "runtime/tuple_row.h"
+#include "runtime/runtime_state.h"
+#include "util/types.h"
+#include "util/date_func.h"
+#include "util/uid_util.h"
+
+#include "gen_cpp/PaloInternalService_types.h"
+
+namespace doris {
+
+const size_t FileResultWriter::OUTSTREAM_BUFFER_SIZE_BYTES = 1024 * 1024;
+
+FileResultWriter::FileResultWriter(
+const ResultFileOptions* file_opts,
+const std::vector& output_expr_ctxs,
+RuntimeProfile* parent_profile) :
+_file_opts(file_opts),
+_output_expr_ctxs(output_expr_ctxs),
+_parent_profile(parent_profile) {
+}
+
+FileResultWriter::~FileResultWriter() {
+close();
+}
+
+Status FileResultWriter::init(RuntimeState* state) {
+_state = state;
+_init_profile();
+
+RETURN_IF_ERROR(_create_file_writer());
+return Status::OK();
+}
+
+void FileResultWriter::_init_profile() {
+RuntimeProfile* profile = 
_parent_profile->create_child("FileResultWriter", true, true);
+_append_row_batch_timer = ADD_TIMER(profile, "AppendBatchTime");
+_convert_tuple_timer = ADD_CHILD_TIMER(profile, "TupleConvertTime", 
"AppendBatchTime");
+_file_write_timer = ADD_CHILD_TIMER(profile, "FileWriteTime", 
"AppendBatchTime");
+_writer_close_timer = ADD_TIMER(profile, "FileWriterCloseTime");
+_written_rows_counter = ADD_COUNTER(profile, "NumWrittenRows", 
TUnit::UNIT);
+_written_data_bytes = ADD_COUNTER(profile, "WrittenDataBytes", 
TUnit::BYTES);
+}
+
+Status FileResultWriter::_create_file_writer() {
+std::string file_name = _get_next_file_name();
+if (_file_opts->is_local_file) {
+_file_writer = new LocalFileWriter(file_name, 0 /* start offset */);
+} else {
+_file_writer = new BrokerWriter(_state->exec_env(),
+_file_opts->broker_addresses,
+_file_opts->broker_properties,
+file_name,
+0 /*start offset*/);
+}
+RETURN_IF_ERROR(_file_writer->open());
+
+switch (_file_opts->file_format) {
+case TFileFormatType::FORMAT_CSV_PLAIN:
+// just use file writer is enough
+break;
+case TFileFormatType::FORMAT_PARQUET:
+_parquet_writer = new ParquetWriterWrapper(_file_writer, 
_output_expr_ctxs);
+break;
+default:
+return Status::InternalError(strings::Substitute("unsupport file 
format: $0", _file_opts->file_format));
+}
+LOG(INFO) << "create file for exporting query result. file name: " << 
file_name
+<< ". query id: " << print_id(_state->query_id());
+return Status::OK();
+}
+
+// file name format as: my_prefix_0.csv
+std::string FileResultWriter::_get_next_file_name() {
+std::stringstream ss;
+ss << _file_opts->file_path << "_" << (_file_idx++) << "." << 
_file_format_to_name();
+return ss.str();
+}
+
+std::string FileResultWriter::_file_format_to_name() {
+switch (_file_opts->file_format) {
+case TFileFormatType::FORMAT_CSV_PLAIN:
+return "csv";
+case TFileFormatType::FORMAT_PARQUET:
+return "parquet";
+default:
+return "unknown";
+}
+}
+
+Status FileResultWriter::append_row_batch(const RowBatch* batch) {
+if (NULL == batch || 0 == batch->num_rows()) {
+return Status::OK();
+}
+
+SCOPED_TIMER(_append_row_batch_timer);
+if (_parquet_writer != nullptr) {
+RETURN_IF_ERROR(_parquet_writer->write(*batch));
+} else {
+RETURN_IF_ERROR(_write_csv_file(*batch));
+}
+
+_written_rows += batch->num_rows();
+return Status::OK();
+}
+
+Status FileResultWriter::_write_csv_file(const RowBatch& batch) {
+int num_rows = 

[GitHub] [incubator-doris] morningman commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result

2020-05-18 Thread GitBox


morningman commented on a change in pull request #3584:
URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r427017242



##
File path: be/src/runtime/file_result_writer.cpp
##
@@ -0,0 +1,319 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "runtime/file_result_writer.h"
+
+#include "exec/broker_writer.h"
+#include "exec/local_file_writer.h"
+#include "exec/parquet_writer.h"
+#include "exprs/expr.h"
+#include "runtime/primitive_type.h"
+#include "runtime/row_batch.h"
+#include "runtime/tuple_row.h"
+#include "runtime/runtime_state.h"
+#include "util/types.h"
+#include "util/date_func.h"
+#include "util/uid_util.h"
+
+#include "gen_cpp/PaloInternalService_types.h"
+
+namespace doris {
+
+const size_t FileResultWriter::OUTSTREAM_BUFFER_SIZE_BYTES = 1024 * 1024;
+
+FileResultWriter::FileResultWriter(
+const ResultFileOptions* file_opts,
+const std::vector& output_expr_ctxs,
+RuntimeProfile* parent_profile) :
+_file_opts(file_opts),
+_output_expr_ctxs(output_expr_ctxs),
+_parent_profile(parent_profile) {
+}
+
+FileResultWriter::~FileResultWriter() {
+close();
+}
+
+Status FileResultWriter::init(RuntimeState* state) {
+_state = state;
+_init_profile();
+
+RETURN_IF_ERROR(_create_file_writer());
+return Status::OK();
+}
+
+void FileResultWriter::_init_profile() {
+RuntimeProfile* profile = 
_parent_profile->create_child("FileResultWriter", true, true);
+_append_row_batch_timer = ADD_TIMER(profile, "AppendBatchTime");
+_convert_tuple_timer = ADD_CHILD_TIMER(profile, "TupleConvertTime", 
"AppendBatchTime");
+_file_write_timer = ADD_CHILD_TIMER(profile, "FileWriteTime", 
"AppendBatchTime");
+_writer_close_timer = ADD_TIMER(profile, "FileWriterCloseTime");
+_written_rows_counter = ADD_COUNTER(profile, "NumWrittenRows", 
TUnit::UNIT);
+_written_data_bytes = ADD_COUNTER(profile, "WrittenDataBytes", 
TUnit::BYTES);
+}
+
+Status FileResultWriter::_create_file_writer() {
+std::string file_name = _get_next_file_name();
+if (_file_opts->is_local_file) {
+_file_writer = new LocalFileWriter(file_name, 0 /* start offset */);
+} else {
+_file_writer = new BrokerWriter(_state->exec_env(),
+_file_opts->broker_addresses,
+_file_opts->broker_properties,
+file_name,
+0 /*start offset*/);
+}
+RETURN_IF_ERROR(_file_writer->open());
+
+switch (_file_opts->file_format) {
+case TFileFormatType::FORMAT_CSV_PLAIN:
+// just use file writer is enough
+break;
+case TFileFormatType::FORMAT_PARQUET:
+_parquet_writer = new ParquetWriterWrapper(_file_writer, 
_output_expr_ctxs);
+break;
+default:
+return Status::InternalError(strings::Substitute("unsupport file 
format: $0", _file_opts->file_format));
+}
+LOG(INFO) << "create file for exporting query result. file name: " << 
file_name
+<< ". query id: " << print_id(_state->query_id());
+return Status::OK();
+}
+
+// file name format as: my_prefix_0.csv
+std::string FileResultWriter::_get_next_file_name() {
+std::stringstream ss;
+ss << _file_opts->file_path << "_" << (_file_idx++) << "." << 
_file_format_to_name();
+return ss.str();
+}
+
+std::string FileResultWriter::_file_format_to_name() {
+switch (_file_opts->file_format) {
+case TFileFormatType::FORMAT_CSV_PLAIN:
+return "csv";
+case TFileFormatType::FORMAT_PARQUET:
+return "parquet";
+default:
+return "unknown";
+}
+}
+
+Status FileResultWriter::append_row_batch(const RowBatch* batch) {
+if (NULL == batch || 0 == batch->num_rows()) {

Review comment:
   Modified





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [incubator-doris] morningman commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result

2020-05-18 Thread GitBox


morningman commented on a change in pull request #3584:
URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r427016918



##
File path: be/src/runtime/file_result_writer.cpp
##
@@ -0,0 +1,319 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "runtime/file_result_writer.h"
+
+#include "exec/broker_writer.h"
+#include "exec/local_file_writer.h"
+#include "exec/parquet_writer.h"
+#include "exprs/expr.h"
+#include "runtime/primitive_type.h"
+#include "runtime/row_batch.h"
+#include "runtime/tuple_row.h"
+#include "runtime/runtime_state.h"
+#include "util/types.h"
+#include "util/date_func.h"
+#include "util/uid_util.h"
+
+#include "gen_cpp/PaloInternalService_types.h"
+
+namespace doris {
+
+const size_t FileResultWriter::OUTSTREAM_BUFFER_SIZE_BYTES = 1024 * 1024;
+
+FileResultWriter::FileResultWriter(
+const ResultFileOptions* file_opts,
+const std::vector& output_expr_ctxs,
+RuntimeProfile* parent_profile) :
+_file_opts(file_opts),
+_output_expr_ctxs(output_expr_ctxs),
+_parent_profile(parent_profile) {
+}
+
+FileResultWriter::~FileResultWriter() {
+close();
+}
+
+Status FileResultWriter::init(RuntimeState* state) {
+_state = state;
+_init_profile();
+
+RETURN_IF_ERROR(_create_file_writer());
+return Status::OK();
+}
+
+void FileResultWriter::_init_profile() {
+RuntimeProfile* profile = 
_parent_profile->create_child("FileResultWriter", true, true);
+_append_row_batch_timer = ADD_TIMER(profile, "AppendBatchTime");
+_convert_tuple_timer = ADD_CHILD_TIMER(profile, "TupleConvertTime", 
"AppendBatchTime");
+_file_write_timer = ADD_CHILD_TIMER(profile, "FileWriteTime", 
"AppendBatchTime");
+_writer_close_timer = ADD_TIMER(profile, "FileWriterCloseTime");
+_written_rows_counter = ADD_COUNTER(profile, "NumWrittenRows", 
TUnit::UNIT);
+_written_data_bytes = ADD_COUNTER(profile, "WrittenDataBytes", 
TUnit::BYTES);
+}
+
+Status FileResultWriter::_create_file_writer() {
+std::string file_name = _get_next_file_name();
+if (_file_opts->is_local_file) {
+_file_writer = new LocalFileWriter(file_name, 0 /* start offset */);
+} else {
+_file_writer = new BrokerWriter(_state->exec_env(),
+_file_opts->broker_addresses,
+_file_opts->broker_properties,
+file_name,
+0 /*start offset*/);
+}
+RETURN_IF_ERROR(_file_writer->open());
+
+switch (_file_opts->file_format) {
+case TFileFormatType::FORMAT_CSV_PLAIN:
+// just use file writer is enough
+break;
+case TFileFormatType::FORMAT_PARQUET:
+_parquet_writer = new ParquetWriterWrapper(_file_writer, 
_output_expr_ctxs);

Review comment:
   `ParquetOutputStream` is not implemented.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org



[GitHub] [incubator-doris] morningman commented on a change in pull request #3584: [OUTFILE] Support `INTO OUTFILE` to export query result

2020-05-18 Thread GitBox


morningman commented on a change in pull request #3584:
URL: https://github.com/apache/incubator-doris/pull/3584#discussion_r427016796



##
File path: be/src/exec/parquet_writer.cpp
##
@@ -0,0 +1,91 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "exec/parquet_writer.h"
+
+#include 
+#include 
+#include 
+
+#include "exec/file_writer.h"
+#include "common/logging.h"
+#include "gen_cpp/PaloBrokerService_types.h"
+#include "gen_cpp/TPaloBrokerService.h"
+#include "runtime/broker_mgr.h"
+#include "runtime/client_cache.h"
+#include "runtime/exec_env.h"
+#include "runtime/tuple.h"
+#include "runtime/descriptors.h"
+#include "runtime/mem_pool.h"
+#include "util/thrift_util.h"
+
+namespace doris {
+
+/// ParquetOutputStream
+ParquetOutputStream::ParquetOutputStream(FileWriter* file_writer): 
_file_writer(file_writer) {
+set_mode(arrow::io::FileMode::WRITE);
+}
+
+ParquetOutputStream::~ParquetOutputStream() {
+Close();
+}
+
+arrow::Status ParquetOutputStream::Write(const void* data, int64_t nbytes) {
+size_t written_len = 0;
+Status st = _file_writer->write(reinterpret_cast(data), 
nbytes, _len);
+if (!st.ok()) {

Review comment:
   Actually, `ParquetOutputStream` is not implemented yet. I will modify 
this class in next PR.

##
File path: be/src/exec/parquet_writer.cpp
##
@@ -0,0 +1,91 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "exec/parquet_writer.h"
+
+#include 
+#include 
+#include 
+
+#include "exec/file_writer.h"
+#include "common/logging.h"
+#include "gen_cpp/PaloBrokerService_types.h"
+#include "gen_cpp/TPaloBrokerService.h"
+#include "runtime/broker_mgr.h"
+#include "runtime/client_cache.h"
+#include "runtime/exec_env.h"
+#include "runtime/tuple.h"
+#include "runtime/descriptors.h"
+#include "runtime/mem_pool.h"
+#include "util/thrift_util.h"
+
+namespace doris {
+
+/// ParquetOutputStream
+ParquetOutputStream::ParquetOutputStream(FileWriter* file_writer): 
_file_writer(file_writer) {
+set_mode(arrow::io::FileMode::WRITE);
+}
+
+ParquetOutputStream::~ParquetOutputStream() {
+Close();
+}
+
+arrow::Status ParquetOutputStream::Write(const void* data, int64_t nbytes) {
+size_t written_len = 0;
+Status st = _file_writer->write(reinterpret_cast(data), 
nbytes, _len);
+if (!st.ok()) {
+return arrow::Status::IOError(st.get_error_msg());
+}
+_cur_pos += written_len;
+return arrow::Status::OK();
+}
+
+arrow::Status ParquetOutputStream::Tell(int64_t* position) const {
+*position = _cur_pos;
+return arrow::Status::OK();
+}
+
+arrow::Status ParquetOutputStream::Close() {
+Status st = _file_writer->close();
+if (!st.ok()) {

Review comment:
   `ParquetOutputStream` is not implemented





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org