[GitHub] incubator-quickstep issue #360: Fix the inclusion guard of ForemanSingleNode...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/360 LGTM! Merging. ---
[GitHub] incubator-quickstep issue #359: Fixed the build issues regarding tmb benchma...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/359 LGTM. Merging. ---
[GitHub] incubator-quickstep issue #358: Fix a bug in HashJoinOperator
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/358 Tests added. The problematic code would fail on the added tests. ---
[GitHub] incubator-quickstep issue #355: QUICKSTEP-127 Data provider thread
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/355 @hbdeshmukh Hi Harshad, can you rebase the branch so I can help merge this PR. ---
[GitHub] incubator-quickstep issue #357: Fixed the command execution bug in the distr...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/357 LGTM! ---
[GitHub] incubator-quickstep issue #355: QUICKSTEP-127 Data provider thread
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/355 LGTM! ---
[GitHub] incubator-quickstep pull request #358: Fix a bug in HashJoinOperator
GitHub user jianqiao opened a pull request: https://github.com/apache/incubator-quickstep/pull/358 Fix a bug in HashJoinOperator This PR fixes a bug in `HashJoinOperator` w.r.t. the swapping of probe/build sides in a previous PR. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jianqiao/incubator-quickstep fix-filter-side Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-quickstep/pull/358.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #358 commit c4a072f301593b3f92a009ba752c05b2226a0f32 Author: Jianqiao Zhu Date: 2018-06-03T20:38:36Z Fix a bug of filter side in HashJoinOperator ---
[GitHub] incubator-quickstep issue #347: QUICKSTEP-121: Added the self-join support.
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/347 LGTM! Merging. Note that the `concretize` signature in `Physical` plans looks somehow cumbersome, we may add a `SubstitutionContext` class to wrap these in a future PR. ``` ::quickstep::Predicate* concretize( const std::unordered_map &substitution_map, const std::unordered_set &left_expr_ids = std::unordered_set(), const std::unordered_set &right_expr_ids = std::unordered_set()) const override; ``` ---
[GitHub] incubator-quickstep issue #353: Minor bug fixes and refactors.
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/353 LGTM! Merging. ---
[GitHub] incubator-quickstep issue #354: Fixed the union-all elimiation case where so...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/354 LGTM! Merging. ---
[GitHub] incubator-quickstep issue #351: Use Exactness info in Catalog stats.
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/351 The stats can be used to provide an estimation even when they are not exact. ---
[GitHub] incubator-quickstep issue #352: QUICKSTEP-125: Fixed the non-determinism in ...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/352 LGTM! Merging. ---
[GitHub] incubator-quickstep issue #350: Fixed the bug regarding EliminateEmptyNode a...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/350 LGTM! Merging. ---
[GitHub] incubator-quickstep issue #349: Fixed the bug regarding EliminateEmptyNode o...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/349 LGTM! Merging. ---
[GitHub] incubator-quickstep pull request #346: Add a python script to auto fix CMake...
GitHub user jianqiao opened a pull request: https://github.com/apache/incubator-quickstep/pull/346 Add a python script to auto fix CMakeLists files This PR adds a script that intends to help improve developer productivity by automatically fixing `CMakeLists.txt` files for the Quickstep project (with best effort). The script will do the following things: - Scan the repo's subdirectories and collect `#include` information from all source code files. - Parse existing `CMakeLists.txt` files and convert all "recognized" commands into proper intermediate representations -- the "unrecognized" part will be kept as "verbal" lines. - Resolve subdirectories, targets and link dependencies. Add / delete / update the corresponding entries. - Convert the intermediate representations back to `CMakeLists.txt` files. **NOTE:** Currently the script is at its initial stage and will not update tests or conditional targets (i.e. those within cmake `if` commands). It is likely to work well if you just create/delete some files or add/remove some `#include`'s -- otherwise additional manual fixes may need to be done after applying the script. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jianqiao/incubator-quickstep autofix-cmake-tool Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-quickstep/pull/346.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #346 commit 73d796dee760a03a91d55cb0fe4d8f073f831237 Author: Jianqiao Zhu Date: 2018-04-27T22:28:51Z Add a python script to auto fix CMakeLists files ---
[GitHub] incubator-quickstep issue #344: QUICKSTEP-123: Fixed the missing 'has_repart...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/344 LGTM! Merging. ---
[GitHub] incubator-quickstep pull request #343: Fix all CMakeLists.txt for automated ...
GitHub user jianqiao opened a pull request: https://github.com/apache/incubator-quickstep/pull/343 Fix all CMakeLists.txt for automated processing This PR fixes and adjusts the style of all `CMakeLists.txt` so that they become stable (i.e. well-formatted) to be processed by an automated tool. The above mentioned tool will be proposed in a subsequent PR. It is intended to help improve developer productivity as it scans source code dependencies and automatically fixes cmakelists. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jianqiao/incubator-quickstep autofix-cmake Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-quickstep/pull/343.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #343 commit 3169b5646aecae35485a76f4439a121d9f66b3e2 Author: Jianqiao Zhu Date: 2018-04-18T05:54:31Z Fix and rearrange all CMakeLists.txt so that they are ready to be processed and regenrated by an automation tool. ---
[GitHub] incubator-quickstep issue #342: Quickstep-119: Added the rule that eliminate...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/342 LGTM! Merging. ---
[GitHub] incubator-quickstep issue #340: More informative error for BlockNotFound exc...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/340 LGTM! Merging. ---
[GitHub] incubator-quickstep issue #340: More informative error for BlockNotFound exc...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/340 The change looks good! Some minor fixes are needed. ---
[GitHub] incubator-quickstep pull request #340: More informative error for BlockNotFo...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/340#discussion_r182211596 --- Diff: storage/StorageErrors.hpp --- @@ -61,9 +61,16 @@ class BlockMemoryTooSmall : public std::exception { **/ class BlockNotFoundInMemory : public std::exception { public: + BlockNotFoundInMemory(int block_id) : block_id_(block_id) {} + virtual const char* what() const throw() { -return "BlockNotFoundInMemory: The specified block was not found in memory"; +std::string message = "BlockNotFoundInMemory: The specified block with ID " + + std::to_string(block_id_ )+ " was not found in memory"; +return message.c_str(); --- End diff -- The `message` object will be destructed at the end of this method -- so the returned pointer is likely to be invalid. As a fix we can have the `message` string stored as a member variable. ---
[GitHub] incubator-quickstep pull request #340: More informative error for BlockNotFo...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/340#discussion_r182212888 --- Diff: storage/StorageErrors.hpp --- @@ -61,9 +61,16 @@ class BlockMemoryTooSmall : public std::exception { **/ class BlockNotFoundInMemory : public std::exception { public: + BlockNotFoundInMemory(int block_id) : block_id_(block_id) {} + virtual const char* what() const throw() { -return "BlockNotFoundInMemory: The specified block was not found in memory"; +std::string message = "BlockNotFoundInMemory: The specified block with ID " + + std::to_string(block_id_ )+ " was not found in memory"; +return message.c_str(); } + + private: + int block_id_; --- End diff -- Suggested fix: ``` const std::string block_id_message_; ``` ---
[GitHub] incubator-quickstep pull request #340: More informative error for BlockNotFo...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/340#discussion_r182210127 --- Diff: storage/StorageErrors.hpp --- @@ -61,9 +61,16 @@ class BlockMemoryTooSmall : public std::exception { **/ class BlockNotFoundInMemory : public std::exception { public: + BlockNotFoundInMemory(int block_id) : block_id_(block_id) {} --- End diff -- Minor style fix: ``` explicit BlockNotFoundInMemory(const int block_id) : ... ``` ---
[GitHub] incubator-quickstep issue #339: Upgrade cmake version.
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/339 LGTM! Merging. ---
[GitHub] incubator-quickstep pull request #334: Fix iwyu include path
GitHub user jianqiao opened a pull request: https://github.com/apache/incubator-quickstep/pull/334 Fix iwyu include path This PR fixes the third-party library include paths for the iwyu (include-what-you-use) tool. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/incubator-quickstep fix-iwyu Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-quickstep/pull/334.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #334 commit c2ed5c69b6b8dad07d7410beb0c8292ea1a746e0 Author: Jianqiao Zhu Date: 2017-09-01T20:07:41Z Fix iwyu include path ---
[GitHub] incubator-quickstep pull request #332: Small adjustments in star schema cost...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/332#discussion_r170369358 --- Diff: query_optimizer/cost_model/StarSchemaSimpleCostModel.cpp --- @@ -493,7 +493,7 @@ std::size_t StarSchemaSimpleCostModel::getNumDistinctValues( return stat.getNumDistinctValues(rel_attr_id); } } - return estimateCardinalityForTableReference(table_reference); + return estimateCardinalityForTableReference(table_reference) * 0.1; --- End diff -- This estimation ratio can be any decimal number that is not close to `1` -- in that case the optimizer would choose bad plans in some situations as the column appears to have "unique" values. `0.1` tends to be a reasonable choice -- we may also have `0.05`, `0.2`, etc., which can be adjusted later when there are actual demands. ---
[GitHub] incubator-quickstep pull request #333: Fix SeparateChainingHashTable::resize...
GitHub user jianqiao opened a pull request: https://github.com/apache/incubator-quickstep/pull/333 Fix SeparateChainingHashTable::resize() This PR fixes the problem that Quickstep hangs when resizing `SeparateChainingHashTable` during the execution of `BuildHashOperator`. Here is a sequence of queries that reproduce the problem: ``` CREATE TABLE r(x INT, y INT); CREATE TABLE s(x INT, y INT); CREATE TABLE t(x INT, y INT); INSERT INTO r SELECT 1, 1 FROM generate_series(1, 200) AS g(x); INSERT INTO s SELECT 1, 1 FROM generate_series(1, 200) AS g(x); INSERT INTO t SELECT 1, 1 FROM generate_series(1, 1000) AS g(x); \analyze SELECT COUNT(*) FROM r, s, t WHERE r.x = s.x AND r.y = s.y AND s.x = t.x AND s.y = t.y; ``` The problem is caused by the [`resize()` call](https://github.com/apache/incubator-quickstep/blob/master/storage/HashTable.hpp#L1514) in `HashTable::putValueAccessorCompositeKey()` when `using_prealloc` is true. In this case, pre-allocation decides to resize the hash table in order to consume all the tuples from the current value accessor. However, `resize()` will alway abort if the hash table is not "actually full", causing infinite loops. Note that `SimpleScalarSeparateChainingHashTable` does not have the same problem, as its [`isFull` method](https://github.com/apache/incubator-quickstep/blob/master/storage/SimpleScalarSeparateChainingHashTable.hpp#L241) already takes `extra_buckets` into consideration. Also note that `LinearOpenAddressingHashTable` seems to have avoided the hanging problem by using a [`retry_num` check](https://github.com/apache/incubator-quickstep/blob/master/storage/LinearOpenAddressingHashTable.hpp#L1203). You can merge this pull request into a Git repository by running: $ git pull https://github.com/jianqiao/incubator-quickstep fix-hash-resize Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-quickstep/pull/333.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #333 commit d1dbb0d9bc2d1f001deee4039157b0be464870f4 Author: Jianqiao Zhu Date: 2018-02-18T07:16:07Z Fix the hanging problem of SeparateChainingHashTable::resize() ---
[GitHub] incubator-quickstep pull request #332: Small adjustments in star schema cost...
GitHub user jianqiao opened a pull request: https://github.com/apache/incubator-quickstep/pull/332 Small adjustments in star schema cost model for # distinct values estimation This PR has a small adjustment in star schema cost model for # of distinct values estimation, together with a fix to a potential bug with `impliesUniqueAttributes`. The adjustment is likely to improve query plans _when table stats are not present_. It does not affect SSB/TPC-H performance. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jianqiao/incubator-quickstep adjust-cost Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-quickstep/pull/332.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #332 commit 8e94a8e7ef6c99e5c64d1d96bb7283f9f1154116 Author: Jianqiao Zhu Date: 2018-02-07T21:42:15Z Small adjust in star schema cost model for # distinct values ---
[GitHub] incubator-quickstep pull request #331: Add a cmake option to handle the Trav...
GitHub user jianqiao opened a pull request: https://github.com/apache/incubator-quickstep/pull/331 Add a cmake option to handle the Travis CI timeout problem. This PR adds a cmake option `ENABLE_COMPARISON_INLINE_EXPANSION` to allow disabling of method specialization in various `Comparison`'s. Turing the flag `OFF` will greatly reduce Quickstep compile time -- thus improving development productivity as well as fixing the Travis CI timeout problem. Note that the flag is by default `ON`, and will be [turned off](https://github.com/apache/incubator-quickstep/blob/539e1ebe09b5d1a2d86069ed1fdc6e9fb38c5ce7/.travis.yml#L80) during the Travis test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/incubator-quickstep fix-travis-timeout Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-quickstep/pull/331.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #331 commit 539e1ebe09b5d1a2d86069ed1fdc6e9fb38c5ce7 Author: Jianqiao Zhu Date: 2018-02-02T23:27:59Z Add a flag to allow disabling of Comparison inline expansion to enable acceleration of Quickstep build. (for development productivity as well as solving the Travis CI timeout problem) ---
[GitHub] incubator-quickstep issue #329: IDE Documentation fixes
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/329 LGTM! Merging. ---
[GitHub] incubator-quickstep issue #330: Upgraded benchmark third party library.
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/330 LGTM! Merging. ---
[GitHub] incubator-quickstep issue #319: Fixed the bug when partition w/ pruned colum...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/319 LGTM! Merging. ---
[GitHub] incubator-quickstep issue #327: QUICKSTEP-113 Remove glog source code from t...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/327 Merging. ---
[GitHub] incubator-quickstep issue #327: QUICKSTEP-113 Remove glog source code from t...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/327 I have tested Xcode, it works! Will merge the PR later. ---
Re: Support for varchar(max)
There is no TEXT type yet and it would be good to add the type. Also there are two issues to be improved for VARCHAR: (A) The current varchar is restricted to be within a storage block, i.e. 2MB by default configuration. (B) For varchar with relative large size (e.g. varchar(8192)), a storage block will just be partially filled then mark full -- due to some reservation check during bulk insert -- thus wasting storage space. 2017-12-04 17:20 GMT-06:00 Dylan Bacon : > Varchar(MAX)/TEXT is a construct that lets you put in an arbitrary amount > of text into the string with no defined upper limit unlike varchar(#) where > # is the character limit. It's a small technical limitation to a project > I'm working on if we don't have it but it's easy enough to work around, was > mostly curious if we have that support. I'm shoving email bodies into QS > and having arbitrary text would make that more powerful. > > > > On 12/4/17 5:17 PM, Robert Claus wrote: > >> I've used varchar successfully in Quickstep, but I don't know what >> functions are supported. Is there specific functionality you're looking >> for? >> >> Ex. "CREATE TABLE Child (a int, b int, c varchar(20));" >> >> -Robert >> >> On Mon, Dec 4, 2017 at 5:01 PM, Dylan Bacon wrote: >> >> Hello, >>> >>> Does Quickstep currently have support for arbitrary-length BLOB format >>> varchars? Think TEXT or varchar(MAX) from SQL Server. >>> >>> -- >>> Regards, >>> >>> Dylan Bacon >>> University of Wisconsin - Madison >>> Department of Computer Sciences >>> dba...@wisc.edu >>> >>> >>> >
[GitHub] incubator-quickstep issue #326: QUICKSTEP-112 Get the list of referenced bas...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/326 LGTM! Merging. ---
Re: Quickstep Network Mode and C++ Sockets
Yes. NetworkCliClient is quite standalone. The dependencies are: (1) #include (2) #include "cli/NetworkCli.grpc.pb.h" (3) #include "cli/NetworkCli.pb.h" (4) #include "utility/Macros.hpp" To write your own client: (A) Setup grpc so that you can include the header files and link to it. (B) Grab NetworkCli.proto (https://github.com/apache/ incubator-quickstep/blob/master/cli/NetworkCli.proto), change the package name if necessary (originally quickstep). Either (B.1) use grpc/protobuf tools to compile NetworkCli.proto to generate (2) and (3) -- see https://github.com/apache/incubator-quickstep/blob/ master/cli/CMakeLists.txt#L53 or (B.2) compile quickstep and grab the files from build/cli/ (C) Copy the NetworkCliClient class from QS into your client code. (A)/(B) may be somehow annoying to handle as you need to search through various documentations ... Best, Jianqiao 2017-11-30 17:07 GMT-06:00 Dylan Bacon : > So NetworkCliClient should be something I'm able to include in my program > along with the appropriate dependencies and use as the API? I was thinking > about needing to do that but I wasn't sure if that was a standalone API QS > has implemented or a core part of the system. Unless I'm being mistaken and > you're talking about something from gRPC. This is my first time working > with it. > > > > On 11/30/17 4:58 PM, Jianqiao wrote: > >> Hi Dylan, >> >> Currently the network mode is using gRPC, so you probably need to use the >> corresponding API (see >> https://github.com/apache/incubator-quickstep/blob/master/ >> cli/NetworkCliClientMain.cpp#L42 >> as an example). The raw socket connection won't work unless you hack >> gRPC's >> message exchange protocol .. >> >> Best, >> Jianqiao >> >> 2017-11-30 16:49 GMT-06:00 Dylan Bacon : >> >> Hello, >>> >>> I am attempting to interface with Quickstep using its NetworkCliClient >>> and >>> it's not working as I would expect. I have the default port and IP set to >>> 3000 and 0.0.0.0 and am attempting to send single queries to be processed >>> over in my test harness. From what I could tell of the code when QS is in >>> network mode it accepts a socket connection and string input from that >>> function and processes it in NetworkCliClient.hpp and >>> NetworkCliClientMain.cpp, and yet this is not happening with my test >>> code. >>> The connection is being established but Quickstep does not seem to be >>> doing >>> anything with the queries that come in. >>> >>> Attached is the test code that I am using. test is just a table by that >>> name, I'm selecting a literal from it so the contents shouldn't matter. >>> I've also attempted to create a table with this but Quickstep did not >>> process that. >>> >>> -- >>> Regards, >>> >>> Dylan Bacon >>> University of Wisconsin - Madison >>> Department of Computer Sciences >>> dba...@wisc.edu >>> >>> >>> >
[GitHub] incubator-quickstep pull request #326: QUICKSTEP-112 Get the list of referen...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/326#discussion_r154230379 --- Diff: query_optimizer/rules/ReferencedBaseRelations.hpp --- @@ -0,0 +1,78 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + **/ + +#ifndef QUICKSTEP_QUERY_OPTIMIZER_RULES_REFERENCED_BASE_RELATIONS_HPP_ +#define QUICKSTEP_QUERY_OPTIMIZER_RULES_REFERENCED_BASE_RELATIONS_HPP_ + +#include +#include + +#include "catalog/CatalogTypedefs.hpp" +#include "query_optimizer/logical/Logical.hpp" +#include "query_optimizer/rules/DFSTraversal.hpp" +#include "utility/Macros.hpp" + +namespace quickstep { + +class CatalogRelation; + +namespace optimizer { + +class OptimizerContext; + +class ReferencedBaseRelations : public DFSTraversal { --- End diff -- Since this class overrides the `apply` method, we can just inherit `Rule`. ---
[GitHub] incubator-quickstep pull request #326: QUICKSTEP-112 Get the list of referen...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/326#discussion_r154229622 --- Diff: query_optimizer/rules/ReferencedBaseRelations.hpp --- @@ -0,0 +1,78 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + **/ + +#ifndef QUICKSTEP_QUERY_OPTIMIZER_RULES_REFERENCED_BASE_RELATIONS_HPP_ +#define QUICKSTEP_QUERY_OPTIMIZER_RULES_REFERENCED_BASE_RELATIONS_HPP_ + +#include +#include + +#include "catalog/CatalogTypedefs.hpp" +#include "query_optimizer/logical/Logical.hpp" +#include "query_optimizer/rules/DFSTraversal.hpp" +#include "utility/Macros.hpp" + +namespace quickstep { + +class CatalogRelation; + +namespace optimizer { + +class OptimizerContext; + +class ReferencedBaseRelations : public DFSTraversal { + public: + /** + * @brief Constructor + * @param optimizer_context The optimizer context. + */ + explicit ReferencedBaseRelations(OptimizerContext *optimizer_context) + : optimizer_context_(optimizer_context) { + } + + std::string getName() const override { return "ReferencedBaseRelations"; } + + TreeNodePtr apply(const TreeNodePtr &tree) override; + + /** + * @brief Get the base relations referenced in a query. + */ + const std::vector getReferencedBaseRelations() const { --- End diff -- Better remove the beginning `const`, as the method returns a temporary object. ---
Re: Quickstep Network Mode and C++ Sockets
Hi Dylan, Currently the network mode is using gRPC, so you probably need to use the corresponding API (see https://github.com/apache/incubator-quickstep/blob/master/cli/NetworkCliClientMain.cpp#L42 as an example). The raw socket connection won't work unless you hack gRPC's message exchange protocol .. Best, Jianqiao 2017-11-30 16:49 GMT-06:00 Dylan Bacon : > Hello, > > I am attempting to interface with Quickstep using its NetworkCliClient and > it's not working as I would expect. I have the default port and IP set to > 3000 and 0.0.0.0 and am attempting to send single queries to be processed > over in my test harness. From what I could tell of the code when QS is in > network mode it accepts a socket connection and string input from that > function and processes it in NetworkCliClient.hpp and > NetworkCliClientMain.cpp, and yet this is not happening with my test code. > The connection is being established but Quickstep does not seem to be doing > anything with the queries that come in. > > Attached is the test code that I am using. test is just a table by that > name, I'm selecting a literal from it so the contents shouldn't matter. > I've also attempted to create a table with this but Quickstep did not > process that. > > -- > Regards, > > Dylan Bacon > University of Wisconsin - Madison > Department of Computer Sciences > dba...@wisc.edu > >
Re: problem with build quickstep
Hi Song, It seems to be problem with higher versions of gcc/clang. As a temporary fix please comment out (or remove) the following two lines in incubator/CMakeLists.txt and see if it works: https://github.com/apache/incubator-quickstep/compare/disable-flags Best, Jianqiao 2017-11-28 18:57 GMT-06:00 Song Zhao : > hi Harshad > > My cmake version is 3.9.6, CMAKE_CXX_COMPILER:FILEPATH=/usr/bin/c++ > > > Thank you, > Song >
[GitHub] incubator-quickstep issue #300: QUICKSTEP-106: Hash-Join-Fuse: Feature added...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/300 @zuyu Currently `HashJoinOperator` performance is sensitive to the _number of build blocks per probe block_ due to the concurrency bottleneck within LRU policy enforcer. Consider the situation that the build side relation has `N` blocks and the number of blocks decreases to `M` after applying the predicate, where `N` is very large but `M` is small. Then materializing the filtered build-side relation incurs only a small overhead, but it dramatically reduces the _number of build blocks per probe block_. ---
[GitHub] incubator-quickstep issue #300: QUICKSTEP-106: Hash-Join-Fuse: Feature added...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/300 Merging. ---
[GitHub] incubator-quickstep issue #325: DO NOT MERGE: Concurrent queries transaction...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/325 The design looks good to me! It would be better to fast forward to subsequent PRs to see the actual usage. ---
[GitHub] incubator-quickstep issue #300: QUICKSTEP-106: Hash-Join-Fuse: Feature added...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/300 @dylanpbacon Hi Dylan, I updated the LIP related stuff and put it in this [branch](https://github.com/apache/incubator-quickstep/tree/Hash-Join-Fuse), you may just fetch the changes to reset your repo's `Hash-Join-Fuse` branch, then `git push -f`, then we can merge this PR. **Note:** The optimization is enabled by default, and I added an extra gflag `fuse-hash-select-threshold` that is set to one million (`100u`) by default. A fusion transformation is applied only when the estimated cardinality of the build-side selection is _greater_ than the threshold. Overall speaking, the fuse-hash-select optimization is especially beneficial when _the build-side selection has large output cardinality_ (e.g. TPC-H Q21), and the benefits come from two aspects: (1) smaller memory footprint, (2) avoiding materialization of the selection's output. However, due to some issues in current implementation of HashJoinOperator (and buffer manager), the fusion may slow down some queries (e.g. TPC-H Q02, 400ms -> 700ms with LIP fixed). The `fuse-hash-select-threshold` prevents those situations. ---
[GitHub] incubator-quickstep issue #321: Fix number of work orders generated for inse...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/321 LGTM! Merging. ---
[GitHub] incubator-quickstep issue #323: Temporary Build Support for OS X 10.13
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/323 Merged ---
[GitHub] incubator-quickstep issue #323: Temporary Build Support for OS X 10.13
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/323 LGTM! Can you squash the two commits into one and then I can merge this PR. (To squash the two commits, go to the repo, `git rebase -i HEAD~2`, change the second `pick` to `fixup`, then save & exit.) ---
Re: cmake error
Hi Om, As a quick fix you can comment out (by adding '#' in front, or just remove the line) the following two lines in incubator-quickstep/CMakeLists.txt: Line 294 <https://github.com/apache/incubator-quickstep/blob/master/CMakeLists.txt#L294> : set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Werror") Line 588 <https://github.com/apache/incubator-quickstep/blob/master/CMakeLists.txt#L588> : set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-return-type-c-linkage") This should fix the problem for gcc-6.3.. we may add systematic fixes for different compiler versions later. Best, Jianqiao 2017-11-09 18:21 GMT-06:00 Om Jadhav : > Hi, > > It’s gcc version 6.3.0. > > gcc -v > Using built-in specs. > COLLECT_GCC=gcc > COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/6/lto-wrapper > Target: x86_64-linux-gnu > Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro > 6.3.0-18ubuntu2~16.04' --with-bugurl=file:///usr/share/doc/gcc-6/README.Bugs > --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr > --program-suffix=-6 --program-prefix=x86_64-linux-gnu- --enable-shared > --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext > --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ > --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes > --with-default-libstdcxx-abi=new --enable-gnu-unique-object > --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib > --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo > --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-6-amd64/jre > --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-6-amd64 > --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-6-amd64 > --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar > --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch > --disable-werror --with-arch-32=i686 --with-abi=m64 > --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic > --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu > --target=x86_64-linux-gnu > Thread model: posix > gcc version 6.3.0 20170519 (Ubuntu/Linaro 6.3.0-18ubuntu2~16.04) > > > On 09/11/17, 5:04 PM, "Jianqiao" wrote: > > It seems to be a problem related to C++ compiler version. Can you > check its > version by using command: > gcc -v > > The fix should be a few lines of changes in root directory's/glog's > CMakeLists.txt. > > Best, > Jianqiao > > > 2017-11-08 15:30 GMT-06:00 Om Jadhav : > > > Hi Jianqiao, > > > > Please find the make error below: > > > > [ 7%] Completed 'libtcmalloc_ext' > > [ 7%] Built target libtcmalloc_ext > > [ 7%] Building CXX object third_party/googletest/ > > googletest/CMakeFiles/gtest.dir/src/gtest-all.cc.o > > [ 7%] Linking CXX static library libgtest.a > > [ 7%] Built target gtest > > [ 7%] Building CXX object third_party/gflags/CMakeFiles/ > > gflags_nothreads-static.dir/src/gflags.cc.o > > /home/omjadhav/quickstep/third_party/src/gflags/src/gflags.cc:443:5: > > error: ‘int google::{anonymous}::FlagValue::ValueSize() const’ > defined > > but not used [-Werror=unused-function] > > int FlagValue::ValueSize() const { > > ^ > > cc1plus: error: unrecognized command line option > > ‘-Wno-return-type-c-linkage’ [-Werror] > > cc1plus: all warnings being treated as errors > > third_party/gflags/CMakeFiles/gflags_nothreads-static.dir/ > build.make:62: > > recipe for target 'third_party/gflags/CMakeFiles/gflags_nothreads- > static.dir/src/gflags.cc.o' > > failed > > make[2]: *** [third_party/gflags/CMakeFiles/gflags_nothreads- > static.dir/src/gflags.cc.o] > > Error 1 > > CMakeFiles/Makefile2:939: recipe for target 'third_party/gflags/ > > CMakeFiles/gflags_nothreads-static.dir/all' failed > > make[1]: *** [third_party/gflags/CMakeFiles/gflags_nothreads- > static.dir/all] > > Error 2 > > Makefile:138: recipe for target 'all' failed > > make: *** [all] Error 2 > > > > > > Thanks > > Om > > > > On 06/11/17, 3:45 PM, "Jianqiao" wrote: > > > > Hi Om, > > > > It seems that your "cmake" output is okay. Can you also provide > the > > "make" > > error message? > > > >
Re: cmake error
It seems to be a problem related to C++ compiler version. Can you check its version by using command: gcc -v The fix should be a few lines of changes in root directory's/glog's CMakeLists.txt. Best, Jianqiao 2017-11-08 15:30 GMT-06:00 Om Jadhav : > Hi Jianqiao, > > Please find the make error below: > > [ 7%] Completed 'libtcmalloc_ext' > [ 7%] Built target libtcmalloc_ext > [ 7%] Building CXX object third_party/googletest/ > googletest/CMakeFiles/gtest.dir/src/gtest-all.cc.o > [ 7%] Linking CXX static library libgtest.a > [ 7%] Built target gtest > [ 7%] Building CXX object third_party/gflags/CMakeFiles/ > gflags_nothreads-static.dir/src/gflags.cc.o > /home/omjadhav/quickstep/third_party/src/gflags/src/gflags.cc:443:5: > error: ‘int google::{anonymous}::FlagValue::ValueSize() const’ defined > but not used [-Werror=unused-function] > int FlagValue::ValueSize() const { > ^ > cc1plus: error: unrecognized command line option > ‘-Wno-return-type-c-linkage’ [-Werror] > cc1plus: all warnings being treated as errors > third_party/gflags/CMakeFiles/gflags_nothreads-static.dir/build.make:62: > recipe for target > 'third_party/gflags/CMakeFiles/gflags_nothreads-static.dir/src/gflags.cc.o' > failed > make[2]: *** > [third_party/gflags/CMakeFiles/gflags_nothreads-static.dir/src/gflags.cc.o] > Error 1 > CMakeFiles/Makefile2:939: recipe for target 'third_party/gflags/ > CMakeFiles/gflags_nothreads-static.dir/all' failed > make[1]: *** [third_party/gflags/CMakeFiles/gflags_nothreads-static.dir/all] > Error 2 > Makefile:138: recipe for target 'all' failed > make: *** [all] Error 2 > > > Thanks > Om > > On 06/11/17, 3:45 PM, "Jianqiao" wrote: > > Hi Om, > > It seems that your "cmake" output is okay. Can you also provide the > "make" > error message? > > Best, > Jianqiao > > 2017-11-06 11:34 GMT-06:00 Harshad Deshmukh : > > > Hi Om, > > > > What's your build setup? Did you download the prerequisites and > > initialized the git submodules? > > > > Get Outlook for Android<https://aka.ms/ghei36> > > > > > > From: Om Jadhav > > Sent: Friday, November 3, 2017 3:42:05 PM > > To: dev@quickstep.incubator.apache.org > > Subject: cmake error > > > > Hello, > > > > I am trying to cmake, and I am getting most of the things failed for > the > > first time. And also the build is failing after this cmake. > > > > o/p: > > > > Vector copy elision level set to: single-relation selection > > -- git Version: v0.0.0 > > -- Version: 0.0.0 > > -- Performing Test HAVE_STD_REGEX > > -- Performing Test HAVE_STD_REGEX -- success > > -- Performing Test HAVE_GNU_POSIX_REGEX > > -- Performing Test HAVE_GNU_POSIX_REGEX -- failed to compile > > -- Performing Test HAVE_POSIX_REGEX > > -- Performing Test HAVE_POSIX_REGEX -- success > > -- Performing Test HAVE_STEADY_CLOCK > > -- Performing Test HAVE_STEADY_CLOCK -- success > > -- Checking program counter fetch from ucontext_t member: > > uc_mcontext.gregs[REG_EIP] > > -- Performing Test PC_FROM_UCONTEXT_COMPILES > > -- Performing Test PC_FROM_UCONTEXT_COMPILES - Failed > > -- Checking program counter fetch from ucontext_t member: > > uc_mcontext.gregs[REG_RIP] > > -- Performing Test PC_FROM_UCONTEXT_COMPILES > > -- Performing Test PC_FROM_UCONTEXT_COMPILES - Failed > > -- Checking program counter fetch from ucontext_t member: > uc_mcontext.sc_ip > > -- Performing Test PC_FROM_UCONTEXT_COMPILES > > -- Performing Test PC_FROM_UCONTEXT_COMPILES - Failed > > -- Checking program counter fetch from ucontext_t member: > > uc_mcontext.uc_regs->gregs[PT_NIP] > > -- Performing Test PC_FROM_UCONTEXT_COMPILES > > -- Performing Test PC_FROM_UCONTEXT_COMPILES - Failed > > -- Checking program counter fetch from ucontext_t member: > > uc_mcontext.gregs[R15] > > -- Performing Test PC_FROM_UCONTEXT_COMPILES > > -- Performing Test PC_FROM_UCONTEXT_COMPILES - Failed > > -- Checking program counter fetch from ucontext_t member: > > uc_mcontext.arm_pc > > -- Performing Test PC_FROM_UCONTEXT_COMPILES > > -- Performing Test PC_FROM_UCONTEXT_COMPILES - Failed > > -- Checking program counter fetch from ucontext_t member: > > uc_mcontext.mc_eip
Re: cmake error
Hi Om, It seems that your "cmake" output is okay. Can you also provide the "make" error message? Best, Jianqiao 2017-11-06 11:34 GMT-06:00 Harshad Deshmukh : > Hi Om, > > What's your build setup? Did you download the prerequisites and > initialized the git submodules? > > Get Outlook for Android<https://aka.ms/ghei36> > > > From: Om Jadhav > Sent: Friday, November 3, 2017 3:42:05 PM > To: dev@quickstep.incubator.apache.org > Subject: cmake error > > Hello, > > I am trying to cmake, and I am getting most of the things failed for the > first time. And also the build is failing after this cmake. > > o/p: > > Vector copy elision level set to: single-relation selection > -- git Version: v0.0.0 > -- Version: 0.0.0 > -- Performing Test HAVE_STD_REGEX > -- Performing Test HAVE_STD_REGEX -- success > -- Performing Test HAVE_GNU_POSIX_REGEX > -- Performing Test HAVE_GNU_POSIX_REGEX -- failed to compile > -- Performing Test HAVE_POSIX_REGEX > -- Performing Test HAVE_POSIX_REGEX -- success > -- Performing Test HAVE_STEADY_CLOCK > -- Performing Test HAVE_STEADY_CLOCK -- success > -- Checking program counter fetch from ucontext_t member: > uc_mcontext.gregs[REG_EIP] > -- Performing Test PC_FROM_UCONTEXT_COMPILES > -- Performing Test PC_FROM_UCONTEXT_COMPILES - Failed > -- Checking program counter fetch from ucontext_t member: > uc_mcontext.gregs[REG_RIP] > -- Performing Test PC_FROM_UCONTEXT_COMPILES > -- Performing Test PC_FROM_UCONTEXT_COMPILES - Failed > -- Checking program counter fetch from ucontext_t member: uc_mcontext.sc_ip > -- Performing Test PC_FROM_UCONTEXT_COMPILES > -- Performing Test PC_FROM_UCONTEXT_COMPILES - Failed > -- Checking program counter fetch from ucontext_t member: > uc_mcontext.uc_regs->gregs[PT_NIP] > -- Performing Test PC_FROM_UCONTEXT_COMPILES > -- Performing Test PC_FROM_UCONTEXT_COMPILES - Failed > -- Checking program counter fetch from ucontext_t member: > uc_mcontext.gregs[R15] > -- Performing Test PC_FROM_UCONTEXT_COMPILES > -- Performing Test PC_FROM_UCONTEXT_COMPILES - Failed > -- Checking program counter fetch from ucontext_t member: > uc_mcontext.arm_pc > -- Performing Test PC_FROM_UCONTEXT_COMPILES > -- Performing Test PC_FROM_UCONTEXT_COMPILES - Failed > -- Checking program counter fetch from ucontext_t member: > uc_mcontext.mc_eip > -- Performing Test PC_FROM_UCONTEXT_COMPILES > -- Performing Test PC_FROM_UCONTEXT_COMPILES - Failed > -- Checking program counter fetch from ucontext_t member: > uc_mcontext.mc_rip > -- Performing Test PC_FROM_UCONTEXT_COMPILES > -- Performing Test PC_FROM_UCONTEXT_COMPILES - Failed > -- Checking program counter fetch from ucontext_t member: > uc_mcontext.__gregs[_REG_EIP] > -- Performing Test PC_FROM_UCONTEXT_COMPILES > -- Performing Test PC_FROM_UCONTEXT_COMPILES - Failed > -- Checking program counter fetch from ucontext_t member: > uc_mcontext.__gregs[_REG_RIP] > -- Performing Test PC_FROM_UCONTEXT_COMPILES > -- Performing Test PC_FROM_UCONTEXT_COMPILES - Failed > -- Checking program counter fetch from ucontext_t member: > uc_mcontext->ss.eip > -- Performing Test PC_FROM_UCONTEXT_COMPILES > -- Performing Test PC_FROM_UCONTEXT_COMPILES - Failed > -- Checking program counter fetch from ucontext_t member: > uc_mcontext->__ss.__eip > -- Performing Test PC_FROM_UCONTEXT_COMPILES > -- Performing Test PC_FROM_UCONTEXT_COMPILES - Failed > -- Checking program counter fetch from ucontext_t member: > uc_mcontext->ss.rip > -- Performing Test PC_FROM_UCONTEXT_COMPILES > -- Performing Test PC_FROM_UCONTEXT_COMPILES - Failed > -- Checking program counter fetch from ucontext_t member: > uc_mcontext->__ss.__rip > -- Performing Test PC_FROM_UCONTEXT_COMPILES > -- Performing Test PC_FROM_UCONTEXT_COMPILES - Failed > -- Checking program counter fetch from ucontext_t member: > uc_mcontext->ss.srr0 > -- Performing Test PC_FROM_UCONTEXT_COMPILES > -- Performing Test PC_FROM_UCONTEXT_COMPILES - Failed > -- Checking program counter fetch from ucontext_t member: > uc_mcontext->__ss.__srr0 > -- Performing Test PC_FROM_UCONTEXT_COMPILES > -- Performing Test PC_FROM_UCONTEXT_COMPILES - Failed > CMake Warning at third_party/src/glog/CMakeLists.txt:185 (message): > Unable to find program counter field in ucontext_t. GLOG signal handler > will not be able to report precise PC position. > > > You appear to be building on a Linux system with HugeTLB support. To take > advantage of this feature, you will need to configure kernel support for > hugepages by setting /proc/sys/vm/nr_hugepages and/or > /proc/sys/vm/nr_overcommit_hugepages as well as running quickstep > executables
[GitHub] incubator-quickstep issue #320: Support Multiple Tuple Inserts
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/320 It may just be fine before `ExecutionGenerator`, I think the fix is to revise `ExecutionGenerator::convertInsertTuple()` and do some modifications inside `InsertOperator`. ---
[GitHub] incubator-quickstep issue #316: Support Multiple Tuple Inserts
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/316 LGTM! Will merge after travis-ci's tests. ---
[GitHub] incubator-quickstep issue #314: Added Vector Aggregation support in the dist...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/314 LGTM! Merging. ---
[GitHub] incubator-quickstep pull request #315: [DO NOT MERGE] Refactor type system t...
GitHub user jianqiao opened a pull request: https://github.com/apache/incubator-quickstep/pull/315 [DO NOT MERGE] Refactor type system to provide better extensibility of types and functions This is a preliminary PR that is not ready to be merged but provides an overall view of the type system refactoring work. Many constructs are at their initial designs and maybe further improved. The PR aims at reviewing the refactoring designs at the "architecture" level. Detailed code style and unit test issues may be addressed later in subsequent concrete PRs. The overall purpose of the refactoring is to improve the extensibility of the existing type/function system (i.e. support more kinds of types/functions and make it easier to add new types and functions), while retaining the performance of the current system. ### Major Changes Part I. Type System --- # 1. Categorize all types into four [_memory layouts_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/TypeID.hpp#L64). The four memory layouts are: * __CxxInlinePod__ (C++ plain old data) * __ParInlinePod__ (Parameterized inline plain old data) * __ParOutOfLinePod__ (Parameterized out-of-line plain old data) * __CxxGeneric__ (C++ generic types) Memory layout decides how the corresponding type's values are stored and represented. Briefly speaking, * _CxxInlinePod_ corresponds to C++ primitive types or POD structs. * E.g. _int_, _double_, _struct { double x, double y }_. * The size of a CxxInlinePod value is known at C++ compile time (e.g _double_ has size 8, _struct { double x, double y }_ has size 16). * _ParInlinePod_ corresponds to database defined "fixed length" types. * E.g. _Char(8)_, _Char(20)_. * The size of such types' values are not known at C++ compile time. Instead, the type is parameterized by an unsigned integer, where the parameter's value is known at SQL query compile time (which is C++ run-time). * _ParOutOfLinePod_ corresponds to database defined "variable length" types. * E.g. _Varchar(20)_. * The size of such types' values are not known until SQL query run-time. * _CxxGeneric_ correponds to C++ general types (i.e. any C++ type). * E.g. _std::set<int>_, _std::vector<const Type*>_. * Such types have to implement serialization/deserialization methods to have storage support. --- # 2. Use [_TypeIDTrait_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/TypeRegistrar.hpp#L59) to allow many information to be known at compile time. With this per-type trait information, we can avoid many boilerplate code for each subclass of _Type_ by using template techniques and specialize on the memory layout. See [_TypeSynthesizer_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/TypeSynthesizer.hpp) and [_TypeFactory_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/TypeFactory.cpp#L69). _TypeIDTrait_ is also extensively used in many other places as it provides all the required compile-time information about a type. --- # 3. Support more types. Details will be written later about how to add a new type into the Quickstep system. The current PR has some example types added: * The [_Bool_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/BoolType.hpp) type. It will be used later for connecting scalar functions and predicates. * The [_Text_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/TextType.hpp) type. A general non-parameterized string type. * __TODO:__ We need some updates in the storage block module (potentially also other places) to handle the "infinite maximum byte size" types. * The [_MetaType_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/MetaType-decl.hpp) type. It is "type of type". I.e. a value of _MetaType_ has C++ type _const Type*_. * The [_Array_](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/ArrayType.hpp) type. A generic type that represents an array. This type takes a MetaType value as parameter, where the parameter specifies the array's element type. * __TODO__: We need specialized array types such as _IntArray_ and _TextArray_ for performance consideration. --- # 4. Improve the type casting mechanism. Type casting (coersion) is an important feature that is needed in practice from time to time. This PR's design defined an overall [template](https://github.com/apache/incubator-quickstep/blob/refactor-type/types/operations/unary_operations/CastFunctorOverloads.hpp#L41) ``` template struct CastFunctor; ``` which is then
[GitHub] incubator-quickstep issue #304: Added a new set API for TupleIdSequence.
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/304 LGTM! Merging. ---
[GitHub] incubator-quickstep pull request #299: QUICKSTEP-104 Fix the problem that Li...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/299#discussion_r140061423 --- Diff: cli/LineReader.cpp --- @@ -171,7 +173,7 @@ std::string LineReader::getNextCommand() { case '.': case '\\': // Fall Through. // If the dot or forward slash begins the line, begin a command search. - if (scan_position == 0) { + if (special_char_location == multiline_buffer.find_first_not_of(" \t\r\n")) { --- End diff -- Yes it should cover all the possible situations. ---
[GitHub] incubator-quickstep pull request #299: QUICKSTEP-104 Fix the problem that Li...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/299#discussion_r140060711 --- Diff: cli/tests/command_executor/D.test --- @@ -69,6 +71,7 @@ INSERT INTO foo3 values(5, 1, 1.0, 1.0, 'XYZZ'); col4 | Float col5 | Char(5) == + --- End diff -- Probably no. The test input does not go through `LineReader`. ---
[GitHub] incubator-quickstep pull request #299: QUICKSTEP-104 Fix the problem that Li...
GitHub user jianqiao opened a pull request: https://github.com/apache/incubator-quickstep/pull/299 QUICKSTEP-104 Fix the problem that LineReader cannot recognize a command if there are whitespaces before it. This PR fixes a bug that the Quickstep REPL cannot recognize a command (e.g. `\d`, `\analyze`) if there are whitespaces or empty lines before the command. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jianqiao/incubator-quickstep fix-extract-command Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-quickstep/pull/299.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #299 commit 77960a42dcfb3d27de5601548a04d81a6be79375 Author: Jianqiao Zhu Date: 2017-09-20T03:02:02Z Fix a bug in LineReader for recognizing command ---
[GitHub] incubator-quickstep issue #298: Prune columns after partition rule.
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/298 LGTM. Merging. ---
[GitHub] incubator-quickstep issue #297: Fixed a bug in partitioned NLJ.
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/297 LGTM. Merging. ---
[GitHub] incubator-quickstep issue #271: QUICKSTEP-95: Fixed the exception due to zer...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/271 LGTM. Merging. ---
[GitHub] incubator-quickstep issue #296: QUICKSTEP-78: Displayed Partition Info using...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/296 LGTM. Merging. ---
[GitHub] incubator-quickstep issue #293: Added Partition rules for Sort.
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/293 LGTM. Merging. ---
[GitHub] incubator-quickstep pull request #292: Redirect stdout and stderr in network...
GitHub user jianqiao opened a pull request: https://github.com/apache/incubator-quickstep/pull/292 Redirect stdout and stderr in network mode. This PR redirects `stdout / stderr` to `io_handle->out() / io_handle->err()` (see [here](https://github.com/apache/incubator-quickstep/blob/fb9f856a78d947647406f661f3f3291e294bd266/cli/QuickstepCli.cpp#L309)) for transmitting the standard stream outputs to client in the network mode. It allows `quickstep_client` to use `COPY ... TO stdout ...` command to retrieve result table in CSV format. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/incubator-quickstep redirect-stream Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-quickstep/pull/292.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #292 commit fb9f856a78d947647406f661f3f3291e294bd266 Author: Jianqiao Zhu Date: 2017-09-01T17:07:13Z Redirect stdout and stderr in network mode. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep pull request #291: Add "COPY TO" operator for exporting ...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/291#discussion_r136199350 --- Diff: relational_operators/TextScanOperator.cpp --- @@ -82,14 +81,19 @@ static bool ValidateTextScanTextSegmentSize(const char *flagname, return true; } -static const volatile bool text_scan_text_segment_size_dummy = gflags::RegisterFlagValidator( -&FLAGS_textscan_text_segment_size, &ValidateTextScanTextSegmentSize); +static const volatile bool text_scan_text_segment_size_dummy = +gflags::RegisterFlagValidator( +&FLAGS_textscan_text_segment_size, &ValidateTextScanTextSegmentSize); namespace { -size_t getFileSize(const string &file_name) { +static std::size_t GetFileSize(const std::string &file_name) { --- End diff -- Updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep pull request #291: Add "COPY TO" operator for exporting ...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/291#discussion_r136198329 --- Diff: query_optimizer/resolver/Resolver.cpp --- @@ -143,6 +145,45 @@ namespace E = ::quickstep::optimizer::expressions; namespace L = ::quickstep::optimizer::logical; namespace S = ::quickstep::serialization; +namespace { + +static attribute_id GetAttributeIdFromName( --- End diff -- Updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep pull request #291: Add "COPY TO" operator for exporting ...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/291#discussion_r136197963 --- Diff: query_optimizer/logical/LogicalType.hpp --- @@ -34,6 +34,7 @@ namespace logical { enum class LogicalType { kAggregate, --- End diff -- Updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep pull request #291: Add "COPY TO" operator for exporting ...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/291#discussion_r136197848 --- Diff: query_optimizer/logical/CopyTo.hpp --- @@ -0,0 +1,141 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + **/ + +#ifndef QUICKSTEP_QUERY_OPTIMIZER_LOGICAL_COPY_TO_HPP_ +#define QUICKSTEP_QUERY_OPTIMIZER_LOGICAL_COPY_TO_HPP_ + +#include +#include +#include + +#include "query_optimizer/OptimizerTree.hpp" +#include "query_optimizer/expressions/AttributeReference.hpp" +#include "query_optimizer/logical/Logical.hpp" +#include "query_optimizer/logical/LogicalType.hpp" +#include "utility/BulkIOConfiguration.hpp" +#include "utility/Macros.hpp" + +#include "glog/logging.h" + +namespace quickstep { +namespace optimizer { +namespace logical { + +/** \addtogroup OptimizerLogical + * @{ + */ + +class CopyTo; +typedef std::shared_ptr CopyToPtr; + +/** + * @brief Represents an operation that copies data from a relation to a text file. + */ +class CopyTo : public Logical { + public: + LogicalType getLogicalType() const override { +return LogicalType::kCopyTo; + } + + std::string getName() const override { +return "CopyTo"; + } + + /** + * @return The input relation whose data is to be exported. + */ + const LogicalPtr& input() const { +return input_; + } + + /** + * @return The name of the file to write the data to. + */ + const std::string& file_name() const { +return file_name_; + } + + /** + * @return The options for this COPY TO statement. + */ + BulkIOConfigurationPtr options() const { --- End diff -- Updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep pull request #291: Add "COPY TO" operator for exporting ...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/291#discussion_r136180815 --- Diff: query_optimizer/logical/CopyFrom.hpp --- @@ -66,20 +67,14 @@ class CopyFrom : public Logical { const std::string& file_name() const { return file_name_; } /** - * @return The delimiter used in the text file to separate columns. + * @return The options for this COPY FROM statement. */ - const char column_delimiter() const { return column_delimiter_; } - - /** - * @return Whether to decode escape sequences in the text file. - */ - bool escape_strings() const { return escape_strings_; } + BulkIOConfigurationPtr options() const { return options_; } --- End diff -- Updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep pull request #291: Add "COPY TO" operator for exporting ...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/291#discussion_r136178849 --- Diff: parser/ParseStatement.hpp --- @@ -771,122 +776,135 @@ class ParseStatementInsertSelection : public ParseStatementInsert { DISALLOW_COPY_AND_ASSIGN(ParseStatementInsertSelection); }; + /** - * @brief Optional parameters for a COPY FROM statement. + * @brief The parsed representation of a COPY FROM/COPY TO statement. **/ -struct ParseCopyFromParams : public ParseTreeNode { - /** - * @brief Constructor, sets default values. - **/ - ParseCopyFromParams(const int line_number, const int column_number) - : ParseTreeNode(line_number, column_number), -escape_strings(true) { - } - - std::string getName() const override { return "CopyFromParams"; } - +class ParseStatementCopy : public ParseStatement { + public: /** - * @brief Sets the column delimiter. - * - * @param delimiter_in The column delimiter string. + * @brief Copy direction (FROM text file/TO text file). */ - void set_delimiter(ParseString* delimiter_in) { -delimiter.reset(delimiter_in); - } - - /** - * @brief The string which terminates individual attribute values in the - *input file. Can be NULL. - **/ - std::unique_ptr delimiter; + enum CopyDirection { +kFrom, --- End diff -- Updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep pull request #291: Add "COPY TO" operator for exporting ...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/291#discussion_r136198737 --- Diff: query_optimizer/resolver/Resolver.cpp --- @@ -418,27 +455,157 @@ L::LogicalPtr Resolver::resolve(const ParseStatement &parse_query) { } L::LogicalPtr Resolver::resolveCopyFrom( -const ParseStatementCopyFrom ©_from_statement) { - // Default parameters. - std::string column_delimiter_ = "\t"; - bool escape_strings_ = true; +const ParseStatementCopy ©_from_statement) { + DCHECK(copy_from_statement.getCopyDirection() == ParseStatementCopy::kFrom); + const PtrList *params = copy_from_statement.params(); - const ParseCopyFromParams *params = copy_from_statement.params(); + BulkIOFormat file_format = BulkIOFormat::kText; if (params != nullptr) { -if (params->delimiter != nullptr) { - column_delimiter_ = params->delimiter->value(); - if (column_delimiter_.size() != 1) { -THROW_SQL_ERROR_AT(params->delimiter) -<< "DELIMITER is not a single character"; +for (const ParseKeyValue ¶m : *params) { + const std::string &key = ToLower(param.key()->value()); + if (key == "format") { +const ParseString *parse_format = GetKeyValueString(param); +const std::string format = ToLower(parse_format->value()); +// TODO(jianqiao): Support other bulk load formats such as CSV. +if (format != "text") { + THROW_SQL_ERROR_AT(parse_format) << "Unsupported file format: " << format; +} +// Update file_format when other formats get supported. +break; + } +} + } + + std::unique_ptr options = + std::make_unique(file_format); + if (params != nullptr) { +for (const ParseKeyValue ¶m : *params) { + const std::string &key = ToLower(param.key()->value()); + if (key == "delimiter") { +const ParseString *parse_delimiter = GetKeyValueString(param); +const std::string &delimiter = parse_delimiter->value(); +if (delimiter.size() != 1) { + THROW_SQL_ERROR_AT(parse_delimiter) + << "DELIMITER is not a single character"; +} +options->setDelimiter(delimiter.front()); + } else if (key == "escape_strings") { +options->setEscapeStrings(GetKeyValueBool(param)); + } else if (key != "format") { +THROW_SQL_ERROR_AT(¶m) << "Unsupported copy option: " << key; } } -escape_strings_ = params->escape_strings; } return L::CopyFrom::Create(resolveRelationName(copy_from_statement.relation_name()), - copy_from_statement.source_filename()->value(), - column_delimiter_[0], - escape_strings_); + copy_from_statement.file_name()->value(), + BulkIOConfigurationPtr(options.release())); +} + +L::LogicalPtr Resolver::resolveCopyTo( +const ParseStatementCopy ©_to_statement) { + DCHECK(copy_to_statement.getCopyDirection() == ParseStatementCopy::kTo); + const PtrList *params = copy_to_statement.params(); + + // Check if copy format is explicitly specified. + BulkIOFormat file_format = BulkIOFormat::kText; + bool format_specified = false; + if (params != nullptr) { +for (const ParseKeyValue ¶m : *params) { + const std::string &key = ToLower(param.key()->value()); + if (key == "format") { +const ParseString *parse_format = GetKeyValueString(param); +const std::string format = ToLower(parse_format->value()); +if (format == "csv") { + file_format = BulkIOFormat::kCSV; +} else if (format == "text") { + file_format = BulkIOFormat::kText; +} else { + THROW_SQL_ERROR_AT(parse_format) << "Unsupported file format: " << format; +} +format_specified = true; +break; + } +} + } + + const std::string &file_name = copy_to_statement.file_name()->value(); + if (file_name.length() <= 1) { +THROW_SQL_ERROR_AT(copy_to_statement.file_name()) +<< "File name can not be empty"; + } + + // Infer copy format from file name extension. + if (!format_specified) { +if (file_name.length() > 4) { + if (ToLowe
[GitHub] incubator-quickstep pull request #291: Add "COPY TO" operator for exporting ...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/291#discussion_r136198572 --- Diff: query_optimizer/resolver/Resolver.cpp --- @@ -418,27 +455,157 @@ L::LogicalPtr Resolver::resolve(const ParseStatement &parse_query) { } L::LogicalPtr Resolver::resolveCopyFrom( -const ParseStatementCopyFrom ©_from_statement) { - // Default parameters. - std::string column_delimiter_ = "\t"; - bool escape_strings_ = true; +const ParseStatementCopy ©_from_statement) { + DCHECK(copy_from_statement.getCopyDirection() == ParseStatementCopy::kFrom); + const PtrList *params = copy_from_statement.params(); - const ParseCopyFromParams *params = copy_from_statement.params(); + BulkIOFormat file_format = BulkIOFormat::kText; if (params != nullptr) { -if (params->delimiter != nullptr) { - column_delimiter_ = params->delimiter->value(); - if (column_delimiter_.size() != 1) { -THROW_SQL_ERROR_AT(params->delimiter) -<< "DELIMITER is not a single character"; +for (const ParseKeyValue ¶m : *params) { + const std::string &key = ToLower(param.key()->value()); + if (key == "format") { +const ParseString *parse_format = GetKeyValueString(param); +const std::string format = ToLower(parse_format->value()); +// TODO(jianqiao): Support other bulk load formats such as CSV. +if (format != "text") { + THROW_SQL_ERROR_AT(parse_format) << "Unsupported file format: " << format; +} +// Update file_format when other formats get supported. +break; + } +} + } + + std::unique_ptr options = + std::make_unique(file_format); + if (params != nullptr) { +for (const ParseKeyValue ¶m : *params) { + const std::string &key = ToLower(param.key()->value()); + if (key == "delimiter") { +const ParseString *parse_delimiter = GetKeyValueString(param); +const std::string &delimiter = parse_delimiter->value(); +if (delimiter.size() != 1) { + THROW_SQL_ERROR_AT(parse_delimiter) + << "DELIMITER is not a single character"; +} +options->setDelimiter(delimiter.front()); + } else if (key == "escape_strings") { +options->setEscapeStrings(GetKeyValueBool(param)); + } else if (key != "format") { +THROW_SQL_ERROR_AT(¶m) << "Unsupported copy option: " << key; } } -escape_strings_ = params->escape_strings; } return L::CopyFrom::Create(resolveRelationName(copy_from_statement.relation_name()), - copy_from_statement.source_filename()->value(), - column_delimiter_[0], - escape_strings_); + copy_from_statement.file_name()->value(), + BulkIOConfigurationPtr(options.release())); +} + +L::LogicalPtr Resolver::resolveCopyTo( +const ParseStatementCopy ©_to_statement) { + DCHECK(copy_to_statement.getCopyDirection() == ParseStatementCopy::kTo); + const PtrList *params = copy_to_statement.params(); + + // Check if copy format is explicitly specified. + BulkIOFormat file_format = BulkIOFormat::kText; + bool format_specified = false; + if (params != nullptr) { +for (const ParseKeyValue ¶m : *params) { + const std::string &key = ToLower(param.key()->value()); + if (key == "format") { +const ParseString *parse_format = GetKeyValueString(param); +const std::string format = ToLower(parse_format->value()); +if (format == "csv") { + file_format = BulkIOFormat::kCSV; +} else if (format == "text") { + file_format = BulkIOFormat::kText; +} else { + THROW_SQL_ERROR_AT(parse_format) << "Unsupported file format: " << format; +} +format_specified = true; +break; + } +} + } + + const std::string &file_name = copy_to_statement.file_name()->value(); + if (file_name.length() <= 1) { --- End diff -- Updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep pull request #291: Add "COPY TO" operator for exporting ...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/291#discussion_r136199193 --- Diff: relational_operators/TableExportOperator.hpp --- @@ -0,0 +1,268 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + **/ + +#ifndef QUICKSTEP_RELATIONAL_OPERATORS_TABLE_EXPORT_OPERATOR_HPP_ +#define QUICKSTEP_RELATIONAL_OPERATORS_TABLE_EXPORT_OPERATOR_HPP_ + +#include +#include +#include +#include +#include +#include + +#include "catalog/CatalogRelation.hpp" +#include "catalog/CatalogTypedefs.hpp" +#include "query_execution/QueryContext.hpp" +#include "relational_operators/RelationalOperator.hpp" +#include "relational_operators/WorkOrder.hpp" +#include "storage/StorageBlockInfo.hpp" +#include "threading/SpinMutex.hpp" +#include "utility/BulkIOConfiguration.hpp" +#include "utility/Macros.hpp" + +#include "glog/logging.h" + +#include "tmb/id_typedefs.h" + +namespace tmb { class MessageBus; } + +namespace quickstep { + +class CatalogRelationSchema; +class StorageManager; +class ValueAccessor; +class WorkOrderProtosContainer; +class WorkOrdersContainer; + +namespace serialization { class WorkOrder; } + +/** \addtogroup RelationalOperators + * @{ + */ + +class TableExportOperator : public RelationalOperator { + public: + /** + * @brief Feedback message to Foreman when a TableExportToStringWorkOrder has + *completed writing a block to the string buffer. + */ + enum FeedbackMessageType : WorkOrder::FeedbackMessageType { + kBlockOutputMessage, + }; + + /** + * @brief Constructor. + * + * @param query_id The ID of the query to which this operator belongs. + * @param input_relation The relation to export. + * @param input_relation_is_stored If input_relation is a stored relation and + *is fully available to the operator before it can start generating + *workorders. + * @param file_name The name of the file to export the relation to. + * @param options The options that specify the detailed format of the output + *file. + */ + TableExportOperator(const std::size_t query_id, + const CatalogRelation &input_relation, + const bool input_relation_is_stored, + const std::string &file_name, + const BulkIOConfigurationPtr &options) + : RelationalOperator(query_id), +input_relation_(input_relation), +input_relation_is_stored_(input_relation_is_stored), +file_name_(file_name), +options_(options), +input_relation_block_ids_(input_relation_is_stored + ? input_relation.getBlocksSnapshot() + : std::vector()), +num_workorders_generated_(0), +started_(false), +num_blocks_written_(0), +file_(nullptr) {} + + ~TableExportOperator() override {} + + OperatorType getOperatorType() const override { +return kTableExport; + } + + std::string getName() const override { +return "TableExportOperator"; + } + + /** + * @return The relation to export. + */ + const CatalogRelation& input_relation() const { +return input_relation_; + } + + bool getAllWorkOrders(WorkOrdersContainer *container, +QueryContext *query_context, +StorageManager *storage_manager, +const tmb::client_id scheduler_client_id, +tmb::MessageBus *bus) override; + + bool getAllWorkOrderProtos
[GitHub] incubator-quickstep pull request #291: Add "COPY TO" operator for exporting ...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/291#discussion_r136198182 --- Diff: query_optimizer/physical/CopyFrom.hpp --- @@ -68,22 +69,14 @@ class CopyFrom : public Physical { const std::string& file_name() const { return file_name_; } /** - * @return The delimiter used in the text file to separate columns. + * @return The options for this COPY FROM statement. */ - const char column_delimiter() const { return column_delimiter_; } - - /** - * @return Whether to decode escape sequences in the text file. - */ - bool escape_strings() const { return escape_strings_; } + BulkIOConfigurationPtr options() const { return options_; } --- End diff -- Updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep pull request #291: Add "COPY TO" operator for exporting ...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/291#discussion_r136198306 --- Diff: query_optimizer/physical/PhysicalType.hpp --- @@ -34,6 +34,7 @@ namespace physical { enum class PhysicalType { kAggregate, --- End diff -- Updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep pull request #291: Add "COPY TO" operator for exporting ...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/291#discussion_r136198462 --- Diff: query_optimizer/resolver/Resolver.cpp --- @@ -418,27 +455,157 @@ L::LogicalPtr Resolver::resolve(const ParseStatement &parse_query) { } L::LogicalPtr Resolver::resolveCopyFrom( -const ParseStatementCopyFrom ©_from_statement) { - // Default parameters. - std::string column_delimiter_ = "\t"; - bool escape_strings_ = true; +const ParseStatementCopy ©_from_statement) { + DCHECK(copy_from_statement.getCopyDirection() == ParseStatementCopy::kFrom); + const PtrList *params = copy_from_statement.params(); - const ParseCopyFromParams *params = copy_from_statement.params(); + BulkIOFormat file_format = BulkIOFormat::kText; if (params != nullptr) { -if (params->delimiter != nullptr) { - column_delimiter_ = params->delimiter->value(); - if (column_delimiter_.size() != 1) { -THROW_SQL_ERROR_AT(params->delimiter) -<< "DELIMITER is not a single character"; +for (const ParseKeyValue ¶m : *params) { + const std::string &key = ToLower(param.key()->value()); + if (key == "format") { +const ParseString *parse_format = GetKeyValueString(param); +const std::string format = ToLower(parse_format->value()); +// TODO(jianqiao): Support other bulk load formats such as CSV. +if (format != "text") { + THROW_SQL_ERROR_AT(parse_format) << "Unsupported file format: " << format; +} +// Update file_format when other formats get supported. +break; + } +} + } + + std::unique_ptr options = + std::make_unique(file_format); + if (params != nullptr) { +for (const ParseKeyValue ¶m : *params) { + const std::string &key = ToLower(param.key()->value()); --- End diff -- Updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep pull request #291: Add "COPY TO" operator for exporting ...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/291#discussion_r136198547 --- Diff: query_optimizer/resolver/Resolver.cpp --- @@ -418,27 +455,157 @@ L::LogicalPtr Resolver::resolve(const ParseStatement &parse_query) { } L::LogicalPtr Resolver::resolveCopyFrom( -const ParseStatementCopyFrom ©_from_statement) { - // Default parameters. - std::string column_delimiter_ = "\t"; - bool escape_strings_ = true; +const ParseStatementCopy ©_from_statement) { + DCHECK(copy_from_statement.getCopyDirection() == ParseStatementCopy::kFrom); + const PtrList *params = copy_from_statement.params(); - const ParseCopyFromParams *params = copy_from_statement.params(); + BulkIOFormat file_format = BulkIOFormat::kText; if (params != nullptr) { -if (params->delimiter != nullptr) { - column_delimiter_ = params->delimiter->value(); - if (column_delimiter_.size() != 1) { -THROW_SQL_ERROR_AT(params->delimiter) -<< "DELIMITER is not a single character"; +for (const ParseKeyValue ¶m : *params) { + const std::string &key = ToLower(param.key()->value()); + if (key == "format") { +const ParseString *parse_format = GetKeyValueString(param); +const std::string format = ToLower(parse_format->value()); +// TODO(jianqiao): Support other bulk load formats such as CSV. +if (format != "text") { + THROW_SQL_ERROR_AT(parse_format) << "Unsupported file format: " << format; +} +// Update file_format when other formats get supported. +break; + } +} + } + + std::unique_ptr options = + std::make_unique(file_format); + if (params != nullptr) { +for (const ParseKeyValue ¶m : *params) { + const std::string &key = ToLower(param.key()->value()); + if (key == "delimiter") { +const ParseString *parse_delimiter = GetKeyValueString(param); +const std::string &delimiter = parse_delimiter->value(); +if (delimiter.size() != 1) { --- End diff -- Updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep pull request #291: Add "COPY TO" operator for exporting ...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/291#discussion_r136198255 --- Diff: query_optimizer/physical/CopyTo.hpp --- @@ -0,0 +1,147 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + **/ + +#ifndef QUICKSTEP_QUERY_OPTIMIZER_PHYSICAL_COPY_TO_HPP_ +#define QUICKSTEP_QUERY_OPTIMIZER_PHYSICAL_COPY_TO_HPP_ + +#include +#include +#include + +#include "query_optimizer/OptimizerTree.hpp" +#include "query_optimizer/expressions/AttributeReference.hpp" +#include "query_optimizer/physical/Physical.hpp" +#include "query_optimizer/physical/PhysicalType.hpp" +#include "utility/BulkIOConfiguration.hpp" +#include "utility/Macros.hpp" + +#include "glog/logging.h" + +namespace quickstep { +namespace optimizer { +namespace physical { + +/** \addtogroup OptimizerPhysical + * @{ + */ + +class CopyTo; +typedef std::shared_ptr CopyToPtr; + +/** + * @brief Represents an operation that copies data from a relation to a text file. + */ +class CopyTo : public Physical { + public: + PhysicalType getPhysicalType() const override { +return PhysicalType::kCopyTo; + } + + std::string getName() const override { +return "CopyTo"; + } + + /** + * @return The input relation whose data is to be exported. + */ + const PhysicalPtr& input() const { +return input_; + } + + /** + * @return The name of the file to write the data to. + */ + const std::string& file_name() const { +return file_name_; + } + + /** + * @return The options for this COPY TO statement. + */ + BulkIOConfigurationPtr options() const { --- End diff -- Updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep pull request #291: Add "COPY TO" operator for exporting ...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/291#discussion_r136199940 --- Diff: utility/ExecutionDAGVisualizer.cpp --- @@ -55,12 +55,15 @@ using std::to_string; namespace quickstep { DEFINE_bool(visualize_execution_dag_partition_info, false, -"If true, display the operator partition info in the visualized execution plan DAG." -"Valid iif 'visualize_execution_dag' turns on."); +"If true, display the operator partition info in the visualized " +"execution plan DAG. Valid if 'visualize_execution_dag' turns on."); --- End diff -- Updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep pull request #291: Add "COPY TO" operator for exporting ...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/291#discussion_r136199148 --- Diff: relational_operators/TableExportOperator.hpp --- @@ -0,0 +1,268 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + **/ + +#ifndef QUICKSTEP_RELATIONAL_OPERATORS_TABLE_EXPORT_OPERATOR_HPP_ +#define QUICKSTEP_RELATIONAL_OPERATORS_TABLE_EXPORT_OPERATOR_HPP_ + +#include +#include +#include +#include +#include +#include + +#include "catalog/CatalogRelation.hpp" +#include "catalog/CatalogTypedefs.hpp" +#include "query_execution/QueryContext.hpp" +#include "relational_operators/RelationalOperator.hpp" +#include "relational_operators/WorkOrder.hpp" +#include "storage/StorageBlockInfo.hpp" +#include "threading/SpinMutex.hpp" +#include "utility/BulkIOConfiguration.hpp" +#include "utility/Macros.hpp" + +#include "glog/logging.h" + +#include "tmb/id_typedefs.h" + +namespace tmb { class MessageBus; } + +namespace quickstep { + +class CatalogRelationSchema; +class StorageManager; +class ValueAccessor; +class WorkOrderProtosContainer; +class WorkOrdersContainer; + +namespace serialization { class WorkOrder; } + +/** \addtogroup RelationalOperators + * @{ + */ + +class TableExportOperator : public RelationalOperator { + public: + /** + * @brief Feedback message to Foreman when a TableExportToStringWorkOrder has + *completed writing a block to the string buffer. + */ + enum FeedbackMessageType : WorkOrder::FeedbackMessageType { + kBlockOutputMessage, + }; + + /** + * @brief Constructor. + * + * @param query_id The ID of the query to which this operator belongs. + * @param input_relation The relation to export. + * @param input_relation_is_stored If input_relation is a stored relation and + *is fully available to the operator before it can start generating + *workorders. + * @param file_name The name of the file to export the relation to. + * @param options The options that specify the detailed format of the output + *file. + */ + TableExportOperator(const std::size_t query_id, + const CatalogRelation &input_relation, + const bool input_relation_is_stored, + const std::string &file_name, + const BulkIOConfigurationPtr &options) + : RelationalOperator(query_id), +input_relation_(input_relation), +input_relation_is_stored_(input_relation_is_stored), +file_name_(file_name), +options_(options), +input_relation_block_ids_(input_relation_is_stored + ? input_relation.getBlocksSnapshot() + : std::vector()), +num_workorders_generated_(0), +started_(false), +num_blocks_written_(0), +file_(nullptr) {} + + ~TableExportOperator() override {} + + OperatorType getOperatorType() const override { +return kTableExport; + } + + std::string getName() const override { +return "TableExportOperator"; + } + + /** + * @return The relation to export. + */ + const CatalogRelation& input_relation() const { +return input_relation_; + } + + bool getAllWorkOrders(WorkOrdersContainer *container, +QueryContext *query_context, +StorageManager *storage_manager, +const tmb::client_id scheduler_client_id, +tmb::MessageBus *bus) override; + + bool getAllWorkOrderProtos
[GitHub] incubator-quickstep pull request #291: Add "COPY TO" operator for exporting ...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/291#discussion_r136199047 --- Diff: query_optimizer/resolver/Resolver.cpp --- @@ -1595,6 +1742,19 @@ void Resolver::appendProjectIfNeedPrecomputationAfterAggregation( } } +void Resolver::reportIfWithClauseUnused( +const PtrVector &with_list) const { + if (!with_queries_info_.unreferenced_query_indexes.empty()) { +int unreferenced_with_query_index = *with_queries_info_.unreferenced_query_indexes.begin(); --- End diff -- Updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep pull request #291: Add "COPY TO" operator for exporting ...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/291#discussion_r136180038 --- Diff: parser/ParseStatement.hpp --- @@ -898,24 +916,53 @@ class ParseStatementCopyFrom : public ParseStatement { std::vector *non_container_child_fields, std::vector *container_child_field_names, std::vector> *container_child_fields) const override { -inline_field_names->push_back("relation_name"); -inline_field_values->push_back(relation_name_->value()); +inline_field_names->push_back("direction"); +inline_field_values->push_back(direction_ == kFrom ? "FROM" : "TO"); -inline_field_names->push_back("source_file"); -inline_field_values->push_back(source_filename_->value()); +inline_field_names->push_back("file"); +inline_field_values->push_back(file_name_->value()); + +if (relation_name_ != nullptr) { + inline_field_names->push_back("relation_name"); + inline_field_values->push_back(relation_name_->value()); +} + +if (set_operation_query_ != nullptr) { + non_container_child_field_names->push_back("set_operation_query"); + non_container_child_fields->push_back(set_operation_query_.get()); +} + +if (with_clause_ != nullptr && !with_clause_->empty()) { + container_child_field_names->push_back("with_clause"); + container_child_fields->emplace_back(); + for (const ParseSubqueryTableReference &common_subquery : *with_clause_) { +container_child_fields->back().push_back(&common_subquery); + } +} if (params_ != nullptr) { - non_container_child_field_names->push_back("params"); - non_container_child_fields->push_back(params_.get()); + container_child_field_names->push_back("params"); + container_child_fields->emplace_back(); + for (const ParseKeyValue ¶m : *params_) { +container_child_fields->back().push_back(¶m); + } } } private: + const CopyDirection direction_; + + // NOTE(jianqiao): + // (1) Either relation_name_ or set_operation_query_ has non-null value. + // (2) set_operation_query_ must be null for COPY FROM statement. --- End diff -- Yes there are two different constructors for each situation. Calling either constructor will enforce the constraints. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep pull request #291: Add "COPY TO" operator for exporting ...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/291#discussion_r136198676 --- Diff: query_optimizer/resolver/Resolver.cpp --- @@ -418,27 +455,157 @@ L::LogicalPtr Resolver::resolve(const ParseStatement &parse_query) { } L::LogicalPtr Resolver::resolveCopyFrom( -const ParseStatementCopyFrom ©_from_statement) { - // Default parameters. - std::string column_delimiter_ = "\t"; - bool escape_strings_ = true; +const ParseStatementCopy ©_from_statement) { + DCHECK(copy_from_statement.getCopyDirection() == ParseStatementCopy::kFrom); + const PtrList *params = copy_from_statement.params(); - const ParseCopyFromParams *params = copy_from_statement.params(); + BulkIOFormat file_format = BulkIOFormat::kText; if (params != nullptr) { -if (params->delimiter != nullptr) { - column_delimiter_ = params->delimiter->value(); - if (column_delimiter_.size() != 1) { -THROW_SQL_ERROR_AT(params->delimiter) -<< "DELIMITER is not a single character"; +for (const ParseKeyValue ¶m : *params) { + const std::string &key = ToLower(param.key()->value()); + if (key == "format") { +const ParseString *parse_format = GetKeyValueString(param); +const std::string format = ToLower(parse_format->value()); +// TODO(jianqiao): Support other bulk load formats such as CSV. +if (format != "text") { + THROW_SQL_ERROR_AT(parse_format) << "Unsupported file format: " << format; +} +// Update file_format when other formats get supported. +break; + } +} + } + + std::unique_ptr options = + std::make_unique(file_format); + if (params != nullptr) { +for (const ParseKeyValue ¶m : *params) { + const std::string &key = ToLower(param.key()->value()); + if (key == "delimiter") { +const ParseString *parse_delimiter = GetKeyValueString(param); +const std::string &delimiter = parse_delimiter->value(); +if (delimiter.size() != 1) { + THROW_SQL_ERROR_AT(parse_delimiter) + << "DELIMITER is not a single character"; +} +options->setDelimiter(delimiter.front()); + } else if (key == "escape_strings") { +options->setEscapeStrings(GetKeyValueBool(param)); + } else if (key != "format") { +THROW_SQL_ERROR_AT(¶m) << "Unsupported copy option: " << key; } } -escape_strings_ = params->escape_strings; } return L::CopyFrom::Create(resolveRelationName(copy_from_statement.relation_name()), - copy_from_statement.source_filename()->value(), - column_delimiter_[0], - escape_strings_); + copy_from_statement.file_name()->value(), + BulkIOConfigurationPtr(options.release())); +} + +L::LogicalPtr Resolver::resolveCopyTo( +const ParseStatementCopy ©_to_statement) { + DCHECK(copy_to_statement.getCopyDirection() == ParseStatementCopy::kTo); + const PtrList *params = copy_to_statement.params(); + + // Check if copy format is explicitly specified. + BulkIOFormat file_format = BulkIOFormat::kText; + bool format_specified = false; + if (params != nullptr) { +for (const ParseKeyValue ¶m : *params) { + const std::string &key = ToLower(param.key()->value()); + if (key == "format") { +const ParseString *parse_format = GetKeyValueString(param); +const std::string format = ToLower(parse_format->value()); +if (format == "csv") { + file_format = BulkIOFormat::kCSV; +} else if (format == "text") { + file_format = BulkIOFormat::kText; +} else { + THROW_SQL_ERROR_AT(parse_format) << "Unsupported file format: " << format; +} +format_specified = true; +break; + } +} + } + + const std::string &file_name = copy_to_statement.file_name()->value(); + if (file_name.length() <= 1) { +THROW_SQL_ERROR_AT(copy_to_statement.file_name()) +<< "File name can not be empty"; + } + + // Infer copy format from file name extension. + if (!format_specified) { +if (file_name.length() > 4) { + if (ToLowe
[GitHub] incubator-quickstep pull request #291: Add "COPY TO" operator for exporting ...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/291#discussion_r136199560 --- Diff: utility/BulkIOConfiguration.hpp --- @@ -0,0 +1,198 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + **/ + +#ifndef QUICKSTEP_UTILITY_BULK_IO_CONFIGURATION_HPP_ +#define QUICKSTEP_UTILITY_BULK_IO_CONFIGURATION_HPP_ + +#include +#include + +#include "utility/Macros.hpp" + +#include "glog/logging.h" + +namespace quickstep { + +/** + * @brief External file format for bulk I/O. + */ +enum class BulkIOFormat { + kCSV, --- End diff -- Updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep pull request #291: Add "COPY TO" operator for exporting ...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/291#discussion_r136199785 --- Diff: utility/BulkIOConfiguration.hpp --- @@ -0,0 +1,198 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + **/ + +#ifndef QUICKSTEP_UTILITY_BULK_IO_CONFIGURATION_HPP_ +#define QUICKSTEP_UTILITY_BULK_IO_CONFIGURATION_HPP_ + +#include +#include + +#include "utility/Macros.hpp" + +#include "glog/logging.h" + +namespace quickstep { + +/** + * @brief External file format for bulk I/O. + */ +enum class BulkIOFormat { + kCSV, + kText +}; + +class BulkIOConfiguration; +typedef std::shared_ptr BulkIOConfigurationPtr; + +/** + * @brief Detailed file format configuration for bulk I/O (i.e. COPY operations) + *that moves data between Quickstep tables and external files. + */ +class BulkIOConfiguration { --- End diff -- Updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep pull request #291: Add "COPY TO" operator for exporting ...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/291#discussion_r136199029 --- Diff: query_optimizer/resolver/Resolver.cpp --- @@ -418,27 +455,157 @@ L::LogicalPtr Resolver::resolve(const ParseStatement &parse_query) { } L::LogicalPtr Resolver::resolveCopyFrom( -const ParseStatementCopyFrom ©_from_statement) { - // Default parameters. - std::string column_delimiter_ = "\t"; - bool escape_strings_ = true; +const ParseStatementCopy ©_from_statement) { + DCHECK(copy_from_statement.getCopyDirection() == ParseStatementCopy::kFrom); + const PtrList *params = copy_from_statement.params(); - const ParseCopyFromParams *params = copy_from_statement.params(); + BulkIOFormat file_format = BulkIOFormat::kText; if (params != nullptr) { -if (params->delimiter != nullptr) { - column_delimiter_ = params->delimiter->value(); - if (column_delimiter_.size() != 1) { -THROW_SQL_ERROR_AT(params->delimiter) -<< "DELIMITER is not a single character"; +for (const ParseKeyValue ¶m : *params) { + const std::string &key = ToLower(param.key()->value()); + if (key == "format") { +const ParseString *parse_format = GetKeyValueString(param); +const std::string format = ToLower(parse_format->value()); +// TODO(jianqiao): Support other bulk load formats such as CSV. +if (format != "text") { + THROW_SQL_ERROR_AT(parse_format) << "Unsupported file format: " << format; +} +// Update file_format when other formats get supported. +break; + } +} + } + + std::unique_ptr options = + std::make_unique(file_format); + if (params != nullptr) { +for (const ParseKeyValue ¶m : *params) { + const std::string &key = ToLower(param.key()->value()); + if (key == "delimiter") { +const ParseString *parse_delimiter = GetKeyValueString(param); +const std::string &delimiter = parse_delimiter->value(); +if (delimiter.size() != 1) { + THROW_SQL_ERROR_AT(parse_delimiter) + << "DELIMITER is not a single character"; +} +options->setDelimiter(delimiter.front()); + } else if (key == "escape_strings") { +options->setEscapeStrings(GetKeyValueBool(param)); + } else if (key != "format") { +THROW_SQL_ERROR_AT(¶m) << "Unsupported copy option: " << key; } } -escape_strings_ = params->escape_strings; } return L::CopyFrom::Create(resolveRelationName(copy_from_statement.relation_name()), - copy_from_statement.source_filename()->value(), - column_delimiter_[0], - escape_strings_); + copy_from_statement.file_name()->value(), + BulkIOConfigurationPtr(options.release())); +} + +L::LogicalPtr Resolver::resolveCopyTo( +const ParseStatementCopy ©_to_statement) { + DCHECK(copy_to_statement.getCopyDirection() == ParseStatementCopy::kTo); + const PtrList *params = copy_to_statement.params(); + + // Check if copy format is explicitly specified. + BulkIOFormat file_format = BulkIOFormat::kText; + bool format_specified = false; + if (params != nullptr) { +for (const ParseKeyValue ¶m : *params) { + const std::string &key = ToLower(param.key()->value()); + if (key == "format") { +const ParseString *parse_format = GetKeyValueString(param); +const std::string format = ToLower(parse_format->value()); +if (format == "csv") { + file_format = BulkIOFormat::kCSV; +} else if (format == "text") { + file_format = BulkIOFormat::kText; +} else { + THROW_SQL_ERROR_AT(parse_format) << "Unsupported file format: " << format; +} +format_specified = true; +break; + } +} + } + + const std::string &file_name = copy_to_statement.file_name()->value(); + if (file_name.length() <= 1) { +THROW_SQL_ERROR_AT(copy_to_statement.file_name()) +<< "File name can not be empty"; + } + + // Infer copy format from file name extension. + if (!format_specified) { +if (file_name.length() > 4) { + if (ToLowe
[GitHub] incubator-quickstep pull request #291: Add "COPY TO" operator for exporting ...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/291#discussion_r136178749 --- Diff: parser/ParseStatement.hpp --- @@ -60,16 +60,16 @@ class ParseStatement : public ParseTreeNode { * @brief The possible types of SQL statements. **/ enum StatementType { -kCreateTable, +kCommand, --- End diff -- Updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep pull request #291: Add "COPY TO" operator for exporting ...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/291#discussion_r136175863 --- Diff: parser/ParseKeyValue.hpp --- @@ -37,14 +37,15 @@ namespace quickstep { */ class ParseKeyValue : public ParseTreeNode { public: - enum class KeyValueType { + enum KeyValueType { +kStringBool, --- End diff -- Updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep pull request #291: Add "COPY TO" operator for exporting ...
Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/291#discussion_r136198661 --- Diff: query_optimizer/resolver/Resolver.cpp --- @@ -418,27 +455,157 @@ L::LogicalPtr Resolver::resolve(const ParseStatement &parse_query) { } L::LogicalPtr Resolver::resolveCopyFrom( -const ParseStatementCopyFrom ©_from_statement) { - // Default parameters. - std::string column_delimiter_ = "\t"; - bool escape_strings_ = true; +const ParseStatementCopy ©_from_statement) { + DCHECK(copy_from_statement.getCopyDirection() == ParseStatementCopy::kFrom); + const PtrList *params = copy_from_statement.params(); - const ParseCopyFromParams *params = copy_from_statement.params(); + BulkIOFormat file_format = BulkIOFormat::kText; if (params != nullptr) { -if (params->delimiter != nullptr) { - column_delimiter_ = params->delimiter->value(); - if (column_delimiter_.size() != 1) { -THROW_SQL_ERROR_AT(params->delimiter) -<< "DELIMITER is not a single character"; +for (const ParseKeyValue ¶m : *params) { + const std::string &key = ToLower(param.key()->value()); + if (key == "format") { +const ParseString *parse_format = GetKeyValueString(param); +const std::string format = ToLower(parse_format->value()); +// TODO(jianqiao): Support other bulk load formats such as CSV. +if (format != "text") { + THROW_SQL_ERROR_AT(parse_format) << "Unsupported file format: " << format; +} +// Update file_format when other formats get supported. +break; + } +} + } + + std::unique_ptr options = + std::make_unique(file_format); + if (params != nullptr) { +for (const ParseKeyValue ¶m : *params) { + const std::string &key = ToLower(param.key()->value()); + if (key == "delimiter") { +const ParseString *parse_delimiter = GetKeyValueString(param); +const std::string &delimiter = parse_delimiter->value(); +if (delimiter.size() != 1) { + THROW_SQL_ERROR_AT(parse_delimiter) + << "DELIMITER is not a single character"; +} +options->setDelimiter(delimiter.front()); + } else if (key == "escape_strings") { +options->setEscapeStrings(GetKeyValueBool(param)); + } else if (key != "format") { +THROW_SQL_ERROR_AT(¶m) << "Unsupported copy option: " << key; } } -escape_strings_ = params->escape_strings; } return L::CopyFrom::Create(resolveRelationName(copy_from_statement.relation_name()), - copy_from_statement.source_filename()->value(), - column_delimiter_[0], - escape_strings_); + copy_from_statement.file_name()->value(), + BulkIOConfigurationPtr(options.release())); +} + +L::LogicalPtr Resolver::resolveCopyTo( +const ParseStatementCopy ©_to_statement) { + DCHECK(copy_to_statement.getCopyDirection() == ParseStatementCopy::kTo); + const PtrList *params = copy_to_statement.params(); + + // Check if copy format is explicitly specified. + BulkIOFormat file_format = BulkIOFormat::kText; + bool format_specified = false; + if (params != nullptr) { +for (const ParseKeyValue ¶m : *params) { + const std::string &key = ToLower(param.key()->value()); + if (key == "format") { +const ParseString *parse_format = GetKeyValueString(param); +const std::string format = ToLower(parse_format->value()); +if (format == "csv") { + file_format = BulkIOFormat::kCSV; +} else if (format == "text") { + file_format = BulkIOFormat::kText; +} else { + THROW_SQL_ERROR_AT(parse_format) << "Unsupported file format: " << format; +} +format_specified = true; +break; + } +} + } + + const std::string &file_name = copy_to_statement.file_name()->value(); + if (file_name.length() <= 1) { +THROW_SQL_ERROR_AT(copy_to_statement.file_name()) +<< "File name can not be empty"; + } + + // Infer copy format from file name extension. + if (!format_specified) { +if (file_name.length() > 4) { --- End diff --
[GitHub] incubator-quickstep pull request #291: Add "COPY TO" operator for exporting ...
GitHub user jianqiao opened a pull request: https://github.com/apache/incubator-quickstep/pull/291 Add "COPY TO" operator for exporting data from Quickstep. This PR adds support for the "COPY TO" statement for exporting tables from Quickstep. Two formats `TEXT` and `CSV` are supported. The current available copy options are: - `FORMAT`: Format of the output file, either `TEXT` or `CSV`. - `DELIMITER`: Separator character of the fields. - `HEADER`: Whether to add table header. For `CSV` format only. - `QUOTE`: The quote character. For `CSV` format only. - `ESCAPE_STRINGS`: Whether to escape special characters. For `TEXT` format only. - `NULL_STRING`: The string representation of the `NULL` value. See the example queries and results [here](https://github.com/apache/incubator-quickstep/blob/a036acb446f137fea263ae218ef12f337f5bc1a1/query_optimizer/tests/execution_generator/Copy.test). Note that some convenient features are also provided: - Export the result table from a query. ``` -- (1) -- COPY SELECT x FROM r TO 'data.txt'; -- (2) -- WITH s(v) AS ( SELECT MIN(y) FROM r GROUP BY x ) COPY SELECT AVG(v) FROM s TO 'results.csv'; ``` - Print to standard output/error stream, e.g. ``` -- (1) -- COPY r TO stdout; -- (2) -- COPY SELECT x FROM r TO stderr; ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/incubator-quickstep copy-to Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-quickstep/pull/291.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #291 commit a036acb446f137fea263ae218ef12f337f5bc1a1 Author: Jianqiao Zhu Date: 2017-08-04T21:49:45Z Add "COPY TO" operator for exporting data from Quickstep. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep issue #289: Minor refactored SortMergeRunOperator.
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/289 LGTM. Merging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep issue #290: Fixed the bug that missed assigning 'num_par...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/290 LGTM. Merging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep issue #282: QUICKSTEP-92: Improved ExecutionDAGVisualize...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/282 LGTM! Merging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep issue #280: Removed an unnecessary API in RelationalOper...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/280 LGTM! Merging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep issue #279: Applied WorkOrderSelectionPolicy.
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/279 I think the PR looks good and the code structure is convenient for further adjustment. So merge it now so that @zuyu can continue the work on partition-aware scheduling. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep pull request #277: Fix a bug with min/max aggregation.
GitHub user jianqiao opened a pull request: https://github.com/apache/incubator-quickstep/pull/277 Fix a bug with min/max aggregation. This PR fixes the bug with min/max aggregation (that produces incorrect results) when the aggregated column is stored with **leading-zero-truncation compression**. The bug happens because the untyped value (i.e. a `const void *` pointer) obtained from the leading-zero-truncation compressed column's corresponding `ValueAccessor` is [the address of a temporary buffer](https://github.com/apache/incubator-quickstep/blob/master/storage/CompressedColumnStoreValueAccessor.hpp#L112) -- i.e. the value pointer becomes invalid when the next `accessor->next()` gets called. [Keeping the value pointer](https://github.com/apache/incubator-quickstep/blob/master/types/operations/comparisons/LiteralComparators-inl.hpp#L546) across [multiple iterations of](https://github.com/apache/incubator-quickstep/blob/master/types/operations/comparisons/LiteralComparators-inl.hpp#L568) accessor->next() causes the problem. The fix is to copy the value pointer's underlying value to a local variable (so that the value remains valid across iterations of accessor->next()). You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/incubator-quickstep fix-compare-aggregate Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-quickstep/pull/277.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #277 commit f3b792bf532aad41724b91cf88c97fcfbbdc1ea9 Author: Jianqiao Zhu Date: 2017-08-02T20:47:30Z Fix the bug with min/max aggregation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep issue #276: Fixed the check failure if a query does not ...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/276 LGTM! Merging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep issue #274: Determine #InitPartitions for CollisionFreeV...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/274 LGTM! Merging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-quickstep issue #273: Determine #Partitions for Aggr State Hash Ta...
Github user jianqiao commented on the issue: https://github.com/apache/incubator-quickstep/pull/273 LGTM! Merging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---