Re: Drill: Memory Spilling for the Hash Aggregate Operator
The attachment didn’t come through. I’m hoping that you settled on a “hybrid” hash algorithm that can write to disk, or write to memory, and the cost of discovering that is wrong is not too great. With Goetz Graefe’s hybrid hash join (which can be easily adapted to hybrid hash aggregate) if the input ALMOST fits in memory you could process most of it in memory, then revisit the stuff you spilled to disk. > On Jan 13, 2017, at 7:46 PM, Boaz Ben-Zviwrote: > > Hi Drill developers, > > Attached is a document describing the design for memory spilling > implementation for the Hash Aggregate operator. > > Please send me any comments or questions, > > -- Boaz
[GitHub] drill pull request #721: DRILL-5172: Display elapsed time for queries in the...
Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/721 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #696: DRILL-4558: BSonReader should prepare buffer size a...
Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/696 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #695: DRILL-4868: fix how hive function set DrillBuf.
Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/695 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #695: DRILL-4868: fix how hive function set DrillBuf.
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/695 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #696: DRILL-4558: BSonReader should prepare buffer size as actua...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/696 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #687: DRILL-4996: Parquet Date auto-correction is not wor...
Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/687 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #714: DRILL-4919: Fix select count(1) / count(*) on csv w...
Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/714 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #708: DRILL-5152: Enhance the mock data source: better da...
Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/708 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #715: DRILL-5105: remove buffer size checking in getBuffe...
Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/715 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #714: DRILL-4919: Fix select count(1) / count(*) on csv with hea...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/714 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #715: DRILL-5105: remove buffer size checking in getBuffers sinc...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/715 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #687: DRILL-4996: Parquet Date auto-correction is not working in...
Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/687 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (DRILL-5196) Could not run a single MongoDB unit test case through command line or IDE
Chunhui Shi created DRILL-5196: -- Summary: Could not run a single MongoDB unit test case through command line or IDE Key: DRILL-5196 URL: https://issues.apache.org/jira/browse/DRILL-5196 Project: Apache Drill Issue Type: Bug Reporter: Chunhui Shi Assignee: Chunhui Shi Could not run a single MongoDB's unit test through IDE or command line. The reason is when running a single test case, the MongoDB instance did not get started thus a 'table not found' error for 'mongo.employee.empinfo' would be raised. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-5195) Publish Operator and MajorFragment Stats in Profile page
Kunal Khatua created DRILL-5195: --- Summary: Publish Operator and MajorFragment Stats in Profile page Key: DRILL-5195 URL: https://issues.apache.org/jira/browse/DRILL-5195 Project: Apache Drill Issue Type: Improvement Components: Web Server Affects Versions: 1.9.0 Reporter: Kunal Khatua Assignee: Kunal Khatua Currently, we show runtimes for major fragments, and min,max,avg times for setup, processing and waiting for various operators. It would be worthwhile to have additional stats for the following: MajorFragment %Busy - % of the active time for all the minor fragments within each major fragment that they were busy. Operator Profile %Busy - % of the active time for all the fragments within each operator that they were busy. Records - Total number of records propagated out by that operator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] drill pull request #696: DRILL-4558: BSonReader should prepare buffer size a...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/696#discussion_r96048977 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/bson/TestBsonRecordReader.java --- @@ -103,6 +103,17 @@ public void testStringType() throws IOException { } @Test + public void testSpecialCharStringType() throws IOException { +BsonDocument bsonDoc = new BsonDocument(); +bsonDoc.append("stringKey", new BsonString("§§§§§§§§§1")); --- End diff -- Might want to put your value in a string, convert to byte but, and assert that the byte buf length differs in length from the string length. This will verify that this test is, in fact, testing the problem case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #714: DRILL-4919: Fix select count(1) / count(*) on csv with hea...
Github user gparai commented on the issue: https://github.com/apache/drill/pull/714 +1 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #685: Drill 5043: Function that returns a unique id per session/...
Github user nagarajanchinnasamy commented on the issue: https://github.com/apache/drill/pull/685 Sent a mail to Dev list with the findings... issue yet to be resolved: [Link to the mail](http://mail-archives.apache.org/mod_mbox/drill-dev/201701.mbox/%3ccacd9g6hwrtfckd_2jhmrext_tb0zwcrxtb7j5digkmy9atk...@mail.gmail.com%3E) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---