I have played with the plugin as well today and overall its very good. I tried the queries http://docs.mongodb.org/manual/tutorial/aggregation-zip-code-data-set/ on the zip code dataset and all the aggregate queries worked.
----------- select sum(pop) from zipcodes where city='SEATTLE’; select state, city, sum(pop) from zipcodes group by state,city order by sum(pop) asc limit 1; select state,city,avg(pop) from zipcodes group by state, city; select city, sum(pop) from zipcodes group by city order by sum(pop) asc limit 1; select state,sum(pop) from zipcodes group by state having sum(pop) > 10000000; ---------- I however noticed issues with querying repeating elements (used USDA nutrition dataset), especially more than one level nested as well as JOINs (example queries are below) ------------------ 0: jdbc:drill:zk=local> SELECT t1.first_name FROM mongo.employee.`empinfo` t1 JOIN mongo.employee.`empinfo` t2 ON t1.`employee_id` = t2.`employee_id`; Query failed: Failure while setting up Foreman. Internal error: Error while applying rule DrillPushProjIntoScan, args [rel#12606:ProjectRel.NONE.ANY([]).[](child=rel#12598:Subset#0.ENUMERABLE.ANY([]).[],employee_id=$1,first_name=$2), rel#12594:EnumerableTableAccessRel.ENUMERABLE.ANY([]).[](table=[mongo, employee, empinfo])] [08f4eedd-f5c9-4ebf-8d5b-d9249b79ca32] 0: jdbc:drill:zk=local> select t.nutrients from mongo.usda.nutrition t limit 1; Query failed: Screen received stop request sent. You tried to write a BigInt type when you are using a ValueWriter of type NullableFloat8WriterImpl. [dc44e277-1b1d-4f00-b60e-9f06b883e7c5] Error: exception while executing query: Failure while trying to get next result batch. (state=,code=0) 0: jdbc:drill:zk=local> select t.nutrients[0].units from mongo.usda.nutrition t limit 1; Query failed: Screen received stop request sent. You tried to write a BigInt type when you are using a ValueWriter of type NullableFloat8WriterImpl. [a285c85e-4607-48fc-97af-41b5726459e2] Error: exception while executing query: Failure while trying to get next result batch. (state=,code=0) On Fri, Sep 26, 2014 at 6:07 PM, Jinfeng Ni <j...@maprtech.com> wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/25996/#review54756 > ----------------------------------------------------------- > > Ship it! > > > I did not do a detail code review; let that task to Steven. I mainly > played with this Mongo plugin. Overall it looks good. > > Basically, I start a mongodb instance, import the data, and run several > single table queryies, and all of them work perfectly. > > Some issues I saw when playing around : > > 1. The result of select * seems not the expect answer : it would return a > map containing all the columns: > > SELECT * FROM mongo.employee.`empinfo` limit 2; > +------------+ > | * | > +------------+ > | { "employee_id" : 1101 , "full_name" : "Steve Eurich" , "first_name" : > "Steve" , "last_name" : "Eurich" , "position_id" : 16 , "position" : "Store > T" , "isFTE" : true} | > | { "employee_id" : 1102 , "full_name" : "Mary Pierson" , "first_name" : > "Mary" , "last_name" : "Pierson" , "position_id" : 16 , "position" : "Store > T" , "isFTE" : true} | > +------------+ > 2 rows selected (0.084 seconds) > > In contrast, here is the result when Drill queries a .json file: > > select * from cp.`employee.json` limit 2; > > +-------------+------------+------------+------------+-------------+----------------+------------+---------------+------------+------------+------------+---------------+-----------------+----------------+------------+-----------------+ > | employee_id | full_name | first_name | last_name | position_id | > position_title | store_id | department_id | birth_date | hire_date | > salary | supervisor_id | education_level | marital_status | gender | > management_role | > > +-------------+------------+------------+------------+-------------+----------------+------------+---------------+------------+------------+------------+---------------+-----------------+----------------+------------+-----------------+ > | 1 | Sheri Nowmer | Sheri | Nowmer | 1 | > President | 0 | 1 | 1961-08-26 | 1994-12-01 > 00:00:00.0 | 80000.0 | 0 | Graduate Degree | S > | F | Senior Management | > | 2 | Derrick Whelply | Derrick | Whelply | 2 | > VP Country Manager | 0 | 1 | 1915-07-03 | 1994-12-01 > 00:00:00.0 | 40000.0 | 1 | Graduate Degree | M > | M | Senior Management | > > +-------------+------------+------------+------------+-------------+----------------+------------+---------------+------------+------------+------------+---------------+-----------------+----------------+------------+-----------------+ > 2 rows selected (0.39 seconds) > > > 2. Join two mongodb tables would fail. > > SELECT t1.first_name, t2.last_name FROM mongo.employee.`empinfo` t1, > mongo.employee.`empinfo` t2 where t1.`employee_id` = t2.`employee_id` limit > 1; > Query failed: Failure while setting up Foreman. Internal error: while > converting `t1`.`employee_id` = `t2`.`employee_id` > [39eb6c88-fd21-4514-8903-48d99210b88d] > > 3. Join a mongodb table with a table with other storage engine would fail > with CanNotPlanException: > > SELECT t1.first_name, t2.last_name FROM mongo.employee.`empinfo` t1, > mongo.employee.`empinfo` t2 where t1.`employee_id` = t2.`employee_id` limit > 1; > Query failed: Failure while setting up Foreman. Internal error: while > converting `t1`.`employee_id` = `t2`.`employee_id` > [39eb6c88-fd21-4514-8903-48d99210b88d] > > Error: exception while executing query: Failure while trying to get next > result batch. (state=,code=0) > 0: jdbc:drill:zk=local> SELECT t1.first_name, t1.last_name FROM > mongo.employee.`empinfo` as t1, cp.`employee.json` t2 where t1.employee_id > = t2.employee_id limit 10; > Query failed: Failure while parsing sql. Node > [rel#2496:Subset#5.LOGICAL.ANY([]).[]] could not be implemented; planner > state: > > Root: rel#2496:Subset#5.LOGICAL.ANY([]).[] > Original rel: > ...... > > 4. Select *, regular_column from mongodb would return the regular_column > as null. > > 0: jdbc:drill:zk=local> SELECT first_name FROM mongo.employee.`empinfo` > limit 2; > +------------+ > | first_name | > +------------+ > | Steve | > | Mary | > +------------+ > 2 rows selected (0.084 seconds) > 0: jdbc:drill:zk=local> SELECT *, first_name FROM mongo.employee.`empinfo` > limit 2; > +------------+------------+ > | * | first_name | > +------------+------------+ > | { "employee_id" : 1101 , "full_name" : "Steve Eurich" , "first_name" : > "Steve" , "last_name" : "Eurich" , "position_id" : 16 , "position" : "Store > T" , "isFTE" : true} | null | > | { "employee_id" : 1102 , "full_name" : "Mary Pierson" , "first_name" : > "Mary" , "last_name" : "Pierson" , "position_id" : 16 , "position" : "Store > T" , "isFTE" : true} | null | > +------------+------------+ > > > > I think it would be fine to fix those issues in the next release. > > > PS: could you please re-build a patch after rebasing on the recent master > branch? > > - Jinfeng Ni > > > On Sept. 24, 2014, 11:06 a.m., Anil Kumar B wrote: > > > > ----------------------------------------------------------- > > This is an automatically generated e-mail. To reply, visit: > > https://reviews.apache.org/r/25996/ > > ----------------------------------------------------------- > > > > (Updated Sept. 24, 2014, 11:06 a.m.) > > > > > > Review request for drill, Aditya Kishore, Jacques Nadeau, and Kamesh B. > > > > > > Repository: drill-git > > > > > > Description > > ------- > > > > Mongo storage plugin support: The features which we implemented as part > of this is as follows. > > 1) Support for sharded(chunk wise), shared-replicated(chunk wise), > replicated, stand-alone > > 2) Predicate pushdown > > 3) Mongo PStore > > > > MongoRecordReader uses JsonReaderWithState in the case of non-star > queries. > > > > > > Diffs > > ----- > > > > contrib/pom.xml 728038a > > contrib/storage-mongo/pom.xml PRE-CREATION > > > > contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/DrillMongoConstants.java > PRE-CREATION > > > > contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoCnxnManager.java > PRE-CREATION > > > > contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoCompareFunctionProcessor.java > PRE-CREATION > > > > contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoFilterBuilder.java > PRE-CREATION > > > > contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoGroupScan.java > PRE-CREATION > > > > contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoPushDownFilterForScan.java > PRE-CREATION > > > > contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoRecordReader.java > PRE-CREATION > > > > contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoScanBatchCreator.java > PRE-CREATION > > > > contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoScanSpec.java > PRE-CREATION > > > > contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoStoragePlugin.java > PRE-CREATION > > > > contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoStoragePluginConfig.java > PRE-CREATION > > > > contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoSubScan.java > PRE-CREATION > > > > contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoUtils.java > PRE-CREATION > > > > contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/common/ChunkInfo.java > PRE-CREATION > > > > contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/common/MongoCompareOp.java > PRE-CREATION > > > > contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/config/MongoPStore.java > PRE-CREATION > > > > contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/config/MongoPStoreProvider.java > PRE-CREATION > > > > contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/schema/MongoDatabaseSchema.java > PRE-CREATION > > > > contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/schema/MongoSchemaFactory.java > PRE-CREATION > > > contrib/storage-mongo/src/main/resources/bootstrap-storage-plugins.json > PRE-CREATION > > contrib/storage-mongo/src/main/resources/drill-module.conf PRE-CREATION > > > > contrib/storage-mongo/src/test/java/org/apache/drill/exec/store/mongo/TestMongoChunkAssignment.java > PRE-CREATION > > distribution/pom.xml cd5df0d > > distribution/src/assemble/bin.xml 86e3802 > > exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java > 933bfbe > > > > exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java > 4fa61e1 > > > > exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java > 4e12b8b > > > > exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReaderWithState.java > ef995f8 > > > > Diff: https://reviews.apache.org/r/25996/diff/ > > > > > > Testing > > ------- > > > > 1) Tested various set of queries on sharded, replicated and stand-alone > modes. > > > > 2) Test Environment details: We created mongo cluster with 2 shards with > a collections consists of 35 chunks(18 chunks are one shard and remaining > chunks on on other shard). Below are the few queries which we tested in all > the environments. > > > > a) SELECT * FROM mongo.employee.`empinfo` limit 10; > > > > b) SELECT first_name, last_name FROM mongo.employee.`empinfo` > limit 10; > > > > c) SELECT first_name, last_name FROM mongo.employee.`empinfo` > where employee_id = 1111; > > > > d) SELECT * FROM mongo.employee.`empinfo` where full_name = 'Phil > Munoz'; > > > > e) SELECT first_name, last_name, position_id FROM > mongo.employee.`empinfo` where employee_id = 1111 OR position_id = 16; > > > > g) SELECT first_name, last_name FROM mongo.employee.`empinfo` > where isFTE = true; > > > > h) SELECT first_name, last_name, position_id FROM > mongo.employee.`empinfo` where employee_id = 1107 AND position_id = 17 AND > last_name = 'Yonce'; > > > > > > 3) PStore functionality not fully tested. > > > > > > Thanks, > > > > Anil Kumar B > > > > > >