Btw I got the dataset JSON from http://docs.mongodb.org/manual/tutorial/aggregation-zip-code-data-set/
On Tue, Sep 30, 2014 at 4:30 PM, Neeraja Rentachintala < [email protected]> wrote: > Hi Anil, Kamesh > > I was trying MongoDB plugin with the following queries and all of them > succeeded. This is using the zip codes dataset. > However intermittently (now consistently) , these are failing. I built > Kamesh branch and also tried from the latest 0.6 master (which merged the > plugin). > Do you have any insights into this. I can file JIRA if you like. > > select state,sum(pop) from zipcodes group by state having sum(pop) > > 10000000 order by sum(pop) desc; > > select state,city,avg(pop) from zipcodes group by state, city; > > select city, sum(pop) from zipcodes where state is not null group by city > order by sum(pop) desc limit 1; > > select city, sum(pop) from zipcodes where state is not null group by city > order by sum(pop) asc limit 1; > > --------------- > > > Errors > > ------------------- > > select sum(pop) from zipcodes where city=‘CHICAGO’; > > 0: jdbc:drill:zk=local> select state, city, sum(pop) from zipcodes group > by state, city order by sum(pop) limit 5; > > Query failed: Failure while running fragment. Invalid value for boolean: > 15338 [1366240a-125d-4646-962a-1734f84b03b3] > > > Error: exception while executing query: Failure while trying to get next > result batch. (state=,code=0) > > 0: jdbc:drill:zk=local> select state, city, sum(pop) from zipcodes where > state is not null and city is not null group by state, city order by > sum(pop) limit 5; > > Query failed: Failure while running fragment. Invalid value for boolean: > 15338 [53a1c241-82f0-4413-aa09-39f851d7209a] > > > Error: exception while executing query: Failure while trying to get next > result batch. (state=,code=0) > > 0: jdbc:drill:zk=local> select state, city, sum(pop) from zipcodes group > by state,city order by sum(pop) asc limit 1; > > Query failed: Failure while running fragment. Invalid value for boolean: > 15338 [911f48f7-019e-42f2-b0af-1e255416ce76] > > > Error: exception while executing query: Failure while trying to get next > result batch. (state=,code=0) > > 0: jdbc:drill:zk=local> select state,city,avg(pop) from zipcodes group by > state, city; > > Query failed: Failure while running fragment. Invalid value for boolean: > 15338 [54c5e480-972a-49b9-860e-60d4a7429366] > > > Error: exception while executing query: Failure while trying to get next > result batch. (state=,code=0) > > 0: jdbc:drill:zk=local> select city, sum(pop) from zipcodes group by city > order by sum(pop) asc limit 1; > > Query failed: Failure while running fragment. Invalid value for boolean: > 15338 [b5374f4a-3ee6-47ac-9dc1-fcebed555518] > > > Error: exception while executing query: Failure while trying to get next > result batch. (state=,code=0) > > 0: jdbc:drill:zk=local> select state,sum(pop) from zipcodes group by state > having sum(pop) > 10000000; > > Query failed: Failure while running fragment. Invalid value for boolean: > 15338 [5875169d-dcb8-419f-991e-d38d41dddcbb] > > > Error: exception while executing query: Failure while trying to get next > result batch. (state=,code=0) > > On Sat, Sep 27, 2014 at 1:07 AM, Kamesh <[email protected]> wrote: > >> Thanks Jinfeng & Neeraja for looking into this. >> We will look into the above mentioned issues. >> >> >> >> On Sat, Sep 27, 2014 at 8:28 AM, Neeraja Rentachintala < >> [email protected]> wrote: >> >>> I have played with the plugin as well today and overall its very good. >>> >>> I tried the queries >>> http://docs.mongodb.org/manual/tutorial/aggregation-zip-code-data-set/ >>> on the zip code dataset and all the aggregate queries worked. >>> >>> >>> ----------- >>> >>> select sum(pop) from zipcodes where city='SEATTLE’; >>> >>> select state, city, sum(pop) from zipcodes group by state,city order by >>> sum(pop) asc limit 1; >>> >>> select state,city,avg(pop) from zipcodes group by state, city; >>> >>> select city, sum(pop) from zipcodes group by city order by sum(pop) asc >>> limit 1; >>> >>> select state,sum(pop) from zipcodes group by state having sum(pop) > >>> 10000000; >>> >>> >>> ---------- >>> >>> >>> I however noticed issues with querying repeating elements (used USDA >>> nutrition dataset), especially more than one level nested as well as JOINs >>> (example queries are below) >>> >>> ------------------ >>> >>> 0: jdbc:drill:zk=local> SELECT t1.first_name FROM >>> mongo.employee.`empinfo` t1 JOIN mongo.employee.`empinfo` t2 ON >>> t1.`employee_id` = t2.`employee_id`; >>> >>> Query failed: Failure while setting up Foreman. Internal error: Error >>> while applying rule DrillPushProjIntoScan, args >>> [rel#12606:ProjectRel.NONE.ANY([]).[](child=rel#12598:Subset#0.ENUMERABLE.ANY([]).[],employee_id=$1,first_name=$2), >>> rel#12594:EnumerableTableAccessRel.ENUMERABLE.ANY([]).[](table=[mongo, >>> employee, empinfo])] [08f4eedd-f5c9-4ebf-8d5b-d9249b79ca32] >>> >>> >>> 0: jdbc:drill:zk=local> select t.nutrients from mongo.usda.nutrition t >>> limit 1; >>> >>> Query failed: Screen received stop request sent. You tried to write a >>> BigInt type when you are using a ValueWriter of type >>> NullableFloat8WriterImpl. [dc44e277-1b1d-4f00-b60e-9f06b883e7c5] >>> >>> >>> Error: exception while executing query: Failure while trying to get next >>> result batch. (state=,code=0) >>> >>> 0: jdbc:drill:zk=local> select t.nutrients[0].units from >>> mongo.usda.nutrition t limit 1; >>> >>> Query failed: Screen received stop request sent. You tried to write a >>> BigInt type when you are using a ValueWriter of type >>> NullableFloat8WriterImpl. [a285c85e-4607-48fc-97af-41b5726459e2] >>> >>> >>> Error: exception while executing query: Failure while trying to get next >>> result batch. (state=,code=0) >>> >>> >>> >>> On Fri, Sep 26, 2014 at 6:07 PM, Jinfeng Ni <[email protected]> wrote: >>> >>>> >>>> ----------------------------------------------------------- >>>> This is an automatically generated e-mail. To reply, visit: >>>> https://reviews.apache.org/r/25996/#review54756 >>>> ----------------------------------------------------------- >>>> >>>> Ship it! >>>> >>>> >>>> I did not do a detail code review; let that task to Steven. I mainly >>>> played with this Mongo plugin. Overall it looks good. >>>> >>>> Basically, I start a mongodb instance, import the data, and run several >>>> single table queryies, and all of them work perfectly. >>>> >>>> Some issues I saw when playing around : >>>> >>>> 1. The result of select * seems not the expect answer : it would return >>>> a map containing all the columns: >>>> >>>> SELECT * FROM mongo.employee.`empinfo` limit 2; >>>> +------------+ >>>> | * | >>>> +------------+ >>>> | { "employee_id" : 1101 , "full_name" : "Steve Eurich" , "first_name" >>>> : "Steve" , "last_name" : "Eurich" , "position_id" : 16 , "position" : >>>> "Store T" , "isFTE" : true} | >>>> | { "employee_id" : 1102 , "full_name" : "Mary Pierson" , "first_name" >>>> : "Mary" , "last_name" : "Pierson" , "position_id" : 16 , "position" : >>>> "Store T" , "isFTE" : true} | >>>> +------------+ >>>> 2 rows selected (0.084 seconds) >>>> >>>> In contrast, here is the result when Drill queries a .json file: >>>> >>>> select * from cp.`employee.json` limit 2; >>>> >>>> +-------------+------------+------------+------------+-------------+----------------+------------+---------------+------------+------------+------------+---------------+-----------------+----------------+------------+-----------------+ >>>> | employee_id | full_name | first_name | last_name | position_id | >>>> position_title | store_id | department_id | birth_date | hire_date | >>>> salary | supervisor_id | education_level | marital_status | gender | >>>> management_role | >>>> >>>> +-------------+------------+------------+------------+-------------+----------------+------------+---------------+------------+------------+------------+---------------+-----------------+----------------+------------+-----------------+ >>>> | 1 | Sheri Nowmer | Sheri | Nowmer | 1 | >>>> President | 0 | 1 | 1961-08-26 | 1994-12-01 >>>> 00:00:00.0 | 80000.0 | 0 | Graduate Degree | S >>>> | F | Senior Management | >>>> | 2 | Derrick Whelply | Derrick | Whelply | 2 >>>> | VP Country Manager | 0 | 1 | 1915-07-03 | >>>> 1994-12-01 00:00:00.0 | 40000.0 | 1 | Graduate Degree | M >>>> | M | Senior Management | >>>> >>>> +-------------+------------+------------+------------+-------------+----------------+------------+---------------+------------+------------+------------+---------------+-----------------+----------------+------------+-----------------+ >>>> 2 rows selected (0.39 seconds) >>>> >>>> >>>> 2. Join two mongodb tables would fail. >>>> >>>> SELECT t1.first_name, t2.last_name FROM mongo.employee.`empinfo` t1, >>>> mongo.employee.`empinfo` t2 where t1.`employee_id` = t2.`employee_id` limit >>>> 1; >>>> Query failed: Failure while setting up Foreman. Internal error: while >>>> converting `t1`.`employee_id` = `t2`.`employee_id` >>>> [39eb6c88-fd21-4514-8903-48d99210b88d] >>>> >>>> 3. Join a mongodb table with a table with other storage engine would >>>> fail with CanNotPlanException: >>>> >>>> SELECT t1.first_name, t2.last_name FROM mongo.employee.`empinfo` t1, >>>> mongo.employee.`empinfo` t2 where t1.`employee_id` = t2.`employee_id` limit >>>> 1; >>>> Query failed: Failure while setting up Foreman. Internal error: while >>>> converting `t1`.`employee_id` = `t2`.`employee_id` >>>> [39eb6c88-fd21-4514-8903-48d99210b88d] >>>> >>>> Error: exception while executing query: Failure while trying to get >>>> next result batch. (state=,code=0) >>>> 0: jdbc:drill:zk=local> SELECT t1.first_name, t1.last_name FROM >>>> mongo.employee.`empinfo` as t1, cp.`employee.json` t2 where t1.employee_id >>>> = t2.employee_id limit 10; >>>> Query failed: Failure while parsing sql. Node >>>> [rel#2496:Subset#5.LOGICAL.ANY([]).[]] could not be implemented; planner >>>> state: >>>> >>>> Root: rel#2496:Subset#5.LOGICAL.ANY([]).[] >>>> Original rel: >>>> ...... >>>> >>>> 4. Select *, regular_column from mongodb would return the >>>> regular_column as null. >>>> >>>> 0: jdbc:drill:zk=local> SELECT first_name FROM mongo.employee.`empinfo` >>>> limit 2; >>>> +------------+ >>>> | first_name | >>>> +------------+ >>>> | Steve | >>>> | Mary | >>>> +------------+ >>>> 2 rows selected (0.084 seconds) >>>> 0: jdbc:drill:zk=local> SELECT *, first_name FROM >>>> mongo.employee.`empinfo` limit 2; >>>> +------------+------------+ >>>> | * | first_name | >>>> +------------+------------+ >>>> | { "employee_id" : 1101 , "full_name" : "Steve Eurich" , "first_name" >>>> : "Steve" , "last_name" : "Eurich" , "position_id" : 16 , "position" : >>>> "Store T" , "isFTE" : true} | null | >>>> | { "employee_id" : 1102 , "full_name" : "Mary Pierson" , "first_name" >>>> : "Mary" , "last_name" : "Pierson" , "position_id" : 16 , "position" : >>>> "Store T" , "isFTE" : true} | null | >>>> +------------+------------+ >>>> >>>> >>>> >>>> I think it would be fine to fix those issues in the next release. >>>> >>>> >>>> PS: could you please re-build a patch after rebasing on the recent >>>> master branch? >>>> >>>> - Jinfeng Ni >>>> >>>> >>>> On Sept. 24, 2014, 11:06 a.m., Anil Kumar B wrote: >>>> > >>>> > ----------------------------------------------------------- >>>> > This is an automatically generated e-mail. To reply, visit: >>>> > https://reviews.apache.org/r/25996/ >>>> > ----------------------------------------------------------- >>>> > >>>> > (Updated Sept. 24, 2014, 11:06 a.m.) >>>> > >>>> > >>>> > Review request for drill, Aditya Kishore, Jacques Nadeau, and Kamesh >>>> B. >>>> > >>>> > >>>> > Repository: drill-git >>>> > >>>> > >>>> > Description >>>> > ------- >>>> > >>>> > Mongo storage plugin support: The features which we implemented as >>>> part of this is as follows. >>>> > 1) Support for sharded(chunk wise), shared-replicated(chunk wise), >>>> replicated, stand-alone >>>> > 2) Predicate pushdown >>>> > 3) Mongo PStore >>>> > >>>> > MongoRecordReader uses JsonReaderWithState in the case of non-star >>>> queries. >>>> > >>>> > >>>> > Diffs >>>> > ----- >>>> > >>>> > contrib/pom.xml 728038a >>>> > contrib/storage-mongo/pom.xml PRE-CREATION >>>> > >>>> >>>> contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/DrillMongoConstants.java >>>> PRE-CREATION >>>> > >>>> >>>> contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoCnxnManager.java >>>> PRE-CREATION >>>> > >>>> >>>> contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoCompareFunctionProcessor.java >>>> PRE-CREATION >>>> > >>>> >>>> contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoFilterBuilder.java >>>> PRE-CREATION >>>> > >>>> >>>> contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoGroupScan.java >>>> PRE-CREATION >>>> > >>>> >>>> contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoPushDownFilterForScan.java >>>> PRE-CREATION >>>> > >>>> >>>> contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoRecordReader.java >>>> PRE-CREATION >>>> > >>>> >>>> contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoScanBatchCreator.java >>>> PRE-CREATION >>>> > >>>> >>>> contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoScanSpec.java >>>> PRE-CREATION >>>> > >>>> >>>> contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoStoragePlugin.java >>>> PRE-CREATION >>>> > >>>> >>>> contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoStoragePluginConfig.java >>>> PRE-CREATION >>>> > >>>> >>>> contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoSubScan.java >>>> PRE-CREATION >>>> > >>>> >>>> contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/MongoUtils.java >>>> PRE-CREATION >>>> > >>>> >>>> contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/common/ChunkInfo.java >>>> PRE-CREATION >>>> > >>>> >>>> contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/common/MongoCompareOp.java >>>> PRE-CREATION >>>> > >>>> >>>> contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/config/MongoPStore.java >>>> PRE-CREATION >>>> > >>>> >>>> contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/config/MongoPStoreProvider.java >>>> PRE-CREATION >>>> > >>>> >>>> contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/schema/MongoDatabaseSchema.java >>>> PRE-CREATION >>>> > >>>> >>>> contrib/storage-mongo/src/main/java/org/apache/drill/exec/store/mongo/schema/MongoSchemaFactory.java >>>> PRE-CREATION >>>> > >>>> contrib/storage-mongo/src/main/resources/bootstrap-storage-plugins.json >>>> PRE-CREATION >>>> > contrib/storage-mongo/src/main/resources/drill-module.conf >>>> PRE-CREATION >>>> > >>>> >>>> contrib/storage-mongo/src/test/java/org/apache/drill/exec/store/mongo/TestMongoChunkAssignment.java >>>> PRE-CREATION >>>> > distribution/pom.xml cd5df0d >>>> > distribution/src/assemble/bin.xml 86e3802 >>>> > >>>> exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java >>>> 933bfbe >>>> > >>>> >>>> exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java >>>> 4fa61e1 >>>> > >>>> >>>> exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java >>>> 4e12b8b >>>> > >>>> >>>> exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReaderWithState.java >>>> ef995f8 >>>> > >>>> > Diff: https://reviews.apache.org/r/25996/diff/ >>>> > >>>> > >>>> > Testing >>>> > ------- >>>> > >>>> > 1) Tested various set of queries on sharded, replicated and >>>> stand-alone modes. >>>> > >>>> > 2) Test Environment details: We created mongo cluster with 2 shards >>>> with a collections consists of 35 chunks(18 chunks are one shard and >>>> remaining chunks on on other shard). Below are the few queries which we >>>> tested in all the environments. >>>> > >>>> > a) SELECT * FROM mongo.employee.`empinfo` limit 10; >>>> > >>>> > b) SELECT first_name, last_name FROM mongo.employee.`empinfo` >>>> limit 10; >>>> > >>>> > c) SELECT first_name, last_name FROM mongo.employee.`empinfo` >>>> where employee_id = 1111; >>>> > >>>> > d) SELECT * FROM mongo.employee.`empinfo` where full_name = >>>> 'Phil Munoz'; >>>> > >>>> > e) SELECT first_name, last_name, position_id FROM >>>> mongo.employee.`empinfo` where employee_id = 1111 OR position_id = 16; >>>> > >>>> > g) SELECT first_name, last_name FROM mongo.employee.`empinfo` >>>> where isFTE = true; >>>> > >>>> > h) SELECT first_name, last_name, position_id FROM >>>> mongo.employee.`empinfo` where employee_id = 1107 AND position_id = 17 AND >>>> last_name = 'Yonce'; >>>> > >>>> > >>>> > 3) PStore functionality not fully tested. >>>> > >>>> > >>>> > Thanks, >>>> > >>>> > Anil Kumar B >>>> > >>>> > >>>> >>>> >>> >> >> >> -- >> Kamesh. >> > >
