Re: query for point in time
I think there's something wrong with your database then, but okay. You still haven't said what your Solr schema looks like -- that list of values doesn't say what the solr field names or types are. I think this is maybe because you don't actually have a Solr database and have no idea how Solr works, you're just asking in theory? On the other hand, you just said you have better performance with solr -- I'm not sure how you were able to tell the performance of solr in answering these queries if you don't even know how to make them! But, again, assuming your data is set up like i'm guessing it is, it's quite similar to what you'd do with an rdbms. What does 'most current' mean? Can jobs be overlapping? To find the project with the latest start date for a given person, just limit to documents with that current person in a 'q' or 'fq', and then sort by "start_date desc". Perhaps limit to 1 if you really only want one hit. Same principle as you would in an rdbms. Again, this requires setting up your solr index in such a way to answer these sorts of questions. Each "document" in Solr will represent a person-project pair. It'll have fields for person (or multiple fields, personID, personFirst, personLast, etc), project name, project start date, project end date. This will make it easy/possible to answer questions like your examples with Solr, but will make it hard to answer many other sorts of questions -- unlike an rdbms, it is difficult to set up a Solr index that can flexibly answer just about any question you through at it, particularly when you have hieararchical or otherwise multi-entity data. If you are interested, the standard Solr tutorial is pretty good: http://lucene.apache.org/solr/tutorial.html On 9/15/2011 6:39 PM, gary tam wrote: Thanks for the reply. We had the search within the database initially, but it proven to be too slow. With solr we have much better performance. One more question, how could I find the most current job for each employee My data looks like John Smith department A web site bug fix 2010-01-01 2010-01-03 unit testing 2010-01-04 2010-01-06 QA support 2010-01-07 2010-01-12 implementation 2010-01-13 2010-01-22 Jane Doe department A QA support 2010-01-01 2010-05-01 implementation 2010-05-02 2010-09-28 Joe Doe department APHP development 2011-01-01 2011-08-31 Java Development 2011-09-01 2011-09-15 I would like to return this as my search result John Smith department Aimplementation 2010-01-13 2010-01-22 Jane Doe department Aimplementation 2010-05-02 2010-09-28 Joe Doedepartment AJava Development 2011-09-01 2011-09-15 Thanks in advance Gary On Thu, Sep 15, 2011 at 3:33 PM, Jonathan Rochkind wrote: You didn't tell us what your schema looks like, what fields with what types are involved. But similar to how you'd do it in your database, you need to find 'documents' that have a start date before your date in question, and an end date after your date in question, to find the ones whose range includes your date in question. Something like this: q=start_date:[* TO '2010-01-05'] AND end_date:['2010-01-05' TO *] Of course, you need to add on your restriction to just documents about 'John Smith', through another AND clause or an 'fq'. But in general, if you've got a db with this info already, and this is all you need, why not just use the db? Multi-hieararchy data like this is going to give you trouble in Solr eventually, you've got to arrange the solr indexes/schema to answer your questions, and eventually you're going to have two questions which require mutually incompatible schema to answer. An rdbms is a great general purpose question answering tool for structured data. lucene/Solr is a great indexing tool for text matching. On 9/15/2011 2:55 PM, gary tam wrote: Hi I have a scenario that I am not sure how to write the query for. Here is the scenario - have an employee record with multi value for project, started date, end date. looks something like John Smith web site bug fix 2010-01-01 2010-01-03 unit testing 2010-01-04 2010-01-06 QA support 2010-01-07 2010-01-12 implementation 2010-01-13 2010-01-22 I want to find what project John Smith was working on 2010-01-05 Is this possible or I have to back to my database ? Thanks
Re: query for point in time
Thanks for the reply. We had the search within the database initially, but it proven to be too slow. With solr we have much better performance. One more question, how could I find the most current job for each employee My data looks like John Smith department A web site bug fix 2010-01-01 2010-01-03 unit testing 2010-01-04 2010-01-06 QA support 2010-01-07 2010-01-12 implementation 2010-01-13 2010-01-22 Jane Doe department A QA support 2010-01-01 2010-05-01 implementation 2010-05-02 2010-09-28 Joe Doe department APHP development 2011-01-01 2011-08-31 Java Development 2011-09-01 2011-09-15 I would like to return this as my search result John Smith department Aimplementation 2010-01-13 2010-01-22 Jane Doe department Aimplementation 2010-05-02 2010-09-28 Joe Doedepartment AJava Development 2011-09-01 2011-09-15 Thanks in advance Gary On Thu, Sep 15, 2011 at 3:33 PM, Jonathan Rochkind wrote: > You didn't tell us what your schema looks like, what fields with what types > are involved. > > But similar to how you'd do it in your database, you need to find > 'documents' that have a start date before your date in question, and an end > date after your date in question, to find the ones whose range includes your > date in question. > > Something like this: > > q=start_date:[* TO '2010-01-05'] AND end_date:['2010-01-05' TO *] > > Of course, you need to add on your restriction to just documents about > 'John Smith', through another AND clause or an 'fq'. > > But in general, if you've got a db with this info already, and this is all > you need, why not just use the db? Multi-hieararchy data like this is going > to give you trouble in Solr eventually, you've got to arrange the solr > indexes/schema to answer your questions, and eventually you're going to have > two questions which require mutually incompatible schema to answer. > > An rdbms is a great general purpose question answering tool for structured > data. lucene/Solr is a great indexing tool for text matching. > > > On 9/15/2011 2:55 PM, gary tam wrote: > >> Hi >> >> I have a scenario that I am not sure how to write the query for. >> >> Here is the scenario - have an employee record with multi value for >> project, >> started date, end date. >> >> looks something like >> >> >> John Smith web site bug fix 2010-01-01 2010-01-03 >> unit testing 2010-01-04 >> 2010-01-06 >> QA support 2010-01-07 >> 2010-01-12 >> implementation 2010-01-13 >> 2010-01-22 >> >> I want to find what project John Smith was working on 2010-01-05 >> >> Is this possible or I have to back to my database ? >> >> >> Thanks >> >>
Re: query for point in time
You didn't tell us what your schema looks like, what fields with what types are involved. But similar to how you'd do it in your database, you need to find 'documents' that have a start date before your date in question, and an end date after your date in question, to find the ones whose range includes your date in question. Something like this: q=start_date:[* TO '2010-01-05'] AND end_date:['2010-01-05' TO *] Of course, you need to add on your restriction to just documents about 'John Smith', through another AND clause or an 'fq'. But in general, if you've got a db with this info already, and this is all you need, why not just use the db? Multi-hieararchy data like this is going to give you trouble in Solr eventually, you've got to arrange the solr indexes/schema to answer your questions, and eventually you're going to have two questions which require mutually incompatible schema to answer. An rdbms is a great general purpose question answering tool for structured data. lucene/Solr is a great indexing tool for text matching. On 9/15/2011 2:55 PM, gary tam wrote: Hi I have a scenario that I am not sure how to write the query for. Here is the scenario - have an employee record with multi value for project, started date, end date. looks something like John Smith web site bug fix 2010-01-01 2010-01-03 unit testing 2010-01-04 2010-01-06 QA support 2010-01-07 2010-01-12 implementation 2010-01-13 2010-01-22 I want to find what project John Smith was working on 2010-01-05 Is this possible or I have to back to my database ? Thanks