RE: Sharing dih dictionaries
You're totally correct. There's actually a link on the DIH page now which wasn't there when I had read it a long time ago. I'm really looking forward to 4.0, it's got a ton of great new features. Thanks for the links!! -Original Message- From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] Sent: Monday, December 05, 2011 10:45 PM To: solr-user@lucene.apache.org Subject: Re: Sharing dih dictionaries It looks like https://issues.apache.org/jira/browse/SOLR-2382 or even https://issues.apache.org/jira/browse/SOLR-2613. I guess by using SOLR-2382 you can specify your own SortedMapBackedCache subclass which is able to share your Dictionary. Regards On Tue, Dec 6, 2011 at 12:26 AM, Brent Mills bmi...@uship.com wrote: I'm not really sure how to title this but here's what I'm trying to do. I have a query that creates a rather large dictionary of codes that are shared across multiple fields of a base entity. I'm using the cachedsqlentityprocessor but I was curious if there was a way to join this multiple times to the base entity so I can avoid having to reload it for each column join. Ex: entity name=parts query=select name, code1, code2, code3 from parts field column=name name=name / entity name=shareddictionary1 query=select code, description from partcodes where=code=parts.code1 field column=description name=code1desc //entity entity name=shareddictionary2 query=select code, description from partcodes where=code=parts.code2 field column=description name=code1desc //entity entity name=shareddictionary3 query=select code, description from partcodes where=code=parts.code3 field column=description name=code1desc //entity /entity Kind of a simplified example but in this case the dictionary query has to be run 3 times to join 3 different columns. It would be nice if I could load the data set once as an entity and specify how to join it in code without requiring a separate sql query. Any ideas? -- Sincerely yours Mikhail Khludnev Developer Grid Dynamics tel. 1-415-738-8644 Skype: mkhludnev http://www.griddynamics.com mkhlud...@griddynamics.com
RE: Sharing dih dictionaries
Just FYI that the final piece of SOLR-2382 has not been committed, and instead has been spun off to SOLR-2943. So it you're using Trunk and you need the ability to persist a cache on disk and then read it back again later as an DIH entity, you'll need both SOLR-2943 and also a cache implementation. We're using the BDB-JE cache from SOLR-2613 in production. There is also one backed with a Lucene index (SOLR-2948). James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Brent Mills [mailto:bmi...@uship.com] Sent: Tuesday, December 06, 2011 2:43 PM To: solr-user@lucene.apache.org Subject: RE: Sharing dih dictionaries You're totally correct. There's actually a link on the DIH page now which wasn't there when I had read it a long time ago. I'm really looking forward to 4.0, it's got a ton of great new features. Thanks for the links!! -Original Message- From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] Sent: Monday, December 05, 2011 10:45 PM To: solr-user@lucene.apache.org Subject: Re: Sharing dih dictionaries It looks like https://issues.apache.org/jira/browse/SOLR-2382 or even https://issues.apache.org/jira/browse/SOLR-2613. I guess by using SOLR-2382 you can specify your own SortedMapBackedCache subclass which is able to share your Dictionary. Regards On Tue, Dec 6, 2011 at 12:26 AM, Brent Mills bmi...@uship.com wrote: I'm not really sure how to title this but here's what I'm trying to do. I have a query that creates a rather large dictionary of codes that are shared across multiple fields of a base entity. I'm using the cachedsqlentityprocessor but I was curious if there was a way to join this multiple times to the base entity so I can avoid having to reload it for each column join. Ex: entity name=parts query=select name, code1, code2, code3 from parts field column=name name=name / entity name=shareddictionary1 query=select code, description from partcodes where=code=parts.code1 field column=description name=code1desc //entity entity name=shareddictionary2 query=select code, description from partcodes where=code=parts.code2 field column=description name=code1desc //entity entity name=shareddictionary3 query=select code, description from partcodes where=code=parts.code3 field column=description name=code1desc //entity /entity Kind of a simplified example but in this case the dictionary query has to be run 3 times to join 3 different columns. It would be nice if I could load the data set once as an entity and specify how to join it in code without requiring a separate sql query. Any ideas? -- Sincerely yours Mikhail Khludnev Developer Grid Dynamics tel. 1-415-738-8644 Skype: mkhludnev http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Sharing dih dictionaries
AFAIK DIH jar is separated from Solr war. Isn't there a chance to use DIH from 4.0 in Solr 3.4? James, Sorry for hijacking the thread. But, do you have a chance to review https://issues.apache.org/jira/browse/SOLR-2947 I want to provide a patch for fixing multi-threading in DIH. But formally speaking, this issue in addition with https://issues.apache.org/jira/browse/SOLR-2933 blocks me. Regards On Wed, Dec 7, 2011 at 1:11 AM, Dyer, James james.d...@ingrambook.comwrote: Just FYI that the final piece of SOLR-2382 has not been committed, and instead has been spun off to SOLR-2943. So it you're using Trunk and you need the ability to persist a cache on disk and then read it back again later as an DIH entity, you'll need both SOLR-2943 and also a cache implementation. We're using the BDB-JE cache from SOLR-2613 in production. There is also one backed with a Lucene index (SOLR-2948). James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Brent Mills [mailto:bmi...@uship.com] Sent: Tuesday, December 06, 2011 2:43 PM To: solr-user@lucene.apache.org Subject: RE: Sharing dih dictionaries You're totally correct. There's actually a link on the DIH page now which wasn't there when I had read it a long time ago. I'm really looking forward to 4.0, it's got a ton of great new features. Thanks for the links!! -Original Message- From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] Sent: Monday, December 05, 2011 10:45 PM To: solr-user@lucene.apache.org Subject: Re: Sharing dih dictionaries It looks like https://issues.apache.org/jira/browse/SOLR-2382 or even https://issues.apache.org/jira/browse/SOLR-2613. I guess by using SOLR-2382 you can specify your own SortedMapBackedCache subclass which is able to share your Dictionary. Regards On Tue, Dec 6, 2011 at 12:26 AM, Brent Mills bmi...@uship.com wrote: I'm not really sure how to title this but here's what I'm trying to do. I have a query that creates a rather large dictionary of codes that are shared across multiple fields of a base entity. I'm using the cachedsqlentityprocessor but I was curious if there was a way to join this multiple times to the base entity so I can avoid having to reload it for each column join. Ex: entity name=parts query=select name, code1, code2, code3 from parts field column=name name=name / entity name=shareddictionary1 query=select code, description from partcodes where=code=parts.code1 field column=description name=code1desc //entity entity name=shareddictionary2 query=select code, description from partcodes where=code=parts.code2 field column=description name=code1desc //entity entity name=shareddictionary3 query=select code, description from partcodes where=code=parts.code3 field column=description name=code1desc //entity /entity Kind of a simplified example but in this case the dictionary query has to be run 3 times to join 3 different columns. It would be nice if I could load the data set once as an entity and specify how to join it in code without requiring a separate sql query. Any ideas? -- Sincerely yours Mikhail Khludnev Developer Grid Dynamics tel. 1-415-738-8644 Skype: mkhludnev http://www.griddynamics.com mkhlud...@griddynamics.com -- Sincerely yours Mikhail Khludnev Developer Grid Dynamics tel. 1-415-738-8644 Skype: mkhludnev http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Sharing dih dictionaries
It looks like https://issues.apache.org/jira/browse/SOLR-2382 or even https://issues.apache.org/jira/browse/SOLR-2613. I guess by using SOLR-2382 you can specify your own SortedMapBackedCache subclass which is able to share your Dictionary. Regards On Tue, Dec 6, 2011 at 12:26 AM, Brent Mills bmi...@uship.com wrote: I'm not really sure how to title this but here's what I'm trying to do. I have a query that creates a rather large dictionary of codes that are shared across multiple fields of a base entity. I'm using the cachedsqlentityprocessor but I was curious if there was a way to join this multiple times to the base entity so I can avoid having to reload it for each column join. Ex: entity name=parts query=select name, code1, code2, code3 from parts field column=name name=name / entity name=shareddictionary1 query=select code, description from partcodes where=code=parts.code1 field column=description name=code1desc //entity entity name=shareddictionary2 query=select code, description from partcodes where=code=parts.code2 field column=description name=code1desc //entity entity name=shareddictionary3 query=select code, description from partcodes where=code=parts.code3 field column=description name=code1desc //entity /entity Kind of a simplified example but in this case the dictionary query has to be run 3 times to join 3 different columns. It would be nice if I could load the data set once as an entity and specify how to join it in code without requiring a separate sql query. Any ideas? -- Sincerely yours Mikhail Khludnev Developer Grid Dynamics tel. 1-415-738-8644 Skype: mkhludnev http://www.griddynamics.com mkhlud...@griddynamics.com