RE: Dataimport Handler in solr 3.6.1
There were 2 major changes to DIH Cache functionality in Solr 3.6, only 1 of which was carried to Solr 4.0: - Solr 3.6 had 2 MAJOR changes: 1. We support pluggable caches so that you can write your own cache implemetations and cache however you want. The goal here is to allow you to cache to disk when you had to do large, complex joins and an in-memory cache could result in an OOM. Also, you can specify cacheImpl with any EntityProcessor, not just SqlEntityProcessor. So you can join child entities that come from XML, flat files, etc. CachedSqlEntityProcessor is technically deprecated as using it is the same as SqlEntityProcessor with cacheImpl=SortedMapBackedCache specified. This does a simple in-memory cache very similar to Solr3.5 and prior. (see https://issues.apache.org/jira/browse/SOLR-2382) 2. Extensive work was done to try and make the threads parameter work in more situations. This involved some rather invasive changes to the DIH Cache functionality. (see https://issues.apache.org/jira/browse/SOLR-3011) - Solr 4.0 has #1 above, BUT NOT #2. Rather the threads functionality was entirely removed. Subsequently, if the problem is due to #2 (SOLR-3011), this isn't as big a problem because 3.x users can simply use the 3.5 DIH jar (but some use-cases involding threads work with the 3.6(.1) jar and not at all with 3.5, so users will have to pick choose the best version to use for their instance). My concern is there are issues with #1 (SOLR-2382). That's why I'm asking if at all possible you can try this with SOLR 4.0. I have tested Solr 4.0 extensively here and it seems caching works exactly as it ought. However, DIH is flexible on how it can be configured and there could be somethat that was broken that I have not uncovered myself. Any issues that may exist with SOLR-2382 need to be identified and fixed in the 4.x branch as soon as possible. I apologize for the late response. I was away the past week. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: mechravi25 [mailto:mechrav...@yahoo.co.in] Sent: Tuesday, August 21, 2012 7:47 AM To: solr-user@lucene.apache.org Subject: RE: Dataimport Handler in solr 3.6.1 Hi James, Thanks for the suggestions. Actually it is cacheLookup=ent1.id . had misspelt it. Also, I will be needing the transformers mentioned as there are other columns as well. Actually tried using the 3.5 DIH jars in 3.6.1 and indexed the same and the indexing was successful. But I wanted this to work with 3.6.1 DIH. Just came across the SOLR-2382 patch. I tried giving the following processor=CachedSqlEntityProcessor cacheImpl=SortedMapBackedCache in my DIH.xml file. In case of static fields in child entities ,the indexing happended fine but in case of dynamic fields, only one of the dynamic fields was indexed and the rest was skipped even though the total rows fetched from datasource was correct. Following are my questions 1.) Is there a big difference in solr 3.5 and 3.6.1 DIH handler files? like is any new feature added in 3.6 DIH that is not present in 3.5? 2.) Am i missing something while giving the cacheImpl=SortedMapBackedCache in my DIH.xml because of which dynamic fields are not indexed properly? There is no change to my DIH file from my previous post apart from this cacheImpl addition and also the dynamic fields are indexed properly if I do not give this cacheImpl. Am I missing something here? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Dataimport-Handler-in-solr-3-6-1-tp4001149p4002421.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Dataimport Handler in solr 3.6.1
Hi James, Thanks for the suggestions. Actually it is cacheLookup=ent1.id . had misspelt it. Also, I will be needing the transformers mentioned as there are other columns as well. Actually tried using the 3.5 DIH jars in 3.6.1 and indexed the same and the indexing was successful. But I wanted this to work with 3.6.1 DIH. Just came across the SOLR-2382 patch. I tried giving the following processor=CachedSqlEntityProcessor cacheImpl=SortedMapBackedCache in my DIH.xml file. In case of static fields in child entities ,the indexing happended fine but in case of dynamic fields, only one of the dynamic fields was indexed and the rest was skipped even though the total rows fetched from datasource was correct. Following are my questions 1.) Is there a big difference in solr 3.5 and 3.6.1 DIH handler files? like is any new feature added in 3.6 DIH that is not present in 3.5? 2.) Am i missing something while giving the cacheImpl=SortedMapBackedCache in my DIH.xml because of which dynamic fields are not indexed properly? There is no change to my DIH file from my previous post apart from this cacheImpl addition and also the dynamic fields are indexed properly if I do not give this cacheImpl. Am I missing something here? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Dataimport-Handler-in-solr-3-6-1-tp4001149p4002421.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Dataimport Handler in solr 3.6.1
One thing I notice in your configuration...the child entity has this: cacheLookup=ent1.uid but your parent entity doesn't have a uid field. Also, you have these 3 transformers: RegexTransformer,DateFormatTransformer,TemplateTransformer but none of your columns seem to make use of these. Are you sure you need them? In any case I am suspicious there may still be bugs in 3.6.1 related to CachedSqlEntityProcessor, so if you are able to create a failing unit test and post it to JIRA that would be helpful. If you need to, you can use the 3.5 DIH jar with Solr 3.6.1. Also, I do not think the SOLR-3360 should affect you unless you're using the threads parameter. Both SOLR-3360 SOLR-3430 fixed bugs related to CachedSqlEntityProcessor that were introduced in 3.6.0 (from SOLR-3411 and SOLR-2482 respectively). Finally, if you are at all able to test this on 4.0-beta, I would greatly appreciate it! SOLR-3411/SOLR-3360 were never applied to version 4.0 because threadS support was removed entirely. However, SOLR-2482/SOLR-3430 were applied to 4.0 also. If we have any more SOLR-2482 bugs lingering in 4.0 these really need to be fixed so any testing help would be much appreciated. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: mechravi25 [mailto:mechrav...@yahoo.co.in] Sent: Tuesday, August 14, 2012 8:04 AM To: solr-user@lucene.apache.org Subject: Dataimport Handler in solr 3.6.1 I am indexing some data using dataimport handler files in solr 3.6.1. I using a nested entity in my handler file. I noticed a scenario where-in instead of the records which is to be fetched for a document, all the records present in the table are indexed. Following is the ideal scenario how the data has to be indexed. For a document A, I am trying to index the 2 values B,C as a multivalued field idA/id related_id strB/str strC/str /related_id This is how the output should be. I have used the same DIH file for solr 1.4,3.5 versions and the data was indexed fine like the one mentioned above in both the versions. But in solr 3.6.1 version, data was indexed differently. In my table, there are 4 values(B,C,D,E) in related_id field. This is how the data is indexed in 3.6.1 idA/id related_id strB/str strC/str strD/str strE/str /related_id Ideally, the values D and E should not get indexed under id A. This is the same for the other id records. Following is the content of the DIH file entity name=ent1 query=select sid as id Table1 a transformer=RegexTransformer,DateFormatTransformer,TemplateTransformer field column=id name=id boost=0.5/ entity name=ent2 query=select id1,rid from Table2 processor=CachedSqlEntityProcessor cacheKey=id1 cacheLookup=ent1.uid transformer=RegexTransformer,DateFormatTransformer,TemplateTransformer field column=rid name=related_id/ /entity /entity I tried changing the CachedSqlEntityProcessor to SqlEntityProcessor and then indexed the same but still I faced the same issue. When I googled a bit, I found this url https://issues.apache.org/jira/browse/SOLR-3360 I am not sure if the issue 3360 is the same as the scenario as I have mentioned above. Please guid me. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Dataimport-Handler-in-solr-3-6-1-tp4001149.html Sent from the Solr - User mailing list archive at Nabble.com.