RE: Dataimport Handler in solr 3.6.1

2012-08-30 Thread Dyer, James
There were 2 major changes to DIH Cache functionality in Solr 3.6, only 1 of 
which was carried to Solr 4.0:

- Solr 3.6 had 2 MAJOR changes:

1. We support pluggable caches so that you can write your own cache 
implemetations and cache however you want.  The goal here is to allow you to 
cache to disk when you had to do large, complex joins and an in-memory cache 
could result in an OOM.  Also, you can specify cacheImpl with any 
EntityProcessor, not just SqlEntityProcessor.  So you can join child entities 
that come from XML, flat files, etc.  CachedSqlEntityProcessor is technically 
deprecated as using it is the same as SqlEntityProcessor with 
cacheImpl=SortedMapBackedCache specified.  This does a simple in-memory cache 
very similar to Solr3.5 and prior. (see 
https://issues.apache.org/jira/browse/SOLR-2382)

2. Extensive work was done to try and make the threads parameter work in more 
situations.  This involved some rather invasive changes to the DIH Cache 
functionality. (see https://issues.apache.org/jira/browse/SOLR-3011)

- Solr 4.0 has #1 above, BUT NOT #2.  Rather the threads functionality was 
entirely removed.

Subsequently, if the problem is due to #2 (SOLR-3011), this isn't as big a 
problem because 3.x users can simply use the 3.5 DIH jar (but some use-cases 
involding threads work with the 3.6(.1) jar and not at all with 3.5, so users 
will have to pick  choose the best version to use for their instance).

My concern is there are issues with #1 (SOLR-2382).  That's why I'm asking if 
at all possible you can try this with SOLR 4.0.  I have tested Solr 4.0 
extensively here and it seems caching works exactly as it ought.  However, DIH 
is flexible on how it can be configured and there could be somethat that was 
broken that I have not uncovered myself.  Any issues that may exist with 
SOLR-2382 need to be identified and fixed in the 4.x branch as soon as possible.

I apologize for the late response.  I was away the past week.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-Original Message-
From: mechravi25 [mailto:mechrav...@yahoo.co.in] 
Sent: Tuesday, August 21, 2012 7:47 AM
To: solr-user@lucene.apache.org
Subject: RE: Dataimport Handler in solr 3.6.1

Hi James,

Thanks for the suggestions. 

Actually it is cacheLookup=ent1.id . had misspelt it. Also, I will be
needing the transformers mentioned as there are other columns as well.

Actually tried using the 3.5 DIH jars in 3.6.1 and indexed the same and the
indexing was successful. But I wanted this to work with 3.6.1 DIH. Just came
across the SOLR-2382 patch. I tried giving the following 

processor=CachedSqlEntityProcessor cacheImpl=SortedMapBackedCache 

in my DIH.xml file. In case of static fields in child entities ,the indexing
happended fine but in case of dynamic fields, only one of the dynamic fields
was indexed and the rest was skipped even though the total rows fetched from
datasource was correct.

Following are my questions

1.) Is there a big difference in solr 3.5 and 3.6.1 DIH handler files? like
is any new feature added in 3.6 DIH that is not present in 3.5?
2.) Am i missing something while giving the cacheImpl=SortedMapBackedCache
in my DIH.xml because of which dynamic fields are not indexed properly?
There is no change to my DIH file from my previous post apart from this
cacheImpl addition and also the dynamic fields are indexed properly if I do
not give this cacheImpl. Am I missing something here?

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dataimport-Handler-in-solr-3-6-1-tp4001149p4002421.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Dataimport Handler in solr 3.6.1

2012-08-21 Thread mechravi25
Hi James,

Thanks for the suggestions. 

Actually it is cacheLookup=ent1.id . had misspelt it. Also, I will be
needing the transformers mentioned as there are other columns as well.

Actually tried using the 3.5 DIH jars in 3.6.1 and indexed the same and the
indexing was successful. But I wanted this to work with 3.6.1 DIH. Just came
across the SOLR-2382 patch. I tried giving the following 

processor=CachedSqlEntityProcessor cacheImpl=SortedMapBackedCache 

in my DIH.xml file. In case of static fields in child entities ,the indexing
happended fine but in case of dynamic fields, only one of the dynamic fields
was indexed and the rest was skipped even though the total rows fetched from
datasource was correct.

Following are my questions

1.) Is there a big difference in solr 3.5 and 3.6.1 DIH handler files? like
is any new feature added in 3.6 DIH that is not present in 3.5?
2.) Am i missing something while giving the cacheImpl=SortedMapBackedCache
in my DIH.xml because of which dynamic fields are not indexed properly?
There is no change to my DIH file from my previous post apart from this
cacheImpl addition and also the dynamic fields are indexed properly if I do
not give this cacheImpl. Am I missing something here?

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dataimport-Handler-in-solr-3-6-1-tp4001149p4002421.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Dataimport Handler in solr 3.6.1

2012-08-14 Thread Dyer, James
One thing I notice in your configuration...the child entity has this:

cacheLookup=ent1.uid

but your parent entity doesn't have a uid field.  

Also, you have these 3 transformers:  
RegexTransformer,DateFormatTransformer,TemplateTransformer

but none of your columns seem to make use of these.  Are you sure you need them?

In any case I am suspicious there may still be bugs in 3.6.1 related to 
CachedSqlEntityProcessor, so if you are able to create a failing unit test and 
post it to JIRA that would be helpful.  If you need to, you can use the 3.5 DIH 
jar with Solr 3.6.1.  Also, I do not think the SOLR-3360 should affect you 
unless you're using the threads parameter.  Both SOLR-3360  SOLR-3430 fixed 
bugs related to CachedSqlEntityProcessor that were introduced in 3.6.0 (from 
SOLR-3411 and SOLR-2482 respectively).

Finally, if you are at all able to test this on 4.0-beta, I would greatly 
appreciate it!  SOLR-3411/SOLR-3360 were never applied to version 4.0 because 
threadS support was removed entirely.  However, SOLR-2482/SOLR-3430 were 
applied to 4.0 also.  If we have any more SOLR-2482 bugs lingering in 4.0 these 
really need to be fixed so any testing help would be much appreciated.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: mechravi25 [mailto:mechrav...@yahoo.co.in] 
Sent: Tuesday, August 14, 2012 8:04 AM
To: solr-user@lucene.apache.org
Subject: Dataimport Handler in solr 3.6.1

I am indexing some data using dataimport handler files in solr 3.6.1. I using
a nested entity in my handler file. 
I noticed a scenario where-in instead of the records which is to be fetched
for a document, 
all the records present in the table are indexed.

Following is the ideal scenario how the data has to be indexed.
For a document A, I am trying to index the 2 values B,C as a multivalued
field

idA/id
related_id
strB/str
strC/str
/related_id

This is how the output should be. I have used the same DIH file for solr
1.4,3.5 versions 
and the data was indexed fine like the one mentioned above in both the
versions.

But in solr 3.6.1 version, data was indexed differently. In my table, there
are 4 values(B,C,D,E) in related_id field.
This is how the data is indexed in 3.6.1

idA/id
related_id
strB/str
strC/str
strD/str
strE/str
/related_id

Ideally, the values D and E should not get indexed under id A. This is the
same for the other id records.


Following is the content of the DIH file



 entity name=ent1  query=select sid as id Table1 a 
transformer=RegexTransformer,DateFormatTransformer,TemplateTransformer

field column=id name=id boost=0.5/
  

entity name=ent2 query=select id1,rid from Table2 
processor=CachedSqlEntityProcessor cacheKey=id1 cacheLookup=ent1.uid
transformer=RegexTransformer,DateFormatTransformer,TemplateTransformer


field column=rid name=related_id/
   

/entity


/entity



 I tried changing the CachedSqlEntityProcessor to SqlEntityProcessor and
then indexed the same but still I faced the same issue.
 
 When I googled a bit, I found this url
https://issues.apache.org/jira/browse/SOLR-3360


I am not sure if the issue 3360 is the same as the scenario as I have
mentioned above.

Please guid me.

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dataimport-Handler-in-solr-3-6-1-tp4001149.html
Sent from the Solr - User mailing list archive at Nabble.com.