RE: Nested CachedSqlEntityProcessor running for each entity row with Solr 3.6?

2012-05-10 Thread Brent Mills
Hi James,

I just pulled down the newest nightly build of 4.0 and it solves an issue I had 
been having with solr ignoring the caching of the child entities.  It was 
basically opening a new connection for each iteration even though everything 
was specified correctly.  This was present in my previous build of 4.0 so it 
looks like you fixed it with one of those patches.  Thanks for all your work on 
the DIH, the caching improvements are a big help with some of the things we 
will be rolling out in production soon.

-Brent

-Original Message-
From: Dyer, James [mailto:james.d...@ingrambook.com] 
Sent: Monday, May 07, 2012 1:47 PM
To: solr-user@lucene.apache.org
Cc: Brent Mills; dye.kel...@gmail.com; keithn...@dswinc.com
Subject: RE: Nested CachedSqlEntityProcessor running for each entity row with 
Solr 3.6?

Dear Kellen, Brent  Keith,

There now are fixes available for 2 cache-related bugs that unfortunately made 
their way into the 3.6.0 release.  These were addressed on these 2 JIRA issues, 
which have been committed to the 3.6 branch (as of today):
- https://issues.apache.org/jira/browse/SOLR-3430
- https://issues.apache.org/jira/browse/SOLR-3360
These problem were also affecting Trunk/4.x, with both fixes being committed to 
Trunk under SOLR-3430.

Should Solr 3.6.1 be released, these fixes will become generally available at 
that time.  They also will be part of the 4.0 release, which the Development 
Community hopes will be later this year.

In the mean time, I am hoping each of you can test these fixes with your 
installation.  The best way to do this is to get a fresh SVN checkout of the 
3.6.1 branch 
(http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6/), switch 
to the solr directory, then run ant dist.  I believe you need Ant 1.8 to 
build.

If you are unable to build yourself, I put an *unofficial* shapshot of the DIH 
jar here:
 
http://people.apache.org/~jdyer/unofficial/apache-solr-dataimporthandler-3.6.1-SNAPSHOT-r1335176.jar

Please let me know if this solves your problems with DIH Caching, giving you 
the functionality you had with 3.5 and prior.  Your feedback is greatly 
appreciatd.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: not interesting [mailto:dye.kel...@gmail.com]
Sent: Monday, May 07, 2012 9:43 AM
To: solr-user@lucene.apache.org
Subject: Nested CachedSqlEntityProcessor running for each entity row with Solr 
3.6?

I just upgraded from Solr 3.4 to Solr 3.6; I'm using the same data-import.xml 
for both versions. The import functioned properly with 3.4.

I'm using a nested entity to fetch authors associated with each document, and 
I'm using CachedSqlEntityProcessor to avoid hitting the DB an unreasonable 
number of times. However, when indexing, Solr indexes very slowly and appears 
to be fetching all authors in the DB for each document. The index should be 
~500 megs; I aborted the indexing when it reached ~6gigs. If I comment out the 
nested author entity below, Solr will index normally.

Am I missing something obvious or is this a bug?

document name=documents
entity name=document dataSource=production
 transformer=HTMLStripTransformer,TemplateTransformer,RegexTransformer
 query=select id, ..., from document
field column=id name=id/
field column=uid name=uid template=DOC${document.id}/
!-- more fields .. --
entity name=author dataSource=production
 query=select
cast(da.document_id as text) as document_id,
a.id, a.name, a.signature from document_author da
left outer join author a on a.id = da.author_id
 cacheKey=document_id
 cacheLookup=document.id
 processor=CachedSqlEntityProcessor
 field name=author_id column=id /
 field name=author column=name /
 field name=author_signature column=signature /
/entity
/entity
/document

Also posted at SO if you prefer to answer there:
http://stackoverflow.com/questions/10482484/nested-cachedsqlentityprocessor-running-for-each-entity-row-with-solr-3-6

Kellen


Re: Nested CachedSqlEntityProcessor running for each entity row with Solr 3.6?

2012-05-08 Thread not interesting
 In the mean time, I am hoping each of you can test these fixes with your 
 installation.  The best way to do this is to get a fresh SVN checkout of the 
 3.6.1 branch 
 (http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6/), 
 switch to the solr directory, then run ant dist.  I believe you need Ant 
 1.8 to build.

 If you are unable to build yourself, I put an *unofficial* shapshot of the 
 DIH jar here:
  http://people.apache.org/~jdyer/unofficial/apache-solr-dataimporthandler-3.6.1-SNAPSHOT-r1335176.jar

I understood your suggestion to be that I should use the 3.6.1
dataimporthandler jars with my 3.6.0 installation.

If that was correct, then this has not solved my issue. I have tried
both the unofficial snapshot and my own built-from-source version of
the jars.

The behavior of DIH is the same; it fetches far more rows than it
should, the index grows to a very large size, and indexing is very
slow (10 minutes, 1 rows fetched, only 1500 documents
processed).

Kellen


RE: Nested CachedSqlEntityProcessor running for each entity row with Solr 3.6?

2012-05-08 Thread Dyer, James
Kellen,

I appreciate your trying this out.  Is there any way you can provide your 
data-config.xml file?  I'd really like to get to the bottom of this.  Thanks.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: not interesting [mailto:dye.kel...@gmail.com] 
Sent: Tuesday, May 08, 2012 2:39 AM
To: solr-user@lucene.apache.org
Subject: Re: Nested CachedSqlEntityProcessor running for each entity row with 
Solr 3.6?

 In the mean time, I am hoping each of you can test these fixes with your 
 installation.  The best way to do this is to get a fresh SVN checkout of the 
 3.6.1 branch 
 (http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6/), 
 switch to the solr directory, then run ant dist.  I believe you need Ant 
 1.8 to build.

 If you are unable to build yourself, I put an *unofficial* shapshot of the 
 DIH jar here:
  http://people.apache.org/~jdyer/unofficial/apache-solr-dataimporthandler-3.6.1-SNAPSHOT-r1335176.jar

I understood your suggestion to be that I should use the 3.6.1
dataimporthandler jars with my 3.6.0 installation.

If that was correct, then this has not solved my issue. I have tried
both the unofficial snapshot and my own built-from-source version of
the jars.

The behavior of DIH is the same; it fetches far more rows than it
should, the index grows to a very large size, and indexing is very
slow (10 minutes, 1 rows fetched, only 1500 documents
processed).

Kellen


Re: Nested CachedSqlEntityProcessor running for each entity row with Solr 3.6?

2012-05-07 Thread Mikhail Khludnev
Hi,

it sounds like
https://issues.apache.org/jira/browse/SOLR-3360
fix is committed, tests are on going.

On Mon, May 7, 2012 at 6:42 PM, not interesting dye.kel...@gmail.comwrote:

 I just upgraded from Solr 3.4 to Solr 3.6; I'm using the same
 data-import.xml for both versions. The import functioned properly with
 3.4.

 I'm using a nested entity to fetch authors associated with each
 document, and I'm using CachedSqlEntityProcessor to avoid hitting the
 DB an unreasonable number of times. However, when indexing, Solr
 indexes very slowly and appears to be fetching all authors in the DB
 for each document. The index should be ~500 megs; I aborted the
 indexing when it reached ~6gigs. If I comment out the nested author
 entity below, Solr will index normally.

 Am I missing something obvious or is this a bug?

 document name=documents
entity name=document dataSource=production
 transformer=HTMLStripTransformer,TemplateTransformer,RegexTransformer
 query=select id, ..., from document
field column=id name=id/
field column=uid name=uid template=DOC${document.id}/
!-- more fields .. --
entity name=author dataSource=production
 query=select
cast(da.document_id as text) as document_id,
a.id, a.name, a.signature from document_author da
left outer join author a on a.id = da.author_id
 cacheKey=document_id
 cacheLookup=document.id
 processor=CachedSqlEntityProcessor
 field name=author_id column=id /
 field name=author column=name /
 field name=author_signature column=signature /
/entity
/entity
 /document

 Also posted at SO if you prefer to answer there:

 http://stackoverflow.com/questions/10482484/nested-cachedsqlentityprocessor-running-for-each-entity-row-with-solr-3-6

 Kellen




-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: Nested CachedSqlEntityProcessor running for each entity row with Solr 3.6?

2012-05-07 Thread not interesting
 it sounds like
 https://issues.apache.org/jira/browse/SOLR-3360
 fix is committed, tests are on going.

Hmm, I'm running solr behind tomcat; where can I configure Solr to use
only a single thread for testing?

Kellen


Re: Nested CachedSqlEntityProcessor running for each entity row with Solr 3.6?

2012-05-07 Thread Mikhail Khludnev
Your dataconfig.xml is already single threaded. Bug is in DIH 3.6.0 code.
There should be a link to the fixed jar in the comments.

On Mon, May 7, 2012 at 7:15 PM, not interesting dye.kel...@gmail.comwrote:

  it sounds like
  https://issues.apache.org/jira/browse/SOLR-3360
  fix is committed, tests are on going.

 Hmm, I'm running solr behind tomcat; where can I configure Solr to use
 only a single thread for testing?

 Kellen




-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


RE: Nested CachedSqlEntityProcessor running for each entity row with Solr 3.6?

2012-05-07 Thread Dyer, James
Dear Kellen, Brent  Keith,

There now are fixes available for 2 cache-related bugs that unfortunately made 
their way into the 3.6.0 release.  These were addressed on these 2 JIRA issues, 
which have been committed to the 3.6 branch (as of today):
- https://issues.apache.org/jira/browse/SOLR-3430
- https://issues.apache.org/jira/browse/SOLR-3360
These problem were also affecting Trunk/4.x, with both fixes being committed to 
Trunk under SOLR-3430.

Should Solr 3.6.1 be released, these fixes will become generally available at 
that time.  They also will be part of the 4.0 release, which the Development 
Community hopes will be later this year.

In the mean time, I am hoping each of you can test these fixes with your 
installation.  The best way to do this is to get a fresh SVN checkout of the 
3.6.1 branch 
(http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6/), switch 
to the solr directory, then run ant dist.  I believe you need Ant 1.8 to 
build.

If you are unable to build yourself, I put an *unofficial* shapshot of the DIH 
jar here:
 
http://people.apache.org/~jdyer/unofficial/apache-solr-dataimporthandler-3.6.1-SNAPSHOT-r1335176.jar

Please let me know if this solves your problems with DIH Caching, giving you 
the functionality you had with 3.5 and prior.  Your feedback is greatly 
appreciatd.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: not interesting [mailto:dye.kel...@gmail.com] 
Sent: Monday, May 07, 2012 9:43 AM
To: solr-user@lucene.apache.org
Subject: Nested CachedSqlEntityProcessor running for each entity row with Solr 
3.6?

I just upgraded from Solr 3.4 to Solr 3.6; I'm using the same
data-import.xml for both versions. The import functioned properly with
3.4.

I'm using a nested entity to fetch authors associated with each
document, and I'm using CachedSqlEntityProcessor to avoid hitting the
DB an unreasonable number of times. However, when indexing, Solr
indexes very slowly and appears to be fetching all authors in the DB
for each document. The index should be ~500 megs; I aborted the
indexing when it reached ~6gigs. If I comment out the nested author
entity below, Solr will index normally.

Am I missing something obvious or is this a bug?

document name=documents
entity name=document dataSource=production
 transformer=HTMLStripTransformer,TemplateTransformer,RegexTransformer
 query=select id, ..., from document
field column=id name=id/
field column=uid name=uid template=DOC${document.id}/
!-- more fields .. --
entity name=author dataSource=production
 query=select
cast(da.document_id as text) as document_id,
a.id, a.name, a.signature from document_author da
left outer join author a on a.id = da.author_id
 cacheKey=document_id
 cacheLookup=document.id
 processor=CachedSqlEntityProcessor
 field name=author_id column=id /
 field name=author column=name /
 field name=author_signature column=signature /
/entity
/entity
/document

Also posted at SO if you prefer to answer there:
http://stackoverflow.com/questions/10482484/nested-cachedsqlentityprocessor-running-for-each-entity-row-with-solr-3-6

Kellen