Re: DIH - Closing ResultSet in JdbcDataSource

2011-01-12 Thread Shane Perry
I have found where a root entity has completed processing and added the
logic to clear the entity's cache at that point (didn't change any of the
logic for clearing all entity caches once the import has completed).  I have
also created an enhancement request found at
https://issues.apache.org/jira/browse/SOLR-2313.

On Tue, Jan 11, 2011 at 2:54 PM, Shane Perry thry...@gmail.com wrote:

 By placing some strategic debug messages, I have found that the JDBC
 connections are not being closed until all entity elements have been
 processed (in the entire config file).  A simplified example would be:

 dataConfig
   dataSource name=ds1 driver=org.postgresql.Driver
 url=jdbc:postgresql://localhost:5432/db1 user=... password=... /
   dataSource name=ds2 driver=org.postgresql.Driver
 url=jdbc:postgresql://localhost:5432/db2 user=... password=... /

   document
 entity name=entity1 datasource=ds1 ...
   ... field list ...
   entity name=entity1a datasource=ds1 ...
 ... field list ...
   /entity
/entity
 entity name=entity2 datasource=ds2 ...
   ... field list ...
   entity name=entity2a datasource=ds2 ...
 ... field list ...
   /entity
/entity
   /document
 /dataConfig

 The behavior is:

 JDBC connection opened for entity1 and entity1a - Applicable queries run
 and ResultSet objects processed
 All open ResultSet and Statement objects closed for entity1 and entity1a
 JDBC connection opened for entity2 and entity2a - Applicable queries run
 and ResultSet objects processed
 All open ResultSet and Statement objects closed for entity2 and entity2a
 All JDBC connections (none are closed at this point) are closed.

 In my instance, I have some 95 unique entity elements (19 parents with 5
 children each), resulting in 95 open JDBC connections.  If I understand the
 process correctly, it should be safe to close the JDBC connection for a
 root entity (immediate children of document) and all descendant
 entity elements once the parent has been successfully completed.  I have
 been digging around the code, but due to my unfamiliarity with the code, I'm
 not sure where this would occur.

 Is this a valid solution?  It's looking like I should probably open a
 defect and I'm willing to do so along with submitting a patch, but need a
 little more direction on where the fix would best reside.

 Thanks,

 Shane



 On Mon, Jan 10, 2011 at 7:14 AM, Shane Perry thry...@gmail.com wrote:

 Gora,

 Thanks for the response.  After taking another look, you are correct about
 the hasnext() closing the ResultSet object (1.4.1 as well as 1.4.0).  I
 didn't recognize the case difference in the two function calls, so missed
 it.  I'll keep looking into the original issue and reply if I find a
 cause/solution.

 Shane


 On Sat, Jan 8, 2011 at 4:04 AM, Gora Mohanty g...@mimirtech.com wrote:

 On Sat, Jan 8, 2011 at 1:10 AM, Shane Perry thry...@gmail.com wrote:
  Hi,
 
  I am in the process of migrating our system from Postgres 8.4 to Solr
  1.4.1.  Our system is fairly complex and as a result, I have had to
 define
  19 base entities in the data-config.xml definition file.  Each of these
  entities executes 5 queries.  When doing a full-import, as each entity
  completes, the server hosting Postgres shows 5 idle in transaction
 for the
  entity.
 
  In digging through the code, I found that the JdbcDataSource wraps the
  ResultSet object in a custom ResultSetIterator object, leaving the
 ResultSet
  open.  Walking through the code I can't find a close() call anywhere on
 the
  ResultSet.  I believe this results in the idle in transaction
 processes.
 [...]

 Have not examined the idle in transaction issue that you
 mention, but the ResultSet object in a ResultSetIterator is
 closed in the private hasnext() method, when there are no
 more results, or if there is an exception. hasnext() is called
 by the public hasNext() method that should be used in
 iterating over the results, so I see no issue there.

 Regards,
 Gora

 P.S. This is from Solr 1.4.0 code, but I would not think that
this part of the code would have changed.






Re: DIH - Closing ResultSet in JdbcDataSource

2011-01-11 Thread Shane Perry
By placing some strategic debug messages, I have found that the JDBC
connections are not being closed until all entity elements have been
processed (in the entire config file).  A simplified example would be:

dataConfig
  dataSource name=ds1 driver=org.postgresql.Driver
url=jdbc:postgresql://localhost:5432/db1 user=... password=... /
  dataSource name=ds2 driver=org.postgresql.Driver
url=jdbc:postgresql://localhost:5432/db2 user=... password=... /

  document
entity name=entity1 datasource=ds1 ...
  ... field list ...
  entity name=entity1a datasource=ds1 ...
... field list ...
  /entity
   /entity
entity name=entity2 datasource=ds2 ...
  ... field list ...
  entity name=entity2a datasource=ds2 ...
... field list ...
  /entity
   /entity
  /document
/dataConfig

The behavior is:

JDBC connection opened for entity1 and entity1a - Applicable queries run and
ResultSet objects processed
All open ResultSet and Statement objects closed for entity1 and entity1a
JDBC connection opened for entity2 and entity2a - Applicable queries run and
ResultSet objects processed
All open ResultSet and Statement objects closed for entity2 and entity2a
All JDBC connections (none are closed at this point) are closed.

In my instance, I have some 95 unique entity elements (19 parents with 5
children each), resulting in 95 open JDBC connections.  If I understand the
process correctly, it should be safe to close the JDBC connection for a
root entity (immediate children of document) and all descendant
entity elements once the parent has been successfully completed.  I have
been digging around the code, but due to my unfamiliarity with the code, I'm
not sure where this would occur.

Is this a valid solution?  It's looking like I should probably open a defect
and I'm willing to do so along with submitting a patch, but need a little
more direction on where the fix would best reside.

Thanks,

Shane


On Mon, Jan 10, 2011 at 7:14 AM, Shane Perry thry...@gmail.com wrote:

 Gora,

 Thanks for the response.  After taking another look, you are correct about
 the hasnext() closing the ResultSet object (1.4.1 as well as 1.4.0).  I
 didn't recognize the case difference in the two function calls, so missed
 it.  I'll keep looking into the original issue and reply if I find a
 cause/solution.

 Shane


 On Sat, Jan 8, 2011 at 4:04 AM, Gora Mohanty g...@mimirtech.com wrote:

 On Sat, Jan 8, 2011 at 1:10 AM, Shane Perry thry...@gmail.com wrote:
  Hi,
 
  I am in the process of migrating our system from Postgres 8.4 to Solr
  1.4.1.  Our system is fairly complex and as a result, I have had to
 define
  19 base entities in the data-config.xml definition file.  Each of these
  entities executes 5 queries.  When doing a full-import, as each entity
  completes, the server hosting Postgres shows 5 idle in transaction for
 the
  entity.
 
  In digging through the code, I found that the JdbcDataSource wraps the
  ResultSet object in a custom ResultSetIterator object, leaving the
 ResultSet
  open.  Walking through the code I can't find a close() call anywhere on
 the
  ResultSet.  I believe this results in the idle in transaction
 processes.
 [...]

 Have not examined the idle in transaction issue that you
 mention, but the ResultSet object in a ResultSetIterator is
 closed in the private hasnext() method, when there are no
 more results, or if there is an exception. hasnext() is called
 by the public hasNext() method that should be used in
 iterating over the results, so I see no issue there.

 Regards,
 Gora

 P.S. This is from Solr 1.4.0 code, but I would not think that
this part of the code would have changed.





Re: DIH - Closing ResultSet in JdbcDataSource

2011-01-10 Thread Shane Perry
Gora,

Thanks for the response.  After taking another look, you are correct about
the hasnext() closing the ResultSet object (1.4.1 as well as 1.4.0).  I
didn't recognize the case difference in the two function calls, so missed
it.  I'll keep looking into the original issue and reply if I find a
cause/solution.

Shane

On Sat, Jan 8, 2011 at 4:04 AM, Gora Mohanty g...@mimirtech.com wrote:

 On Sat, Jan 8, 2011 at 1:10 AM, Shane Perry thry...@gmail.com wrote:
  Hi,
 
  I am in the process of migrating our system from Postgres 8.4 to Solr
  1.4.1.  Our system is fairly complex and as a result, I have had to
 define
  19 base entities in the data-config.xml definition file.  Each of these
  entities executes 5 queries.  When doing a full-import, as each entity
  completes, the server hosting Postgres shows 5 idle in transaction for
 the
  entity.
 
  In digging through the code, I found that the JdbcDataSource wraps the
  ResultSet object in a custom ResultSetIterator object, leaving the
 ResultSet
  open.  Walking through the code I can't find a close() call anywhere on
 the
  ResultSet.  I believe this results in the idle in transaction
 processes.
 [...]

 Have not examined the idle in transaction issue that you
 mention, but the ResultSet object in a ResultSetIterator is
 closed in the private hasnext() method, when there are no
 more results, or if there is an exception. hasnext() is called
 by the public hasNext() method that should be used in
 iterating over the results, so I see no issue there.

 Regards,
 Gora

 P.S. This is from Solr 1.4.0 code, but I would not think that
this part of the code would have changed.



Re: DIH - Closing ResultSet in JdbcDataSource

2011-01-08 Thread Gora Mohanty
On Sat, Jan 8, 2011 at 1:10 AM, Shane Perry thry...@gmail.com wrote:
 Hi,

 I am in the process of migrating our system from Postgres 8.4 to Solr
 1.4.1.  Our system is fairly complex and as a result, I have had to define
 19 base entities in the data-config.xml definition file.  Each of these
 entities executes 5 queries.  When doing a full-import, as each entity
 completes, the server hosting Postgres shows 5 idle in transaction for the
 entity.

 In digging through the code, I found that the JdbcDataSource wraps the
 ResultSet object in a custom ResultSetIterator object, leaving the ResultSet
 open.  Walking through the code I can't find a close() call anywhere on the
 ResultSet.  I believe this results in the idle in transaction processes.
[...]

Have not examined the idle in transaction issue that you
mention, but the ResultSet object in a ResultSetIterator is
closed in the private hasnext() method, when there are no
more results, or if there is an exception. hasnext() is called
by the public hasNext() method that should be used in
iterating over the results, so I see no issue there.

Regards,
Gora

P.S. This is from Solr 1.4.0 code, but I would not think that
this part of the code would have changed.


DIH - Closing ResultSet in JdbcDataSource

2011-01-07 Thread Shane Perry
Hi,

I am in the process of migrating our system from Postgres 8.4 to Solr
1.4.1.  Our system is fairly complex and as a result, I have had to define
19 base entities in the data-config.xml definition file.  Each of these
entities executes 5 queries.  When doing a full-import, as each entity
completes, the server hosting Postgres shows 5 idle in transaction for the
entity.

In digging through the code, I found that the JdbcDataSource wraps the
ResultSet object in a custom ResultSetIterator object, leaving the ResultSet
open.  Walking through the code I can't find a close() call anywhere on the
ResultSet.  I believe this results in the idle in transaction processes.

Am I off base here?  I'm not sure what the overall implications are of the
idle in transaction processes, but is there a way I can get around the
issue without importing each entity manually?  Any feedback would be greatly
appreciated.

Thanks in advance,

Shane


Re: DIH - Closing ResultSet in JdbcDataSource

2011-01-07 Thread Adam Estrada
This is my configuration which seems to work just fine.

?xml version=1.0 encoding=utf-8 ?
dataConfig
  dataSource type=JdbcDataSource
   name=DBImport
   driver=net.sourceforge.jtds.jdbc.Driver

url=jdbc:jtds:sqlserver://localhost;databaseName=50_DEV;responseBuffering=adaptive;
   user=test
   password=test
   onError=skip/
  document

From there it's just a matter of running the select statement and mapping it
against the correct fields in your index.

Adam

On Fri, Jan 7, 2011 at 2:40 PM, Shane Perry thry...@gmail.com wrote:

 Hi,

 I am in the process of migrating our system from Postgres 8.4 to Solr
 1.4.1.  Our system is fairly complex and as a result, I have had to define
 19 base entities in the data-config.xml definition file.  Each of these
 entities executes 5 queries.  When doing a full-import, as each entity
 completes, the server hosting Postgres shows 5 idle in transaction for
 the
 entity.

 In digging through the code, I found that the JdbcDataSource wraps the
 ResultSet object in a custom ResultSetIterator object, leaving the
 ResultSet
 open.  Walking through the code I can't find a close() call anywhere on the
 ResultSet.  I believe this results in the idle in transaction processes.

 Am I off base here?  I'm not sure what the overall implications are of the
 idle in transaction processes, but is there a way I can get around the
 issue without importing each entity manually?  Any feedback would be
 greatly
 appreciated.

 Thanks in advance,

 Shane