Re: DataImportHandler running out of memory

2012-02-23 Thread Shawn Heisey

On 2/20/2012 6:49 AM, v_shan wrote:

DIH still running out of memory for me, with Full Import on a database of
size 1.5 GB.

Solr version: 3_5_0

Note that I have already added batchSize=-1 but getting same error.


A few questions:

- How much memory have you given to the JVM running this Solr instance?
- How much memory does your server have?
- What is the size of all your index cores, and how many documents are 
in them?
- How large are your Solr caches (filterCache, documentCache, 
queryResultCache)?

- What is your ramBufferSize set to in the indexDefaults section?

Thanks,
Shawn



Re: DataImportHandler running out of memory

2012-02-20 Thread v_shan
(DocBuilder.java:636)
... 5 more

Feb 20, 2012 7:07:45 PM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: start rollback
Feb 20, 2012 7:07:45 PM org.apache.solr.update.DirectUpdateHandler2 rollback

--
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImportHandler-running-out-of-memory-tp490797p3760755.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: DataImportHandler running out of memory

2008-11-03 Thread sunnyfr

Hi,

I tried batchSize =-1 but when I'm doing that I will use all mysql's memory
and it's a problem for mysql's database.

:s


Noble Paul നോബിള്‍ नोब्ळ् wrote:
 
 I've moved the FAQ to a new Page
 http://wiki.apache.org/solr/DataImportHandlerFaq
 The DIH page is too big and editing has become harder
 
 On Thu, Jun 26, 2008 at 6:07 PM, Shalin Shekhar Mangar
 [EMAIL PROTECTED] wrote:
 I've added a FAQ section to DataImportHandler wiki page which captures
 question on out of memory exception with both MySQL and MS SQL Server
 drivers.

 http://wiki.apache.org/solr/DataImportHandler#faq

 On Thu, Jun 26, 2008 at 9:36 AM, Noble Paul നോബിള്‍ नोब्ळ्
 [EMAIL PROTECTED] wrote:
 We must document this information in the wiki.  We never had a chance
 to play w/ ms sql server
 --Noble

 On Thu, Jun 26, 2008 at 12:38 AM, wojtekpia [EMAIL PROTECTED]
 wrote:

 It looks like that was the problem. With responseBuffering=adaptive,
 I'm able
 to load all my data using the sqljdbc driver.
 --
 View this message in context:
 http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18119732.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 --Noble Paul




 --
 Regards,
 Shalin Shekhar Mangar.

 
 
 
 -- 
 --Noble Paul
 
 

-- 
View this message in context: 
http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p20305039.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DataImportHandler running out of memory

2008-10-31 Thread sunnyfr

Hi Grant,

How did you finally managed it  
I've the same problem with less data, 8,5M, if I put a batchsize -1, I will
slow down a lot the database which is not that good for the website and
stack request.
What did you do you ??? 

Thanks,


Grant Ingersoll-6 wrote:
 
 I think it's a bit different.  I ran into this exact problem about two  
 weeks ago on a 13 million record DB.  MySQL doesn't honor the fetch  
 size for it's v5 JDBC driver.
 
 See
 http://www.databasesandlife.com/reading-row-by-row-into-java-from-mysql/ 
   or do a search for MySQL fetch size.
 
 You actually have to do setFetchSize(Integer.MIN_VALUE) (-1 doesn't  
 work) in order to get streaming in MySQL.
 
 -Grant
 
 
 On Jun 24, 2008, at 10:35 PM, Shalin Shekhar Mangar wrote:
 
 Setting the batchSize to 1 would mean that the Jdbc driver will  
 keep
 1 rows in memory *for each entity* which uses that data source (if
 correctly implemented by the driver). Not sure how well the Sql Server
 driver implements this. Also keep in mind that Solr also needs  
 memory to
 index documents. You can probably try setting the batch size to a  
 lower
 value.

 The regular memory tuning stuff should apply here too -- try disabling
 autoCommit and turn-off autowarming and see if it helps.

 On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED]  
 wrote:


 I'm trying to load ~10 million records into Solr using the
 DataImportHandler.
 I'm running out of memory (java.lang.OutOfMemoryError: Java heap  
 space) as
 soon as I try loading more than about 5 million records.

 Here's my configuration:
 I'm connecting to a SQL Server database using the sqljdbc driver.  
 I've
 given
 my Solr instance 1.5 GB of memory. I have set the dataSource  
 batchSize to
 1. My SQL query is select top XXX field1, ... from table1. I  
 have
 about 40 fields in my Solr schema.

 I thought the DataImportHandler would stream data from the DB  
 rather than
 loading it all into memory at once. Is that not the case? Any  
 thoughts on
 how to get around this (aside from getting a machine with more  
 memory)?

 --
 View this message in context:
 http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 --
 Grant Ingersoll
 http://www.lucidimagination.com
 
 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ
 
 
 
 
 
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p20263146.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DataImportHandler running out of memory

2008-10-31 Thread Noble Paul നോബിള്‍ नोब्ळ्
I've moved the FAQ to a new Page
http://wiki.apache.org/solr/DataImportHandlerFaq
The DIH page is too big and editing has become harder

On Thu, Jun 26, 2008 at 6:07 PM, Shalin Shekhar Mangar
[EMAIL PROTECTED] wrote:
 I've added a FAQ section to DataImportHandler wiki page which captures
 question on out of memory exception with both MySQL and MS SQL Server
 drivers.

 http://wiki.apache.org/solr/DataImportHandler#faq

 On Thu, Jun 26, 2008 at 9:36 AM, Noble Paul നോബിള്‍ नोब्ळ्
 [EMAIL PROTECTED] wrote:
 We must document this information in the wiki.  We never had a chance
 to play w/ ms sql server
 --Noble

 On Thu, Jun 26, 2008 at 12:38 AM, wojtekpia [EMAIL PROTECTED] wrote:

 It looks like that was the problem. With responseBuffering=adaptive, I'm 
 able
 to load all my data using the sqljdbc driver.
 --
 View this message in context: 
 http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18119732.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 --Noble Paul




 --
 Regards,
 Shalin Shekhar Mangar.




-- 
--Noble Paul


Re: DataImportHandler running out of memory

2008-06-26 Thread Shalin Shekhar Mangar
I've added a FAQ section to DataImportHandler wiki page which captures
question on out of memory exception with both MySQL and MS SQL Server
drivers.

http://wiki.apache.org/solr/DataImportHandler#faq

On Thu, Jun 26, 2008 at 9:36 AM, Noble Paul നോബിള്‍ नोब्ळ्
[EMAIL PROTECTED] wrote:
 We must document this information in the wiki.  We never had a chance
 to play w/ ms sql server
 --Noble

 On Thu, Jun 26, 2008 at 12:38 AM, wojtekpia [EMAIL PROTECTED] wrote:

 It looks like that was the problem. With responseBuffering=adaptive, I'm able
 to load all my data using the sqljdbc driver.
 --
 View this message in context: 
 http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18119732.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 --Noble Paul




-- 
Regards,
Shalin Shekhar Mangar.


Re: DataImportHandler running out of memory

2008-06-25 Thread Grant Ingersoll
I think it's a bit different.  I ran into this exact problem about two  
weeks ago on a 13 million record DB.  MySQL doesn't honor the fetch  
size for it's v5 JDBC driver.


See http://www.databasesandlife.com/reading-row-by-row-into-java-from-mysql/ 
 or do a search for MySQL fetch size.


You actually have to do setFetchSize(Integer.MIN_VALUE) (-1 doesn't  
work) in order to get streaming in MySQL.


-Grant


On Jun 24, 2008, at 10:35 PM, Shalin Shekhar Mangar wrote:

Setting the batchSize to 1 would mean that the Jdbc driver will  
keep

1 rows in memory *for each entity* which uses that data source (if
correctly implemented by the driver). Not sure how well the Sql Server
driver implements this. Also keep in mind that Solr also needs  
memory to
index documents. You can probably try setting the batch size to a  
lower

value.

The regular memory tuning stuff should apply here too -- try disabling
autoCommit and turn-off autowarming and see if it helps.

On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED]  
wrote:




I'm trying to load ~10 million records into Solr using the
DataImportHandler.
I'm running out of memory (java.lang.OutOfMemoryError: Java heap  
space) as

soon as I try loading more than about 5 million records.

Here's my configuration:
I'm connecting to a SQL Server database using the sqljdbc driver.  
I've

given
my Solr instance 1.5 GB of memory. I have set the dataSource  
batchSize to
1. My SQL query is select top XXX field1, ... from table1. I  
have

about 40 fields in my Solr schema.

I thought the DataImportHandler would stream data from the DB  
rather than
loading it all into memory at once. Is that not the case? Any  
thoughts on
how to get around this (aside from getting a machine with more  
memory)?


--
View this message in context:
http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
Sent from the Solr - User mailing list archive at Nabble.com.





--
Regards,
Shalin Shekhar Mangar.


--
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ









Re: DataImportHandler running out of memory

2008-06-25 Thread Grant Ingersoll
I'm assuming, of course, that the DIH doesn't automatically modify the  
SQL statement according to the batch size.


-Grant

On Jun 25, 2008, at 7:05 AM, Grant Ingersoll wrote:

I think it's a bit different.  I ran into this exact problem about  
two weeks ago on a 13 million record DB.  MySQL doesn't honor the  
fetch size for it's v5 JDBC driver.


See http://www.databasesandlife.com/reading-row-by-row-into-java-from-mysql/ 
 or do a search for MySQL fetch size.


You actually have to do setFetchSize(Integer.MIN_VALUE) (-1 doesn't  
work) in order to get streaming in MySQL.


-Grant


On Jun 24, 2008, at 10:35 PM, Shalin Shekhar Mangar wrote:

Setting the batchSize to 1 would mean that the Jdbc driver will  
keep
1 rows in memory *for each entity* which uses that data source  
(if
correctly implemented by the driver). Not sure how well the Sql  
Server
driver implements this. Also keep in mind that Solr also needs  
memory to
index documents. You can probably try setting the batch size to a  
lower

value.

The regular memory tuning stuff should apply here too -- try  
disabling

autoCommit and turn-off autowarming and see if it helps.

On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED]  
wrote:




I'm trying to load ~10 million records into Solr using the
DataImportHandler.
I'm running out of memory (java.lang.OutOfMemoryError: Java heap  
space) as

soon as I try loading more than about 5 million records.

Here's my configuration:
I'm connecting to a SQL Server database using the sqljdbc driver.  
I've

given
my Solr instance 1.5 GB of memory. I have set the dataSource  
batchSize to
1. My SQL query is select top XXX field1, ... from table1. I  
have

about 40 fields in my Solr schema.

I thought the DataImportHandler would stream data from the DB  
rather than
loading it all into memory at once. Is that not the case? Any  
thoughts on
how to get around this (aside from getting a machine with more  
memory)?


--
View this message in context:
http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
Sent from the Solr - User mailing list archive at Nabble.com.





--
Regards,
Shalin Shekhar Mangar.


--
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ









--
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ









Re: DataImportHandler running out of memory

2008-06-25 Thread Shalin Shekhar Mangar
The OP is actually using Sql Server (not MySql) as per his mail.

On Wed, Jun 25, 2008 at 4:40 PM, Grant Ingersoll [EMAIL PROTECTED]
wrote:

 I'm assuming, of course, that the DIH doesn't automatically modify the SQL
 statement according to the batch size.

 -Grant


 On Jun 25, 2008, at 7:05 AM, Grant Ingersoll wrote:

  I think it's a bit different.  I ran into this exact problem about two
 weeks ago on a 13 million record DB.  MySQL doesn't honor the fetch size for
 it's v5 JDBC driver.

 See
 http://www.databasesandlife.com/reading-row-by-row-into-java-from-mysql/ or
 do a search for MySQL fetch size.

 You actually have to do setFetchSize(Integer.MIN_VALUE) (-1 doesn't work)
 in order to get streaming in MySQL.

 -Grant


 On Jun 24, 2008, at 10:35 PM, Shalin Shekhar Mangar wrote:

  Setting the batchSize to 1 would mean that the Jdbc driver will keep
 1 rows in memory *for each entity* which uses that data source (if
 correctly implemented by the driver). Not sure how well the Sql Server
 driver implements this. Also keep in mind that Solr also needs memory to
 index documents. You can probably try setting the batch size to a lower
 value.

 The regular memory tuning stuff should apply here too -- try disabling
 autoCommit and turn-off autowarming and see if it helps.

 On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED] wrote:


 I'm trying to load ~10 million records into Solr using the
 DataImportHandler.
 I'm running out of memory (java.lang.OutOfMemoryError: Java heap space)
 as
 soon as I try loading more than about 5 million records.

 Here's my configuration:
 I'm connecting to a SQL Server database using the sqljdbc driver. I've
 given
 my Solr instance 1.5 GB of memory. I have set the dataSource batchSize
 to
 1. My SQL query is select top XXX field1, ... from table1. I have
 about 40 fields in my Solr schema.

 I thought the DataImportHandler would stream data from the DB rather
 than
 loading it all into memory at once. Is that not the case? Any thoughts
 on
 how to get around this (aside from getting a machine with more memory)?

 --
 View this message in context:

 http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 Regards,
 Shalin Shekhar Mangar.


 --
 Grant Ingersoll
 http://www.lucidimagination.com

 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ








 --
 Grant Ingersoll
 http://www.lucidimagination.com

 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ










-- 
Regards,
Shalin Shekhar Mangar.


Re: DataImportHandler running out of memory

2008-06-25 Thread Noble Paul നോബിള്‍ नोब्ळ्
DIH does not modify SQL. This value is used as a connection property
--Noble

On Wed, Jun 25, 2008 at 4:40 PM, Grant Ingersoll [EMAIL PROTECTED] wrote:
 I'm assuming, of course, that the DIH doesn't automatically modify the SQL
 statement according to the batch size.

 -Grant

 On Jun 25, 2008, at 7:05 AM, Grant Ingersoll wrote:

 I think it's a bit different.  I ran into this exact problem about two
 weeks ago on a 13 million record DB.  MySQL doesn't honor the fetch size for
 it's v5 JDBC driver.

 See
 http://www.databasesandlife.com/reading-row-by-row-into-java-from-mysql/ or
 do a search for MySQL fetch size.

 You actually have to do setFetchSize(Integer.MIN_VALUE) (-1 doesn't work)
 in order to get streaming in MySQL.

 -Grant


 On Jun 24, 2008, at 10:35 PM, Shalin Shekhar Mangar wrote:

 Setting the batchSize to 1 would mean that the Jdbc driver will keep
 1 rows in memory *for each entity* which uses that data source (if
 correctly implemented by the driver). Not sure how well the Sql Server
 driver implements this. Also keep in mind that Solr also needs memory to
 index documents. You can probably try setting the batch size to a lower
 value.

 The regular memory tuning stuff should apply here too -- try disabling
 autoCommit and turn-off autowarming and see if it helps.

 On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED] wrote:


 I'm trying to load ~10 million records into Solr using the
 DataImportHandler.
 I'm running out of memory (java.lang.OutOfMemoryError: Java heap space)
 as
 soon as I try loading more than about 5 million records.

 Here's my configuration:
 I'm connecting to a SQL Server database using the sqljdbc driver. I've
 given
 my Solr instance 1.5 GB of memory. I have set the dataSource batchSize
 to
 1. My SQL query is select top XXX field1, ... from table1. I have
 about 40 fields in my Solr schema.

 I thought the DataImportHandler would stream data from the DB rather
 than
 loading it all into memory at once. Is that not the case? Any thoughts
 on
 how to get around this (aside from getting a machine with more memory)?

 --
 View this message in context:

 http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 Regards,
 Shalin Shekhar Mangar.

 --
 Grant Ingersoll
 http://www.lucidimagination.com

 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ








 --
 Grant Ingersoll
 http://www.lucidimagination.com

 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ











-- 
--Noble Paul


Re: DataImportHandler running out of memory

2008-06-25 Thread Noble Paul നോബിള്‍ नोब्ळ्
The latest patch sets fetchSize as Integer.MIN_VALUE if -1 is passed.
It is added specifically for mysql driver
--Noble

On Wed, Jun 25, 2008 at 4:35 PM, Grant Ingersoll [EMAIL PROTECTED] wrote:
 I think it's a bit different.  I ran into this exact problem about two weeks
 ago on a 13 million record DB.  MySQL doesn't honor the fetch size for it's
 v5 JDBC driver.

 See
 http://www.databasesandlife.com/reading-row-by-row-into-java-from-mysql/ or
 do a search for MySQL fetch size.

 You actually have to do setFetchSize(Integer.MIN_VALUE) (-1 doesn't work) in
 order to get streaming in MySQL.

 -Grant


 On Jun 24, 2008, at 10:35 PM, Shalin Shekhar Mangar wrote:

 Setting the batchSize to 1 would mean that the Jdbc driver will keep
 1 rows in memory *for each entity* which uses that data source (if
 correctly implemented by the driver). Not sure how well the Sql Server
 driver implements this. Also keep in mind that Solr also needs memory to
 index documents. You can probably try setting the batch size to a lower
 value.

 The regular memory tuning stuff should apply here too -- try disabling
 autoCommit and turn-off autowarming and see if it helps.

 On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED] wrote:


 I'm trying to load ~10 million records into Solr using the
 DataImportHandler.
 I'm running out of memory (java.lang.OutOfMemoryError: Java heap space)
 as
 soon as I try loading more than about 5 million records.

 Here's my configuration:
 I'm connecting to a SQL Server database using the sqljdbc driver. I've
 given
 my Solr instance 1.5 GB of memory. I have set the dataSource batchSize to
 1. My SQL query is select top XXX field1, ... from table1. I have
 about 40 fields in my Solr schema.

 I thought the DataImportHandler would stream data from the DB rather than
 loading it all into memory at once. Is that not the case? Any thoughts on
 how to get around this (aside from getting a machine with more memory)?

 --
 View this message in context:

 http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 Regards,
 Shalin Shekhar Mangar.

 --
 Grant Ingersoll
 http://www.lucidimagination.com

 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ


Re: DataImportHandler running out of memory

2008-06-25 Thread wojtekpia

I'm trying with batchSize=-1 now. So far it seems to be working, but very
slowly. I will update when it completes or crashes.

Even with a batchSize of 100 I was running out of memory.

I'm running on a 32-bit Windows machine. I've set the -Xmx to 1.5 GB - I
believe that's the maximum for my environment.

The batchSize parameter doesn't seem to control what happens... when I
select top 5,000,000 with a batchSize of 10,000, it works. When I select top
10,000,000 with the same batchSize, it runs out of memory.

Also, I'm using the 469 patch posted on 2008-06-11 08:41 AM.


Noble Paul നോബിള്‍ नोब्ळ् wrote:
 
 DIH streams rows one by one.
 set the fetchSize=-1 this might help. It may make the indexing a bit
 slower but memory consumption would be low.
 The memory is consumed by the jdbc driver. try tuning the -Xmx value for
 the VM
 --Noble
 
 On Wed, Jun 25, 2008 at 8:05 AM, Shalin Shekhar Mangar
 [EMAIL PROTECTED] wrote:
 Setting the batchSize to 1 would mean that the Jdbc driver will keep
 1 rows in memory *for each entity* which uses that data source (if
 correctly implemented by the driver). Not sure how well the Sql Server
 driver implements this. Also keep in mind that Solr also needs memory to
 index documents. You can probably try setting the batch size to a lower
 value.

 The regular memory tuning stuff should apply here too -- try disabling
 autoCommit and turn-off autowarming and see if it helps.

 On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED] wrote:


 I'm trying to load ~10 million records into Solr using the
 DataImportHandler.
 I'm running out of memory (java.lang.OutOfMemoryError: Java heap space)
 as
 soon as I try loading more than about 5 million records.

 Here's my configuration:
 I'm connecting to a SQL Server database using the sqljdbc driver. I've
 given
 my Solr instance 1.5 GB of memory. I have set the dataSource batchSize
 to
 1. My SQL query is select top XXX field1, ... from table1. I have
 about 40 fields in my Solr schema.

 I thought the DataImportHandler would stream data from the DB rather
 than
 loading it all into memory at once. Is that not the case? Any thoughts
 on
 how to get around this (aside from getting a machine with more memory)?

 --
 View this message in context:
 http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 Regards,
 Shalin Shekhar Mangar.

 
 
 
 -- 
 --Noble Paul
 
 

-- 
View this message in context: 
http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18115900.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DataImportHandler running out of memory

2008-06-25 Thread Shalin Shekhar Mangar
Hi,

I don't think the problem is within DataImportHandler since it just streams
resultset. The fetchSize is just passed as a parameter passed to
Statement#setFetchSize() and the Jdbc driver is supposed to honor it and
keep only that many rows in memory.

From what I could find about the Sql Server driver -- there's a connection
property called responseBuffering whose default value is full which causes
the entire result set is fetched. See
http://msdn.microsoft.com/en-us/library/ms378988.aspx for more details. You
can set connection properties like this directly in the jdbc url specified
in DataImportHandler's dataSource configuration.

On Wed, Jun 25, 2008 at 10:17 PM, wojtekpia [EMAIL PROTECTED] wrote:


 I'm trying with batchSize=-1 now. So far it seems to be working, but very
 slowly. I will update when it completes or crashes.

 Even with a batchSize of 100 I was running out of memory.

 I'm running on a 32-bit Windows machine. I've set the -Xmx to 1.5 GB - I
 believe that's the maximum for my environment.

 The batchSize parameter doesn't seem to control what happens... when I
 select top 5,000,000 with a batchSize of 10,000, it works. When I select
 top
 10,000,000 with the same batchSize, it runs out of memory.

 Also, I'm using the 469 patch posted on 2008-06-11 08:41 AM.


 Noble Paul നോബിള്‍ नोब्ळ् wrote:
 
  DIH streams rows one by one.
  set the fetchSize=-1 this might help. It may make the indexing a bit
  slower but memory consumption would be low.
  The memory is consumed by the jdbc driver. try tuning the -Xmx value for
  the VM
  --Noble
 
  On Wed, Jun 25, 2008 at 8:05 AM, Shalin Shekhar Mangar
  [EMAIL PROTECTED] wrote:
  Setting the batchSize to 1 would mean that the Jdbc driver will keep
  1 rows in memory *for each entity* which uses that data source (if
  correctly implemented by the driver). Not sure how well the Sql Server
  driver implements this. Also keep in mind that Solr also needs memory to
  index documents. You can probably try setting the batch size to a lower
  value.
 
  The regular memory tuning stuff should apply here too -- try disabling
  autoCommit and turn-off autowarming and see if it helps.
 
  On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED]
 wrote:
 
 
  I'm trying to load ~10 million records into Solr using the
  DataImportHandler.
  I'm running out of memory (java.lang.OutOfMemoryError: Java heap space)
  as
  soon as I try loading more than about 5 million records.
 
  Here's my configuration:
  I'm connecting to a SQL Server database using the sqljdbc driver. I've
  given
  my Solr instance 1.5 GB of memory. I have set the dataSource batchSize
  to
  1. My SQL query is select top XXX field1, ... from table1. I have
  about 40 fields in my Solr schema.
 
  I thought the DataImportHandler would stream data from the DB rather
  than
  loading it all into memory at once. Is that not the case? Any thoughts
  on
  how to get around this (aside from getting a machine with more memory)?
 
  --
  View this message in context:
 
 http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
  --
  Regards,
  Shalin Shekhar Mangar.
 
 
 
 
  --
  --Noble Paul
 
 

 --
 View this message in context:
 http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18115900.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.


Re: DataImportHandler running out of memory

2008-06-25 Thread wojtekpia

It looks like that was the problem. With responseBuffering=adaptive, I'm able
to load all my data using the sqljdbc driver.
-- 
View this message in context: 
http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18119732.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DataImportHandler running out of memory

2008-06-25 Thread Noble Paul നോബിള്‍ नोब्ळ्
We must document this information in the wiki.  We never had a chance
to play w/ ms sql server
--Noble

On Thu, Jun 26, 2008 at 12:38 AM, wojtekpia [EMAIL PROTECTED] wrote:

 It looks like that was the problem. With responseBuffering=adaptive, I'm able
 to load all my data using the sqljdbc driver.
 --
 View this message in context: 
 http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18119732.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
--Noble Paul


DataImportHandler running out of memory

2008-06-24 Thread wojtekpia

I'm trying to load ~10 million records into Solr using the DataImportHandler.
I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) as
soon as I try loading more than about 5 million records.

Here's my configuration:
I'm connecting to a SQL Server database using the sqljdbc driver. I've given
my Solr instance 1.5 GB of memory. I have set the dataSource batchSize to
1. My SQL query is select top XXX field1, ... from table1. I have
about 40 fields in my Solr schema.

I thought the DataImportHandler would stream data from the DB rather than
loading it all into memory at once. Is that not the case? Any thoughts on
how to get around this (aside from getting a machine with more memory)? 

-- 
View this message in context: 
http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DataImportHandler running out of memory

2008-06-24 Thread Grant Ingersoll
This is a bug in MySQL.  Try setting the Fetch Size the Statement on  
the connection to Integer.MIN_VALUE.


See http://forums.mysql.com/read.php?39,137457 amongst a host of other  
discussions on the subject.  Basically, it tries to load all the rows  
into memory, the only alternative is to set the fetch size to  
Integer.MIN_VALUE so that it gets it one row at a time.  I've hit this  
one myself and it isn't caused by the DataImportHandler, but by the  
MySQL JDBC handler.


-Grant



On Jun 24, 2008, at 8:23 PM, wojtekpia wrote:



I'm trying to load ~10 million records into Solr using the  
DataImportHandler.
I'm running out of memory (java.lang.OutOfMemoryError: Java heap  
space) as

soon as I try loading more than about 5 million records.

Here's my configuration:
I'm connecting to a SQL Server database using the sqljdbc driver.  
I've given
my Solr instance 1.5 GB of memory. I have set the dataSource  
batchSize to
1. My SQL query is select top XXX field1, ... from table1. I  
have

about 40 fields in my Solr schema.

I thought the DataImportHandler would stream data from the DB rather  
than
loading it all into memory at once. Is that not the case? Any  
thoughts on
how to get around this (aside from getting a machine with more  
memory)?


--
View this message in context: 
http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ









Re: DataImportHandler running out of memory

2008-06-24 Thread Shalin Shekhar Mangar
Setting the batchSize to 1 would mean that the Jdbc driver will keep
1 rows in memory *for each entity* which uses that data source (if
correctly implemented by the driver). Not sure how well the Sql Server
driver implements this. Also keep in mind that Solr also needs memory to
index documents. You can probably try setting the batch size to a lower
value.

The regular memory tuning stuff should apply here too -- try disabling
autoCommit and turn-off autowarming and see if it helps.

On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED] wrote:


 I'm trying to load ~10 million records into Solr using the
 DataImportHandler.
 I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) as
 soon as I try loading more than about 5 million records.

 Here's my configuration:
 I'm connecting to a SQL Server database using the sqljdbc driver. I've
 given
 my Solr instance 1.5 GB of memory. I have set the dataSource batchSize to
 1. My SQL query is select top XXX field1, ... from table1. I have
 about 40 fields in my Solr schema.

 I thought the DataImportHandler would stream data from the DB rather than
 loading it all into memory at once. Is that not the case? Any thoughts on
 how to get around this (aside from getting a machine with more memory)?

 --
 View this message in context:
 http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.


Re: DataImportHandler running out of memory

2008-06-24 Thread Noble Paul നോബിള്‍ नोब्ळ्
DIH streams rows one by one.
set the fetchSize=-1 this might help. It may make the indexing a bit
slower but memory consumption would be low.
The memory is consumed by the jdbc driver. try tuning the -Xmx value for the VM
--Noble

On Wed, Jun 25, 2008 at 8:05 AM, Shalin Shekhar Mangar
[EMAIL PROTECTED] wrote:
 Setting the batchSize to 1 would mean that the Jdbc driver will keep
 1 rows in memory *for each entity* which uses that data source (if
 correctly implemented by the driver). Not sure how well the Sql Server
 driver implements this. Also keep in mind that Solr also needs memory to
 index documents. You can probably try setting the batch size to a lower
 value.

 The regular memory tuning stuff should apply here too -- try disabling
 autoCommit and turn-off autowarming and see if it helps.

 On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED] wrote:


 I'm trying to load ~10 million records into Solr using the
 DataImportHandler.
 I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) as
 soon as I try loading more than about 5 million records.

 Here's my configuration:
 I'm connecting to a SQL Server database using the sqljdbc driver. I've
 given
 my Solr instance 1.5 GB of memory. I have set the dataSource batchSize to
 1. My SQL query is select top XXX field1, ... from table1. I have
 about 40 fields in my Solr schema.

 I thought the DataImportHandler would stream data from the DB rather than
 loading it all into memory at once. Is that not the case? Any thoughts on
 how to get around this (aside from getting a machine with more memory)?

 --
 View this message in context:
 http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 Regards,
 Shalin Shekhar Mangar.




-- 
--Noble Paul


Re: DataImportHandler running out of memory

2008-06-24 Thread Noble Paul നോബിള്‍ नोब्ळ्
it is batchSize=-1 not fetchSize. Or keep it to a very small value.
--Noble

On Wed, Jun 25, 2008 at 9:31 AM, Noble Paul നോബിള്‍ नोब्ळ्
[EMAIL PROTECTED] wrote:
 DIH streams rows one by one.
 set the fetchSize=-1 this might help. It may make the indexing a bit
 slower but memory consumption would be low.
 The memory is consumed by the jdbc driver. try tuning the -Xmx value for the 
 VM
 --Noble

 On Wed, Jun 25, 2008 at 8:05 AM, Shalin Shekhar Mangar
 [EMAIL PROTECTED] wrote:
 Setting the batchSize to 1 would mean that the Jdbc driver will keep
 1 rows in memory *for each entity* which uses that data source (if
 correctly implemented by the driver). Not sure how well the Sql Server
 driver implements this. Also keep in mind that Solr also needs memory to
 index documents. You can probably try setting the batch size to a lower
 value.

 The regular memory tuning stuff should apply here too -- try disabling
 autoCommit and turn-off autowarming and see if it helps.

 On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED] wrote:


 I'm trying to load ~10 million records into Solr using the
 DataImportHandler.
 I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) as
 soon as I try loading more than about 5 million records.

 Here's my configuration:
 I'm connecting to a SQL Server database using the sqljdbc driver. I've
 given
 my Solr instance 1.5 GB of memory. I have set the dataSource batchSize to
 1. My SQL query is select top XXX field1, ... from table1. I have
 about 40 fields in my Solr schema.

 I thought the DataImportHandler would stream data from the DB rather than
 loading it all into memory at once. Is that not the case? Any thoughts on
 how to get around this (aside from getting a machine with more memory)?

 --
 View this message in context:
 http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 Regards,
 Shalin Shekhar Mangar.




 --
 --Noble Paul




-- 
--Noble Paul