Re: DataImport using SqlEntityProcessor running Out of Memory
Hello O, It seems to me (but it's better to look at the heap histogram) that buffering sub-entities in SortedMapBackedCache blows heap off. I'm aware about two directions: - use file based cache instead. I don't know exactly how it works, you can start from https://issues.apache.org/jira/browse/SOLR-2382 and check how to enable berkleyDB cache; - personally, I'm promoting merging resultsets ordered by RDBMS https://issues.apache.org/jira/browse/SOLR-4799 On Fri, May 9, 2014 at 7:16 PM, O. Olson olson_...@yahoo.it wrote: I have a Data Schema which is Hierarchical i.e. I have an Entity and a number of attributes. For a small subset of the Data - about 300 MB, I can do the import with 3 GB memory. Now with the entire 4 GB Dataset, I find I cannot do the import with 9 GB of memory. I am using the SqlEntityProcessor as below: dataConfig dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://localhost\MSSQLSERVER;databaseName=SolrDB;user=solrusr;password=solrusr;/ document entity name=Entity query=SELECT EntID, Image FROM ENTITY_TABLE field column=EntID name=EntID / field column=Image name=Image / entity name=EntityAttribute1 query=SELECT AttributeValue, EntID FROM ATTR_TABLE WHERE AttributeID=1 cacheKey=EntID cacheLookup=Entity.EntID processor=SqlEntityProcessor cacheImpl=SortedMapBackedCache field column=AttributeValue name=EntityAttribute1 / /entity entity name=EntityAttribute2 query=SELECT AttributeValue, EntID FROM ATTR_TABLE WHERE AttributeID=2 cacheKey=EntID cacheLookup=Entity.EntID processor=SqlEntityProcessor cacheImpl=SortedMapBackedCache field column=AttributeValue name=EntityAttribute2 / /entity /entity /document /dataConfig What is the best way to import this data? Doing it without a cache, results in many SQL queries. With the cache, I run out of memory. I’m curious why 4GB of data cannot entirely fit in memory. One thing I need to mention is that I have about 400 to 500 attributes. Thanks in advance for any helpful advice. O. O. -- View this message in context: http://lucene.472066.n3.nabble.com/DataImport-using-SqlEntityProcessor-running-Out-of-Memory-tp4135080.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: DataImport using SqlEntityProcessor running Out of Memory
On 5/9/2014 9:16 AM, O. Olson wrote: I have a Data Schema which is Hierarchical i.e. I have an Entity and a number of attributes. For a small subset of the Data - about 300 MB, I can do the import with 3 GB memory. Now with the entire 4 GB Dataset, I find I cannot do the import with 9 GB of memory. I am using the SqlEntityProcessor as below: dataConfig dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://localhost\MSSQLSERVER;databaseName=SolrDB;user=solrusr;password=solrusr;/ Upgrade your JDBC driver to 1.2 or later, or turn on response buffering. The following URL has this information. It's a very long URL, so if your mail client wraps it, you may not be able to click on it properly: http://wiki.apache.org/solr/DataImportHandlerFaq#I.27m_using_DataImportHandler_with_MS_SQL_Server_database_with_sqljdbc_driver._DataImportHandler_is_going_out_of_memory._I_tried_adjustng_the_batchSize_values_but_they_don.27t_seem_to_make_any_difference._How_do_I_fix_this.3F Thanks, Shawn
DataImport using SqlEntityProcessor running Out of Memory
I have a Data Schema which is Hierarchical i.e. I have an Entity and a number of attributes. For a small subset of the Data - about 300 MB, I can do the import with 3 GB memory. Now with the entire 4 GB Dataset, I find I cannot do the import with 9 GB of memory. I am using the SqlEntityProcessor as below: dataConfig dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver url=jdbc:sqlserver://localhost\MSSQLSERVER;databaseName=SolrDB;user=solrusr;password=solrusr;/ document entity name=Entity query=SELECT EntID, Image FROM ENTITY_TABLE field column=EntID name=EntID / field column=Image name=Image / entity name=EntityAttribute1 query=SELECT AttributeValue, EntID FROM ATTR_TABLE WHERE AttributeID=1 cacheKey=EntID cacheLookup=Entity.EntID processor=SqlEntityProcessor cacheImpl=SortedMapBackedCache field column=AttributeValue name=EntityAttribute1 / /entity entity name=EntityAttribute2 query=SELECT AttributeValue, EntID FROM ATTR_TABLE WHERE AttributeID=2 cacheKey=EntID cacheLookup=Entity.EntID processor=SqlEntityProcessor cacheImpl=SortedMapBackedCache field column=AttributeValue name=EntityAttribute2 / /entity /entity /document /dataConfig What is the best way to import this data? Doing it without a cache, results in many SQL queries. With the cache, I run out of memory. I’m curious why 4GB of data cannot entirely fit in memory. One thing I need to mention is that I have about 400 to 500 attributes. Thanks in advance for any helpful advice. O. O. -- View this message in context: http://lucene.472066.n3.nabble.com/DataImport-using-SqlEntityProcessor-running-Out-of-Memory-tp4135080.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Running out of memory
I am debugging an out of memory error myself and a few suggestions: 1) Are you looking at your search logs around the time of the memory error? In my case, I found a few bad queries requesting a ton of rows (basically the whole index's worth which I think is an error somewhere in our app just have to find it) which happened close to the OOM error being thrown. 2) Do you have Solr hooked up to something like NewRelic/AppDynamics to see the cache usage in real time? Maybe as was suggested, tuning down or eliminating low used caches could help. 3) Are you ensuring that you aren't setting stored=true on fields that don't need it? This will increase the index size and possibly the cache size if lazy loading isn't enabled (to be honest, this part I am a bit unclear of since I haven't had much experience with this myself). Thanks Amit On Mon, Aug 13, 2012 at 11:37 AM, Jon Drukman jdruk...@gmail.com wrote: On Sun, Aug 12, 2012 at 12:31 PM, Alexey Serba ase...@gmail.com wrote: It would be vastly preferable if Solr could just exit when it gets a memory error, because we have it running under daemontools, and that would cause an automatic restart. -XX:OnOutOfMemoryError=cmd args; cmd args Run user-defined commands when an OutOfMemoryError is first thrown. Does Solr require the entire index to fit in memory at all times? No. But it's hard to say about your particular problem without additional information. How often do you commit? Do you use faceting? Do you sort by Solr fields and if yes what are those fields? And you should also check caches. I upgraded to solr-3.6.1 and an extra large amazon instance (15GB RAM) so we'll see if that helps. So far no out of memory errors.
Re: Running out of memory
On Sun, Aug 12, 2012 at 12:31 PM, Alexey Serba ase...@gmail.com wrote: It would be vastly preferable if Solr could just exit when it gets a memory error, because we have it running under daemontools, and that would cause an automatic restart. -XX:OnOutOfMemoryError=cmd args; cmd args Run user-defined commands when an OutOfMemoryError is first thrown. Does Solr require the entire index to fit in memory at all times? No. But it's hard to say about your particular problem without additional information. How often do you commit? Do you use faceting? Do you sort by Solr fields and if yes what are those fields? And you should also check caches. I upgraded to solr-3.6.1 and an extra large amazon instance (15GB RAM) so we'll see if that helps. So far no out of memory errors.
Re: Running out of memory
It would be vastly preferable if Solr could just exit when it gets a memory error, because we have it running under daemontools, and that would cause an automatic restart. -XX:OnOutOfMemoryError=cmd args; cmd args Run user-defined commands when an OutOfMemoryError is first thrown. Does Solr require the entire index to fit in memory at all times? No. But it's hard to say about your particular problem without additional information. How often do you commit? Do you use faceting? Do you sort by Solr fields and if yes what are those fields? And you should also check caches.
Re: Running out of memory
You might want to look at turning down or eliminating your caches if you're running out of RAM. Possibly some of them have a low hit rate, which you can see on the Stats page. Caches with a low hit rate are only consuming RAM and CPU cycles. Also, using this JVM arg might reduce the memory footprint: -XX:+UseCompressedOops In the end though, the surefire solution would be to go to an instance type with more RAM: http://www.ec2instances.info/ Michael Della Bitta Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 www.appinions.com Where Influence Isn’t a Game On Mon, Aug 6, 2012 at 1:48 PM, Jon Drukman jdruk...@gmail.com wrote: Hi there. I am running Solr 1.4.1 on an Amazon EC2 box with 7.5GB of RAM. It was set up about 18 months ago and has been largely trouble-free. Unfortunately, lately it has started to run out of memory pretty much every day. We are seeing SEVERE: java.lang.OutOfMemoryError: Java heap space When that happens, a simple query like http://localhost:8983/solr/select?q=*:*' returns nothing. I am starting Solr with the following: /usr/lib/jvm/jre/bin/java -XX:+UseConcMarkSweepGC -Xms1G -Xmx5G -jar start.jar It would be vastly preferable if Solr could just exit when it gets a memory error, because we have it running under daemontools, and that would cause an automatic restart. After restarting, Solr works fine for another 12-18 hours. Not ideal but at least it wouldn't require human intervention to get it going again. What can I do to reduce the memory pressure? Does Solr require the entire index to fit in memory at all times? The on disk size is 15GB. There are 27.5 million documents, but they are all tiny (mostly one line forum comments like this game is awesome). We're using Sun openJava SDK 1.6 if that matters. -jsd-
Re: DataImportHandler running out of memory
On 2/20/2012 6:49 AM, v_shan wrote: DIH still running out of memory for me, with Full Import on a database of size 1.5 GB. Solr version: 3_5_0 Note that I have already added batchSize=-1 but getting same error. A few questions: - How much memory have you given to the JVM running this Solr instance? - How much memory does your server have? - What is the size of all your index cores, and how many documents are in them? - How large are your Solr caches (filterCache, documentCache, queryResultCache)? - What is your ramBufferSize set to in the indexDefaults section? Thanks, Shawn
Re: DataImportHandler running out of memory
DIH still running out of memory for me, with Full Import on a database of size 1.5 GB. Solr version: 3_5_0 Note that I have already added batchSize=-1 but getting same error. Sharing my DIH config below. dataConfig dataSource type=JdbcDataSource name=jdbc driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost:3306/ib user=root password=root batchSize=-1 / document name=content entity name=issue dataSource=jdbc transformer=RegexTransformer,DateFormatTransformer, TemplateTransformer pk=id query= select ib_issue.`_id` as id, ib_issue.`_issue_title` as issueTitle, ib_issue.`_issue_descr` as issueDescr, createdBy.`_name` as issueCreatedByName, createdBy.`_email` as issueCreatedByEmail from `ib_issue` inner join `ib_user` as createdBy on createdBy.`_id` = ib_issue.`_created_by_user_id` group by ib_issue.`_id` /entity /document /dataConfig Please find the error trace below === 2012-02-20 19:04:40.531:INFO::Started SocketConnector@0.0.0.0:8983 Feb 20, 2012 7:04:57 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={command=statusqt=/dih_ib_jdbc} status=0 QTime=0 Feb 20, 2012 7:04:58 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={command=show-configqt=/dih_ib_jdbc} status=0 QTime=0 Feb 20, 2012 7:05:30 PM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Feb 20, 2012 7:05:30 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dih_ib_jdbc params={command=full-import} status=0 QTime=0 Feb 20, 2012 7:05:30 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dih_ib_jdbc.properties Feb 20, 2012 7:05:30 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Feb 20, 2012 7:05:30 PM org.apache.solr.core.SolrDeletionPolicy onInit INFO: SolrDeletionPolicy.onInit: commits:num=1 commit{dir=E:\workspace\solr_3_5_0\example\solr\data\index,segFN=segments_1,version=1329744880204,generation=1,filenames=[segments_1] Feb 20, 2012 7:05:30 PM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1329744880204 Feb 20, 2012 7:05:30 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity issue with URL: jdbc:mysql://localhost:3306/issueburner Feb 20, 2012 7:05:30 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 172 Feb 20, 2012 7:07:45 PM org.apache.solr.common.SolrException log SEVERE: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:669) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.UnicodeUtil.UTF16toUTF8(UnicodeUtil.java:377) at org.apache.lucene.store.DataOutput.writeString(DataOutput.java:103) at org.apache.lucene.index.FieldsWriter.writeField(FieldsWriter.java:200) at org.apache.lucene.index.StoredFieldsWriterPerThread.addField(StoredFieldsWriterPerThread.java:58) at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:265) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:766) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2327) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2299) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:240) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115) at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:73) at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:293) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument
fixed schema problems, now running out of memory?
It's a small indexing job coming from nutch. 2011-10-26 15:07:29,039 WARN mapred.LocalJobRunner - job_local_0011 java.io.IOException: org.apache.solr.client.solrj.SolrServerException: Error executi$ at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getRec$ at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:338) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) Caused by: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.ja$ at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getRec$ ... 3 more Caused by: org.apache.solr.common.SolrException: Java heap space java.lang.OutOfMem$ Java heap space java.lang.OutOfMemoryError: Java heap spaceat org.apache.lucene$ request: localhost/solr/select?q=id:[* TO *]fl=id,boost,tstamp,$ at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt$ at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt$ at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.ja$ ... 5 more
Re: fixed schema problems, now running out of memory?
More on what's happening. It seems to be timing out during the commit. The new documents are small, but the existing index is large (11 million records). INFO: Closing Searcher@4a7df6 main fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} ... Oct 26, 2011 4:51:17 PM org.apache.solr.update.processor.LogUpdateProcessor finish *INFO: {commit=} 0 2453 **Oct 26, 2011 4:51:17 PM org.apache.solr.core.SolrCore execute **INFO: [] webapp=/solr path=/update params={waitSearcher=truewaitFlush=truewt=javabincommit=trueversion=2} status=0 QTime=2453 *Oct 26, 2011 4:51:52 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={fl=idwt=javabinq=id:[*+TO+*]rows=1version=2} hits=11576871 *status=0 QTime=35298* Oct 26, 2011 4:51:53 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={fl=idwt=javabinq=id:[*+TO+*]rows=1version=2} hits=11576871 status=0 QTime=1 *java.lang.OutOfMemoryError: Java heap space* Dumping heap to /home/bitnami/apache-solr-3.4.0/example/heaplog ... Heap dump file created [306866344 bytes in 32.376 secs] On Wed, Oct 26, 2011 at 11:09 AM, Fred Zimmerman zimzaz@gmail.comwrote: It's a small indexing job coming from nutch. 2011-10-26 15:07:29,039 WARN mapred.LocalJobRunner - job_local_0011 java.io.IOException: org.apache.solr.client.solrj.SolrServerException: Error executi$ at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getRec$ at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:338) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) Caused by: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.ja$ at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getRec$ ... 3 more Caused by: org.apache.solr.common.SolrException: Java heap space java.lang.OutOfMem$ Java heap space java.lang.OutOfMemoryError: Java heap spaceat org.apache.lucene$ request: localhost/solr/select?q=id:[* TO *]fl=id,boost,tstamp,$ at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt$ at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt$ at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.ja$ ... 5 more
Re: fixed schema problems, now running out of memory?
http://wiki.apache.org/solr/SolrPerformanceFactors#Schema_Design_Considerations The number of indexed fields greatly increases the following: - Memory usage during indexing - Segment merge time - Optimization times - Index size These impacts can be reduced by the use of omitNorms=true http://lucene.472066.n3.nabble.com/What-is-omitNorms-td2987547.html 1. length normalization will not work on the specific field-- Which means matching documents with shorter length will not be preferred/boost over matching documents with greater length for the specific field, at search time. For my application, I actually prefer documents with greater length. 2. Index time boosting will not be available on the field. If, both the above cases are not required by you, then, you can set omitNorms=true for the specific fields. This has an added advantage, it will save you some(or a lot of) RAM also, since, with omitNorms=false on total N fields in the index will require RAM of size: Total docs in index * 1 byte * N I have a lot of fields: I count 31 without omitNorms values, which means false by default. Gak! 11,000,000 * 1 * 31 = 31 x 10M = 310MB RAM all by itself. On Wed, Oct 26, 2011 at 1:01 PM, Fred Zimmerman zimzaz@gmail.comwrote: More on what's happening. It seems to be timing out during the commit. The new documents are small, but the existing index is large (11 million records). INFO: Closing Searcher@4a7df6 main fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} ... Oct 26, 2011 4:51:17 PM org.apache.solr.update.processor.LogUpdateProcessor finish *INFO: {commit=} 0 2453 **Oct 26, 2011 4:51:17 PM org.apache.solr.core.SolrCore execute **INFO: [] webapp=/solr path=/update params={waitSearcher=truewaitFlush=truewt=javabincommit=trueversion=2} status=0 QTime=2453 *Oct 26, 2011 4:51:52 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={fl=idwt=javabinq=id:[*+TO+*]rows=1version=2} hits=11576871 *status=0 QTime=35298* Oct 26, 2011 4:51:53 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={fl=idwt=javabinq=id:[*+TO+*]rows=1version=2} hits=11576871 status=0 QTime=1 *java.lang.OutOfMemoryError: Java heap space* Dumping heap to /home/bitnami/apache-solr-3.4.0/example/heaplog ... Heap dump file created [306866344 bytes in 32.376 secs] On Wed, Oct 26, 2011 at 11:09 AM, Fred Zimmerman zimzaz@gmail.comwrote: It's a small indexing job coming from nutch. 2011-10-26 15:07:29,039 WARN mapred.LocalJobRunner - job_local_0011 java.io.IOException: org.apache.solr.client.solrj.SolrServerException: Error executi$ at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getRec$ at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:338) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) Caused by: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.ja$ at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getRec$ ... 3 more Caused by: org.apache.solr.common.SolrException: Java heap space java.lang.OutOfMem$ Java heap space java.lang.OutOfMemoryError: Java heap spaceat org.apache.lucene$ request: localhost/solr/select?q=id:[* TO *]fl=id,boost,tstamp,$ at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt$ at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt$ at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.ja$ ... 5 more
Re: DataImportHandler running out of memory
Hi, I tried batchSize =-1 but when I'm doing that I will use all mysql's memory and it's a problem for mysql's database. :s Noble Paul നോബിള് नोब्ळ् wrote: I've moved the FAQ to a new Page http://wiki.apache.org/solr/DataImportHandlerFaq The DIH page is too big and editing has become harder On Thu, Jun 26, 2008 at 6:07 PM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: I've added a FAQ section to DataImportHandler wiki page which captures question on out of memory exception with both MySQL and MS SQL Server drivers. http://wiki.apache.org/solr/DataImportHandler#faq On Thu, Jun 26, 2008 at 9:36 AM, Noble Paul നോബിള് नोब्ळ् [EMAIL PROTECTED] wrote: We must document this information in the wiki. We never had a chance to play w/ ms sql server --Noble On Thu, Jun 26, 2008 at 12:38 AM, wojtekpia [EMAIL PROTECTED] wrote: It looks like that was the problem. With responseBuffering=adaptive, I'm able to load all my data using the sqljdbc driver. -- View this message in context: http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18119732.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul -- Regards, Shalin Shekhar Mangar. -- --Noble Paul -- View this message in context: http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p20305039.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImportHandler running out of memory
Hi Grant, How did you finally managed it I've the same problem with less data, 8,5M, if I put a batchsize -1, I will slow down a lot the database which is not that good for the website and stack request. What did you do you ??? Thanks, Grant Ingersoll-6 wrote: I think it's a bit different. I ran into this exact problem about two weeks ago on a 13 million record DB. MySQL doesn't honor the fetch size for it's v5 JDBC driver. See http://www.databasesandlife.com/reading-row-by-row-into-java-from-mysql/ or do a search for MySQL fetch size. You actually have to do setFetchSize(Integer.MIN_VALUE) (-1 doesn't work) in order to get streaming in MySQL. -Grant On Jun 24, 2008, at 10:35 PM, Shalin Shekhar Mangar wrote: Setting the batchSize to 1 would mean that the Jdbc driver will keep 1 rows in memory *for each entity* which uses that data source (if correctly implemented by the driver). Not sure how well the Sql Server driver implements this. Also keep in mind that Solr also needs memory to index documents. You can probably try setting the batch size to a lower value. The regular memory tuning stuff should apply here too -- try disabling autoCommit and turn-off autowarming and see if it helps. On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED] wrote: I'm trying to load ~10 million records into Solr using the DataImportHandler. I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) as soon as I try loading more than about 5 million records. Here's my configuration: I'm connecting to a SQL Server database using the sqljdbc driver. I've given my Solr instance 1.5 GB of memory. I have set the dataSource batchSize to 1. My SQL query is select top XXX field1, ... from table1. I have about 40 fields in my Solr schema. I thought the DataImportHandler would stream data from the DB rather than loading it all into memory at once. Is that not the case? Any thoughts on how to get around this (aside from getting a machine with more memory)? -- View this message in context: http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar. -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ -- View this message in context: http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p20263146.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImportHandler running out of memory
I've moved the FAQ to a new Page http://wiki.apache.org/solr/DataImportHandlerFaq The DIH page is too big and editing has become harder On Thu, Jun 26, 2008 at 6:07 PM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: I've added a FAQ section to DataImportHandler wiki page which captures question on out of memory exception with both MySQL and MS SQL Server drivers. http://wiki.apache.org/solr/DataImportHandler#faq On Thu, Jun 26, 2008 at 9:36 AM, Noble Paul നോബിള് नोब्ळ् [EMAIL PROTECTED] wrote: We must document this information in the wiki. We never had a chance to play w/ ms sql server --Noble On Thu, Jun 26, 2008 at 12:38 AM, wojtekpia [EMAIL PROTECTED] wrote: It looks like that was the problem. With responseBuffering=adaptive, I'm able to load all my data using the sqljdbc driver. -- View this message in context: http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18119732.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul -- Regards, Shalin Shekhar Mangar. -- --Noble Paul
Re: DataImportHandler running out of memory
I've added a FAQ section to DataImportHandler wiki page which captures question on out of memory exception with both MySQL and MS SQL Server drivers. http://wiki.apache.org/solr/DataImportHandler#faq On Thu, Jun 26, 2008 at 9:36 AM, Noble Paul നോബിള് नोब्ळ् [EMAIL PROTECTED] wrote: We must document this information in the wiki. We never had a chance to play w/ ms sql server --Noble On Thu, Jun 26, 2008 at 12:38 AM, wojtekpia [EMAIL PROTECTED] wrote: It looks like that was the problem. With responseBuffering=adaptive, I'm able to load all my data using the sqljdbc driver. -- View this message in context: http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18119732.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul -- Regards, Shalin Shekhar Mangar.
Re: DataImportHandler running out of memory
I think it's a bit different. I ran into this exact problem about two weeks ago on a 13 million record DB. MySQL doesn't honor the fetch size for it's v5 JDBC driver. See http://www.databasesandlife.com/reading-row-by-row-into-java-from-mysql/ or do a search for MySQL fetch size. You actually have to do setFetchSize(Integer.MIN_VALUE) (-1 doesn't work) in order to get streaming in MySQL. -Grant On Jun 24, 2008, at 10:35 PM, Shalin Shekhar Mangar wrote: Setting the batchSize to 1 would mean that the Jdbc driver will keep 1 rows in memory *for each entity* which uses that data source (if correctly implemented by the driver). Not sure how well the Sql Server driver implements this. Also keep in mind that Solr also needs memory to index documents. You can probably try setting the batch size to a lower value. The regular memory tuning stuff should apply here too -- try disabling autoCommit and turn-off autowarming and see if it helps. On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED] wrote: I'm trying to load ~10 million records into Solr using the DataImportHandler. I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) as soon as I try loading more than about 5 million records. Here's my configuration: I'm connecting to a SQL Server database using the sqljdbc driver. I've given my Solr instance 1.5 GB of memory. I have set the dataSource batchSize to 1. My SQL query is select top XXX field1, ... from table1. I have about 40 fields in my Solr schema. I thought the DataImportHandler would stream data from the DB rather than loading it all into memory at once. Is that not the case? Any thoughts on how to get around this (aside from getting a machine with more memory)? -- View this message in context: http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar. -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Re: DataImportHandler running out of memory
I'm assuming, of course, that the DIH doesn't automatically modify the SQL statement according to the batch size. -Grant On Jun 25, 2008, at 7:05 AM, Grant Ingersoll wrote: I think it's a bit different. I ran into this exact problem about two weeks ago on a 13 million record DB. MySQL doesn't honor the fetch size for it's v5 JDBC driver. See http://www.databasesandlife.com/reading-row-by-row-into-java-from-mysql/ or do a search for MySQL fetch size. You actually have to do setFetchSize(Integer.MIN_VALUE) (-1 doesn't work) in order to get streaming in MySQL. -Grant On Jun 24, 2008, at 10:35 PM, Shalin Shekhar Mangar wrote: Setting the batchSize to 1 would mean that the Jdbc driver will keep 1 rows in memory *for each entity* which uses that data source (if correctly implemented by the driver). Not sure how well the Sql Server driver implements this. Also keep in mind that Solr also needs memory to index documents. You can probably try setting the batch size to a lower value. The regular memory tuning stuff should apply here too -- try disabling autoCommit and turn-off autowarming and see if it helps. On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED] wrote: I'm trying to load ~10 million records into Solr using the DataImportHandler. I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) as soon as I try loading more than about 5 million records. Here's my configuration: I'm connecting to a SQL Server database using the sqljdbc driver. I've given my Solr instance 1.5 GB of memory. I have set the dataSource batchSize to 1. My SQL query is select top XXX field1, ... from table1. I have about 40 fields in my Solr schema. I thought the DataImportHandler would stream data from the DB rather than loading it all into memory at once. Is that not the case? Any thoughts on how to get around this (aside from getting a machine with more memory)? -- View this message in context: http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar. -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Re: DataImportHandler running out of memory
The OP is actually using Sql Server (not MySql) as per his mail. On Wed, Jun 25, 2008 at 4:40 PM, Grant Ingersoll [EMAIL PROTECTED] wrote: I'm assuming, of course, that the DIH doesn't automatically modify the SQL statement according to the batch size. -Grant On Jun 25, 2008, at 7:05 AM, Grant Ingersoll wrote: I think it's a bit different. I ran into this exact problem about two weeks ago on a 13 million record DB. MySQL doesn't honor the fetch size for it's v5 JDBC driver. See http://www.databasesandlife.com/reading-row-by-row-into-java-from-mysql/ or do a search for MySQL fetch size. You actually have to do setFetchSize(Integer.MIN_VALUE) (-1 doesn't work) in order to get streaming in MySQL. -Grant On Jun 24, 2008, at 10:35 PM, Shalin Shekhar Mangar wrote: Setting the batchSize to 1 would mean that the Jdbc driver will keep 1 rows in memory *for each entity* which uses that data source (if correctly implemented by the driver). Not sure how well the Sql Server driver implements this. Also keep in mind that Solr also needs memory to index documents. You can probably try setting the batch size to a lower value. The regular memory tuning stuff should apply here too -- try disabling autoCommit and turn-off autowarming and see if it helps. On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED] wrote: I'm trying to load ~10 million records into Solr using the DataImportHandler. I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) as soon as I try loading more than about 5 million records. Here's my configuration: I'm connecting to a SQL Server database using the sqljdbc driver. I've given my Solr instance 1.5 GB of memory. I have set the dataSource batchSize to 1. My SQL query is select top XXX field1, ... from table1. I have about 40 fields in my Solr schema. I thought the DataImportHandler would stream data from the DB rather than loading it all into memory at once. Is that not the case? Any thoughts on how to get around this (aside from getting a machine with more memory)? -- View this message in context: http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar. -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ -- Regards, Shalin Shekhar Mangar.
Re: DataImportHandler running out of memory
DIH does not modify SQL. This value is used as a connection property --Noble On Wed, Jun 25, 2008 at 4:40 PM, Grant Ingersoll [EMAIL PROTECTED] wrote: I'm assuming, of course, that the DIH doesn't automatically modify the SQL statement according to the batch size. -Grant On Jun 25, 2008, at 7:05 AM, Grant Ingersoll wrote: I think it's a bit different. I ran into this exact problem about two weeks ago on a 13 million record DB. MySQL doesn't honor the fetch size for it's v5 JDBC driver. See http://www.databasesandlife.com/reading-row-by-row-into-java-from-mysql/ or do a search for MySQL fetch size. You actually have to do setFetchSize(Integer.MIN_VALUE) (-1 doesn't work) in order to get streaming in MySQL. -Grant On Jun 24, 2008, at 10:35 PM, Shalin Shekhar Mangar wrote: Setting the batchSize to 1 would mean that the Jdbc driver will keep 1 rows in memory *for each entity* which uses that data source (if correctly implemented by the driver). Not sure how well the Sql Server driver implements this. Also keep in mind that Solr also needs memory to index documents. You can probably try setting the batch size to a lower value. The regular memory tuning stuff should apply here too -- try disabling autoCommit and turn-off autowarming and see if it helps. On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED] wrote: I'm trying to load ~10 million records into Solr using the DataImportHandler. I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) as soon as I try loading more than about 5 million records. Here's my configuration: I'm connecting to a SQL Server database using the sqljdbc driver. I've given my Solr instance 1.5 GB of memory. I have set the dataSource batchSize to 1. My SQL query is select top XXX field1, ... from table1. I have about 40 fields in my Solr schema. I thought the DataImportHandler would stream data from the DB rather than loading it all into memory at once. Is that not the case? Any thoughts on how to get around this (aside from getting a machine with more memory)? -- View this message in context: http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar. -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ -- --Noble Paul
Re: DataImportHandler running out of memory
The latest patch sets fetchSize as Integer.MIN_VALUE if -1 is passed. It is added specifically for mysql driver --Noble On Wed, Jun 25, 2008 at 4:35 PM, Grant Ingersoll [EMAIL PROTECTED] wrote: I think it's a bit different. I ran into this exact problem about two weeks ago on a 13 million record DB. MySQL doesn't honor the fetch size for it's v5 JDBC driver. See http://www.databasesandlife.com/reading-row-by-row-into-java-from-mysql/ or do a search for MySQL fetch size. You actually have to do setFetchSize(Integer.MIN_VALUE) (-1 doesn't work) in order to get streaming in MySQL. -Grant On Jun 24, 2008, at 10:35 PM, Shalin Shekhar Mangar wrote: Setting the batchSize to 1 would mean that the Jdbc driver will keep 1 rows in memory *for each entity* which uses that data source (if correctly implemented by the driver). Not sure how well the Sql Server driver implements this. Also keep in mind that Solr also needs memory to index documents. You can probably try setting the batch size to a lower value. The regular memory tuning stuff should apply here too -- try disabling autoCommit and turn-off autowarming and see if it helps. On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED] wrote: I'm trying to load ~10 million records into Solr using the DataImportHandler. I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) as soon as I try loading more than about 5 million records. Here's my configuration: I'm connecting to a SQL Server database using the sqljdbc driver. I've given my Solr instance 1.5 GB of memory. I have set the dataSource batchSize to 1. My SQL query is select top XXX field1, ... from table1. I have about 40 fields in my Solr schema. I thought the DataImportHandler would stream data from the DB rather than loading it all into memory at once. Is that not the case? Any thoughts on how to get around this (aside from getting a machine with more memory)? -- View this message in context: http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar. -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Re: DataImportHandler running out of memory
I'm trying with batchSize=-1 now. So far it seems to be working, but very slowly. I will update when it completes or crashes. Even with a batchSize of 100 I was running out of memory. I'm running on a 32-bit Windows machine. I've set the -Xmx to 1.5 GB - I believe that's the maximum for my environment. The batchSize parameter doesn't seem to control what happens... when I select top 5,000,000 with a batchSize of 10,000, it works. When I select top 10,000,000 with the same batchSize, it runs out of memory. Also, I'm using the 469 patch posted on 2008-06-11 08:41 AM. Noble Paul നോബിള് नोब्ळ् wrote: DIH streams rows one by one. set the fetchSize=-1 this might help. It may make the indexing a bit slower but memory consumption would be low. The memory is consumed by the jdbc driver. try tuning the -Xmx value for the VM --Noble On Wed, Jun 25, 2008 at 8:05 AM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: Setting the batchSize to 1 would mean that the Jdbc driver will keep 1 rows in memory *for each entity* which uses that data source (if correctly implemented by the driver). Not sure how well the Sql Server driver implements this. Also keep in mind that Solr also needs memory to index documents. You can probably try setting the batch size to a lower value. The regular memory tuning stuff should apply here too -- try disabling autoCommit and turn-off autowarming and see if it helps. On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED] wrote: I'm trying to load ~10 million records into Solr using the DataImportHandler. I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) as soon as I try loading more than about 5 million records. Here's my configuration: I'm connecting to a SQL Server database using the sqljdbc driver. I've given my Solr instance 1.5 GB of memory. I have set the dataSource batchSize to 1. My SQL query is select top XXX field1, ... from table1. I have about 40 fields in my Solr schema. I thought the DataImportHandler would stream data from the DB rather than loading it all into memory at once. Is that not the case? Any thoughts on how to get around this (aside from getting a machine with more memory)? -- View this message in context: http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar. -- --Noble Paul -- View this message in context: http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18115900.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImportHandler running out of memory
Hi, I don't think the problem is within DataImportHandler since it just streams resultset. The fetchSize is just passed as a parameter passed to Statement#setFetchSize() and the Jdbc driver is supposed to honor it and keep only that many rows in memory. From what I could find about the Sql Server driver -- there's a connection property called responseBuffering whose default value is full which causes the entire result set is fetched. See http://msdn.microsoft.com/en-us/library/ms378988.aspx for more details. You can set connection properties like this directly in the jdbc url specified in DataImportHandler's dataSource configuration. On Wed, Jun 25, 2008 at 10:17 PM, wojtekpia [EMAIL PROTECTED] wrote: I'm trying with batchSize=-1 now. So far it seems to be working, but very slowly. I will update when it completes or crashes. Even with a batchSize of 100 I was running out of memory. I'm running on a 32-bit Windows machine. I've set the -Xmx to 1.5 GB - I believe that's the maximum for my environment. The batchSize parameter doesn't seem to control what happens... when I select top 5,000,000 with a batchSize of 10,000, it works. When I select top 10,000,000 with the same batchSize, it runs out of memory. Also, I'm using the 469 patch posted on 2008-06-11 08:41 AM. Noble Paul നോബിള് नोब्ळ् wrote: DIH streams rows one by one. set the fetchSize=-1 this might help. It may make the indexing a bit slower but memory consumption would be low. The memory is consumed by the jdbc driver. try tuning the -Xmx value for the VM --Noble On Wed, Jun 25, 2008 at 8:05 AM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: Setting the batchSize to 1 would mean that the Jdbc driver will keep 1 rows in memory *for each entity* which uses that data source (if correctly implemented by the driver). Not sure how well the Sql Server driver implements this. Also keep in mind that Solr also needs memory to index documents. You can probably try setting the batch size to a lower value. The regular memory tuning stuff should apply here too -- try disabling autoCommit and turn-off autowarming and see if it helps. On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED] wrote: I'm trying to load ~10 million records into Solr using the DataImportHandler. I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) as soon as I try loading more than about 5 million records. Here's my configuration: I'm connecting to a SQL Server database using the sqljdbc driver. I've given my Solr instance 1.5 GB of memory. I have set the dataSource batchSize to 1. My SQL query is select top XXX field1, ... from table1. I have about 40 fields in my Solr schema. I thought the DataImportHandler would stream data from the DB rather than loading it all into memory at once. Is that not the case? Any thoughts on how to get around this (aside from getting a machine with more memory)? -- View this message in context: http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar. -- --Noble Paul -- View this message in context: http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18115900.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.
Re: DataImportHandler running out of memory
It looks like that was the problem. With responseBuffering=adaptive, I'm able to load all my data using the sqljdbc driver. -- View this message in context: http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18119732.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImportHandler running out of memory
We must document this information in the wiki. We never had a chance to play w/ ms sql server --Noble On Thu, Jun 26, 2008 at 12:38 AM, wojtekpia [EMAIL PROTECTED] wrote: It looks like that was the problem. With responseBuffering=adaptive, I'm able to load all my data using the sqljdbc driver. -- View this message in context: http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18119732.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
DataImportHandler running out of memory
I'm trying to load ~10 million records into Solr using the DataImportHandler. I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) as soon as I try loading more than about 5 million records. Here's my configuration: I'm connecting to a SQL Server database using the sqljdbc driver. I've given my Solr instance 1.5 GB of memory. I have set the dataSource batchSize to 1. My SQL query is select top XXX field1, ... from table1. I have about 40 fields in my Solr schema. I thought the DataImportHandler would stream data from the DB rather than loading it all into memory at once. Is that not the case? Any thoughts on how to get around this (aside from getting a machine with more memory)? -- View this message in context: http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImportHandler running out of memory
This is a bug in MySQL. Try setting the Fetch Size the Statement on the connection to Integer.MIN_VALUE. See http://forums.mysql.com/read.php?39,137457 amongst a host of other discussions on the subject. Basically, it tries to load all the rows into memory, the only alternative is to set the fetch size to Integer.MIN_VALUE so that it gets it one row at a time. I've hit this one myself and it isn't caused by the DataImportHandler, but by the MySQL JDBC handler. -Grant On Jun 24, 2008, at 8:23 PM, wojtekpia wrote: I'm trying to load ~10 million records into Solr using the DataImportHandler. I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) as soon as I try loading more than about 5 million records. Here's my configuration: I'm connecting to a SQL Server database using the sqljdbc driver. I've given my Solr instance 1.5 GB of memory. I have set the dataSource batchSize to 1. My SQL query is select top XXX field1, ... from table1. I have about 40 fields in my Solr schema. I thought the DataImportHandler would stream data from the DB rather than loading it all into memory at once. Is that not the case? Any thoughts on how to get around this (aside from getting a machine with more memory)? -- View this message in context: http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Re: DataImportHandler running out of memory
Setting the batchSize to 1 would mean that the Jdbc driver will keep 1 rows in memory *for each entity* which uses that data source (if correctly implemented by the driver). Not sure how well the Sql Server driver implements this. Also keep in mind that Solr also needs memory to index documents. You can probably try setting the batch size to a lower value. The regular memory tuning stuff should apply here too -- try disabling autoCommit and turn-off autowarming and see if it helps. On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED] wrote: I'm trying to load ~10 million records into Solr using the DataImportHandler. I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) as soon as I try loading more than about 5 million records. Here's my configuration: I'm connecting to a SQL Server database using the sqljdbc driver. I've given my Solr instance 1.5 GB of memory. I have set the dataSource batchSize to 1. My SQL query is select top XXX field1, ... from table1. I have about 40 fields in my Solr schema. I thought the DataImportHandler would stream data from the DB rather than loading it all into memory at once. Is that not the case? Any thoughts on how to get around this (aside from getting a machine with more memory)? -- View this message in context: http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.
Re: DataImportHandler running out of memory
DIH streams rows one by one. set the fetchSize=-1 this might help. It may make the indexing a bit slower but memory consumption would be low. The memory is consumed by the jdbc driver. try tuning the -Xmx value for the VM --Noble On Wed, Jun 25, 2008 at 8:05 AM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: Setting the batchSize to 1 would mean that the Jdbc driver will keep 1 rows in memory *for each entity* which uses that data source (if correctly implemented by the driver). Not sure how well the Sql Server driver implements this. Also keep in mind that Solr also needs memory to index documents. You can probably try setting the batch size to a lower value. The regular memory tuning stuff should apply here too -- try disabling autoCommit and turn-off autowarming and see if it helps. On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED] wrote: I'm trying to load ~10 million records into Solr using the DataImportHandler. I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) as soon as I try loading more than about 5 million records. Here's my configuration: I'm connecting to a SQL Server database using the sqljdbc driver. I've given my Solr instance 1.5 GB of memory. I have set the dataSource batchSize to 1. My SQL query is select top XXX field1, ... from table1. I have about 40 fields in my Solr schema. I thought the DataImportHandler would stream data from the DB rather than loading it all into memory at once. Is that not the case? Any thoughts on how to get around this (aside from getting a machine with more memory)? -- View this message in context: http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar. -- --Noble Paul
Re: DataImportHandler running out of memory
it is batchSize=-1 not fetchSize. Or keep it to a very small value. --Noble On Wed, Jun 25, 2008 at 9:31 AM, Noble Paul നോബിള് नोब्ळ् [EMAIL PROTECTED] wrote: DIH streams rows one by one. set the fetchSize=-1 this might help. It may make the indexing a bit slower but memory consumption would be low. The memory is consumed by the jdbc driver. try tuning the -Xmx value for the VM --Noble On Wed, Jun 25, 2008 at 8:05 AM, Shalin Shekhar Mangar [EMAIL PROTECTED] wrote: Setting the batchSize to 1 would mean that the Jdbc driver will keep 1 rows in memory *for each entity* which uses that data source (if correctly implemented by the driver). Not sure how well the Sql Server driver implements this. Also keep in mind that Solr also needs memory to index documents. You can probably try setting the batch size to a lower value. The regular memory tuning stuff should apply here too -- try disabling autoCommit and turn-off autowarming and see if it helps. On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED] wrote: I'm trying to load ~10 million records into Solr using the DataImportHandler. I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) as soon as I try loading more than about 5 million records. Here's my configuration: I'm connecting to a SQL Server database using the sqljdbc driver. I've given my Solr instance 1.5 GB of memory. I have set the dataSource batchSize to 1. My SQL query is select top XXX field1, ... from table1. I have about 40 fields in my Solr schema. I thought the DataImportHandler would stream data from the DB rather than loading it all into memory at once. Is that not the case? Any thoughts on how to get around this (aside from getting a machine with more memory)? -- View this message in context: http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar. -- --Noble Paul -- --Noble Paul