Re: DataImport using SqlEntityProcessor running Out of Memory

2014-05-13 Thread Mikhail Khludnev
Hello O,
It seems to me (but it's better to look at the heap histogram) that
buffering sub-entities in SortedMapBackedCache blows heap off.
I'm aware about two directions:
- use file based cache instead. I don't know exactly how it works, you can
start from https://issues.apache.org/jira/browse/SOLR-2382 and check how to
enable berkleyDB cache;
- personally, I'm promoting merging resultsets ordered by RDBMS
https://issues.apache.org/jira/browse/SOLR-4799




On Fri, May 9, 2014 at 7:16 PM, O. Olson olson_...@yahoo.it wrote:

 I have a Data Schema which is Hierarchical i.e. I have an Entity and a
 number
 of attributes. For a small subset of the Data - about 300 MB, I can do the
 import with 3 GB memory. Now with the entire 4 GB Dataset, I find I cannot
 do the import with 9 GB of memory.
 I am using the SqlEntityProcessor as below:

 dataConfig
 dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver

 url=jdbc:sqlserver://localhost\MSSQLSERVER;databaseName=SolrDB;user=solrusr;password=solrusr;/
 document
 entity name=Entity query=SELECT EntID, Image
 FROM ENTITY_TABLE
 field column=EntID name=EntID /
 field column=Image name=Image /

 entity name=EntityAttribute1
 query=SELECT AttributeValue, EntID FROM ATTR_TABLE
 WHERE AttributeID=1
 cacheKey=EntID
 cacheLookup=Entity.EntID
 processor=SqlEntityProcessor cacheImpl=SortedMapBackedCache
 field column=AttributeValue
 name=EntityAttribute1 /
 /entity
 entity name=EntityAttribute2
 query=SELECT AttributeValue, EntID FROM ATTR_TABLE
 WHERE AttributeID=2
 cacheKey=EntID
 cacheLookup=Entity.EntID
 processor=SqlEntityProcessor cacheImpl=SortedMapBackedCache
 field column=AttributeValue
 name=EntityAttribute2 /
 /entity



 /entity
 /document
 /dataConfig



 What is the best way to import this data? Doing it without a cache, results
 in many SQL queries. With the cache, I run out of memory.

 I’m curious why 4GB of data cannot entirely fit in memory. One thing I need
 to mention is that I have about 400 to 500 attributes.

 Thanks in advance for any helpful advice.
 O. O.




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/DataImport-using-SqlEntityProcessor-running-Out-of-Memory-tp4135080.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: DataImport using SqlEntityProcessor running Out of Memory

2014-05-12 Thread Shawn Heisey
On 5/9/2014 9:16 AM, O. Olson wrote:
 I have a Data Schema which is Hierarchical i.e. I have an Entity and a number
 of attributes. For a small subset of the Data - about 300 MB, I can do the
 import with 3 GB memory. Now with the entire 4 GB Dataset, I find I cannot
 do the import with 9 GB of memory. 
 I am using the SqlEntityProcessor as below: 
 
 dataConfig
 dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
 url=jdbc:sqlserver://localhost\MSSQLSERVER;databaseName=SolrDB;user=solrusr;password=solrusr;/

Upgrade your JDBC driver to 1.2 or later, or turn on response buffering.
 The following URL has this information.  It's a very long URL, so if
your mail client wraps it, you may not be able to click on it properly:

http://wiki.apache.org/solr/DataImportHandlerFaq#I.27m_using_DataImportHandler_with_MS_SQL_Server_database_with_sqljdbc_driver._DataImportHandler_is_going_out_of_memory._I_tried_adjustng_the_batchSize_values_but_they_don.27t_seem_to_make_any_difference._How_do_I_fix_this.3F

Thanks,
Shawn



DataImport using SqlEntityProcessor running Out of Memory

2014-05-11 Thread O. Olson
I have a Data Schema which is Hierarchical i.e. I have an Entity and a number
of attributes. For a small subset of the Data - about 300 MB, I can do the
import with 3 GB memory. Now with the entire 4 GB Dataset, I find I cannot
do the import with 9 GB of memory. 
I am using the SqlEntityProcessor as below: 

dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
url=jdbc:sqlserver://localhost\MSSQLSERVER;databaseName=SolrDB;user=solrusr;password=solrusr;/
document
entity name=Entity query=SELECT EntID, Image FROM 
ENTITY_TABLE
field column=EntID name=EntID /
field column=Image name=Image /

entity name=EntityAttribute1  
query=SELECT AttributeValue, EntID FROM ATTR_TABLE
WHERE AttributeID=1 
cacheKey=EntID 
cacheLookup=Entity.EntID
processor=SqlEntityProcessor cacheImpl=SortedMapBackedCache
field column=AttributeValue 
name=EntityAttribute1 / 
/entity
entity name=EntityAttribute2  
query=SELECT AttributeValue, EntID FROM ATTR_TABLE
WHERE AttributeID=2 
cacheKey=EntID 
cacheLookup=Entity.EntID
processor=SqlEntityProcessor cacheImpl=SortedMapBackedCache
field column=AttributeValue 
name=EntityAttribute2 / 
/entity



/entity
/document
/dataConfig



What is the best way to import this data? Doing it without a cache, results
in many SQL queries. With the cache, I run out of memory. 

I’m curious why 4GB of data cannot entirely fit in memory. One thing I need
to mention is that I have about 400 to 500 attributes. 

Thanks in advance for any helpful advice. 
O. O. 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/DataImport-using-SqlEntityProcessor-running-Out-of-Memory-tp4135080.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Running out of memory

2012-08-16 Thread Amit Nithian
I am debugging an out of memory error myself and a few suggestions:
1) Are you looking at your search logs around the time of the memory
error? In my case, I found a few bad queries requesting a ton of rows
(basically the whole index's worth which I think is an error somewhere
in our app just have to find it) which happened close to the OOM error
being thrown.
2) Do you have Solr hooked up to something like NewRelic/AppDynamics
to see the cache usage in real time? Maybe as was suggested, tuning
down or eliminating low used caches could help.
3) Are you ensuring that you aren't setting stored=true on fields
that don't need it? This will increase the index size and possibly the
cache size if lazy loading isn't enabled (to be honest, this part I am
a bit unclear of since I haven't had much experience with this
myself).

Thanks
Amit

On Mon, Aug 13, 2012 at 11:37 AM, Jon Drukman jdruk...@gmail.com wrote:
 On Sun, Aug 12, 2012 at 12:31 PM, Alexey Serba ase...@gmail.com wrote:

  It would be vastly preferable if Solr could just exit when it gets a
 memory
  error, because we have it running under daemontools, and that would cause
  an automatic restart.
 -XX:OnOutOfMemoryError=cmd args; cmd args
 Run user-defined commands when an OutOfMemoryError is first thrown.

  Does Solr require the entire index to fit in memory at all times?
 No.

 But it's hard to say about your particular problem without additional
 information. How often do you commit? Do you use faceting? Do you sort
 by Solr fields and if yes what are those fields? And you should also
 check caches.


 I upgraded to solr-3.6.1 and an extra large amazon instance (15GB RAM) so
 we'll see if that helps.  So far no out of memory errors.


Re: Running out of memory

2012-08-13 Thread Jon Drukman
On Sun, Aug 12, 2012 at 12:31 PM, Alexey Serba ase...@gmail.com wrote:

  It would be vastly preferable if Solr could just exit when it gets a
 memory
  error, because we have it running under daemontools, and that would cause
  an automatic restart.
 -XX:OnOutOfMemoryError=cmd args; cmd args
 Run user-defined commands when an OutOfMemoryError is first thrown.

  Does Solr require the entire index to fit in memory at all times?
 No.

 But it's hard to say about your particular problem without additional
 information. How often do you commit? Do you use faceting? Do you sort
 by Solr fields and if yes what are those fields? And you should also
 check caches.


I upgraded to solr-3.6.1 and an extra large amazon instance (15GB RAM) so
we'll see if that helps.  So far no out of memory errors.


Re: Running out of memory

2012-08-12 Thread Alexey Serba
 It would be vastly preferable if Solr could just exit when it gets a memory
 error, because we have it running under daemontools, and that would cause
 an automatic restart.
-XX:OnOutOfMemoryError=cmd args; cmd args
Run user-defined commands when an OutOfMemoryError is first thrown.

 Does Solr require the entire index to fit in memory at all times?
No.

But it's hard to say about your particular problem without additional
information. How often do you commit? Do you use faceting? Do you sort
by Solr fields and if yes what are those fields? And you should also
check caches.


Re: Running out of memory

2012-08-06 Thread Michael Della Bitta
You might want to look at turning down or eliminating your caches if
you're running out of RAM. Possibly some of them have a low hit rate,
which you can see on the Stats page. Caches with a low hit rate are
only consuming RAM and CPU cycles.

Also, using this JVM arg might reduce the memory footprint:
-XX:+UseCompressedOops

In the end though, the surefire solution would be to go to an instance
type with more RAM: http://www.ec2instances.info/

Michael Della Bitta


Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Mon, Aug 6, 2012 at 1:48 PM, Jon Drukman jdruk...@gmail.com wrote:
 Hi there.  I am running Solr 1.4.1 on an Amazon EC2 box with 7.5GB of RAM.
  It was set up about 18 months ago and has been largely trouble-free.
  Unfortunately, lately it has started to run out of memory pretty much
 every day.  We are seeing

 SEVERE: java.lang.OutOfMemoryError: Java heap space

 When that happens, a simple query like
 http://localhost:8983/solr/select?q=*:*'
 returns nothing.

 I am starting Solr with the following:

 /usr/lib/jvm/jre/bin/java -XX:+UseConcMarkSweepGC -Xms1G -Xmx5G -jar
 start.jar

 It would be vastly preferable if Solr could just exit when it gets a memory
 error, because we have it running under daemontools, and that would cause
 an automatic restart.  After restarting, Solr works fine for another 12-18
 hours.  Not ideal but at least it wouldn't require human intervention to
 get it going again.

 What can I do to reduce the memory pressure?  Does Solr require the entire
 index to fit in memory at all times?  The on disk size is 15GB.  There are
 27.5 million documents, but they are all tiny (mostly one line forum
 comments like this game is awesome).

 We're using Sun openJava SDK 1.6 if that matters.

 -jsd-


Re: DataImportHandler running out of memory

2012-02-23 Thread Shawn Heisey

On 2/20/2012 6:49 AM, v_shan wrote:

DIH still running out of memory for me, with Full Import on a database of
size 1.5 GB.

Solr version: 3_5_0

Note that I have already added batchSize=-1 but getting same error.


A few questions:

- How much memory have you given to the JVM running this Solr instance?
- How much memory does your server have?
- What is the size of all your index cores, and how many documents are 
in them?
- How large are your Solr caches (filterCache, documentCache, 
queryResultCache)?

- What is your ramBufferSize set to in the indexDefaults section?

Thanks,
Shawn



Re: DataImportHandler running out of memory

2012-02-20 Thread v_shan
DIH still running out of memory for me, with Full Import on a database of
size 1.5 GB.

Solr version: 3_5_0

Note that I have already added batchSize=-1 but getting same error. 
Sharing my DIH config below.


dataConfig
dataSource type=JdbcDataSource
name=jdbc
   driver=com.mysql.jdbc.Driver
   url=jdbc:mysql://localhost:3306/ib
   user=root 
   password=root
   batchSize=-1
   /
document name=content
entity 
name=issue 
dataSource=jdbc
transformer=RegexTransformer,DateFormatTransformer,
TemplateTransformer
pk=id
query=
select
ib_issue.`_id` as id,
ib_issue.`_issue_title` as issueTitle,
ib_issue.`_issue_descr` as issueDescr,
createdBy.`_name` as issueCreatedByName,
createdBy.`_email` as issueCreatedByEmail
from
`ib_issue` 
inner join `ib_user` as createdBy 
on createdBy.`_id` = ib_issue.`_created_by_user_id` 

group by ib_issue.`_id` 

/entity
/document
/dataConfig


Please find the error trace below
===
2012-02-20 19:04:40.531:INFO::Started SocketConnector@0.0.0.0:8983
Feb 20, 2012 7:04:57 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/select params={command=statusqt=/dih_ib_jdbc}
status=0 QTime=0
Feb 20, 2012 7:04:58 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/select
params={command=show-configqt=/dih_ib_jdbc} status=0 QTime=0
Feb 20, 2012 7:05:30 PM org.apache.solr.handler.dataimport.DataImporter
doFullImport
INFO: Starting Full Import
Feb 20, 2012 7:05:30 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/dih_ib_jdbc params={command=full-import}
status=0 QTime=0
Feb 20, 2012 7:05:30 PM org.apache.solr.handler.dataimport.SolrWriter
readIndexerProperties
INFO: Read dih_ib_jdbc.properties
Feb 20, 2012 7:05:30 PM org.apache.solr.update.DirectUpdateHandler2
deleteAll
INFO: [] REMOVING ALL DOCUMENTS FROM INDEX
Feb 20, 2012 7:05:30 PM org.apache.solr.core.SolrDeletionPolicy onInit
INFO: SolrDeletionPolicy.onInit: commits:num=1
   
commit{dir=E:\workspace\solr_3_5_0\example\solr\data\index,segFN=segments_1,version=1329744880204,generation=1,filenames=[segments_1]
Feb 20, 2012 7:05:30 PM org.apache.solr.core.SolrDeletionPolicy
updateCommits
INFO: newest commit = 1329744880204
Feb 20, 2012 7:05:30 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
call
INFO: Creating a connection for entity issue with URL:
jdbc:mysql://localhost:3306/issueburner
Feb 20, 2012 7:05:30 PM org.apache.solr.handler.dataimport.JdbcDataSource$1
call
INFO: Time taken for getConnection(): 172
Feb 20, 2012 7:07:45 PM org.apache.solr.common.SolrException log
SEVERE: Full Import
failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.OutOfMemoryError: Java heap space
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:669)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:268)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:359)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:427)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:408)
Caused by: java.lang.OutOfMemoryError: Java heap space
at
org.apache.lucene.util.UnicodeUtil.UTF16toUTF8(UnicodeUtil.java:377)
at
org.apache.lucene.store.DataOutput.writeString(DataOutput.java:103)
at
org.apache.lucene.index.FieldsWriter.writeField(FieldsWriter.java:200)
at
org.apache.lucene.index.StoredFieldsWriterPerThread.addField(StoredFieldsWriterPerThread.java:58)
at
org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:265)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:766)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2327)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2299)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:240)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
at
org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:73)
at
org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:293)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument

fixed schema problems, now running out of memory?

2011-10-26 Thread Fred Zimmerman
It's a small indexing job coming from nutch.

2011-10-26 15:07:29,039 WARN  mapred.LocalJobRunner - job_local_0011
java.io.IOException: org.apache.solr.client.solrj.SolrServerException: Error
executi$
at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getRec$
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:338)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
Caused by: org.apache.solr.client.solrj.SolrServerException: Error executing
query
at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.ja$
at
org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
at
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getRec$
... 3 more
Caused by: org.apache.solr.common.SolrException: Java heap space
 java.lang.OutOfMem$

Java heap space  java.lang.OutOfMemoryError: Java heap spaceat
org.apache.lucene$

request: localhost/solr/select?q=id:[* TO *]fl=id,boost,tstamp,$
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt$
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt$
at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.ja$
... 5 more


Re: fixed schema problems, now running out of memory?

2011-10-26 Thread Fred Zimmerman
More on what's happening. It seems to be timing out during the commit.

The new documents are small, but the existing index is large (11 million
records).

INFO: Closing Searcher@4a7df6 main

 fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
 ...



 Oct 26, 2011 4:51:17 PM org.apache.solr.update.processor.LogUpdateProcessor
 finish
 *INFO: {commit=} 0 2453
 **Oct 26, 2011 4:51:17 PM org.apache.solr.core.SolrCore execute
 **INFO: [] webapp=/solr path=/update
 params={waitSearcher=truewaitFlush=truewt=javabincommit=trueversion=2}
 status=0 QTime=2453
 *Oct 26, 2011 4:51:52 PM org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/select
 params={fl=idwt=javabinq=id:[*+TO+*]rows=1version=2} hits=11576871 
 *status=0
 QTime=35298*
 Oct 26, 2011 4:51:53 PM org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/select
 params={fl=idwt=javabinq=id:[*+TO+*]rows=1version=2} hits=11576871
 status=0 QTime=1
 *java.lang.OutOfMemoryError: Java heap space*
 Dumping heap to /home/bitnami/apache-solr-3.4.0/example/heaplog ...
 Heap dump file created [306866344 bytes in 32.376 secs]



On Wed, Oct 26, 2011 at 11:09 AM, Fred Zimmerman zimzaz@gmail.comwrote:

 It's a small indexing job coming from nutch.

 2011-10-26 15:07:29,039 WARN  mapred.LocalJobRunner - job_local_0011
 java.io.IOException: org.apache.solr.client.solrj.SolrServerException:
 Error executi$
 at
 org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getRec$
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:338)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
 at
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
 Caused by: org.apache.solr.client.solrj.SolrServerException: Error
 executing query
 at
 org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.ja$
 at
 org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
 at
 org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getRec$
 ... 3 more
 Caused by: org.apache.solr.common.SolrException: Java heap space
  java.lang.OutOfMem$

 Java heap space  java.lang.OutOfMemoryError: Java heap spaceat
 org.apache.lucene$

 request: localhost/solr/select?q=id:[* TO *]fl=id,boost,tstamp,$
 at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt$
 at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt$
 at
 org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.ja$
 ... 5 more




Re: fixed schema problems, now running out of memory?

2011-10-26 Thread Fred Zimmerman
http://wiki.apache.org/solr/SolrPerformanceFactors#Schema_Design_Considerations

The number of indexed fields greatly increases the following:

- Memory usage during indexing


- Segment merge time


- Optimization times


- Index size

 These impacts can be reduced by the use of omitNorms=true


http://lucene.472066.n3.nabble.com/What-is-omitNorms-td2987547.html

1. length normalization will not work on the specific field--
 Which means matching documents with shorter length will not be
 preferred/boost over matching documents with greater length for the specific
 field, at search time.
 For my application, I actually prefer documents with greater length.
 2. Index time boosting will not be available on the field.
 If, both the above cases are not required by you, then, you can set
 omitNorms=true for the specific fields.
 This has an added advantage, it will save you some(or a lot of) RAM also,
 since, with omitNorms=false on total N fields in the index will require
 RAM of size:
  Total docs in index * 1 byte * N
 I have a lot of fields: I count 31 without omitNorms values, which means
 false by default.


Gak!  11,000,000 * 1 * 31 = 31 x 10M = 310MB RAM all by itself.

On Wed, Oct 26, 2011 at 1:01 PM, Fred Zimmerman zimzaz@gmail.comwrote:

 More on what's happening. It seems to be timing out during the commit.

 The new documents are small, but the existing index is large (11 million
 records).

 INFO: Closing Searcher@4a7df6 main

 fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
 ...



 Oct 26, 2011 4:51:17 PM
 org.apache.solr.update.processor.LogUpdateProcessor finish
 *INFO: {commit=} 0 2453
 **Oct 26, 2011 4:51:17 PM org.apache.solr.core.SolrCore execute
 **INFO: [] webapp=/solr path=/update
 params={waitSearcher=truewaitFlush=truewt=javabincommit=trueversion=2}
 status=0 QTime=2453
 *Oct 26, 2011 4:51:52 PM org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/select
 params={fl=idwt=javabinq=id:[*+TO+*]rows=1version=2} hits=11576871 
 *status=0
 QTime=35298*
 Oct 26, 2011 4:51:53 PM org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/solr path=/select
 params={fl=idwt=javabinq=id:[*+TO+*]rows=1version=2} hits=11576871
 status=0 QTime=1
 *java.lang.OutOfMemoryError: Java heap space*
 Dumping heap to /home/bitnami/apache-solr-3.4.0/example/heaplog ...
 Heap dump file created [306866344 bytes in 32.376 secs]



 On Wed, Oct 26, 2011 at 11:09 AM, Fred Zimmerman zimzaz@gmail.comwrote:

 It's a small indexing job coming from nutch.

 2011-10-26 15:07:29,039 WARN  mapred.LocalJobRunner - job_local_0011
 java.io.IOException: org.apache.solr.client.solrj.SolrServerException:
 Error executi$
 at
 org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getRec$
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:338)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
 at
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
 Caused by: org.apache.solr.client.solrj.SolrServerException: Error
 executing query
 at
 org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.ja$
 at
 org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
 at
 org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getRec$
 ... 3 more
 Caused by: org.apache.solr.common.SolrException: Java heap space
  java.lang.OutOfMem$

 Java heap space  java.lang.OutOfMemoryError: Java heap spaceat
 org.apache.lucene$

 request: localhost/solr/select?q=id:[* TO *]fl=id,boost,tstamp,$
 at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt$
 at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHt$
 at
 org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.ja$
 ... 5 more





Re: DataImportHandler running out of memory

2008-11-03 Thread sunnyfr

Hi,

I tried batchSize =-1 but when I'm doing that I will use all mysql's memory
and it's a problem for mysql's database.

:s


Noble Paul നോബിള്‍ नोब्ळ् wrote:
 
 I've moved the FAQ to a new Page
 http://wiki.apache.org/solr/DataImportHandlerFaq
 The DIH page is too big and editing has become harder
 
 On Thu, Jun 26, 2008 at 6:07 PM, Shalin Shekhar Mangar
 [EMAIL PROTECTED] wrote:
 I've added a FAQ section to DataImportHandler wiki page which captures
 question on out of memory exception with both MySQL and MS SQL Server
 drivers.

 http://wiki.apache.org/solr/DataImportHandler#faq

 On Thu, Jun 26, 2008 at 9:36 AM, Noble Paul നോബിള്‍ नोब्ळ्
 [EMAIL PROTECTED] wrote:
 We must document this information in the wiki.  We never had a chance
 to play w/ ms sql server
 --Noble

 On Thu, Jun 26, 2008 at 12:38 AM, wojtekpia [EMAIL PROTECTED]
 wrote:

 It looks like that was the problem. With responseBuffering=adaptive,
 I'm able
 to load all my data using the sqljdbc driver.
 --
 View this message in context:
 http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18119732.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 --Noble Paul




 --
 Regards,
 Shalin Shekhar Mangar.

 
 
 
 -- 
 --Noble Paul
 
 

-- 
View this message in context: 
http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p20305039.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DataImportHandler running out of memory

2008-10-31 Thread sunnyfr

Hi Grant,

How did you finally managed it  
I've the same problem with less data, 8,5M, if I put a batchsize -1, I will
slow down a lot the database which is not that good for the website and
stack request.
What did you do you ??? 

Thanks,


Grant Ingersoll-6 wrote:
 
 I think it's a bit different.  I ran into this exact problem about two  
 weeks ago on a 13 million record DB.  MySQL doesn't honor the fetch  
 size for it's v5 JDBC driver.
 
 See
 http://www.databasesandlife.com/reading-row-by-row-into-java-from-mysql/ 
   or do a search for MySQL fetch size.
 
 You actually have to do setFetchSize(Integer.MIN_VALUE) (-1 doesn't  
 work) in order to get streaming in MySQL.
 
 -Grant
 
 
 On Jun 24, 2008, at 10:35 PM, Shalin Shekhar Mangar wrote:
 
 Setting the batchSize to 1 would mean that the Jdbc driver will  
 keep
 1 rows in memory *for each entity* which uses that data source (if
 correctly implemented by the driver). Not sure how well the Sql Server
 driver implements this. Also keep in mind that Solr also needs  
 memory to
 index documents. You can probably try setting the batch size to a  
 lower
 value.

 The regular memory tuning stuff should apply here too -- try disabling
 autoCommit and turn-off autowarming and see if it helps.

 On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED]  
 wrote:


 I'm trying to load ~10 million records into Solr using the
 DataImportHandler.
 I'm running out of memory (java.lang.OutOfMemoryError: Java heap  
 space) as
 soon as I try loading more than about 5 million records.

 Here's my configuration:
 I'm connecting to a SQL Server database using the sqljdbc driver.  
 I've
 given
 my Solr instance 1.5 GB of memory. I have set the dataSource  
 batchSize to
 1. My SQL query is select top XXX field1, ... from table1. I  
 have
 about 40 fields in my Solr schema.

 I thought the DataImportHandler would stream data from the DB  
 rather than
 loading it all into memory at once. Is that not the case? Any  
 thoughts on
 how to get around this (aside from getting a machine with more  
 memory)?

 --
 View this message in context:
 http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 --
 Grant Ingersoll
 http://www.lucidimagination.com
 
 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ
 
 
 
 
 
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p20263146.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DataImportHandler running out of memory

2008-10-31 Thread Noble Paul നോബിള്‍ नोब्ळ्
I've moved the FAQ to a new Page
http://wiki.apache.org/solr/DataImportHandlerFaq
The DIH page is too big and editing has become harder

On Thu, Jun 26, 2008 at 6:07 PM, Shalin Shekhar Mangar
[EMAIL PROTECTED] wrote:
 I've added a FAQ section to DataImportHandler wiki page which captures
 question on out of memory exception with both MySQL and MS SQL Server
 drivers.

 http://wiki.apache.org/solr/DataImportHandler#faq

 On Thu, Jun 26, 2008 at 9:36 AM, Noble Paul നോബിള്‍ नोब्ळ्
 [EMAIL PROTECTED] wrote:
 We must document this information in the wiki.  We never had a chance
 to play w/ ms sql server
 --Noble

 On Thu, Jun 26, 2008 at 12:38 AM, wojtekpia [EMAIL PROTECTED] wrote:

 It looks like that was the problem. With responseBuffering=adaptive, I'm 
 able
 to load all my data using the sqljdbc driver.
 --
 View this message in context: 
 http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18119732.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 --Noble Paul




 --
 Regards,
 Shalin Shekhar Mangar.




-- 
--Noble Paul


Re: DataImportHandler running out of memory

2008-06-26 Thread Shalin Shekhar Mangar
I've added a FAQ section to DataImportHandler wiki page which captures
question on out of memory exception with both MySQL and MS SQL Server
drivers.

http://wiki.apache.org/solr/DataImportHandler#faq

On Thu, Jun 26, 2008 at 9:36 AM, Noble Paul നോബിള്‍ नोब्ळ्
[EMAIL PROTECTED] wrote:
 We must document this information in the wiki.  We never had a chance
 to play w/ ms sql server
 --Noble

 On Thu, Jun 26, 2008 at 12:38 AM, wojtekpia [EMAIL PROTECTED] wrote:

 It looks like that was the problem. With responseBuffering=adaptive, I'm able
 to load all my data using the sqljdbc driver.
 --
 View this message in context: 
 http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18119732.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 --Noble Paul




-- 
Regards,
Shalin Shekhar Mangar.


Re: DataImportHandler running out of memory

2008-06-25 Thread Grant Ingersoll
I think it's a bit different.  I ran into this exact problem about two  
weeks ago on a 13 million record DB.  MySQL doesn't honor the fetch  
size for it's v5 JDBC driver.


See http://www.databasesandlife.com/reading-row-by-row-into-java-from-mysql/ 
 or do a search for MySQL fetch size.


You actually have to do setFetchSize(Integer.MIN_VALUE) (-1 doesn't  
work) in order to get streaming in MySQL.


-Grant


On Jun 24, 2008, at 10:35 PM, Shalin Shekhar Mangar wrote:

Setting the batchSize to 1 would mean that the Jdbc driver will  
keep

1 rows in memory *for each entity* which uses that data source (if
correctly implemented by the driver). Not sure how well the Sql Server
driver implements this. Also keep in mind that Solr also needs  
memory to
index documents. You can probably try setting the batch size to a  
lower

value.

The regular memory tuning stuff should apply here too -- try disabling
autoCommit and turn-off autowarming and see if it helps.

On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED]  
wrote:




I'm trying to load ~10 million records into Solr using the
DataImportHandler.
I'm running out of memory (java.lang.OutOfMemoryError: Java heap  
space) as

soon as I try loading more than about 5 million records.

Here's my configuration:
I'm connecting to a SQL Server database using the sqljdbc driver.  
I've

given
my Solr instance 1.5 GB of memory. I have set the dataSource  
batchSize to
1. My SQL query is select top XXX field1, ... from table1. I  
have

about 40 fields in my Solr schema.

I thought the DataImportHandler would stream data from the DB  
rather than
loading it all into memory at once. Is that not the case? Any  
thoughts on
how to get around this (aside from getting a machine with more  
memory)?


--
View this message in context:
http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
Sent from the Solr - User mailing list archive at Nabble.com.





--
Regards,
Shalin Shekhar Mangar.


--
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ









Re: DataImportHandler running out of memory

2008-06-25 Thread Grant Ingersoll
I'm assuming, of course, that the DIH doesn't automatically modify the  
SQL statement according to the batch size.


-Grant

On Jun 25, 2008, at 7:05 AM, Grant Ingersoll wrote:

I think it's a bit different.  I ran into this exact problem about  
two weeks ago on a 13 million record DB.  MySQL doesn't honor the  
fetch size for it's v5 JDBC driver.


See http://www.databasesandlife.com/reading-row-by-row-into-java-from-mysql/ 
 or do a search for MySQL fetch size.


You actually have to do setFetchSize(Integer.MIN_VALUE) (-1 doesn't  
work) in order to get streaming in MySQL.


-Grant


On Jun 24, 2008, at 10:35 PM, Shalin Shekhar Mangar wrote:

Setting the batchSize to 1 would mean that the Jdbc driver will  
keep
1 rows in memory *for each entity* which uses that data source  
(if
correctly implemented by the driver). Not sure how well the Sql  
Server
driver implements this. Also keep in mind that Solr also needs  
memory to
index documents. You can probably try setting the batch size to a  
lower

value.

The regular memory tuning stuff should apply here too -- try  
disabling

autoCommit and turn-off autowarming and see if it helps.

On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED]  
wrote:




I'm trying to load ~10 million records into Solr using the
DataImportHandler.
I'm running out of memory (java.lang.OutOfMemoryError: Java heap  
space) as

soon as I try loading more than about 5 million records.

Here's my configuration:
I'm connecting to a SQL Server database using the sqljdbc driver.  
I've

given
my Solr instance 1.5 GB of memory. I have set the dataSource  
batchSize to
1. My SQL query is select top XXX field1, ... from table1. I  
have

about 40 fields in my Solr schema.

I thought the DataImportHandler would stream data from the DB  
rather than
loading it all into memory at once. Is that not the case? Any  
thoughts on
how to get around this (aside from getting a machine with more  
memory)?


--
View this message in context:
http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
Sent from the Solr - User mailing list archive at Nabble.com.





--
Regards,
Shalin Shekhar Mangar.


--
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ









--
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ









Re: DataImportHandler running out of memory

2008-06-25 Thread Shalin Shekhar Mangar
The OP is actually using Sql Server (not MySql) as per his mail.

On Wed, Jun 25, 2008 at 4:40 PM, Grant Ingersoll [EMAIL PROTECTED]
wrote:

 I'm assuming, of course, that the DIH doesn't automatically modify the SQL
 statement according to the batch size.

 -Grant


 On Jun 25, 2008, at 7:05 AM, Grant Ingersoll wrote:

  I think it's a bit different.  I ran into this exact problem about two
 weeks ago on a 13 million record DB.  MySQL doesn't honor the fetch size for
 it's v5 JDBC driver.

 See
 http://www.databasesandlife.com/reading-row-by-row-into-java-from-mysql/ or
 do a search for MySQL fetch size.

 You actually have to do setFetchSize(Integer.MIN_VALUE) (-1 doesn't work)
 in order to get streaming in MySQL.

 -Grant


 On Jun 24, 2008, at 10:35 PM, Shalin Shekhar Mangar wrote:

  Setting the batchSize to 1 would mean that the Jdbc driver will keep
 1 rows in memory *for each entity* which uses that data source (if
 correctly implemented by the driver). Not sure how well the Sql Server
 driver implements this. Also keep in mind that Solr also needs memory to
 index documents. You can probably try setting the batch size to a lower
 value.

 The regular memory tuning stuff should apply here too -- try disabling
 autoCommit and turn-off autowarming and see if it helps.

 On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED] wrote:


 I'm trying to load ~10 million records into Solr using the
 DataImportHandler.
 I'm running out of memory (java.lang.OutOfMemoryError: Java heap space)
 as
 soon as I try loading more than about 5 million records.

 Here's my configuration:
 I'm connecting to a SQL Server database using the sqljdbc driver. I've
 given
 my Solr instance 1.5 GB of memory. I have set the dataSource batchSize
 to
 1. My SQL query is select top XXX field1, ... from table1. I have
 about 40 fields in my Solr schema.

 I thought the DataImportHandler would stream data from the DB rather
 than
 loading it all into memory at once. Is that not the case? Any thoughts
 on
 how to get around this (aside from getting a machine with more memory)?

 --
 View this message in context:

 http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 Regards,
 Shalin Shekhar Mangar.


 --
 Grant Ingersoll
 http://www.lucidimagination.com

 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ








 --
 Grant Ingersoll
 http://www.lucidimagination.com

 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ










-- 
Regards,
Shalin Shekhar Mangar.


Re: DataImportHandler running out of memory

2008-06-25 Thread Noble Paul നോബിള്‍ नोब्ळ्
DIH does not modify SQL. This value is used as a connection property
--Noble

On Wed, Jun 25, 2008 at 4:40 PM, Grant Ingersoll [EMAIL PROTECTED] wrote:
 I'm assuming, of course, that the DIH doesn't automatically modify the SQL
 statement according to the batch size.

 -Grant

 On Jun 25, 2008, at 7:05 AM, Grant Ingersoll wrote:

 I think it's a bit different.  I ran into this exact problem about two
 weeks ago on a 13 million record DB.  MySQL doesn't honor the fetch size for
 it's v5 JDBC driver.

 See
 http://www.databasesandlife.com/reading-row-by-row-into-java-from-mysql/ or
 do a search for MySQL fetch size.

 You actually have to do setFetchSize(Integer.MIN_VALUE) (-1 doesn't work)
 in order to get streaming in MySQL.

 -Grant


 On Jun 24, 2008, at 10:35 PM, Shalin Shekhar Mangar wrote:

 Setting the batchSize to 1 would mean that the Jdbc driver will keep
 1 rows in memory *for each entity* which uses that data source (if
 correctly implemented by the driver). Not sure how well the Sql Server
 driver implements this. Also keep in mind that Solr also needs memory to
 index documents. You can probably try setting the batch size to a lower
 value.

 The regular memory tuning stuff should apply here too -- try disabling
 autoCommit and turn-off autowarming and see if it helps.

 On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED] wrote:


 I'm trying to load ~10 million records into Solr using the
 DataImportHandler.
 I'm running out of memory (java.lang.OutOfMemoryError: Java heap space)
 as
 soon as I try loading more than about 5 million records.

 Here's my configuration:
 I'm connecting to a SQL Server database using the sqljdbc driver. I've
 given
 my Solr instance 1.5 GB of memory. I have set the dataSource batchSize
 to
 1. My SQL query is select top XXX field1, ... from table1. I have
 about 40 fields in my Solr schema.

 I thought the DataImportHandler would stream data from the DB rather
 than
 loading it all into memory at once. Is that not the case? Any thoughts
 on
 how to get around this (aside from getting a machine with more memory)?

 --
 View this message in context:

 http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 Regards,
 Shalin Shekhar Mangar.

 --
 Grant Ingersoll
 http://www.lucidimagination.com

 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ








 --
 Grant Ingersoll
 http://www.lucidimagination.com

 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ











-- 
--Noble Paul


Re: DataImportHandler running out of memory

2008-06-25 Thread Noble Paul നോബിള്‍ नोब्ळ्
The latest patch sets fetchSize as Integer.MIN_VALUE if -1 is passed.
It is added specifically for mysql driver
--Noble

On Wed, Jun 25, 2008 at 4:35 PM, Grant Ingersoll [EMAIL PROTECTED] wrote:
 I think it's a bit different.  I ran into this exact problem about two weeks
 ago on a 13 million record DB.  MySQL doesn't honor the fetch size for it's
 v5 JDBC driver.

 See
 http://www.databasesandlife.com/reading-row-by-row-into-java-from-mysql/ or
 do a search for MySQL fetch size.

 You actually have to do setFetchSize(Integer.MIN_VALUE) (-1 doesn't work) in
 order to get streaming in MySQL.

 -Grant


 On Jun 24, 2008, at 10:35 PM, Shalin Shekhar Mangar wrote:

 Setting the batchSize to 1 would mean that the Jdbc driver will keep
 1 rows in memory *for each entity* which uses that data source (if
 correctly implemented by the driver). Not sure how well the Sql Server
 driver implements this. Also keep in mind that Solr also needs memory to
 index documents. You can probably try setting the batch size to a lower
 value.

 The regular memory tuning stuff should apply here too -- try disabling
 autoCommit and turn-off autowarming and see if it helps.

 On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED] wrote:


 I'm trying to load ~10 million records into Solr using the
 DataImportHandler.
 I'm running out of memory (java.lang.OutOfMemoryError: Java heap space)
 as
 soon as I try loading more than about 5 million records.

 Here's my configuration:
 I'm connecting to a SQL Server database using the sqljdbc driver. I've
 given
 my Solr instance 1.5 GB of memory. I have set the dataSource batchSize to
 1. My SQL query is select top XXX field1, ... from table1. I have
 about 40 fields in my Solr schema.

 I thought the DataImportHandler would stream data from the DB rather than
 loading it all into memory at once. Is that not the case? Any thoughts on
 how to get around this (aside from getting a machine with more memory)?

 --
 View this message in context:

 http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 Regards,
 Shalin Shekhar Mangar.

 --
 Grant Ingersoll
 http://www.lucidimagination.com

 Lucene Helpful Hints:
 http://wiki.apache.org/lucene-java/BasicsOfPerformance
 http://wiki.apache.org/lucene-java/LuceneFAQ


Re: DataImportHandler running out of memory

2008-06-25 Thread wojtekpia

I'm trying with batchSize=-1 now. So far it seems to be working, but very
slowly. I will update when it completes or crashes.

Even with a batchSize of 100 I was running out of memory.

I'm running on a 32-bit Windows machine. I've set the -Xmx to 1.5 GB - I
believe that's the maximum for my environment.

The batchSize parameter doesn't seem to control what happens... when I
select top 5,000,000 with a batchSize of 10,000, it works. When I select top
10,000,000 with the same batchSize, it runs out of memory.

Also, I'm using the 469 patch posted on 2008-06-11 08:41 AM.


Noble Paul നോബിള്‍ नोब्ळ् wrote:
 
 DIH streams rows one by one.
 set the fetchSize=-1 this might help. It may make the indexing a bit
 slower but memory consumption would be low.
 The memory is consumed by the jdbc driver. try tuning the -Xmx value for
 the VM
 --Noble
 
 On Wed, Jun 25, 2008 at 8:05 AM, Shalin Shekhar Mangar
 [EMAIL PROTECTED] wrote:
 Setting the batchSize to 1 would mean that the Jdbc driver will keep
 1 rows in memory *for each entity* which uses that data source (if
 correctly implemented by the driver). Not sure how well the Sql Server
 driver implements this. Also keep in mind that Solr also needs memory to
 index documents. You can probably try setting the batch size to a lower
 value.

 The regular memory tuning stuff should apply here too -- try disabling
 autoCommit and turn-off autowarming and see if it helps.

 On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED] wrote:


 I'm trying to load ~10 million records into Solr using the
 DataImportHandler.
 I'm running out of memory (java.lang.OutOfMemoryError: Java heap space)
 as
 soon as I try loading more than about 5 million records.

 Here's my configuration:
 I'm connecting to a SQL Server database using the sqljdbc driver. I've
 given
 my Solr instance 1.5 GB of memory. I have set the dataSource batchSize
 to
 1. My SQL query is select top XXX field1, ... from table1. I have
 about 40 fields in my Solr schema.

 I thought the DataImportHandler would stream data from the DB rather
 than
 loading it all into memory at once. Is that not the case? Any thoughts
 on
 how to get around this (aside from getting a machine with more memory)?

 --
 View this message in context:
 http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 Regards,
 Shalin Shekhar Mangar.

 
 
 
 -- 
 --Noble Paul
 
 

-- 
View this message in context: 
http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18115900.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DataImportHandler running out of memory

2008-06-25 Thread Shalin Shekhar Mangar
Hi,

I don't think the problem is within DataImportHandler since it just streams
resultset. The fetchSize is just passed as a parameter passed to
Statement#setFetchSize() and the Jdbc driver is supposed to honor it and
keep only that many rows in memory.

From what I could find about the Sql Server driver -- there's a connection
property called responseBuffering whose default value is full which causes
the entire result set is fetched. See
http://msdn.microsoft.com/en-us/library/ms378988.aspx for more details. You
can set connection properties like this directly in the jdbc url specified
in DataImportHandler's dataSource configuration.

On Wed, Jun 25, 2008 at 10:17 PM, wojtekpia [EMAIL PROTECTED] wrote:


 I'm trying with batchSize=-1 now. So far it seems to be working, but very
 slowly. I will update when it completes or crashes.

 Even with a batchSize of 100 I was running out of memory.

 I'm running on a 32-bit Windows machine. I've set the -Xmx to 1.5 GB - I
 believe that's the maximum for my environment.

 The batchSize parameter doesn't seem to control what happens... when I
 select top 5,000,000 with a batchSize of 10,000, it works. When I select
 top
 10,000,000 with the same batchSize, it runs out of memory.

 Also, I'm using the 469 patch posted on 2008-06-11 08:41 AM.


 Noble Paul നോബിള്‍ नोब्ळ् wrote:
 
  DIH streams rows one by one.
  set the fetchSize=-1 this might help. It may make the indexing a bit
  slower but memory consumption would be low.
  The memory is consumed by the jdbc driver. try tuning the -Xmx value for
  the VM
  --Noble
 
  On Wed, Jun 25, 2008 at 8:05 AM, Shalin Shekhar Mangar
  [EMAIL PROTECTED] wrote:
  Setting the batchSize to 1 would mean that the Jdbc driver will keep
  1 rows in memory *for each entity* which uses that data source (if
  correctly implemented by the driver). Not sure how well the Sql Server
  driver implements this. Also keep in mind that Solr also needs memory to
  index documents. You can probably try setting the batch size to a lower
  value.
 
  The regular memory tuning stuff should apply here too -- try disabling
  autoCommit and turn-off autowarming and see if it helps.
 
  On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED]
 wrote:
 
 
  I'm trying to load ~10 million records into Solr using the
  DataImportHandler.
  I'm running out of memory (java.lang.OutOfMemoryError: Java heap space)
  as
  soon as I try loading more than about 5 million records.
 
  Here's my configuration:
  I'm connecting to a SQL Server database using the sqljdbc driver. I've
  given
  my Solr instance 1.5 GB of memory. I have set the dataSource batchSize
  to
  1. My SQL query is select top XXX field1, ... from table1. I have
  about 40 fields in my Solr schema.
 
  I thought the DataImportHandler would stream data from the DB rather
  than
  loading it all into memory at once. Is that not the case? Any thoughts
  on
  how to get around this (aside from getting a machine with more memory)?
 
  --
  View this message in context:
 
 http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
  --
  Regards,
  Shalin Shekhar Mangar.
 
 
 
 
  --
  --Noble Paul
 
 

 --
 View this message in context:
 http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18115900.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.


Re: DataImportHandler running out of memory

2008-06-25 Thread wojtekpia

It looks like that was the problem. With responseBuffering=adaptive, I'm able
to load all my data using the sqljdbc driver.
-- 
View this message in context: 
http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18119732.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DataImportHandler running out of memory

2008-06-25 Thread Noble Paul നോബിള്‍ नोब्ळ्
We must document this information in the wiki.  We never had a chance
to play w/ ms sql server
--Noble

On Thu, Jun 26, 2008 at 12:38 AM, wojtekpia [EMAIL PROTECTED] wrote:

 It looks like that was the problem. With responseBuffering=adaptive, I'm able
 to load all my data using the sqljdbc driver.
 --
 View this message in context: 
 http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18119732.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
--Noble Paul


DataImportHandler running out of memory

2008-06-24 Thread wojtekpia

I'm trying to load ~10 million records into Solr using the DataImportHandler.
I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) as
soon as I try loading more than about 5 million records.

Here's my configuration:
I'm connecting to a SQL Server database using the sqljdbc driver. I've given
my Solr instance 1.5 GB of memory. I have set the dataSource batchSize to
1. My SQL query is select top XXX field1, ... from table1. I have
about 40 fields in my Solr schema.

I thought the DataImportHandler would stream data from the DB rather than
loading it all into memory at once. Is that not the case? Any thoughts on
how to get around this (aside from getting a machine with more memory)? 

-- 
View this message in context: 
http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DataImportHandler running out of memory

2008-06-24 Thread Grant Ingersoll
This is a bug in MySQL.  Try setting the Fetch Size the Statement on  
the connection to Integer.MIN_VALUE.


See http://forums.mysql.com/read.php?39,137457 amongst a host of other  
discussions on the subject.  Basically, it tries to load all the rows  
into memory, the only alternative is to set the fetch size to  
Integer.MIN_VALUE so that it gets it one row at a time.  I've hit this  
one myself and it isn't caused by the DataImportHandler, but by the  
MySQL JDBC handler.


-Grant



On Jun 24, 2008, at 8:23 PM, wojtekpia wrote:



I'm trying to load ~10 million records into Solr using the  
DataImportHandler.
I'm running out of memory (java.lang.OutOfMemoryError: Java heap  
space) as

soon as I try loading more than about 5 million records.

Here's my configuration:
I'm connecting to a SQL Server database using the sqljdbc driver.  
I've given
my Solr instance 1.5 GB of memory. I have set the dataSource  
batchSize to
1. My SQL query is select top XXX field1, ... from table1. I  
have

about 40 fields in my Solr schema.

I thought the DataImportHandler would stream data from the DB rather  
than
loading it all into memory at once. Is that not the case? Any  
thoughts on
how to get around this (aside from getting a machine with more  
memory)?


--
View this message in context: 
http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ









Re: DataImportHandler running out of memory

2008-06-24 Thread Shalin Shekhar Mangar
Setting the batchSize to 1 would mean that the Jdbc driver will keep
1 rows in memory *for each entity* which uses that data source (if
correctly implemented by the driver). Not sure how well the Sql Server
driver implements this. Also keep in mind that Solr also needs memory to
index documents. You can probably try setting the batch size to a lower
value.

The regular memory tuning stuff should apply here too -- try disabling
autoCommit and turn-off autowarming and see if it helps.

On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED] wrote:


 I'm trying to load ~10 million records into Solr using the
 DataImportHandler.
 I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) as
 soon as I try loading more than about 5 million records.

 Here's my configuration:
 I'm connecting to a SQL Server database using the sqljdbc driver. I've
 given
 my Solr instance 1.5 GB of memory. I have set the dataSource batchSize to
 1. My SQL query is select top XXX field1, ... from table1. I have
 about 40 fields in my Solr schema.

 I thought the DataImportHandler would stream data from the DB rather than
 loading it all into memory at once. Is that not the case? Any thoughts on
 how to get around this (aside from getting a machine with more memory)?

 --
 View this message in context:
 http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.


Re: DataImportHandler running out of memory

2008-06-24 Thread Noble Paul നോബിള്‍ नोब्ळ्
DIH streams rows one by one.
set the fetchSize=-1 this might help. It may make the indexing a bit
slower but memory consumption would be low.
The memory is consumed by the jdbc driver. try tuning the -Xmx value for the VM
--Noble

On Wed, Jun 25, 2008 at 8:05 AM, Shalin Shekhar Mangar
[EMAIL PROTECTED] wrote:
 Setting the batchSize to 1 would mean that the Jdbc driver will keep
 1 rows in memory *for each entity* which uses that data source (if
 correctly implemented by the driver). Not sure how well the Sql Server
 driver implements this. Also keep in mind that Solr also needs memory to
 index documents. You can probably try setting the batch size to a lower
 value.

 The regular memory tuning stuff should apply here too -- try disabling
 autoCommit and turn-off autowarming and see if it helps.

 On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED] wrote:


 I'm trying to load ~10 million records into Solr using the
 DataImportHandler.
 I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) as
 soon as I try loading more than about 5 million records.

 Here's my configuration:
 I'm connecting to a SQL Server database using the sqljdbc driver. I've
 given
 my Solr instance 1.5 GB of memory. I have set the dataSource batchSize to
 1. My SQL query is select top XXX field1, ... from table1. I have
 about 40 fields in my Solr schema.

 I thought the DataImportHandler would stream data from the DB rather than
 loading it all into memory at once. Is that not the case? Any thoughts on
 how to get around this (aside from getting a machine with more memory)?

 --
 View this message in context:
 http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 Regards,
 Shalin Shekhar Mangar.




-- 
--Noble Paul


Re: DataImportHandler running out of memory

2008-06-24 Thread Noble Paul നോബിള്‍ नोब्ळ्
it is batchSize=-1 not fetchSize. Or keep it to a very small value.
--Noble

On Wed, Jun 25, 2008 at 9:31 AM, Noble Paul നോബിള്‍ नोब्ळ्
[EMAIL PROTECTED] wrote:
 DIH streams rows one by one.
 set the fetchSize=-1 this might help. It may make the indexing a bit
 slower but memory consumption would be low.
 The memory is consumed by the jdbc driver. try tuning the -Xmx value for the 
 VM
 --Noble

 On Wed, Jun 25, 2008 at 8:05 AM, Shalin Shekhar Mangar
 [EMAIL PROTECTED] wrote:
 Setting the batchSize to 1 would mean that the Jdbc driver will keep
 1 rows in memory *for each entity* which uses that data source (if
 correctly implemented by the driver). Not sure how well the Sql Server
 driver implements this. Also keep in mind that Solr also needs memory to
 index documents. You can probably try setting the batch size to a lower
 value.

 The regular memory tuning stuff should apply here too -- try disabling
 autoCommit and turn-off autowarming and see if it helps.

 On Wed, Jun 25, 2008 at 5:53 AM, wojtekpia [EMAIL PROTECTED] wrote:


 I'm trying to load ~10 million records into Solr using the
 DataImportHandler.
 I'm running out of memory (java.lang.OutOfMemoryError: Java heap space) as
 soon as I try loading more than about 5 million records.

 Here's my configuration:
 I'm connecting to a SQL Server database using the sqljdbc driver. I've
 given
 my Solr instance 1.5 GB of memory. I have set the dataSource batchSize to
 1. My SQL query is select top XXX field1, ... from table1. I have
 about 40 fields in my Solr schema.

 I thought the DataImportHandler would stream data from the DB rather than
 loading it all into memory at once. Is that not the case? Any thoughts on
 how to get around this (aside from getting a machine with more memory)?

 --
 View this message in context:
 http://www.nabble.com/DataImportHandler-running-out-of-memory-tp18102644p18102644.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 Regards,
 Shalin Shekhar Mangar.




 --
 --Noble Paul




-- 
--Noble Paul