Re: DataImport using SqlEntityProcessor running Out of Memory

2014-05-13 Thread Mikhail Khludnev
Hello O,
It seems to me (but it's better to look at the heap histogram) that
buffering sub-entities in SortedMapBackedCache blows heap off.
I'm aware about two directions:
- use file based cache instead. I don't know exactly how it works, you can
start from https://issues.apache.org/jira/browse/SOLR-2382 and check how to
enable berkleyDB cache;
- personally, I'm promoting merging resultsets ordered by RDBMS
https://issues.apache.org/jira/browse/SOLR-4799




On Fri, May 9, 2014 at 7:16 PM, O. Olson  wrote:

> I have a Data Schema which is Hierarchical i.e. I have an Entity and a
> number
> of attributes. For a small subset of the Data - about 300 MB, I can do the
> import with 3 GB memory. Now with the entire 4 GB Dataset, I find I cannot
> do the import with 9 GB of memory.
> I am using the SqlEntityProcessor as below:
>
> 
> 
> url="jdbc:sqlserver://localhost\MSSQLSERVER;databaseName=SolrDB;user=solrusr;password=solrusr;"/>
> 
> 
> 
> 
>
>  query="SELECT AttributeValue, EntID FROM ATTR_TABLE
> WHERE AttributeID=1"
> cacheKey="EntID"
> cacheLookup="Entity.EntID"
> processor="SqlEntityProcessor" cacheImpl="SortedMapBackedCache">
>  name="EntityAttribute1" />
> 
>  query="SELECT AttributeValue, EntID FROM ATTR_TABLE
> WHERE AttributeID=2"
> cacheKey="EntID"
> cacheLookup="Entity.EntID"
> processor="SqlEntityProcessor" cacheImpl="SortedMapBackedCache">
>  name="EntityAttribute2" />
> 
>
>
>
> 
> 
> 
>
>
>
> What is the best way to import this data? Doing it without a cache, results
> in many SQL queries. With the cache, I run out of memory.
>
> I’m curious why 4GB of data cannot entirely fit in memory. One thing I need
> to mention is that I have about 400 to 500 attributes.
>
> Thanks in advance for any helpful advice.
> O. O.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/DataImport-using-SqlEntityProcessor-running-Out-of-Memory-tp4135080.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Re: DataImport using SqlEntityProcessor running Out of Memory

2014-05-12 Thread Shawn Heisey
On 5/9/2014 9:16 AM, O. Olson wrote:
> I have a Data Schema which is Hierarchical i.e. I have an Entity and a number
> of attributes. For a small subset of the Data - about 300 MB, I can do the
> import with 3 GB memory. Now with the entire 4 GB Dataset, I find I cannot
> do the import with 9 GB of memory. 
> I am using the SqlEntityProcessor as below: 
> 
> 
>  url="jdbc:sqlserver://localhost\MSSQLSERVER;databaseName=SolrDB;user=solrusr;password=solrusr;"/>

Upgrade your JDBC driver to 1.2 or later, or turn on response buffering.
 The following URL has this information.  It's a very long URL, so if
your mail client wraps it, you may not be able to click on it properly:

http://wiki.apache.org/solr/DataImportHandlerFaq#I.27m_using_DataImportHandler_with_MS_SQL_Server_database_with_sqljdbc_driver._DataImportHandler_is_going_out_of_memory._I_tried_adjustng_the_batchSize_values_but_they_don.27t_seem_to_make_any_difference._How_do_I_fix_this.3F

Thanks,
Shawn