[ 
https://issues.apache.org/jira/browse/CONNECTORS-55?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12886701#action_12886701
 ] 

Karl Wright commented on CONNECTORS-55:
---------------------------------------

>>>>>>
Can you help me out and give me more ideas on what particular performance 
problems you are concerned about (e.g. query types or whatever) ?
<<<<<<

Hi Robert,
There are two major determinants of performance for LCF, under Postgresql at 
any rate.  The first is the performance of the queue stuffer query, and how 
that scales to when the queue is extremely large.  This is a complex query, but 
its basic form is:

SELECT <rowdata> FROM <queuetable> WHERE <some conditions> AND NOT 
EXISTS(<other row-specific conditions in the same table)>) ORDER BY <priority> 
ASC LIMIT <typically some hundreds of records>

Because the queue may be very large, and this query may potentially return ALL 
records in the queue, the query plan MUST wind up reading directly out of the 
priority index, or the query simply will not work.  It simply cannot afford to 
read 20 million records into memory and then sort them!

The second place performance can be severely impacted is in how parallel writes 
can be.  In postgresql 7.4, for example, everything was single-threaded on 
writes.  This caused web crawling in particular to be poorly performing, 
because every typical web page has a significant number of links that must be 
entered in the queue, and single-threading that process cost some 4x to 10x 
over Postgresql 8.x, which allowed much more parallelism.

Hope this helps.


> Bundle database server with LCF packaged product
> ------------------------------------------------
>
>                 Key: CONNECTORS-55
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-55
>             Project: Lucene Connector Framework
>          Issue Type: Improvement
>          Components: Framework core
>            Reporter: Jack Krupansky
>
> The current requirement that the user install and deploy a PostgreSQL server 
> complicates the installation and deployment of LCF for the user. Installation 
> and deployment of LCF should be as simple as Solr itself. QuickStart is great 
> for the low-end and basic evaluation, but a comparable level of simplified 
> installation and deployment is still needed for full-blown, high-end 
> environments that need the full performance of a ProstgreSQL-class database 
> server. So, PostgreSQL should be bundled with the packaged release of LCF so 
> that installation and deployment of LCF will automatically install and deploy 
> a subset of the full PostgreSQL distribution that is sufficient for the needs 
> of LCF. Starting LCF, with or without the LCF UI, should automatically start 
> the database server. Shutting down LCF should also shutdown the database 
> server process.
> A typical use case would be for a non-developer who is comfortable with Solr 
> and simply wants to crawl documents from, for example, a SharePoint 
> repository and feed them into Solr. QuickStart should work well for the low 
> end or in the early stages of evaluation, but the user would prefer to 
> evaluate "the real thing" with something resembling a production crawl of 
> thousands of documents. Such a user might not be a hard-core developer or be 
> comfortable fiddling with a lot of software components simply to do one 
> conceptually simple operation.
> It should still be possible for the user to supply database server settings 
> to override the defaults, but the LCF package should have all of the 
> best-practice settings deemed appropriate for use with LCF.
> One downside is that installation and deployment will be platform-specific 
> since there are multiple processes and PostgreSQL itself requires a 
> platform-specific installation.
> This proposal presumes that PostgreSQL is the best option for the foreseeable 
> future, but nothing here is intended to preclude support for other database 
> servers in futures releases.
> This proposal should not have any impact on QuickStart packaging or 
> deployment.
> Note: This issue is part of Phase 1 of the CONNECTORS-50 umbrella issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to