[jira] [Commented] (SOLR-8868) SolrCloud: if zookeeper loses and then regains a quorum, Solr nodes and SolrJ Client do not recover and need to be restarted
[ https://issues.apache.org/jira/browse/SOLR-8868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16027637#comment-16027637 ] Martin Grotzke commented on SOLR-8868: -- We also experienced this issue, any update here? > SolrCloud: if zookeeper loses and then regains a quorum, Solr nodes and SolrJ > Client do not recover and need to be restarted > > > Key: SOLR-8868 > URL: https://issues.apache.org/jira/browse/SOLR-8868 > Project: Solr > Issue Type: Bug > Components: SolrCloud, SolrJ >Affects Versions: 5.3.1 >Reporter: Frank J Kelly > > Tried mailing list on 3/15 and 3/16 to no avail. Hopefully I gave enough > details. > > Just wondering if my observation of SolrCloud behavior after ZooKeeper loses > a quorum is normal or to-be-expected > Version of Solr: 5.3.1 > Version of ZooKeeper: 3.4.7 > Using SolrCloud with external ZooKeeper > Deployed on AWS > Our Solr cluster has 3 nodes (m3.large) > Our Zookeeper ensemble consists of three nodes (t2.small) with the same > config using DNS names e.g. > {noformat} > $ more ../conf/zoo.cfg > tickTime=2000 > dataDir=/var/zookeeper > dataLogDir=/var/log/zookeeper > clientPort=2181 > initLimit=10 > syncLimit=5 > standaloneEnabled=false > server.1=zookeeper1.qa.eu-west-1.mysearch.com:2888:3888 > server.2=zookeeper2.qa.eu-west-1.mysearch.com:2888:3888 > server.3=zookeeper3.qa.eu-west-1.mysearch.com:2888:3888 > {noformat} > If we terminate one of the zookeeper nodes we get a ZK election (and I think) > a quorum is maintained. > Operation continues OK and we detect the terminated instance and relaunch a > new ZK node which comes up fine > If we terminate two of the ZK nodes we lose a quorum and then we observe the > following > 1.1) Admin UI shows an error that it is unable to contact ZooKeeper “Could > not connect to ZooKeeper" > 1.2) SolrJ returns the following > {noformat} > org.apache.solr.common.SolrException: Could not load collection from > ZK:qa_eu-west-1_public_index > at > org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(ZkStateReader.java:850) > at org.apache.solr.common.cloud.ZkStateReader$7.get(ZkStateReader.java:515) > at > org.apache.solr.client.solrj.impl.CloudSolrClient.getDocCollection(CloudSolrClient.java:1205) > at > org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:837) > at > org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:805) > at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135) > at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:107) > at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:72) > at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:86) > at > com.here.scbe.search.solr.SolrFacadeImpl.addToSearchIndex(SolrFacadeImpl.java:112) > Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss for > /collections/qa_eu-west-1_public_index/state.json > at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) > at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:345) > at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:342) > at > org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61) > at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:342) > at > org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(ZkStateReader.java:841) > ... 24 more > {noformat} > This makes sense based on our understanding. > When our AutoScale groups launch two new ZooKeeper nodes, initialize them, > fix the DNS etc. we regain a quorum but at this point > 2.1) Admin UI shows the shards as “GONE” (all greyed out) > 2.2) SolrJ returns the same error even though the ZooKeeper DNS names are now > bound to new IP addresses > So at this point I restart the Solr nodes. At this point then > 3.1) Admin UI shows the collections as OK (all shards are green) – yeah the > nodes are back! > 3.2) SolrJ Client still shows the same error – namely > {noformat} > org.apache.solr.common.SolrException: Could not load collection from > ZK:qa_eu-west-1_here_account > at > org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(ZkStateReader.java:850) > at org.apache.solr.common.cloud.ZkStateReader$7.get(ZkStateReader.java:515) > at > org.apache.solr.client.solrj.impl.CloudSolrClient.getDocCollection(CloudSolrClient.java:1205) > at > org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:837) > at
[jira] [Commented] (SOLR-6273) Cross Data Center Replication
[ https://issues.apache.org/jira/browse/SOLR-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613472#comment-14613472 ] Martin Grotzke commented on SOLR-6273: -- Great, thanks for the advice, Renaud! > Cross Data Center Replication > - > > Key: SOLR-6273 > URL: https://issues.apache.org/jira/browse/SOLR-6273 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley >Assignee: Erick Erickson > Attachments: SOLR-6273-trunk-testfix1.patch, > SOLR-6273-trunk-testfix2.patch, SOLR-6273-trunk.patch, SOLR-6273-trunk.patch, > SOLR-6273.patch, SOLR-6273.patch, SOLR-6273.patch, SOLR-6273.patch > > > This is the master issue for Cross Data Center Replication (CDCR) > described at a high level here: > http://heliosearch.org/solr-cross-data-center-replication/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6461) peer cluster configuration
[ https://issues.apache.org/jira/browse/SOLR-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613469#comment-14613469 ] Martin Grotzke commented on SOLR-6461: -- Great, thanks! > peer cluster configuration > -- > > Key: SOLR-6461 > URL: https://issues.apache.org/jira/browse/SOLR-6461 > Project: Solr > Issue Type: Sub-task >Reporter: Yonik Seeley > > From http://heliosearch.org/solr-cross-data-center-replication/#Overview > """Clusters will be configured to know about each other, most likely through > keeping a cluster peer list in zookeeper. One essential piece of information > will be the zookeeper quorum address for each cluster peer. Any node in one > cluster can know the configuration of another cluster via a zookeeper > client.""" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6273) Cross Data Center Replication
[ https://issues.apache.org/jira/browse/SOLR-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612476#comment-14612476 ] Martin Grotzke commented on SOLR-6273: -- Hi all, we're currently evaluating how to expand our current single DC solrcloud to multi (2) DCs. This effort here looks very promising, great work! Assuming we'd test how it works for us, could we follow the documentation mentioned above (https://docs.google.com/document/d/1DZHUFM3z9OX171DeGjcLTRI9uULM-NB1KsCSpVL3Zy0/edit?usp=sharing)? Does it match the current implementation? Do you have any other suggestions for us if we'd test this? Thanks! > Cross Data Center Replication > - > > Key: SOLR-6273 > URL: https://issues.apache.org/jira/browse/SOLR-6273 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley >Assignee: Erick Erickson > Attachments: SOLR-6273-trunk-testfix1.patch, > SOLR-6273-trunk-testfix2.patch, SOLR-6273-trunk.patch, SOLR-6273-trunk.patch, > SOLR-6273.patch, SOLR-6273.patch, SOLR-6273.patch, SOLR-6273.patch > > > This is the master issue for Cross Data Center Replication (CDCR) > described at a high level here: > http://heliosearch.org/solr-cross-data-center-replication/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6461) peer cluster configuration
[ https://issues.apache.org/jira/browse/SOLR-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612451#comment-14612451 ] Martin Grotzke commented on SOLR-6461: -- This one is closed with resolution "fixed", while it hadn't been assigned. Is it really fixed? Are there some details about the current state of implementation of peer cluster configuration? Thanks! > peer cluster configuration > -- > > Key: SOLR-6461 > URL: https://issues.apache.org/jira/browse/SOLR-6461 > Project: Solr > Issue Type: Sub-task >Reporter: Yonik Seeley > > From http://heliosearch.org/solr-cross-data-center-replication/#Overview > """Clusters will be configured to know about each other, most likely through > keeping a cluster peer list in zookeeper. One essential piece of information > will be the zookeeper quorum address for each cluster peer. Any node in one > cluster can know the configuration of another cluster via a zookeeper > client.""" -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6365) specify appends, defaults, invariants outside of the component
[ https://issues.apache.org/jira/browse/SOLR-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14336299#comment-14336299 ] Martin Grotzke commented on SOLR-6365: -- [~noble.paul] Sounds great, I submitted SOLR-7157 > specify appends, defaults, invariants outside of the component > --- > > Key: SOLR-6365 > URL: https://issues.apache.org/jira/browse/SOLR-6365 > Project: Solr > Issue Type: Improvement >Reporter: Noble Paul >Assignee: Noble Paul > Fix For: 5.0, Trunk > > Attachments: SOLR-6365-crappy-test.patch, SOLR-6365.patch, > SOLR-6365.patch, SOLR-6365.patch > > > The components are configured in solrconfig.xml mostly for specifying these > extra parameters. If we separate these out, we can avoid specifying the > components altogether and make solrconfig much simpler. Eventually we want > users to see all functions as paths instead of components and control these > params from outside , through an API and persisted in ZK > objectives : > * define standard components implicitly and let users override some params > only > * reuse standard params across components > * define multiple param sets and mix and match these params at request time > example > {code:xml} > > > > json > _txt > > > {code} > other examples > {code:xml} > > > A > > > B > > > C > > > > > >class="DumpRequestHandler"/> > > > > A1 > > > B1 > > > C1 > > > {code} > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-7157) Specify arbitrary config params outside of the component
Martin Grotzke created SOLR-7157: Summary: Specify arbitrary config params outside of the component Key: SOLR-7157 URL: https://issues.apache.org/jira/browse/SOLR-7157 Project: Solr Issue Type: New Feature Reporter: Martin Grotzke SOLR-6365 added support for appends, defaults and invariants specified outside of the component via initParams. It would be great if it would also be possible to configure arbitrary params via initParams. Our use case is that we want to configure the "healthcheckFile" for the "/admin/ping" RequestHandler, so I'd like to configure it like this: {code} server-enabled.txt {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6365) specify appends, defaults, invariants outside of the component
[ https://issues.apache.org/jira/browse/SOLR-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332378#comment-14332378 ] Martin Grotzke commented on SOLR-6365: -- Moving to solr 5, I'm trying to configure the "healthcheckFile" for the PingRequestHandler. I added {code} server-enabled.txt {code} to solrconfig, unfortunately this did not do the trick. I had to configure the PingRequestHandler completely to get the healthcheckFile configured. My assumption is that *only* appends, defaults and invariants can be specified outside of the component, so what I'm experiencing is expected and not a bug or an issue on my side. Is that correct? > specify appends, defaults, invariants outside of the component > --- > > Key: SOLR-6365 > URL: https://issues.apache.org/jira/browse/SOLR-6365 > Project: Solr > Issue Type: Improvement >Reporter: Noble Paul >Assignee: Noble Paul > Fix For: 5.0, Trunk > > Attachments: SOLR-6365-crappy-test.patch, SOLR-6365.patch, > SOLR-6365.patch, SOLR-6365.patch > > > The components are configured in solrconfig.xml mostly for specifying these > extra parameters. If we separate these out, we can avoid specifying the > components altogether and make solrconfig much simpler. Eventually we want > users to see all functions as paths instead of components and control these > params from outside , through an API and persisted in ZK > objectives : > * define standard components implicitly and let users override some params > only > * reuse standard params across components > * define multiple param sets and mix and match these params at request time > example > {code:xml} > > > > json > _txt > > > {code} > other examples > {code:xml} > > > A > > > B > > > C > > > > > >class="DumpRequestHandler"/> > > > > A1 > > > B1 > > > C1 > > > {code} > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6719) Collection API: CREATE ignores 'property.name' when creating individual cores
[ https://issues.apache.org/jira/browse/SOLR-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252317#comment-14252317 ] Martin Grotzke commented on SOLR-6719: -- We experienced the same with ADDREPLICA and property.solr.common.data.dir (running Solr 4.10.2). > Collection API: CREATE ignores 'property.name' when creating individual cores > - > > Key: SOLR-6719 > URL: https://issues.apache.org/jira/browse/SOLR-6719 > Project: Solr > Issue Type: Bug >Reporter: Hoss Man > > Yashveer Rana pointed this out in the ref guide comments... > https://cwiki.apache.org/confluence/display/solr/Collections+API?focusedCommentId=47382851#comment-47382851 > * Collection CREATE is documented to support "property._name_=_value_" (where > 'name' and 'property' are italics placeholders for user supplied key=val) as > "Set core property _name_ to _value_. See core.properties file contents." > * The [docs for > core.properties|https://cwiki.apache.org/confluence/display/solr/Format+of+solr.xml#Formatofsolr.xml-core.properties_files] > include a list of supported property values, including "name" (literal) as > "The name of the SolrCore. You'll use this name to reference the SolrCore > when running commands with the CoreAdminHandler." > From these docs, it's reasonable to assume that using a URL like this... > http://localhost:8983/solr/admin/collections?action=CREATE&name=my_collection&numShards=2&configSet=data_driven_schema_configs&property.name=my_corename > ...should cause "my_collection" to be created, with the core name used for > every replica being "my_corename" ... but that doesn't happen. instead the > replicas get core names like "my_collection_shard1_replica1" > > This is either a bug, or (my suspicion) it's intentional that the user > specific core name is not being used -- if it's intentional, then the > Collection CREATE command should fail with a clear error if a user does try > to use "property.name" rather then silently ignoring it and the Collection > CREATE docs should be updated to make it clear that "name" is an exception to > the general property.foo -> foo in core.properties support. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6086) Replica active during Warming
[ https://issues.apache.org/jira/browse/SOLR-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14166830#comment-14166830 ] Martin Grotzke commented on SOLR-6086: -- Is this really the case? Why are nodes that are not yet able to serve queries consideres being active? > Replica active during Warming > - > > Key: SOLR-6086 > URL: https://issues.apache.org/jira/browse/SOLR-6086 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.6.1, 4.8.1 >Reporter: ludovic Boutros > Labels: difficulty-medium, impact-medium > Attachments: SOLR-6086.patch, SOLR-6086.patch > > Original Estimate: 72h > Remaining Estimate: 72h > > At least with Solr 4.6.1, replica are considered as active during the warming > process. > This means that if you restart a replica or create a new one, queries will > be send to this replica and the query will hang until the end of the warming > process (If cold searchers are not used). > You cannot add or restart a node silently anymore. > I think that the fact that the replica is active is not a bad thing. > But, the HttpShardHandler and the CloudSolrServer class should take the > warming process in account. > Currently, I have developped a new very simple component which check that a > searcher is registered. > I am also developping custom HttpShardHandler and CloudSolrServer classes > which will check the warming process in addition to the ACTIVE status in the > cluster state. > This seems to be more a workaround than a solution but that's all I can do in > this version. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3383) Async responses in SolrJ
[ https://issues.apache.org/jira/browse/SOLR-3383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13894435#comment-13894435 ] Martin Grotzke commented on SOLR-3383: -- Ok, thanks for the update. > Async responses in SolrJ > > > Key: SOLR-3383 > URL: https://issues.apache.org/jira/browse/SOLR-3383 > Project: Solr > Issue Type: Improvement > Components: clients - java >Affects Versions: 3.5 >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: async, asynchronous, concurrency, query, solrj, update > Fix For: 4.7 > > Attachments: SOLR-3383.patch > > > Today it is like this > - SolrServer.request returns NamedList > - SolrRequest.process returns SolrResponse > - Public methods on SolrServer like addX, optimize, commit, queryX etc. > returns subclasses of SolrResponse (e.g. "add" returns UpdateResponse) > - etc > This is all synchronous - that is, the calling thread of those methods will > wait for the response before being able to continue. I believe the industry > today agrees that "operations" like client-server network-requireing > operations should be done asynchronously seens from the client API. Therefore > basically we should change those methods > - SolrServer.request returns Future> > - SolrRequest.process returns Future > - SolrServer.xxx returns Future > and make the appropriate changes in the implementations below. > My main argument for this right now, is that ConcurrentUpdateSolrServer > really is not able to hand over responses to the calling client. Guess that > it is also the reason why it is only a "Update"-SolrServer and not a complete > SolrServer (being able to do queries etc.) - updates does not require sending > responses (except primitive errors) back to the client, queries etc does. Now > that we do "finegrained error propagation" (SOLR-3382) in order to send > "unique key constraint"- and "versioning"-errors (SOLR-3173 and SOLR-3178) > back to the client in responses to update-request, suddenly it is not true > anymore that updates does not require sending responses back to the client. > Making the changes suggested above (returning Futures) would > - Allow ConcurrentUpdateSolrServer to be used for updates potentially > resulting in "unique key constraint"- and "versioning"-errors > - Allow ConcurrentUpdateSolrServer to become ConcurrentSolrServer - also > being able to do queries etc > - Do cool stuff like SOLR-3384 > - Make SolrJ more modern with respect to asynchronous principles -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3383) Async responses in SolrJ
[ https://issues.apache.org/jira/browse/SOLR-3383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13894298#comment-13894298 ] Martin Grotzke commented on SOLR-3383: -- I'm also interested in async responses in solrj, but instead of using java Futures, I'd prefer a callback based interface, e.g. via onSuccess(callback) and onError(callback), much like scala Futures work. > Async responses in SolrJ > > > Key: SOLR-3383 > URL: https://issues.apache.org/jira/browse/SOLR-3383 > Project: Solr > Issue Type: Improvement > Components: clients - java >Affects Versions: 3.5 >Reporter: Per Steffensen >Assignee: Per Steffensen > Labels: async, asynchronous, concurrency, query, solrj, update > Fix For: 4.7 > > Attachments: SOLR-3383.patch > > > Today it is like this > - SolrServer.request returns NamedList > - SolrRequest.process returns SolrResponse > - Public methods on SolrServer like addX, optimize, commit, queryX etc. > returns subclasses of SolrResponse (e.g. "add" returns UpdateResponse) > - etc > This is all synchronous - that is, the calling thread of those methods will > wait for the response before being able to continue. I believe the industry > today agrees that "operations" like client-server network-requireing > operations should be done asynchronously seens from the client API. Therefore > basically we should change those methods > - SolrServer.request returns Future> > - SolrRequest.process returns Future > - SolrServer.xxx returns Future > and make the appropriate changes in the implementations below. > My main argument for this right now, is that ConcurrentUpdateSolrServer > really is not able to hand over responses to the calling client. Guess that > it is also the reason why it is only a "Update"-SolrServer and not a complete > SolrServer (being able to do queries etc.) - updates does not require sending > responses (except primitive errors) back to the client, queries etc does. Now > that we do "finegrained error propagation" (SOLR-3382) in order to send > "unique key constraint"- and "versioning"-errors (SOLR-3173 and SOLR-3178) > back to the client in responses to update-request, suddenly it is not true > anymore that updates does not require sending responses back to the client. > Making the changes suggested above (returning Futures) would > - Allow ConcurrentUpdateSolrServer to be used for updates potentially > resulting in "unique key constraint"- and "versioning"-errors > - Allow ConcurrentUpdateSolrServer to become ConcurrentSolrServer - also > being able to do queries etc > - Do cool stuff like SOLR-3384 > - Make SolrJ more modern with respect to asynchronous principles -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3318) LBHttpSolrServer should allow to specify a preferred server for a query
[ https://issues.apache.org/jira/browse/SOLR-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264620#comment-13264620 ] Martin Grotzke commented on SOLR-3318: -- Any feedback on this? > LBHttpSolrServer should allow to specify a preferred server for a query > --- > > Key: SOLR-3318 > URL: https://issues.apache.org/jira/browse/SOLR-3318 > Project: Solr > Issue Type: Improvement > Components: clients - java >Affects Versions: 4.0 >Reporter: Martin Grotzke >Priority: Minor > Attachments: SOLR-3318.git.patch > > > For a user query we make several solr queries that differ only slightly and > therefore should use/reuse objects cached from the first query (we're using a > custom request handler and custom caches). > Thus such subsequent queries should hit the same solr server. > The implemented solution looks like this: > * The client obtains a live SolrServer from LBHttpSolrServer > * The client provides this SolrServer as preferred server for a query > * If the preferred server is no longer alive the request is retried on > another live server > * Everything else follows the existing logic: > ** After live servers are exhausted, any servers previously marked as dead > will be tried before failing the request > ** If no live servers are found a SolrServerException is thrown > The implementation is also [on > github|https://github.com/magro/lucene-solr/commit/a75aef3d]. > Mailing list thread: > http://lucene.472066.n3.nabble.com/LBHttpSolrServer-to-query-a-preferred-server-tt3884140.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)
[ https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055737#comment-13055737 ] Martin Grotzke commented on SOLR-2583: -- bq. Looking at your test, I think it is reasonable. But I'd like to use CompactByteArray. I saw it wins over HashMap and float[] when 5% and above in my test. Can you share your test code or s.th. similar? Perhaps you can just fork https://github.com/magro/lucene-solr/ and add an appropriate test that reflects your data? > Make external scoring more efficient (ExternalFileField, FileFloatSource) > - > > Key: SOLR-2583 > URL: https://issues.apache.org/jira/browse/SOLR-2583 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Martin Grotzke >Priority: Minor > Attachments: FileFloatSource.java.patch, patch.txt > > > External scoring eats much memory, depending on the number of documents in > the index. The ExternalFileField (used for external scoring) uses > FileFloatSource, where one FileFloatSource is created per external scoring > file. FileFloatSource creates a float array with the size of the number of > docs (this is also done if the file to load is not found). If there are much > less entries in the scoring file than there are number of docs in total the > big float array wastes much memory. > This could be optimized by using a map of doc -> score, so that the map > contains as many entries as there are scoring entries in the external file, > but not more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)
[ https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050435#comment-13050435 ] Martin Grotzke commented on SOLR-2583: -- bq. Are you sure real floats are actually needed? In our case score values are e.g. 15887 (one example just taken from one of the files). With this sample this test fails: {noformat} byte small = SmallFloat.floatToByte315(104626500f); assertEquals(104626500f, SmallFloat.byte315ToFloat(small), 0f); -> AssertionError: expected:<1.04626496E8> but was:<1.00663296E8> {noformat} This shows that even we have a case where this will produce wrong results, and even if we could fix this in our case there might be someone else with the same issue. bq. it would also good to measure performance... I'd not expect that the boxing makes a real difference here, especially in relation to the rest of the time spent during a search request. A time based performance comparison that has a real value would take some time, it would have to put in relation to the rest of a search request (how do you do this?) and finally it would require proper interpretation when everything is together. Right now I don't think it's worth the effort. {quote} bq. that uses a fixed size and an increasing number of puts I'm not certain how realistic that is, remember behind the scenes compactbytearray uses blocks, and if you touch every one (by putting every K docid or something) then you are just testing the worst case. {quote} Do you want to change the test to s.th. that's more realistic? @Yonik: what do you say regarding the suggestion to use HashMap up to ~5.5% and above that using the float[]? > Make external scoring more efficient (ExternalFileField, FileFloatSource) > - > > Key: SOLR-2583 > URL: https://issues.apache.org/jira/browse/SOLR-2583 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Martin Grotzke >Priority: Minor > Attachments: FileFloatSource.java.patch, patch.txt > > > External scoring eats much memory, depending on the number of documents in > the index. The ExternalFileField (used for external scoring) uses > FileFloatSource, where one FileFloatSource is created per external scoring > file. FileFloatSource creates a float array with the size of the number of > docs (this is also done if the file to load is not found). If there are much > less entries in the scoring file than there are number of docs in total the > big float array wastes much memory. > This could be optimized by using a map of doc -> score, so that the map > contains as many entries as there are scoring entries in the external file, > but not more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)
[ https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049674#comment-13049674 ] Martin Grotzke commented on SOLR-2583: -- The test that produced this output can be found in my lucene-solr fork on github: https://github.com/magro/lucene-solr/commit/b9af87b1 The test method that was executed was testCompareMemoryUsage, for measuring memory usage I used http://code.google.com/p/memory-measurer/ and ran the test/jvm with "-Xmx1G -javaagent:solr/lib/object-explorer.jar" (just from eclipse). I just added another test, that uses a fixed size and an increasing number of puts (testCompareMemoryUsageWithFixSizeAndIncreasingNumPuts, https://github.com/magro/lucene-solr/blob/trunk/solr/src/test/org/apache/solr/search/function/FileFloatSourceMemoryTest.java#L56), with the following results: {noformat} Size: 100 NumPuts 1.000 (0,1%), CompactFloatArray 918.616, float[] 4.000.016, HashMap 72.128 NumPuts 10.000 (1,0%), CompactFloatArray 3.738.712,float[] 4.000.016, HashMap 701.696 NumPuts 50.000 (5,0%), CompactFloatArray 4.016.472,float[] 4.000.016, HashMap 3.383.104 NumPuts 55.000 (5,5%), CompactFloatArray 4.016.472,float[] 4.000.016, HashMap 3.949.120 NumPuts 60.000 (6,0%), CompactFloatArray 4.016.472,float[] 4.000.016, HashMap 4.254.848 NumPuts 100.000 (10,0%),CompactFloatArray 4.016.472,float[] 4.000.016, HashMap 6.622.272 NumPuts 500.000 (50,0%),CompactFloatArray 4.016.472,float[] 4.000.016, HashMap 27.262.976 NumPuts 1.000.000 (100,0%), CompactFloatArray 4.016.472,float[] 4.000.016, HashMap 44.649.664 {noformat} It seems that the HashMap is the most efficient solution up to ~5.5%. Starting from this threshold CompactFloatArray and float[] use less memory, while the CompactFloatArray has no advantages over float[] for puts > 5%. Therefore I'd suggest that we use an adaptive strategy that uses a HashMap up to 5,5% of number of scores compared to numdocs, and starting from this threshold the original float[] approach is used. What do you say? > Make external scoring more efficient (ExternalFileField, FileFloatSource) > - > > Key: SOLR-2583 > URL: https://issues.apache.org/jira/browse/SOLR-2583 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Martin Grotzke >Priority: Minor > Attachments: FileFloatSource.java.patch, patch.txt > > > External scoring eats much memory, depending on the number of documents in > the index. The ExternalFileField (used for external scoring) uses > FileFloatSource, where one FileFloatSource is created per external scoring > file. FileFloatSource creates a float array with the size of the number of > docs (this is also done if the file to load is not found). If there are much > less entries in the scoring file than there are number of docs in total the > big float array wastes much memory. > This could be optimized by using a map of doc -> score, so that the map > contains as many entries as there are scoring entries in the external file, > but not more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)
[ https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049256#comment-13049256 ] Martin Grotzke edited comment on SOLR-2583 at 6/14/11 4:25 PM: --- I just compared memory consumption of the 3 different approaches, with different number of puts (number of scores) and sizes (number of docs), the memory is in byte: {noformat} Puts 1.000, size 1.000.000: CompactFloatArray 898.136,float[] 4.000.016, HashMap 72.192 Puts 10.000, size 1.000.000: CompactFloatArray 3.724.376, float[] 4.000.016, HashMap 702.784 Puts 100.000, size 1.000.000:CompactFloatArray 4.016.472, float[] 4.000.016, HashMap 6.607.808 Puts 1.000.000, size 1.000.000: CompactFloatArray 4.016.472, float[] 4.000.016, HashMap 44.644.032 Puts 1.000, size 5.000.000: CompactFloatArray 1.128.536, float[] 20.000.016, HashMap 72.256 Puts 10.000, size 5.000.000: CompactFloatArray 8.168.536, float[] 20.000.016, HashMap 704.832 Puts 100.000, size 5.000.000:CompactFloatArray 20.013.144, float[] 20.000.016, HashMap 7.385.152 Puts 1.000.000, size 5.000.000: CompactFloatArray 20.131.160, float[] 20.000.016, HashMap 66.395.584 Puts 1.000, size 10.000.000: CompactFloatArray 1.275.992, float[] 40.000.016, HashMap 72.256 Puts 10.000, size 10.000.000:CompactFloatArray 9.289.816, float[] 40.000.016, HashMap 705.280 Puts 100.000, size 10.000.000: CompactFloatArray 37.130.328, float[] 40.000.016, HashMap 7.418.112 Puts 1.000.000, size 10.000.000: CompactFloatArray 40.262.232, float[] 40.000.016, HashMap 69.282.496 {noformat} I want to share this intermediately, without further interpretation/conclusion for now (I just need to get the train). was (Author: martin.grotzke): I just compared memory consumption of the 3 different approaches, with different number of puts (number of scores) and sizes (number of docs): {noformat} Puts 1.000, size 1.000.000: CompactFloatArray 898.136,float[] 4.000.016, HashMap 72.192 Puts 10.000, size 1.000.000: CompactFloatArray 3.724.376, float[] 4.000.016, HashMap 702.784 Puts 100.000, size 1.000.000:CompactFloatArray 4.016.472, float[] 4.000.016, HashMap 6.607.808 Puts 1.000.000, size 1.000.000: CompactFloatArray 4.016.472, float[] 4.000.016, HashMap 44.644.032 Puts 1.000, size 5.000.000: CompactFloatArray 1.128.536, float[] 20.000.016, HashMap 72.256 Puts 10.000, size 5.000.000: CompactFloatArray 8.168.536, float[] 20.000.016, HashMap 704.832 Puts 100.000, size 5.000.000:CompactFloatArray 20.013.144, float[] 20.000.016, HashMap 7.385.152 Puts 1.000.000, size 5.000.000: CompactFloatArray 20.131.160, float[] 20.000.016, HashMap 66.395.584 Puts 1.000, size 10.000.000: CompactFloatArray 1.275.992, float[] 40.000.016, HashMap 72.256 Puts 10.000, size 10.000.000:CompactFloatArray 9.289.816, float[] 40.000.016, HashMap 705.280 Puts 100.000, size 10.000.000: CompactFloatArray 37.130.328, float[] 40.000.016, HashMap 7.418.112 Puts 1.000.000, size 10.000.000: CompactFloatArray 40.262.232, float[] 40.000.016, HashMap 69.282.496 {noformat} I want to share this intermediately, without further interpretation/conclusion for now (I just need to get the train). > Make external scoring more efficient (ExternalFileField, FileFloatSource) > - > > Key: SOLR-2583 > URL: https://issues.apache.org/jira/browse/SOLR-2583 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Martin Grotzke >Priority: Minor > Attachments: FileFloatSource.java.patch, patch.txt > > > External scoring eats much memory, depending on the number of documents in > the index. The ExternalFileField (used for external scoring) uses > FileFloatSource, where one FileFloatSource is created per external scoring > file. FileFloatSource creates a float array with the size of the number of > docs (this is also done if the file to load is not found). If there are much > less entries in the scoring file than there are number of docs in total the > big float array wastes much memory. > This could be optimized by using a map of doc -> score, so that the map > contains as many entries as there are scoring entries in the external file, > but not more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)
[ https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049256#comment-13049256 ] Martin Grotzke commented on SOLR-2583: -- I just compared memory consumption of the 3 different approaches, with different number of puts (number of scores) and sizes (number of docs): {noformat} Puts 1.000, size 1.000.000: CompactFloatArray 898.136,float[] 4.000.016, HashMap 72.192 Puts 10.000, size 1.000.000: CompactFloatArray 3.724.376, float[] 4.000.016, HashMap 702.784 Puts 100.000, size 1.000.000:CompactFloatArray 4.016.472, float[] 4.000.016, HashMap 6.607.808 Puts 1.000.000, size 1.000.000: CompactFloatArray 4.016.472, float[] 4.000.016, HashMap 44.644.032 Puts 1.000, size 5.000.000: CompactFloatArray 1.128.536, float[] 20.000.016, HashMap 72.256 Puts 10.000, size 5.000.000: CompactFloatArray 8.168.536, float[] 20.000.016, HashMap 704.832 Puts 100.000, size 5.000.000:CompactFloatArray 20.013.144, float[] 20.000.016, HashMap 7.385.152 Puts 1.000.000, size 5.000.000: CompactFloatArray 20.131.160, float[] 20.000.016, HashMap 66.395.584 Puts 1.000, size 10.000.000: CompactFloatArray 1.275.992, float[] 40.000.016, HashMap 72.256 Puts 10.000, size 10.000.000:CompactFloatArray 9.289.816, float[] 40.000.016, HashMap 705.280 Puts 100.000, size 10.000.000: CompactFloatArray 37.130.328, float[] 40.000.016, HashMap 7.418.112 Puts 1.000.000, size 10.000.000: CompactFloatArray 40.262.232, float[] 40.000.016, HashMap 69.282.496 {noformat} I want to share this intermediately, without further interpretation/conclusion for now (I just need to get the train). > Make external scoring more efficient (ExternalFileField, FileFloatSource) > - > > Key: SOLR-2583 > URL: https://issues.apache.org/jira/browse/SOLR-2583 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Martin Grotzke >Priority: Minor > Attachments: FileFloatSource.java.patch, patch.txt > > > External scoring eats much memory, depending on the number of documents in > the index. The ExternalFileField (used for external scoring) uses > FileFloatSource, where one FileFloatSource is created per external scoring > file. FileFloatSource creates a float array with the size of the number of > docs (this is also done if the file to load is not found). If there are much > less entries in the scoring file than there are number of docs in total the > big float array wastes much memory. > This could be optimized by using a map of doc -> score, so that the map > contains as many entries as there are scoring entries in the external file, > but not more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)
[ https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049143#comment-13049143 ] Martin Grotzke commented on SOLR-2583: -- {quote} See: http://www.strchr.com/multi-stage_tables i attached a patch, of a (not great) implementation i was sorta kinda trying to clean up for other reasons... maybe you can use it. {quote} Thanx, interesting approach! I just tried to create a CompactFloatArray based on the CompactByteArray to be able to compare memory consumptions. There's one change that wasn't just changing byte to float, and I'm not sure what's the right adaption in this case: {code} diff -w solr/src/java/org/apache/solr/util/CompactByteArray.java solr/src/java/org/apache/solr/util/CompactFloatArray.java 57c57 ... 202,203c202,203 < private void touchBlock(int i, int value) { < hashes[i] = (hashes[i] + (value << 1)) | 1; --- > private void touchBlock(int i, float value) { > hashes[i] = (hashes[i] + (Float.floatToIntBits(value) << 1)) | 1; {code} The adapted test is green, so it seems to be correct at least. I'll also attach the full patch for CompactFloatArray.java and TestCompactFloatArray.java > Make external scoring more efficient (ExternalFileField, FileFloatSource) > - > > Key: SOLR-2583 > URL: https://issues.apache.org/jira/browse/SOLR-2583 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Martin Grotzke >Priority: Minor > Attachments: FileFloatSource.java.patch, patch.txt > > > External scoring eats much memory, depending on the number of documents in > the index. The ExternalFileField (used for external scoring) uses > FileFloatSource, where one FileFloatSource is created per external scoring > file. FileFloatSource creates a float array with the size of the number of > docs (this is also done if the file to load is not found). If there are much > less entries in the scoring file than there are number of docs in total the > big float array wastes much memory. > This could be optimized by using a map of doc -> score, so that the map > contains as many entries as there are scoring entries in the external file, > but not more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)
[ https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046943#comment-13046943 ] Martin Grotzke commented on SOLR-2583: -- bq. If the problem is sparsity, maybe use a two-stage table, still faster than a hashmap and much better for the worst case. What do you mean with a two-stage table, can you clarify this please? > Make external scoring more efficient (ExternalFileField, FileFloatSource) > - > > Key: SOLR-2583 > URL: https://issues.apache.org/jira/browse/SOLR-2583 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Martin Grotzke >Priority: Minor > Attachments: FileFloatSource.java.patch > > > External scoring eats much memory, depending on the number of documents in > the index. The ExternalFileField (used for external scoring) uses > FileFloatSource, where one FileFloatSource is created per external scoring > file. FileFloatSource creates a float array with the size of the number of > docs (this is also done if the file to load is not found). If there are much > less entries in the scoring file than there are number of docs in total the > big float array wastes much memory. > This could be optimized by using a map of doc -> score, so that the map > contains as many entries as there are scoring entries in the external file, > but not more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)
[ https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046712#comment-13046712 ] Martin Grotzke commented on SOLR-2583: -- > Sounds good! I wonder what the memory cut-off should be for auto... 10% of > maxDoc() or so? I'd compare both strategies to see what's the break-even, this should give an absolute number. > Make external scoring more efficient (ExternalFileField, FileFloatSource) > - > > Key: SOLR-2583 > URL: https://issues.apache.org/jira/browse/SOLR-2583 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Martin Grotzke >Priority: Minor > Attachments: FileFloatSource.java.patch > > > External scoring eats much memory, depending on the number of documents in > the index. The ExternalFileField (used for external scoring) uses > FileFloatSource, where one FileFloatSource is created per external scoring > file. FileFloatSource creates a float array with the size of the number of > docs (this is also done if the file to load is not found). If there are much > less entries in the scoring file than there are number of docs in total the > big float array wastes much memory. > This could be optimized by using a map of doc -> score, so that the map > contains as many entries as there are scoring entries in the external file, > but not more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)
[ https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046692#comment-13046692 ] Martin Grotzke commented on SOLR-2583: -- Great, sounds like a further optimization for both sparse and non-sparse files. Though, as we had 4GB taken by FileFloatSource objects a reduction to 1/4 would still be too much for us so for our case I prefer the map based approach - then with Smallfloat. > Make external scoring more efficient (ExternalFileField, FileFloatSource) > - > > Key: SOLR-2583 > URL: https://issues.apache.org/jira/browse/SOLR-2583 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Martin Grotzke >Priority: Minor > Attachments: FileFloatSource.java.patch > > > External scoring eats much memory, depending on the number of documents in > the index. The ExternalFileField (used for external scoring) uses > FileFloatSource, where one FileFloatSource is created per external scoring > file. FileFloatSource creates a float array with the size of the number of > docs (this is also done if the file to load is not found). If there are much > less entries in the scoring file than there are number of docs in total the > big float array wastes much memory. > This could be optimized by using a map of doc -> score, so that the map > contains as many entries as there are scoring entries in the external file, > but not more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)
[ https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046674#comment-13046674 ] Martin Grotzke commented on SOLR-2583: -- Yes, you're right regarding non-sparse fields. The question for the user will be when to use true or false for sparse. It might also be the case, that files differ, in that some are big, others are small. So I'm thinking about making it adaptive: when the number of lines reach a certain percentage compared to the number of docs, the float array is used, otherwise the doc->score map is used. Perhaps it would be good to allow the user to override this, s.th. like sparse=yes/no/auto. What do you think? > Make external scoring more efficient (ExternalFileField, FileFloatSource) > - > > Key: SOLR-2583 > URL: https://issues.apache.org/jira/browse/SOLR-2583 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Martin Grotzke >Priority: Minor > Attachments: FileFloatSource.java.patch > > > External scoring eats much memory, depending on the number of documents in > the index. The ExternalFileField (used for external scoring) uses > FileFloatSource, where one FileFloatSource is created per external scoring > file. FileFloatSource creates a float array with the size of the number of > docs (this is also done if the file to load is not found). If there are much > less entries in the scoring file than there are number of docs in total the > big float array wastes much memory. > This could be optimized by using a map of doc -> score, so that the map > contains as many entries as there are scoring entries in the external file, > but not more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)
[ https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Grotzke updated SOLR-2583: - Attachment: FileFloatSource.java.patch The attached patch changes FileFloatSource to use a map of score by doc. > Make external scoring more efficient (ExternalFileField, FileFloatSource) > - > > Key: SOLR-2583 > URL: https://issues.apache.org/jira/browse/SOLR-2583 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Martin Grotzke >Priority: Minor > Attachments: FileFloatSource.java.patch > > > External scoring eats much memory, depending on the number of documents in > the index. The ExternalFileField (used for external scoring) uses > FileFloatSource, where one FileFloatSource is created per external scoring > file. FileFloatSource creates a float array with the size of the number of > docs (this is also done if the file to load is not found). If there are much > less entries in the scoring file than there are number of docs in total the > big float array wastes much memory. > This could be optimized by using a map of doc -> score, so that the map > contains as many entries as there are scoring entries in the external file, > but not more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)
Make external scoring more efficient (ExternalFileField, FileFloatSource) - Key: SOLR-2583 URL: https://issues.apache.org/jira/browse/SOLR-2583 Project: Solr Issue Type: Improvement Components: search Reporter: Martin Grotzke Priority: Minor External scoring eats much memory, depending on the number of documents in the index. The ExternalFileField (used for external scoring) uses FileFloatSource, where one FileFloatSource is created per external scoring file. FileFloatSource creates a float array with the size of the number of docs (this is also done if the file to load is not found). If there are much less entries in the scoring file than there are number of docs in total the big float array wastes much memory. This could be optimized by using a map of doc -> score, so that the map contains as many entries as there are scoring entries in the external file, but not more. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org