[jira] [Commented] (SOLR-8868) SolrCloud: if zookeeper loses and then regains a quorum, Solr nodes and SolrJ Client do not recover and need to be restarted

2017-05-27 Thread Martin Grotzke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16027637#comment-16027637
 ] 

Martin Grotzke commented on SOLR-8868:
--

We also experienced this issue, any update here? 

> SolrCloud: if zookeeper loses and then regains a quorum, Solr nodes and SolrJ 
> Client do not recover and need to be restarted
> 
>
> Key: SOLR-8868
> URL: https://issues.apache.org/jira/browse/SOLR-8868
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud, SolrJ
>Affects Versions: 5.3.1
>Reporter: Frank J Kelly
>
> Tried mailing list on 3/15 and 3/16 to no avail. Hopefully I gave enough 
> details.
> 
> Just wondering if my observation of SolrCloud behavior after ZooKeeper loses 
> a quorum is normal or to-be-expected
> Version of Solr: 5.3.1
> Version of ZooKeeper: 3.4.7
> Using SolrCloud with external ZooKeeper
> Deployed on AWS
> Our Solr cluster has 3 nodes (m3.large)
> Our Zookeeper ensemble consists of three nodes (t2.small) with the same 
> config using DNS names e.g.
> {noformat}
> $ more ../conf/zoo.cfg
> tickTime=2000
> dataDir=/var/zookeeper
> dataLogDir=/var/log/zookeeper
> clientPort=2181
> initLimit=10
> syncLimit=5
> standaloneEnabled=false
> server.1=zookeeper1.qa.eu-west-1.mysearch.com:2888:3888
> server.2=zookeeper2.qa.eu-west-1.mysearch.com:2888:3888
> server.3=zookeeper3.qa.eu-west-1.mysearch.com:2888:3888
> {noformat}
> If we terminate one of the zookeeper nodes we get a ZK election (and I think) 
> a quorum is maintained.
> Operation continues OK and we detect the terminated instance and relaunch a 
> new ZK node which comes up fine
> If we terminate two of the ZK nodes we lose a quorum and then we observe the 
> following
> 1.1) Admin UI shows an error that it is unable to contact ZooKeeper “Could 
> not connect to ZooKeeper"
> 1.2) SolrJ returns the following
> {noformat}
> org.apache.solr.common.SolrException: Could not load collection from 
> ZK:qa_eu-west-1_public_index
> at 
> org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(ZkStateReader.java:850)
> at org.apache.solr.common.cloud.ZkStateReader$7.get(ZkStateReader.java:515)
> at 
> org.apache.solr.client.solrj.impl.CloudSolrClient.getDocCollection(CloudSolrClient.java:1205)
> at 
> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:837)
> at 
> org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:805)
> at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135)
> at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:107)
> at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:72)
> at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:86)
> at 
> com.here.scbe.search.solr.SolrFacadeImpl.addToSearchIndex(SolrFacadeImpl.java:112)
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for 
> /collections/qa_eu-west-1_public_index/state.json
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
> at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:345)
> at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:342)
> at 
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)
> at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:342)
> at 
> org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(ZkStateReader.java:841)
> ... 24 more
> {noformat}
> This makes sense based on our understanding.
> When our AutoScale groups launch two new ZooKeeper nodes, initialize them, 
> fix the DNS etc. we regain a quorum but at this point
> 2.1) Admin UI shows the shards as “GONE” (all greyed out)
> 2.2) SolrJ returns the same error even though the ZooKeeper DNS names are now 
> bound to new IP addresses
> So at this point I restart the Solr nodes. At this point then
> 3.1) Admin UI shows the collections as OK (all shards are green) – yeah the 
> nodes are back!
> 3.2) SolrJ Client still shows the same error – namely
> {noformat}
> org.apache.solr.common.SolrException: Could not load collection from 
> ZK:qa_eu-west-1_here_account
> at 
> org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(ZkStateReader.java:850)
> at org.apache.solr.common.cloud.ZkStateReader$7.get(ZkStateReader.java:515)
> at 
> org.apache.solr.client.solrj.impl.CloudSolrClient.getDocCollection(CloudSolrClient.java:1205)
> at 
> org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:837)
> at

[jira] [Commented] (SOLR-6273) Cross Data Center Replication

2015-07-03 Thread Martin Grotzke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613472#comment-14613472
 ] 

Martin Grotzke commented on SOLR-6273:
--

Great, thanks for the advice, Renaud! 

> Cross Data Center Replication
> -
>
> Key: SOLR-6273
> URL: https://issues.apache.org/jira/browse/SOLR-6273
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
>Assignee: Erick Erickson
> Attachments: SOLR-6273-trunk-testfix1.patch, 
> SOLR-6273-trunk-testfix2.patch, SOLR-6273-trunk.patch, SOLR-6273-trunk.patch, 
> SOLR-6273.patch, SOLR-6273.patch, SOLR-6273.patch, SOLR-6273.patch
>
>
> This is the master issue for Cross Data Center Replication (CDCR)
> described at a high level here: 
> http://heliosearch.org/solr-cross-data-center-replication/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6461) peer cluster configuration

2015-07-03 Thread Martin Grotzke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613469#comment-14613469
 ] 

Martin Grotzke commented on SOLR-6461:
--

Great, thanks! 

> peer cluster configuration
> --
>
> Key: SOLR-6461
> URL: https://issues.apache.org/jira/browse/SOLR-6461
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Yonik Seeley
>
> From http://heliosearch.org/solr-cross-data-center-replication/#Overview
> """Clusters will be configured to know about each other, most likely through 
> keeping a cluster peer list in zookeeper. One essential piece of information 
> will be the zookeeper quorum address for each cluster peer. Any node in one 
> cluster can know the configuration of another cluster via a zookeeper 
> client."""



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6273) Cross Data Center Replication

2015-07-02 Thread Martin Grotzke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612476#comment-14612476
 ] 

Martin Grotzke commented on SOLR-6273:
--

Hi all, we're currently evaluating how to expand our current single DC 
solrcloud to multi (2) DCs. This effort here looks very promising, great work!
Assuming we'd test how it works for us, could we follow the documentation 
mentioned above 
(https://docs.google.com/document/d/1DZHUFM3z9OX171DeGjcLTRI9uULM-NB1KsCSpVL3Zy0/edit?usp=sharing)?
 Does it match the current implementation? Do you have any other suggestions 
for us if we'd test this? Thanks! 

> Cross Data Center Replication
> -
>
> Key: SOLR-6273
> URL: https://issues.apache.org/jira/browse/SOLR-6273
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
>Assignee: Erick Erickson
> Attachments: SOLR-6273-trunk-testfix1.patch, 
> SOLR-6273-trunk-testfix2.patch, SOLR-6273-trunk.patch, SOLR-6273-trunk.patch, 
> SOLR-6273.patch, SOLR-6273.patch, SOLR-6273.patch, SOLR-6273.patch
>
>
> This is the master issue for Cross Data Center Replication (CDCR)
> described at a high level here: 
> http://heliosearch.org/solr-cross-data-center-replication/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6461) peer cluster configuration

2015-07-02 Thread Martin Grotzke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612451#comment-14612451
 ] 

Martin Grotzke commented on SOLR-6461:
--

This one is closed with resolution "fixed", while it hadn't been assigned. Is 
it really fixed? Are there some details about the current state of 
implementation of peer cluster configuration? Thanks! 

> peer cluster configuration
> --
>
> Key: SOLR-6461
> URL: https://issues.apache.org/jira/browse/SOLR-6461
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Yonik Seeley
>
> From http://heliosearch.org/solr-cross-data-center-replication/#Overview
> """Clusters will be configured to know about each other, most likely through 
> keeping a cluster peer list in zookeeper. One essential piece of information 
> will be the zookeeper quorum address for each cluster peer. Any node in one 
> cluster can know the configuration of another cluster via a zookeeper 
> client."""



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6365) specify appends, defaults, invariants outside of the component

2015-02-25 Thread Martin Grotzke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14336299#comment-14336299
 ] 

Martin Grotzke commented on SOLR-6365:
--

[~noble.paul] Sounds great, I submitted SOLR-7157

> specify  appends, defaults, invariants outside of the component
> ---
>
> Key: SOLR-6365
> URL: https://issues.apache.org/jira/browse/SOLR-6365
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 5.0, Trunk
>
> Attachments: SOLR-6365-crappy-test.patch, SOLR-6365.patch, 
> SOLR-6365.patch, SOLR-6365.patch
>
>
> The components are configured in solrconfig.xml mostly for specifying these 
> extra parameters. If we separate these out, we can avoid specifying the 
> components altogether and make solrconfig much simpler. Eventually we want 
> users to see all functions as paths instead of components and control these 
> params from outside , through an API and persisted in ZK
> objectives :
> * define standard components implicitly and let users override some params 
> only
> * reuse standard params across components
> * define multiple param sets and mix and match these params at request time
> example
> {code:xml}
> 
> 
>   
>  json
>  _txt
>   
> 
> {code}
> other examples
> {code:xml}
> 
> 
>   A
> 
> 
>   B
> 
> 
>   C
> 
>   
>   
>   
>   
>class="DumpRequestHandler"/>
>   
>   
> 
>   A1
> 
> 
>   B1
> 
> 
>   C1
> 
>   
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-7157) Specify arbitrary config params outside of the component

2015-02-25 Thread Martin Grotzke (JIRA)
Martin Grotzke created SOLR-7157:


 Summary: Specify arbitrary config params outside of the component
 Key: SOLR-7157
 URL: https://issues.apache.org/jira/browse/SOLR-7157
 Project: Solr
  Issue Type: New Feature
Reporter: Martin Grotzke


SOLR-6365 added support for appends, defaults and invariants specified outside 
of the component via initParams.

It would be great if it would also be possible to configure arbitrary params 
via initParams.

Our use case is that we want to configure the "healthcheckFile" for the 
"/admin/ping" RequestHandler, so I'd like to configure it like this:

{code}
  
server-enabled.txt
  
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6365) specify appends, defaults, invariants outside of the component

2015-02-22 Thread Martin Grotzke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14332378#comment-14332378
 ] 

Martin Grotzke commented on SOLR-6365:
--

Moving to solr 5, I'm trying to configure the "healthcheckFile" for the 
PingRequestHandler.

I added
{code}
  
server-enabled.txt
  
{code}
to solrconfig, unfortunately this did not do the trick. I had to configure the 
PingRequestHandler completely to get the healthcheckFile configured.

My assumption is that *only* appends, defaults and invariants can be specified 
outside of the component, so what I'm experiencing is expected and not a bug or 
an issue on my side. Is that correct?

> specify  appends, defaults, invariants outside of the component
> ---
>
> Key: SOLR-6365
> URL: https://issues.apache.org/jira/browse/SOLR-6365
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 5.0, Trunk
>
> Attachments: SOLR-6365-crappy-test.patch, SOLR-6365.patch, 
> SOLR-6365.patch, SOLR-6365.patch
>
>
> The components are configured in solrconfig.xml mostly for specifying these 
> extra parameters. If we separate these out, we can avoid specifying the 
> components altogether and make solrconfig much simpler. Eventually we want 
> users to see all functions as paths instead of components and control these 
> params from outside , through an API and persisted in ZK
> objectives :
> * define standard components implicitly and let users override some params 
> only
> * reuse standard params across components
> * define multiple param sets and mix and match these params at request time
> example
> {code:xml}
> 
> 
>   
>  json
>  _txt
>   
> 
> {code}
> other examples
> {code:xml}
> 
> 
>   A
> 
> 
>   B
> 
> 
>   C
> 
>   
>   
>   
>   
>class="DumpRequestHandler"/>
>   
>   
> 
>   A1
> 
> 
>   B1
> 
> 
>   C1
> 
>   
> {code}
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6719) Collection API: CREATE ignores 'property.name' when creating individual cores

2014-12-18 Thread Martin Grotzke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252317#comment-14252317
 ] 

Martin Grotzke commented on SOLR-6719:
--

We experienced the same with ADDREPLICA and property.solr.common.data.dir 
(running Solr 4.10.2).

> Collection API: CREATE ignores 'property.name' when creating individual cores
> -
>
> Key: SOLR-6719
> URL: https://issues.apache.org/jira/browse/SOLR-6719
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>
> Yashveer Rana pointed this out in the ref guide comments...
> https://cwiki.apache.org/confluence/display/solr/Collections+API?focusedCommentId=47382851#comment-47382851
> * Collection CREATE is documented to support "property._name_=_value_" (where 
> 'name' and 'property' are italics placeholders for user supplied key=val) as 
> "Set core property _name_ to _value_. See core.properties file contents."
> * The [docs for 
> core.properties|https://cwiki.apache.org/confluence/display/solr/Format+of+solr.xml#Formatofsolr.xml-core.properties_files]
>  include a list of supported property values, including "name" (literal) as 
> "The name of the SolrCore. You'll use this name to reference the SolrCore 
> when running commands with the CoreAdminHandler."
> From these docs, it's reasonable to assume that using a URL like this...
> http://localhost:8983/solr/admin/collections?action=CREATE&name=my_collection&numShards=2&configSet=data_driven_schema_configs&property.name=my_corename
> ...should cause "my_collection" to be created, with the core name used for 
> every replica being "my_corename" ... but that doesn't happen.  instead the 
> replicas get core names like "my_collection_shard1_replica1"
> 
> This is either a bug, or (my suspicion) it's intentional that the user 
> specific core name is not being used -- if it's intentional, then the 
> Collection CREATE command should fail with a clear error if a user does try 
> to use "property.name" rather then silently ignoring it and the Collection 
> CREATE docs should be updated to make it clear that "name" is an exception to 
> the general property.foo -> foo in core.properties support.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6086) Replica active during Warming

2014-10-10 Thread Martin Grotzke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14166830#comment-14166830
 ] 

Martin Grotzke commented on SOLR-6086:
--

Is this really the case? Why are nodes that are not yet able to serve queries 
consideres being active?

> Replica active during Warming
> -
>
> Key: SOLR-6086
> URL: https://issues.apache.org/jira/browse/SOLR-6086
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.6.1, 4.8.1
>Reporter: ludovic Boutros
>  Labels: difficulty-medium, impact-medium
> Attachments: SOLR-6086.patch, SOLR-6086.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> At least with Solr 4.6.1, replica are considered as active during the warming 
> process.
> This means that if you restart a replica or create a new one, queries will  
> be send to this replica and the query will hang until the end of the warming  
> process (If cold searchers are not used).
> You cannot add or restart a node silently anymore.
> I think that the fact that the replica is active is not a bad thing.
> But, the HttpShardHandler and the CloudSolrServer class should take the 
> warming process in account.
> Currently, I have developped a new very simple component which check that a 
> searcher is registered.
> I am also developping custom HttpShardHandler and CloudSolrServer classes 
> which will check the warming process in addition to the ACTIVE status in the 
> cluster state.
> This seems to be more a workaround than a solution but that's all I can do in 
> this version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3383) Async responses in SolrJ

2014-02-07 Thread Martin Grotzke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13894435#comment-13894435
 ] 

Martin Grotzke commented on SOLR-3383:
--

Ok, thanks for the update. 

> Async responses in SolrJ
> 
>
> Key: SOLR-3383
> URL: https://issues.apache.org/jira/browse/SOLR-3383
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 3.5
>Reporter: Per Steffensen
>Assignee: Per Steffensen
>  Labels: async, asynchronous, concurrency, query, solrj, update
> Fix For: 4.7
>
> Attachments: SOLR-3383.patch
>
>
> Today it is like this
> - SolrServer.request returns NamedList
> - SolrRequest.process returns SolrResponse
> - Public methods on SolrServer like addX, optimize, commit, queryX etc. 
> returns subclasses of SolrResponse (e.g. "add" returns UpdateResponse)
> - etc
> This is all synchronous - that is, the calling thread of those methods will 
> wait for the response before being able to continue. I believe the industry 
> today agrees that "operations" like client-server network-requireing 
> operations should be done asynchronously seens from the client API. Therefore 
> basically we should change those methods
> - SolrServer.request returns Future>
> - SolrRequest.process returns Future
> - SolrServer.xxx returns Future
> and make the appropriate changes in the implementations below.
> My main argument for this right now, is that ConcurrentUpdateSolrServer 
> really is not able to hand over responses to the calling client. Guess that 
> it is also the reason why it is only a "Update"-SolrServer and not a complete 
> SolrServer (being able to do queries etc.) - updates does not require sending 
> responses (except primitive errors) back to the client, queries etc does. Now 
> that we do "finegrained error propagation" (SOLR-3382) in order to send 
> "unique key constraint"- and "versioning"-errors (SOLR-3173 and SOLR-3178) 
> back to the client in responses to update-request, suddenly it is not true 
> anymore that updates does not require sending responses back to the client.
> Making the changes suggested above (returning Futures) would
> - Allow ConcurrentUpdateSolrServer to be used for updates potentially 
> resulting in "unique key constraint"- and "versioning"-errors
> - Allow ConcurrentUpdateSolrServer to become ConcurrentSolrServer - also 
> being able to do queries etc
> - Do cool stuff like SOLR-3384
> - Make SolrJ more modern with respect to asynchronous principles



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3383) Async responses in SolrJ

2014-02-07 Thread Martin Grotzke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13894298#comment-13894298
 ] 

Martin Grotzke commented on SOLR-3383:
--

I'm also interested in async responses in solrj, but instead of using java 
Futures, I'd prefer a callback based interface, e.g. via onSuccess(callback) 
and onError(callback), much like scala Futures work.

> Async responses in SolrJ
> 
>
> Key: SOLR-3383
> URL: https://issues.apache.org/jira/browse/SOLR-3383
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 3.5
>Reporter: Per Steffensen
>Assignee: Per Steffensen
>  Labels: async, asynchronous, concurrency, query, solrj, update
> Fix For: 4.7
>
> Attachments: SOLR-3383.patch
>
>
> Today it is like this
> - SolrServer.request returns NamedList
> - SolrRequest.process returns SolrResponse
> - Public methods on SolrServer like addX, optimize, commit, queryX etc. 
> returns subclasses of SolrResponse (e.g. "add" returns UpdateResponse)
> - etc
> This is all synchronous - that is, the calling thread of those methods will 
> wait for the response before being able to continue. I believe the industry 
> today agrees that "operations" like client-server network-requireing 
> operations should be done asynchronously seens from the client API. Therefore 
> basically we should change those methods
> - SolrServer.request returns Future>
> - SolrRequest.process returns Future
> - SolrServer.xxx returns Future
> and make the appropriate changes in the implementations below.
> My main argument for this right now, is that ConcurrentUpdateSolrServer 
> really is not able to hand over responses to the calling client. Guess that 
> it is also the reason why it is only a "Update"-SolrServer and not a complete 
> SolrServer (being able to do queries etc.) - updates does not require sending 
> responses (except primitive errors) back to the client, queries etc does. Now 
> that we do "finegrained error propagation" (SOLR-3382) in order to send 
> "unique key constraint"- and "versioning"-errors (SOLR-3173 and SOLR-3178) 
> back to the client in responses to update-request, suddenly it is not true 
> anymore that updates does not require sending responses back to the client.
> Making the changes suggested above (returning Futures) would
> - Allow ConcurrentUpdateSolrServer to be used for updates potentially 
> resulting in "unique key constraint"- and "versioning"-errors
> - Allow ConcurrentUpdateSolrServer to become ConcurrentSolrServer - also 
> being able to do queries etc
> - Do cool stuff like SOLR-3384
> - Make SolrJ more modern with respect to asynchronous principles



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3318) LBHttpSolrServer should allow to specify a preferred server for a query

2012-04-29 Thread Martin Grotzke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264620#comment-13264620
 ] 

Martin Grotzke commented on SOLR-3318:
--

Any feedback on this?

> LBHttpSolrServer should allow to specify a preferred server for a query
> ---
>
> Key: SOLR-3318
> URL: https://issues.apache.org/jira/browse/SOLR-3318
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 4.0
>Reporter: Martin Grotzke
>Priority: Minor
> Attachments: SOLR-3318.git.patch
>
>
> For a user query we make several solr queries that differ only slightly and 
> therefore should use/reuse objects cached from the first query (we're using a 
> custom request handler and custom caches).
> Thus such subsequent queries should hit the same solr server.
> The implemented solution looks like this:
> * The client obtains a live SolrServer from LBHttpSolrServer
> * The client provides this SolrServer as preferred server for a query
> * If the preferred server is no longer alive the request is retried on 
> another live server
> * Everything else follows the existing logic:
> ** After live servers are exhausted, any servers previously marked as dead 
> will be tried before failing the request
> ** If no live servers are found a SolrServerException is thrown
> The implementation is also [on 
> github|https://github.com/magro/lucene-solr/commit/a75aef3d].
> Mailing list thread: 
> http://lucene.472066.n3.nabble.com/LBHttpSolrServer-to-query-a-preferred-server-tt3884140.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

2011-06-27 Thread Martin Grotzke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055737#comment-13055737
 ] 

Martin Grotzke commented on SOLR-2583:
--

bq. Looking at your test, I think it is reasonable. But I'd like to use 
CompactByteArray. I saw it wins over HashMap and float[] when 5% and above in 
my test.

Can you share your test code or s.th. similar? Perhaps you can just fork 
https://github.com/magro/lucene-solr/ and add an appropriate test that reflects 
your data?

> Make external scoring more efficient (ExternalFileField, FileFloatSource)
> -
>
> Key: SOLR-2583
> URL: https://issues.apache.org/jira/browse/SOLR-2583
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Martin Grotzke
>Priority: Minor
> Attachments: FileFloatSource.java.patch, patch.txt
>
>
> External scoring eats much memory, depending on the number of documents in 
> the index. The ExternalFileField (used for external scoring) uses 
> FileFloatSource, where one FileFloatSource is created per external scoring 
> file. FileFloatSource creates a float array with the size of the number of 
> docs (this is also done if the file to load is not found). If there are much 
> less entries in the scoring file than there are number of docs in total the 
> big float array wastes much memory.
> This could be optimized by using a map of doc -> score, so that the map 
> contains as many entries as there are scoring entries in the external file, 
> but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

2011-06-16 Thread Martin Grotzke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050435#comment-13050435
 ] 

Martin Grotzke commented on SOLR-2583:
--

bq. Are you sure real floats are actually needed?
In our case score values are e.g. 15887 (one example just taken from one of 
the files). With this sample this test fails:
{noformat}
byte small = SmallFloat.floatToByte315(104626500f);
assertEquals(104626500f, SmallFloat.byte315ToFloat(small), 0f);
-> AssertionError: expected:<1.04626496E8> but was:<1.00663296E8>
{noformat}

This shows that even we have a case where this will produce wrong results, and 
even if we could fix this in our case there might be someone else with the same 
issue.


bq. it would also good to measure performance...
I'd not expect that the boxing makes a real difference here, especially in 
relation to the rest of the time spent during a search request.
A time based performance comparison that has a real value would take some time, 
it would have to put in relation to the rest of a search request (how do you do 
this?) and finally it would require proper interpretation when everything is 
together. Right now I don't think it's worth the effort.


{quote}
bq. that uses a fixed size and an increasing number of puts
I'm not certain how realistic that is, remember behind the scenes 
compactbytearray uses blocks,
and if you touch every one (by putting every K docid or something) then you are 
just testing
the worst case.
{quote}
Do you want to change the test to s.th. that's more realistic?


@Yonik: what do you say regarding the suggestion to use HashMap up to ~5.5% and 
above that using the float[]?

> Make external scoring more efficient (ExternalFileField, FileFloatSource)
> -
>
> Key: SOLR-2583
> URL: https://issues.apache.org/jira/browse/SOLR-2583
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Martin Grotzke
>Priority: Minor
> Attachments: FileFloatSource.java.patch, patch.txt
>
>
> External scoring eats much memory, depending on the number of documents in 
> the index. The ExternalFileField (used for external scoring) uses 
> FileFloatSource, where one FileFloatSource is created per external scoring 
> file. FileFloatSource creates a float array with the size of the number of 
> docs (this is also done if the file to load is not found). If there are much 
> less entries in the scoring file than there are number of docs in total the 
> big float array wastes much memory.
> This could be optimized by using a map of doc -> score, so that the map 
> contains as many entries as there are scoring entries in the external file, 
> but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

2011-06-15 Thread Martin Grotzke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049674#comment-13049674
 ] 

Martin Grotzke commented on SOLR-2583:
--

The test that produced this output can be found in my lucene-solr fork on 
github: https://github.com/magro/lucene-solr/commit/b9af87b1
The test method that was executed was testCompareMemoryUsage, for measuring 
memory usage I used http://code.google.com/p/memory-measurer/ and ran the 
test/jvm with "-Xmx1G -javaagent:solr/lib/object-explorer.jar" (just from 
eclipse).

I just added another test, that uses a fixed size and an increasing number of 
puts (testCompareMemoryUsageWithFixSizeAndIncreasingNumPuts, 
https://github.com/magro/lucene-solr/blob/trunk/solr/src/test/org/apache/solr/search/function/FileFloatSourceMemoryTest.java#L56),
 with the following results:

{noformat}
Size: 100
NumPuts 1.000 (0,1%),   CompactFloatArray 918.616,  float[] 
4.000.016,  HashMap  72.128
NumPuts 10.000 (1,0%),  CompactFloatArray 3.738.712,float[] 
4.000.016,  HashMap  701.696
NumPuts 50.000 (5,0%),  CompactFloatArray 4.016.472,float[] 
4.000.016,  HashMap  3.383.104
NumPuts 55.000 (5,5%),  CompactFloatArray 4.016.472,float[] 
4.000.016,  HashMap  3.949.120
NumPuts 60.000 (6,0%),  CompactFloatArray 4.016.472,float[] 
4.000.016,  HashMap  4.254.848
NumPuts 100.000 (10,0%),CompactFloatArray 4.016.472,float[] 
4.000.016,  HashMap  6.622.272
NumPuts 500.000 (50,0%),CompactFloatArray 4.016.472,float[] 
4.000.016,  HashMap  27.262.976
NumPuts 1.000.000 (100,0%), CompactFloatArray 4.016.472,float[] 
4.000.016,  HashMap  44.649.664
{noformat}

It seems that the HashMap is the most efficient solution up to ~5.5%. Starting 
from this threshold CompactFloatArray and float[] use less memory, while the 
CompactFloatArray has no advantages over float[] for puts > 5%.

Therefore I'd suggest that we use an adaptive strategy that uses a HashMap up 
to 5,5% of number of scores compared to numdocs, and starting from this 
threshold the original float[] approach is used.

What do you say?

> Make external scoring more efficient (ExternalFileField, FileFloatSource)
> -
>
> Key: SOLR-2583
> URL: https://issues.apache.org/jira/browse/SOLR-2583
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Martin Grotzke
>Priority: Minor
> Attachments: FileFloatSource.java.patch, patch.txt
>
>
> External scoring eats much memory, depending on the number of documents in 
> the index. The ExternalFileField (used for external scoring) uses 
> FileFloatSource, where one FileFloatSource is created per external scoring 
> file. FileFloatSource creates a float array with the size of the number of 
> docs (this is also done if the file to load is not found). If there are much 
> less entries in the scoring file than there are number of docs in total the 
> big float array wastes much memory.
> This could be optimized by using a map of doc -> score, so that the map 
> contains as many entries as there are scoring entries in the external file, 
> but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

2011-06-14 Thread Martin Grotzke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049256#comment-13049256
 ] 

Martin Grotzke edited comment on SOLR-2583 at 6/14/11 4:25 PM:
---

I just compared memory consumption of the 3 different approaches, with 
different number of puts (number of scores) and sizes (number of docs), the 
memory is in byte:

{noformat}
Puts  1.000, size 1.000.000:  CompactFloatArray 898.136,float[] 
4.000.016,  HashMap  72.192
Puts  10.000, size 1.000.000: CompactFloatArray 3.724.376,  float[] 
4.000.016,  HashMap  702.784
Puts  100.000, size 1.000.000:CompactFloatArray 4.016.472,  float[] 
4.000.016,  HashMap  6.607.808
Puts  1.000.000, size 1.000.000:  CompactFloatArray 4.016.472,  float[] 
4.000.016,  HashMap  44.644.032
Puts  1.000, size 5.000.000:  CompactFloatArray 1.128.536,  float[] 
20.000.016, HashMap  72.256
Puts  10.000, size 5.000.000: CompactFloatArray 8.168.536,  float[] 
20.000.016, HashMap  704.832
Puts  100.000, size 5.000.000:CompactFloatArray 20.013.144, float[] 
20.000.016, HashMap  7.385.152
Puts  1.000.000, size 5.000.000:  CompactFloatArray 20.131.160, float[] 
20.000.016, HashMap  66.395.584
Puts  1.000, size 10.000.000: CompactFloatArray 1.275.992,  float[] 
40.000.016, HashMap  72.256
Puts  10.000, size 10.000.000:CompactFloatArray 9.289.816,  float[] 
40.000.016, HashMap  705.280
Puts  100.000, size 10.000.000:   CompactFloatArray 37.130.328, float[] 
40.000.016, HashMap  7.418.112
Puts  1.000.000, size 10.000.000: CompactFloatArray 40.262.232, float[] 
40.000.016, HashMap  69.282.496
{noformat}

I want to share this intermediately, without further interpretation/conclusion 
for now (I just need to get the train).

  was (Author: martin.grotzke):
I just compared memory consumption of the 3 different approaches, with 
different number of puts (number of scores) and sizes (number of docs):

{noformat}
Puts  1.000, size 1.000.000:  CompactFloatArray 898.136,float[] 
4.000.016,  HashMap  72.192
Puts  10.000, size 1.000.000: CompactFloatArray 3.724.376,  float[] 
4.000.016,  HashMap  702.784
Puts  100.000, size 1.000.000:CompactFloatArray 4.016.472,  float[] 
4.000.016,  HashMap  6.607.808
Puts  1.000.000, size 1.000.000:  CompactFloatArray 4.016.472,  float[] 
4.000.016,  HashMap  44.644.032
Puts  1.000, size 5.000.000:  CompactFloatArray 1.128.536,  float[] 
20.000.016, HashMap  72.256
Puts  10.000, size 5.000.000: CompactFloatArray 8.168.536,  float[] 
20.000.016, HashMap  704.832
Puts  100.000, size 5.000.000:CompactFloatArray 20.013.144, float[] 
20.000.016, HashMap  7.385.152
Puts  1.000.000, size 5.000.000:  CompactFloatArray 20.131.160, float[] 
20.000.016, HashMap  66.395.584
Puts  1.000, size 10.000.000: CompactFloatArray 1.275.992,  float[] 
40.000.016, HashMap  72.256
Puts  10.000, size 10.000.000:CompactFloatArray 9.289.816,  float[] 
40.000.016, HashMap  705.280
Puts  100.000, size 10.000.000:   CompactFloatArray 37.130.328, float[] 
40.000.016, HashMap  7.418.112
Puts  1.000.000, size 10.000.000: CompactFloatArray 40.262.232, float[] 
40.000.016, HashMap  69.282.496
{noformat}

I want to share this intermediately, without further interpretation/conclusion 
for now (I just need to get the train).
  
> Make external scoring more efficient (ExternalFileField, FileFloatSource)
> -
>
> Key: SOLR-2583
> URL: https://issues.apache.org/jira/browse/SOLR-2583
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Martin Grotzke
>Priority: Minor
> Attachments: FileFloatSource.java.patch, patch.txt
>
>
> External scoring eats much memory, depending on the number of documents in 
> the index. The ExternalFileField (used for external scoring) uses 
> FileFloatSource, where one FileFloatSource is created per external scoring 
> file. FileFloatSource creates a float array with the size of the number of 
> docs (this is also done if the file to load is not found). If there are much 
> less entries in the scoring file than there are number of docs in total the 
> big float array wastes much memory.
> This could be optimized by using a map of doc -> score, so that the map 
> contains as many entries as there are scoring entries in the external file, 
> but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

2011-06-14 Thread Martin Grotzke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049256#comment-13049256
 ] 

Martin Grotzke commented on SOLR-2583:
--

I just compared memory consumption of the 3 different approaches, with 
different number of puts (number of scores) and sizes (number of docs):

{noformat}
Puts  1.000, size 1.000.000:  CompactFloatArray 898.136,float[] 
4.000.016,  HashMap  72.192
Puts  10.000, size 1.000.000: CompactFloatArray 3.724.376,  float[] 
4.000.016,  HashMap  702.784
Puts  100.000, size 1.000.000:CompactFloatArray 4.016.472,  float[] 
4.000.016,  HashMap  6.607.808
Puts  1.000.000, size 1.000.000:  CompactFloatArray 4.016.472,  float[] 
4.000.016,  HashMap  44.644.032
Puts  1.000, size 5.000.000:  CompactFloatArray 1.128.536,  float[] 
20.000.016, HashMap  72.256
Puts  10.000, size 5.000.000: CompactFloatArray 8.168.536,  float[] 
20.000.016, HashMap  704.832
Puts  100.000, size 5.000.000:CompactFloatArray 20.013.144, float[] 
20.000.016, HashMap  7.385.152
Puts  1.000.000, size 5.000.000:  CompactFloatArray 20.131.160, float[] 
20.000.016, HashMap  66.395.584
Puts  1.000, size 10.000.000: CompactFloatArray 1.275.992,  float[] 
40.000.016, HashMap  72.256
Puts  10.000, size 10.000.000:CompactFloatArray 9.289.816,  float[] 
40.000.016, HashMap  705.280
Puts  100.000, size 10.000.000:   CompactFloatArray 37.130.328, float[] 
40.000.016, HashMap  7.418.112
Puts  1.000.000, size 10.000.000: CompactFloatArray 40.262.232, float[] 
40.000.016, HashMap  69.282.496
{noformat}

I want to share this intermediately, without further interpretation/conclusion 
for now (I just need to get the train).

> Make external scoring more efficient (ExternalFileField, FileFloatSource)
> -
>
> Key: SOLR-2583
> URL: https://issues.apache.org/jira/browse/SOLR-2583
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Martin Grotzke
>Priority: Minor
> Attachments: FileFloatSource.java.patch, patch.txt
>
>
> External scoring eats much memory, depending on the number of documents in 
> the index. The ExternalFileField (used for external scoring) uses 
> FileFloatSource, where one FileFloatSource is created per external scoring 
> file. FileFloatSource creates a float array with the size of the number of 
> docs (this is also done if the file to load is not found). If there are much 
> less entries in the scoring file than there are number of docs in total the 
> big float array wastes much memory.
> This could be optimized by using a map of doc -> score, so that the map 
> contains as many entries as there are scoring entries in the external file, 
> but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

2011-06-14 Thread Martin Grotzke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049143#comment-13049143
 ] 

Martin Grotzke commented on SOLR-2583:
--

{quote}
See: http://www.strchr.com/multi-stage_tables

i attached a patch, of a (not great) implementation i was sorta kinda trying to 
clean up for other reasons... maybe you can use it.
{quote}

Thanx, interesting approach!

I just tried to create a CompactFloatArray based on the CompactByteArray to be 
able to compare memory consumptions. There's one change that wasn't just 
changing byte to float, and I'm not sure what's the right adaption in this case:

{code}
diff -w solr/src/java/org/apache/solr/util/CompactByteArray.java 
solr/src/java/org/apache/solr/util/CompactFloatArray.java
57c57
...
202,203c202,203
<   private void touchBlock(int i, int value) {
< hashes[i] = (hashes[i] + (value << 1)) | 1;
---
>   private void touchBlock(int i, float value) {
> hashes[i] = (hashes[i] + (Float.floatToIntBits(value) << 1)) | 1;
{code}

The adapted test is green, so it seems to be correct at least. I'll also attach 
the full patch for CompactFloatArray.java and TestCompactFloatArray.java

> Make external scoring more efficient (ExternalFileField, FileFloatSource)
> -
>
> Key: SOLR-2583
> URL: https://issues.apache.org/jira/browse/SOLR-2583
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Martin Grotzke
>Priority: Minor
> Attachments: FileFloatSource.java.patch, patch.txt
>
>
> External scoring eats much memory, depending on the number of documents in 
> the index. The ExternalFileField (used for external scoring) uses 
> FileFloatSource, where one FileFloatSource is created per external scoring 
> file. FileFloatSource creates a float array with the size of the number of 
> docs (this is also done if the file to load is not found). If there are much 
> less entries in the scoring file than there are number of docs in total the 
> big float array wastes much memory.
> This could be optimized by using a map of doc -> score, so that the map 
> contains as many entries as there are scoring entries in the external file, 
> but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

2011-06-09 Thread Martin Grotzke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046943#comment-13046943
 ] 

Martin Grotzke commented on SOLR-2583:
--

bq. If the problem is sparsity, maybe use a two-stage table, still faster than 
a hashmap and much better for the worst case.

What do you mean with a two-stage table, can you clarify this please?

> Make external scoring more efficient (ExternalFileField, FileFloatSource)
> -
>
> Key: SOLR-2583
> URL: https://issues.apache.org/jira/browse/SOLR-2583
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Martin Grotzke
>Priority: Minor
> Attachments: FileFloatSource.java.patch
>
>
> External scoring eats much memory, depending on the number of documents in 
> the index. The ExternalFileField (used for external scoring) uses 
> FileFloatSource, where one FileFloatSource is created per external scoring 
> file. FileFloatSource creates a float array with the size of the number of 
> docs (this is also done if the file to load is not found). If there are much 
> less entries in the scoring file than there are number of docs in total the 
> big float array wastes much memory.
> This could be optimized by using a map of doc -> score, so that the map 
> contains as many entries as there are scoring entries in the external file, 
> but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

2011-06-09 Thread Martin Grotzke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046712#comment-13046712
 ] 

Martin Grotzke commented on SOLR-2583:
--

> Sounds good!  I wonder what the memory cut-off should be for auto... 10% of 
> maxDoc() or so?

I'd compare both strategies to see what's the break-even, this should give an 
absolute number.

> Make external scoring more efficient (ExternalFileField, FileFloatSource)
> -
>
> Key: SOLR-2583
> URL: https://issues.apache.org/jira/browse/SOLR-2583
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Martin Grotzke
>Priority: Minor
> Attachments: FileFloatSource.java.patch
>
>
> External scoring eats much memory, depending on the number of documents in 
> the index. The ExternalFileField (used for external scoring) uses 
> FileFloatSource, where one FileFloatSource is created per external scoring 
> file. FileFloatSource creates a float array with the size of the number of 
> docs (this is also done if the file to load is not found). If there are much 
> less entries in the scoring file than there are number of docs in total the 
> big float array wastes much memory.
> This could be optimized by using a map of doc -> score, so that the map 
> contains as many entries as there are scoring entries in the external file, 
> but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

2011-06-09 Thread Martin Grotzke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046692#comment-13046692
 ] 

Martin Grotzke commented on SOLR-2583:
--

Great, sounds like a further optimization for both sparse and non-sparse files. 
Though, as we had 4GB taken by FileFloatSource objects a reduction to 1/4 would 
still be too much for us so for our case I prefer the map based approach - then 
with Smallfloat.

> Make external scoring more efficient (ExternalFileField, FileFloatSource)
> -
>
> Key: SOLR-2583
> URL: https://issues.apache.org/jira/browse/SOLR-2583
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Martin Grotzke
>Priority: Minor
> Attachments: FileFloatSource.java.patch
>
>
> External scoring eats much memory, depending on the number of documents in 
> the index. The ExternalFileField (used for external scoring) uses 
> FileFloatSource, where one FileFloatSource is created per external scoring 
> file. FileFloatSource creates a float array with the size of the number of 
> docs (this is also done if the file to load is not found). If there are much 
> less entries in the scoring file than there are number of docs in total the 
> big float array wastes much memory.
> This could be optimized by using a map of doc -> score, so that the map 
> contains as many entries as there are scoring entries in the external file, 
> but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

2011-06-09 Thread Martin Grotzke (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046674#comment-13046674
 ] 

Martin Grotzke commented on SOLR-2583:
--

Yes, you're right regarding non-sparse fields. The question for the user will 
be when to use true or false for sparse. It might also be the case, that files 
differ, in that some are big, others are small. So I'm thinking about making it 
adaptive: when the number of lines reach a certain percentage compared to the 
number of docs, the float array is used, otherwise the doc->score map is used. 
Perhaps it would be good to allow the user to override this, s.th. like 
sparse=yes/no/auto.

What do you think?

> Make external scoring more efficient (ExternalFileField, FileFloatSource)
> -
>
> Key: SOLR-2583
> URL: https://issues.apache.org/jira/browse/SOLR-2583
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Martin Grotzke
>Priority: Minor
> Attachments: FileFloatSource.java.patch
>
>
> External scoring eats much memory, depending on the number of documents in 
> the index. The ExternalFileField (used for external scoring) uses 
> FileFloatSource, where one FileFloatSource is created per external scoring 
> file. FileFloatSource creates a float array with the size of the number of 
> docs (this is also done if the file to load is not found). If there are much 
> less entries in the scoring file than there are number of docs in total the 
> big float array wastes much memory.
> This could be optimized by using a map of doc -> score, so that the map 
> contains as many entries as there are scoring entries in the external file, 
> but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

2011-06-09 Thread Martin Grotzke (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Grotzke updated SOLR-2583:
-

Attachment: FileFloatSource.java.patch

The attached patch changes FileFloatSource to use a map of score by doc.

> Make external scoring more efficient (ExternalFileField, FileFloatSource)
> -
>
> Key: SOLR-2583
> URL: https://issues.apache.org/jira/browse/SOLR-2583
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Martin Grotzke
>Priority: Minor
> Attachments: FileFloatSource.java.patch
>
>
> External scoring eats much memory, depending on the number of documents in 
> the index. The ExternalFileField (used for external scoring) uses 
> FileFloatSource, where one FileFloatSource is created per external scoring 
> file. FileFloatSource creates a float array with the size of the number of 
> docs (this is also done if the file to load is not found). If there are much 
> less entries in the scoring file than there are number of docs in total the 
> big float array wastes much memory.
> This could be optimized by using a map of doc -> score, so that the map 
> contains as many entries as there are scoring entries in the external file, 
> but not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2583) Make external scoring more efficient (ExternalFileField, FileFloatSource)

2011-06-09 Thread Martin Grotzke (JIRA)
Make external scoring more efficient (ExternalFileField, FileFloatSource)
-

 Key: SOLR-2583
 URL: https://issues.apache.org/jira/browse/SOLR-2583
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Martin Grotzke
Priority: Minor


External scoring eats much memory, depending on the number of documents in the 
index. The ExternalFileField (used for external scoring) uses FileFloatSource, 
where one FileFloatSource is created per external scoring file. FileFloatSource 
creates a float array with the size of the number of docs (this is also done if 
the file to load is not found). If there are much less entries in the scoring 
file than there are number of docs in total the big float array wastes much 
memory.

This could be optimized by using a map of doc -> score, so that the map 
contains as many entries as there are scoring entries in the external file, but 
not more.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org