Re: when to change rows param?
Hoss, as of now I managed to adjust this in the client code before it touches the server so it is not urgent at all anymore. I wanted to avoid touching the client code (which is giving, oh great fun, MSIE concurrency miseries) hence I wanted a server-side rewrite of the maximum number of hits returned. Thus far my server customization, except a custom solrconfig and schema, are a query-component and response-handler. I thought that injecting the rows param in the query-component would have been enough (from the limits param my client is giving). But it seems not to be the case. paul Le 12 avr. 2011 à 02:07, Chris Hostetter a écrit : Paul: can you elaborate a little bit on what exactly your problem is? - what is the full component list you are using? - how are you changing the param value (ie: what does the code look like) - what isn't working the way you expect? : I've been using my own QueryComponent (that extends the search one) : successfully to rewrite web-received parameters that are sent from the : (ExtJS-based) javascript client. This allows an amount of : query-rewriting, that's good. I tried to change the rows parameter there : (which is limit in the query, as per the underpinnings of ExtJS) but : it seems that this is not enough. : : Which component should I subclass to change the rows parameter? -Hoss
Re: Can I set up a config-based distributed search
Thanks, Ludovic and Jonathan. Yes, this configuration default is exactly what I was looking for. Ran On Mon, Apr 11, 2011 at 7:12 PM, Jonathan Rochkind rochk...@jhu.edu wrote: I have not worked with shards/distributed, but I think you can probably specify them as defaults in your requesthandler in your solrconfig.xml instead. Somewhere there is (or was) a wiki page on this I can't find right now. There's a way to specify (for a particular request handler) a default parameter value, such as for 'shards', that will be used if none were given with the request. There's also a way to specify an invariant that will always be used even if something else is passed in on the request. Ah, found it: http://wiki.apache.org/solr/SearchHandler#Configuration On 4/11/2011 8:31 AM, Ran Peled wrote: In the Distributed Search page ( http://wiki.apache.org/solr/DistributedSearch), it is documented that in order to perform a distributed search over a sharded index, I should use the shards request parameter, listing the shards to participate in the search (e.g. ?shards=localhost:8983/solr,localhost:7574/solr). I am planning a new pretty large index (1B+ items). Say I have a 100 shards, specifying the shards on the request URL becomes unrealistic due to length of URL. It is also redundant to do that on every request. Is there a way to specify the list of shards in a configuration file, instead of on the query URL? I have seen references to relevant config in SolrCloud, but as I understand it planned to be released only in Solr 4.0. Thanks, Ran
exceeded limit of maxWarmingSearchers = 4 =(
hello. my NRT-Search is not correctly configured =( 2 Solr-Instances. one searcher and one updater the updater start every minute an update of around 3000 documents. and the searcher start an commit ervery minute to refresh the index and read the new doc`s these are my Cache values for an 36 Million Document Index: after a restart my warmuptime is about 1700 MS. do you think i need to set autowarmcount of every cache to near of zero ? - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/exceeded-limit-of-maxWarmingSearchers-4-tp2810380p2810380.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 3.1 performance compared to 1.4.1
Hi Lance, Well not actually copied over the whole configuration files, instead i just added in the missing configuration (into a fresh copy of the example directory). By the directory implementation do you mean the readers used by SolrIndexSearcher ? These are: reader : SolrIndexReader{this=1cb04a0,r=ReadOnlyDirectoryReader@1cb04a0 ,refCnt=1,segments=1} readerDir : org.apache.lucene.store.NIOFSDirectory@/opt/solr3/example/solr/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@1efc208 But it seems the performance is actually still becoming better, at the moment the average has dropped even lower to 28ms (in comparison to 43ms in 1.4.1) Cheers! Marius 2011/4/12 Lance Norskog goks...@gmail.com Marius: I have copied the configuration from 1.4.1 to the 3.1. Does the Directory implementation show up in the JMX beans? In admin/statistics.jsp ? Or the Solr startup logs? (Sorry, don't have a Solr available.) Yonik: What platform are you on? I believe the Lucene Directory implementation now tries to be smarter (compared to lucene 2.9) about picking the best default (but it may not be working out for you for some reason) Lance On Sun, Apr 10, 2011 at 12:46 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Fri, Apr 8, 2011 at 9:53 AM, Marius van Zwijndregt pionw...@gmail.com wrote: Hello ! I'm new to the list, have been using SOLR for roughly 6 months and love it. Currently i'm setting up a 3.1 installation, next to a 1.4.1 installation (Ubuntu server, same JVM params). I have copied the configuration from 1.4.1 to the 3.1. Both version are running fine, but one thing ive noticed, is that the QTime on 3.1, is much slower for initial searches than on the (currently production) 1.4.1 installation. For example: Searching with 3.1; http://mysite:9983/solr/select?q=grasmaaier: QTime returns 371 Searching with 1.4.1: http://mysite:8983/solr/select?q=grasmaaier: QTime returns 59 Using debugQuery=true, i can see that the main time is spend in the query component itself (org.apache.solr.handler.component.QueryComponent). Can someone explain this, and how can i analyze this further ? Does it take time to build up a decent query, so could i switch to 3.1 without having to worry ? Thanks for the report... there's no reason that anything should really be much slower, so it would be great to get to the bottom of this! Is this using the same index as the 1.4.1 server, or did you rebuild it? Are there any other query parameters (that are perhaps added by default, like faceting or anything else that could take up time) or is this truly just a term query? What platform are you on? I believe the Lucene Directory implementation now tries to be smarter (compared to lucene 2.9) about picking the best default (but it may not be working out for you for some reason). -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco -- Lance Norskog goks...@gmail.com
High (io) load and org.mortbay.jetty.EofException
Hello ! Every night within my maintenance window, during high load caused by postgresql (vacuum analyze), i see a few (10-30) messages showing up in the solr 3.1 logfile. SEVERE: org.mortbay.jetty.EofException at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791) at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569) at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012) at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:278) at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212) at org.apache.solr.common.util.FastWriter.flush(FastWriter.java:115) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:344) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at org.mortbay.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:368) at org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:129) at org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:161) at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:714) ... 25 more The client application will return a 408 Request Timeout, and the search is stopped. Does anyone know what might cause this, and how i can prevent it from happening ? I think this might be the time Jetty is willing to wait before my client starts sending the http request, or the client stops the request premature. Cheers! Marius
Re: exceeded limit of maxWarmingSearchers = 4 =(
i start a commit on searcher-Core with: .../core/update?commit=truewaitFlush=false - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/exceeded-limit-of-maxWarmingSearchers-4-tp2810380p2810458.html Sent from the Solr - User mailing list archive at Nabble.com.
Berlin Buzzwords - conference schedule released
Hey folks, The Berlin Buzzwords team recently released the schedule for the conference on high scalability. The conference focuses on the topics search, data analysis and NoSQL. It is to take place on June 6/7th 2011 in Berlin. We are looking forward to two awesome keynote speakers who shaped the world of open source data analysis: Doug Cutting, founder of Apache Lucene and Hadoop) as well as Ted Dunning (Chief Application Architect at MapR Technologies and active developer at Apache Hadoop and Mahout). This year the program has been extended by one additional track. The first conference day focuses on the topics Apache Lucene, NoSQL, messaging and data mining. Speakers include Jakob Homan from Yahoo! who will give in introduction to the new Hadoop security features, Daniel Einspanjer is going to show how NoSQL and Hadoop are being used at Mozilla Socorro. In addition Chris Male gives a presentation on how to integrate Solr with J2EE applications. The second day features presentations by Jonathan Gray on Facebook's use of HBase in their Messaging architecture, Dawid Weiss, Simon Willnauer and Uwe Schindler are showing the latest Apache Lucene developments, Mark Miller provides insights into Solr Performance and Mathias Stearn is discussing MongoDB scalability questions. For our developers Berlin Buzzwords is a great chance to introduce our open source project Couchbase (based on Apache CouchDB and Memcached), get in touch with interested users and discuss their technical questions on site. says Jan Lehnardt, Co-Founder of Couchbase (merged CouchOne and Membase (formerly Northscale) [1]. Registration is open, regular tickets are available for 440,- Euro. There is a group discount. Prizes include coffee break and lunch catering. After the conference there will be trainings on topics related to Berlin Buzzwords such as Enterprise Search with Apache Lucene and Solr [2]. For the very first time we will also have community organised hackathons, that give Berlin Buzzwords visitors the opportunity to work together with the projects' developers on interesting tasks. Berlin Buzzwords is produced by newthinking communications in collaboration with Isabel Drost (Member of the Apache Software Foundation, PMC member Apache community development and co-founder of Apache Mahout), Jan Lehnardt (PMC Chair Apache CouchDB) and Simon Willnauer (PMC member Apache Lucene). [1] http://www.heise.de/open/meldung/NoSQL-CouchOne-und-Membase-fusionieren-zu - Couchbase-1185227.html [2] http://www.jteam.nl/training/2-day-training-Lucene-Solr.html
Re: exceeded limit of maxWarmingSearchers = 4 =(
my filterCache has a warmupTime from ~6000 ... but my config is like this: LRU Cache(maxSize=3000, initialSize=50, autowarmCount=50 ...) should i set maxSize to 50 or similar value ? - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/exceeded-limit-of-maxWarmingSearchers-4-tp2810380p2810561.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: exceeded limit of maxWarmingSearchers = 4 =(
oooh. my queryResultCache has a warmupTime from 54000 = ~1 Minute any suggestions ?? - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/exceeded-limit-of-maxWarmingSearchers-4-tp2810380p2810572.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Decrease warmupTime
i fighting with the same problem but with jetty. its in this case necessary to delete also the jetty work-DIR ??? - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/Decrease-warmupTime-tp494023p2810607.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing Best Practice
Hi Lance thanx for your reply, but I have a question is this patch committed to trunk?
AbstractSolrTestCase and Solr 3.1.0
Hi all, I am porting a previously series of Solr plugins developed for 1.4.1 version to 3.1.0, I've written some integration tests extending the AbstractSolrTestCase [1] utility class but now it seems that wasn't included in the solr-core 3.1.0 artifact as it's in the solr/src/test directory. Was that a choice for the release or it's me missing something (or both)? Should I replace it with a different class with same scope or should I refactor my integration tests in a different way? Thanks in advance for any feedback. Regards, Tommaso [1] : http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_1/solr/src/test/org/apache/solr/util/AbstractSolrTestCase.java
function query apply only in the subset of the query
Hi everyone, My situation is the next, I need to sum the value of a field to the score to the docs returned in the query, but not to all the docs, example: q=car returns 3 docs 1- name=car ford marketValue=1 score=1.3 2- name=car citroen marketValue=2 score=1.3 3- name=car mercedes marketValue=0.5 score=1.3 but if want to sum the marketValue to the score, my returned list is the next: q=car+_val_:marketValue 1- name=bus marketValue=5 score=5 2- name=car citroen marketValue=2 score=3.3 3- name=car ford marketValue=1 score=2.3 4- name=car mercedes marketValue=0.5 score=1.8 Its possible to apply the function query only to the documents returned in the first query? Thanks in advance, Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42
Help with Nested Query
Hi, Im trying to do somethinglike this in Solr 1.4.1 fq=category_id:(24 79) However the values inside the parenthesis will be fetched through another query, so far I’ve tried using _query_ but it doesnt work the way I want it to. Here is what im trying fq=category_id:(_query_:”{!lucene fl=category_id} video”) any suggestions on this? thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/Help-with-Nested-Query-tp2811038p2811038.html Sent from the Solr - User mailing list archive at Nabble.com.
Solrj retry handling - prevent ProtocolException: Unbuffered entity enclosing request can not be repeated
Hi, from time to time we're seeing a ProtocolException: Unbuffered entity enclosing request can not be repeated. in the logs when sending ~500 docs to solr (the stack trace is at the end of the email). I'm aware that this was discussed before (e.g. [1]) and our solution was already to reduce the number of docs that are sent to solr. However, I think that the issue might be solved in solrj. This discussion on the httpclient-dev mailing list [2] points out the solution under option 3) re-instantiate the input stream and retry the request manually. AFAICS CommonsHttpSolrServer.request when _maxRetries is set to s.th. 0 (see [3]) already does some retry stuff, but not around the actual http method execution (_httpClient.executeMethod(method)). Not sure for what the several tries are implemented, but I'd say that if the user sets maxRetries to s.th. 0 also http method execution should be retried. Another thing is the actually seen ProtocolException: AFAICS this is thrown as httpclient (HttpMethodDirector.executeWithRetry) performs a retry itself (see [4]) while the actually processed HttpMethod does not support this. As HttpMethodDirector.executeWithRetry already checks for a HttpMethodRetryHandler (under param HttpMethodParams.RETRY_HANDLER, [5]), it seems as if it would be enough to add such a handler for the update/POST requests to prevent the ProtocolException. So in summary I suggest two things: 1) Retry http method execution when maxRetiries is 0 2) Prevent HttpClient from doing retries (by adding HttpMethodRetryHandler) I first wanted to post it here on the list to see if there are objections or other solutions. Or if there are plans to replace commons httpclient (3.x) by s.th. like apache httpclient 4.x or async-http-client. If there's an agreement that the proposed solution is the way to go ATM I'd submit an appropriate issue for this. Any comments? Cheers, Martin [1] http://lucene.472066.n3.nabble.com/Unbuffered-entity-enclosing-request-can-not-be-repeated-tt788186.html [2] http://www.mail-archive.com/commons-httpclient-dev@jakarta.apache.org/msg06723.html [3] http://svn.apache.org/viewvc/lucene/dev/trunk/solr/src/solrj/org/apache/solr/client/solrj/impl/CommonsHttpSolrServer.java?view=markup#l281 [4] http://svn.apache.org/viewvc/httpcomponents/oac.hc3x/trunk/src/java/org/apache/commons/httpclient/HttpMethodDirector.java?view=markup#l366 [5] http://svn.apache.org/viewvc/httpcomponents/oac.hc3x/trunk/src/java/org/apache/commons/httpclient/HttpMethodDirector.java?view=markup#l426 Stack trace: Caused by: org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing request can not be repeated. at org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:487) at org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2110) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1088) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427) -- Martin Grotzke http://twitter.com/martin_grotzke signature.asc Description: OpenPGP digital signature
Updates during Optimize
Hello. When is start an optimize (which takes more than 4 hours) no updates from DIH are possible. i thougt solr is copy the hole index and then start an optimize from the copy and not lock the index and optimize this ... =( any way to do both in the same time ? - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/Updates-during-Optimize-tp2811183p2811183.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: AbstractSolrTestCase and Solr 3.1.0
On Tue, Apr 12, 2011 at 6:44 AM, Tommaso Teofili tommaso.teof...@gmail.com wrote: Hi all, I am porting a previously series of Solr plugins developed for 1.4.1 version to 3.1.0, I've written some integration tests extending the AbstractSolrTestCase [1] utility class but now it seems that wasn't included in the solr-core 3.1.0 artifact as it's in the solr/src/test directory. Was that a choice for the release or it's me missing something (or both)? Should I replace it with a different class with same scope or should I refactor my integration tests in a different way? Thanks in advance for any feedback. Hi Tommaso: this class (and other test code) was changed to depend upon lucene's test code... due to this it moved to src/test. The issue to make a solr test-framework jar file didnt make 3.1, https://issues.apache.org/jira/browse/SOLR-2061, however its committed to the 3.x branch. note: the class as is in solr 3.1 is un-extendable by an outside project I think, I cleaned up these classes some in SOLR-2061 and tested it all with an external project, so it should be ok now in the branch.
Re: XML not coming through from nabble to Gmail
Chris: Here's the nabble URL: http://lucene.472066.n3.nabble.com/Strip-spaces-and-new-line-characters-from-data-tp2795453p2795453.html The message in the Solr list is from alexei on 8-April. Strip spaces and newline characters from data. This started happening a couple (?) of weeks ago and I don't remember changing anything. Yeah, sure, they all say that This bit of XML that alexei included just doesn't come through to my gmail account, it'll be interesting to see if it makes it out fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.TrimFilterFactory / /fieldType Thanks, Erick On Mon, Apr 11, 2011 at 9:06 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I see the same problem (missing markup) in Thunderbird. Seems like Nabble : might be the culprit? if someone can cite some specific examples (by email message-id, or subject, or date+sender, or url from nabble, or url from any public archive, or anything more specific then posts from nabble containing xml) we can check the official apache mail archive which contains the raw message as recieved by ezmlm., such as.. http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201104.mbox/raw/%3cbanlktimcpthzalstrwhn3rtzpxdzkbo...@mail.gmail.com%3E -Hoss
Re: XML not coming through from nabble to Gmail
FWIW, I see the xml I just sent in gMail, so I'm guessing things are over on the nabble side, but I have very little evidence.. Erick P.S. It's not a huge deal, getting to the correct message on nabble is just a click away. But it is a bit annoying. On Tue, Apr 12, 2011 at 8:38 AM, Erick Erickson erickerick...@gmail.comwrote: Chris: Here's the nabble URL: http://lucene.472066.n3.nabble.com/Strip-spaces-and-new-line-characters-from-data-tp2795453p2795453.html The message in the Solr list is from alexei on 8-April. Strip spaces and newline characters from data. This started happening a couple (?) of weeks ago and I don't remember changing anything. Yeah, sure, they all say that This bit of XML that alexei included just doesn't come through to my gmail account, it'll be interesting to see if it makes it out fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.TrimFilterFactory / /fieldType Thanks, Erick On Mon, Apr 11, 2011 at 9:06 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : I see the same problem (missing markup) in Thunderbird. Seems like Nabble : might be the culprit? if someone can cite some specific examples (by email message-id, or subject, or date+sender, or url from nabble, or url from any public archive, or anything more specific then posts from nabble containing xml) we can check the official apache mail archive which contains the raw message as recieved by ezmlm., such as.. http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201104.mbox/raw/%3cbanlktimcpthzalstrwhn3rtzpxdzkbo...@mail.gmail.com%3E -Hoss
Re: DIH OutOfMemoryError?
Make sure streaming is on. -- how to check ? - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 5GB Xmx - Solr2 for Update-Request - delta every Minute - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-OutOfMemoryError-tp2759013p2811270.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrException: Unavailable Service
Hi, I did not want to hijack this thread ( http://www.mail-archive.com/solr-user@lucene.apache.org/msg34181.html) but I am experiencing the same exact problem mentioned here. To sum up the issue, I am getting intermittent Unavailable Service exception during indexing commit phase. I know that I am calling commit very often but I do not see any way around this. This is my situation, I am indexing a huge amount of documents using multiple instance of SolrJ client running on multiple servers. There is no way for me control when commit is called from these clients, so two different clients can call commit at the same time. I am not sure if I can/should use auto/timed commit because I need to know if a commit failed so I can rollback the batch that failed. What kind of options do I have? Should I try to catch the exception and keep trying to recommit until it goes through? I can see some potential of problems with this approach. Do I need to write a request broker to queue up all these commit and send them to solr one by one in a timely manner? Just wanted to know if anyone has a solution for this problem before I dive off the deep end. Thanks, Phong
RE: XML not coming through from nabble to Gmail
I've asked on Nabble if they know of a fix for the problem: http://nabble-support.1.n2.nabble.com/solr-dev-mailing-list-tp6023495p6264955.html Steve -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, April 12, 2011 8:43 AM To: Chris Hostetter Cc: solr-user@lucene.apache.org Subject: Re: XML not coming through from nabble to Gmail FWIW, I see the xml I just sent in gMail, so I'm guessing things are over on the nabble side, but I have very little evidence.. Erick P.S. It's not a huge deal, getting to the correct message on nabble is just a click away. But it is a bit annoying. On Tue, Apr 12, 2011 at 8:38 AM, Erick Erickson erickerick...@gmail.comwrote: Chris: Here's the nabble URL: http://lucene.472066.n3.nabble.com/Strip-spaces-and-new-line-characters- from-data-tp2795453p2795453.html The message in the Solr list is from alexei on 8-April. Strip spaces and newline characters from data. This started happening a couple (?) of weeks ago and I don't remember changing anything. Yeah, sure, they all say that This bit of XML that alexei included just doesn't come through to my gmail account, it'll be interesting to see if it makes it out fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.TrimFilterFactory / /fieldType Thanks, Erick On Mon, Apr 11, 2011 at 9:06 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : I see the same problem (missing markup) in Thunderbird. Seems like Nabble : might be the culprit? if someone can cite some specific examples (by email message-id, or subject, or date+sender, or url from nabble, or url from any public archive, or anything more specific then posts from nabble containing xml) we can check the official apache mail archive which contains the raw message as recieved by ezmlm., such as.. http://mail-archives.apache.org/mod_mbox/lucene-solr- user/201104.mbox/raw/%3cbanlktimcpthzalstrwhn3rtzpxdzkbo...@mail.gmail.com %3E -Hoss
Re: SolrException: Unavailable Service
If your commit from the client fails, you don't really know the state of your index anyway. All the threads you have sending documents to Solr are adding them to a single internal buffer. Committing flushes that buffer. So if thread 1 gets an error on commit, it will presumably have some documents from thread 2 in the commit. But thread 2 won't necessarily see the results. So I don't think your statement about needing to know if a commit fails is really On Tue, Apr 12, 2011 at 8:50 AM, Phong Dais phong.gd...@gmail.com wrote: Hi, I did not want to hijack this thread ( http://www.mail-archive.com/solr-user@lucene.apache.org/msg34181.html) but I am experiencing the same exact problem mentioned here. To sum up the issue, I am getting intermittent Unavailable Service exception during indexing commit phase. I know that I am calling commit very often but I do not see any way around this. This is my situation, I am indexing a huge amount of documents using multiple instance of SolrJ client running on multiple servers. There is no way for me control when commit is called from these clients, so two different clients can call commit at the same time. I am not sure if I can/should use auto/timed commit because I need to know if a commit failed so I can rollback the batch that failed. What kind of options do I have? Should I try to catch the exception and keep trying to recommit until it goes through? I can see some potential of problems with this approach. Do I need to write a request broker to queue up all these commit and send them to solr one by one in a timely manner? Just wanted to know if anyone has a solution for this problem before I dive off the deep end. Thanks, Phong
Re: SolrException: Unavailable Service
Sorry, fat fingers. Sent that last e-mail inadvertently. Anyway, if I have this correct, I'd recommend going to autocommit and NOT committing from the clients. That's usually the recommended procedure. This is especially true if you have a master/slave setup, because each commit from each client will trigger (potentially) a replication. Best Erick On Tue, Apr 12, 2011 at 9:07 AM, Erick Erickson erickerick...@gmail.comwrote: If your commit from the client fails, you don't really know the state of your index anyway. All the threads you have sending documents to Solr are adding them to a single internal buffer. Committing flushes that buffer. So if thread 1 gets an error on commit, it will presumably have some documents from thread 2 in the commit. But thread 2 won't necessarily see the results. So I don't think your statement about needing to know if a commit fails is really On Tue, Apr 12, 2011 at 8:50 AM, Phong Dais phong.gd...@gmail.com wrote: Hi, I did not want to hijack this thread ( http://www.mail-archive.com/solr-user@lucene.apache.org/msg34181.html) but I am experiencing the same exact problem mentioned here. To sum up the issue, I am getting intermittent Unavailable Service exception during indexing commit phase. I know that I am calling commit very often but I do not see any way around this. This is my situation, I am indexing a huge amount of documents using multiple instance of SolrJ client running on multiple servers. There is no way for me control when commit is called from these clients, so two different clients can call commit at the same time. I am not sure if I can/should use auto/timed commit because I need to know if a commit failed so I can rollback the batch that failed. What kind of options do I have? Should I try to catch the exception and keep trying to recommit until it goes through? I can see some potential of problems with this approach. Do I need to write a request broker to queue up all these commit and send them to solr one by one in a timely manner? Just wanted to know if anyone has a solution for this problem before I dive off the deep end. Thanks, Phong
Searching during postcommit
Hi, I have been trying to perform a search using a CommonsHttpSolrServer when my postCommit event listener is called. I am not able to find the documents just commited; the post in postCommit caused me to assume that I would; it seems that the commit only takes effect when all postCommit have returned. Am I missing something or is there another way I can do this? Thanks, Reeza
Re: function query apply only in the subset of the query
Try using AND (or set q.op): q=car+AND+_val_:marketValue On Apr 12, 2011, at 07:11 , Marco Martinez wrote: Hi everyone, My situation is the next, I need to sum the value of a field to the score to the docs returned in the query, but not to all the docs, example: q=car returns 3 docs 1- name=car ford marketValue=1 score=1.3 2- name=car citroen marketValue=2 score=1.3 3- name=car mercedes marketValue=0.5 score=1.3 but if want to sum the marketValue to the score, my returned list is the next: q=car+_val_:marketValue 1- name=bus marketValue=5 score=5 2- name=car citroen marketValue=2 score=3.3 3- name=car ford marketValue=1 score=2.3 4- name=car mercedes marketValue=0.5 score=1.8 Its possible to apply the function query only to the documents returned in the first query? Thanks in advance, Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42
Analysing all tokens in a stream
Hi I would like to build a component that during indexing analyses all tokens in a stream and adds metadata to a new field based on my analysis. I have different tasks that I would like to perform, like basic classification and certain more advanced phrase detections. How would I do this? A normal TokenFilter can only look at one token a time, but I need to access a larger context. I've noticed that there is a TeeSinkTokenFilter that might be useful in someway since It is also useful for doing things like entity extraction or proper noun analysis, but I don't understand how. Can someone help me with some super-simple stub or similar? What I'm looking for is something like: class MySmartFilter { public AnalyzeTokens(tokenList) { metadataTokens = DoTheAnalysis(tokenList); AddToField(metadata, metadataTokens); } } Any help is much appreciated! Thanks /Bjorn -- View this message in context: http://lucene.472066.n3.nabble.com/Analysing-all-tokens-in-a-stream-tp2811516p2811516.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: AbstractSolrTestCase and Solr 3.1.0
Thanks Robert, that was very useful :) Tommaso 2011/4/12 Robert Muir rcm...@gmail.com On Tue, Apr 12, 2011 at 6:44 AM, Tommaso Teofili tommaso.teof...@gmail.com wrote: Hi all, I am porting a previously series of Solr plugins developed for 1.4.1 version to 3.1.0, I've written some integration tests extending the AbstractSolrTestCase [1] utility class but now it seems that wasn't included in the solr-core 3.1.0 artifact as it's in the solr/src/test directory. Was that a choice for the release or it's me missing something (or both)? Should I replace it with a different class with same scope or should I refactor my integration tests in a different way? Thanks in advance for any feedback. Hi Tommaso: this class (and other test code) was changed to depend upon lucene's test code... due to this it moved to src/test. The issue to make a solr test-framework jar file didnt make 3.1, https://issues.apache.org/jira/browse/SOLR-2061, however its committed to the 3.x branch. note: the class as is in solr 3.1 is un-extendable by an outside project I think, I cleaned up these classes some in SOLR-2061 and tested it all with an external project, so it should be ok now in the branch.
Re: function query apply only in the subset of the query
Thanks but I tried this and I saw that this work in a standard scenario, but in my query i use a my own query parser and it seems that they dont doing the AND and returns all the docs in the index: My query: _query_:{!bm25}car AND _val_:marketValue - 67000 docs returned Solr query parser car AND _val_:marketValue - 300 docs returned Thanks, Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 2011/4/12 Erik Hatcher erik.hatc...@gmail.com Try using AND (or set q.op): q=car+AND+_val_:marketValue On Apr 12, 2011, at 07:11 , Marco Martinez wrote: Hi everyone, My situation is the next, I need to sum the value of a field to the score to the docs returned in the query, but not to all the docs, example: q=car returns 3 docs 1- name=car ford marketValue=1 score=1.3 2- name=car citroen marketValue=2 score=1.3 3- name=car mercedes marketValue=0.5 score=1.3 but if want to sum the marketValue to the score, my returned list is the next: q=car+_val_:marketValue 1- name=bus marketValue=5 score=5 2- name=car citroen marketValue=2 score=3.3 3- name=car ford marketValue=1 score=2.3 4- name=car mercedes marketValue=0.5 score=1.8 Its possible to apply the function query only to the documents returned in the first query? Thanks in advance, Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42
Re: XML not coming through from nabble to Gmail
: : Here's the nabble URL: : : http://lucene.472066.n3.nabble.com/Strip-spaces-and-new-line-characters-from-data-tp2795453p2795453.html : : The message in the Solr list is from alexei on 8-April. Strip spaces and : newline characters from data. And the raw message as recieved by apache... http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201104.mbox/raw/%3c1302272763508-2795453.p...@n3.nabble.com%3E ...no XML. So whatever the problem is it's the mail client (ie: nabble) -Hoss
Re: Updates during Optimize
On 4/12/2011 6:21 AM, stockii wrote: Hello. When is start an optimize (which takes more than 4 hours) no updates from DIH are possible. i thougt solr is copy the hole index and then start an optimize from the copy and not lock the index and optimize this ... =( any way to do both in the same time ? You can't index and optimize at the same time, and I'm pretty sure that there isn't any way to make it possible that wouldn't involve a major rewrite of Lucene, and possibly Solr. The devs would have to say differently if my understanding is wrong. The optimize takes place at the Lucene level. I can't give you much in-depth information, but I can give you some high level stuff. What it's doing is equivalent to a merge, down to one segment. This is not the same as a straight file copy. It must read the entire Lucene data structure and build a new one from scratch. The process removes deleted documents and will also upgrade the version number of the index if it was written with an older version of Lucene. It's very likely that the reading side of the process is nearly as comprehensive as the CheckIndex program, but it also has to write out a new index segment. The net result -- the process gives your CPU and especially your I/O subsystem a workout, simultaneously. If you were to make your I/O subsystem faster, you would probably see a major improvement in your optimize times. On my installation, it takes about 11 minutes to optimize one my 16GB shards, each with 9 million docs. These live in virtual machines that are stored on a six-drive RAID10 array using 7200RPM SATA disks. One of my pie-in-the-sky upgrade dreams is to replace that with a four-drive RAID10 array using SSD, the other two drives would be regular SATA -- a mirrored OS partition. Thanks, Shawn
Re: Updates during Optimize
You can index and optimize at the same time. The current limitation or pause is when the ram buffer is flushing to disk, however that's changing with the DocumentsWriterPerThread implementation, eg, LUCENE-2324. On Tue, Apr 12, 2011 at 8:34 AM, Shawn Heisey s...@elyograg.org wrote: On 4/12/2011 6:21 AM, stockii wrote: Hello. When is start an optimize (which takes more than 4 hours) no updates from DIH are possible. i thougt solr is copy the hole index and then start an optimize from the copy and not lock the index and optimize this ... =( any way to do both in the same time ? You can't index and optimize at the same time, and I'm pretty sure that there isn't any way to make it possible that wouldn't involve a major rewrite of Lucene, and possibly Solr. The devs would have to say differently if my understanding is wrong. The optimize takes place at the Lucene level. I can't give you much in-depth information, but I can give you some high level stuff. What it's doing is equivalent to a merge, down to one segment. This is not the same as a straight file copy. It must read the entire Lucene data structure and build a new one from scratch. The process removes deleted documents and will also upgrade the version number of the index if it was written with an older version of Lucene. It's very likely that the reading side of the process is nearly as comprehensive as the CheckIndex program, but it also has to write out a new index segment. The net result -- the process gives your CPU and especially your I/O subsystem a workout, simultaneously. If you were to make your I/O subsystem faster, you would probably see a major improvement in your optimize times. On my installation, it takes about 11 minutes to optimize one my 16GB shards, each with 9 million docs. These live in virtual machines that are stored on a six-drive RAID10 array using 7200RPM SATA disks. One of my pie-in-the-sky upgrade dreams is to replace that with a four-drive RAID10 array using SSD, the other two drives would be regular SATA -- a mirrored OS partition. Thanks, Shawn
Re: Fwd: machine tags, copy fields and pattern tokenizers
I'm not sure it's a 100% solution but the new path hierarchy tokenizer seems promising. I've only played with a little bit with a little too booze and not enough sleep (in the sky) so apologies for the potty-mouth-ness of this blog post. http://www.aaronland.info/weblog/2011/04/02/status/#sky Cheers, On 3/29/11 6:00 PM, sukhdev wrote: Hi, Was you able to solve machine tag problem in solr. Actually I am also looking if machine tags can be stored as index in solr and search in efficient way. Regards -- View this message in context: http://lucene.472066.n3.nabble.com/Fwd-machine-tags-copy-fields-and-pattern-tokenizers-tp506491p2751745.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr 1.30 Collection Distribution Search
I have 1 master, and 2 slaves setup with 1.30 collection distribution. My frontwed web application does query to the master, do I need to change any code in the web application to query on the slaves? or does the master requests query from the slaves automatcially? Please help thx.
Re: SolrException: Unavailable Service
Erick, My setup is not quite the way you described. I have multiple threads indexing simultaneously, but I only have 1 thread doing the commit after all indexing threads finished. I have multiple instances of this running each in their own java vm. I'm ok with throwing out all the docs indexed so far if the commit fail. I did not know that the recommended procedure is to use auto commit. I will explore this avenue. I was not aware of the master slave setup neither. The first thing that comes to mind is how do I know which docs did not get committed if the auto commit ever fails? What is the recommended procedure for handling failure? Any failed docs will need to be index at some point in the future. Thanks for the valuable inputs. Phong On Tue, Apr 12, 2011 at 9:09 AM, Erick Erickson erickerick...@gmail.comwrote: Sorry, fat fingers. Sent that last e-mail inadvertently. Anyway, if I have this correct, I'd recommend going to autocommit and NOT committing from the clients. That's usually the recommended procedure. This is especially true if you have a master/slave setup, because each commit from each client will trigger (potentially) a replication. Best Erick On Tue, Apr 12, 2011 at 9:07 AM, Erick Erickson erickerick...@gmail.com wrote: If your commit from the client fails, you don't really know the state of your index anyway. All the threads you have sending documents to Solr are adding them to a single internal buffer. Committing flushes that buffer. So if thread 1 gets an error on commit, it will presumably have some documents from thread 2 in the commit. But thread 2 won't necessarily see the results. So I don't think your statement about needing to know if a commit fails is really On Tue, Apr 12, 2011 at 8:50 AM, Phong Dais phong.gd...@gmail.com wrote: Hi, I did not want to hijack this thread ( http://www.mail-archive.com/solr-user@lucene.apache.org/msg34181.html) but I am experiencing the same exact problem mentioned here. To sum up the issue, I am getting intermittent Unavailable Service exception during indexing commit phase. I know that I am calling commit very often but I do not see any way around this. This is my situation, I am indexing a huge amount of documents using multiple instance of SolrJ client running on multiple servers. There is no way for me control when commit is called from these clients, so two different clients can call commit at the same time. I am not sure if I can/should use auto/timed commit because I need to know if a commit failed so I can rollback the batch that failed. What kind of options do I have? Should I try to catch the exception and keep trying to recommit until it goes through? I can see some potential of problems with this approach. Do I need to write a request broker to queue up all these commit and send them to solr one by one in a timely manner? Just wanted to know if anyone has a solution for this problem before I dive off the deep end. Thanks, Phong
Re: Solr 1.30 Collection Distribution Search
Yes. You need to put, say, a load balancer on front of your slaves and distribute the requests to the slave. Best Erick On Tue, Apr 12, 2011 at 2:20 PM, Li Tan litan1...@gmail.com wrote: I have 1 master, and 2 slaves setup with 1.30 collection distribution. My frontwed web application does query to the master, do I need to change any code in the web application to query on the slaves? or does the master requests query from the slaves automatcially? Please help thx.
Re: SolrException: Unavailable Service
See below: On Tue, Apr 12, 2011 at 2:21 PM, Phong Dais phong.gd...@gmail.com wrote: Erick, My setup is not quite the way you described. I have multiple threads indexing simultaneously, but I only have 1 thread doing the commit after all indexing threads finished. I have multiple instances of this running each in their own java vm. I'm ok with throwing out all the docs indexed so far if the commit fail. But this is really the same thing. On the back end, Solr is piping them all into a common index and that is where the autocommit happens. The fact that it's happening in separate JVMs doesn't alter the concept, you should let autocommit handle things. The problem here is knowing what hasn't indexed. I did not know that the recommended procedure is to use auto commit. I will explore this avenue. I was not aware of the master slave setup neither. The first thing that comes to mind is how do I know which docs did not get committed if the auto commit ever fails? What is the recommended procedure for handling failure? Any failed docs will need to be index at some point in the future. Assuming that you have a uniqueKey defined, you can look at the logs to see failures. Then you can choose to re-index all the documents that have changed around that time (backing up as far as you need to to be safe) . The key here is that you can re-index and the old copy (if any) will be replaced by the re-indexed copy. There's nothing really built into Solr that does this for you, you really have to build this part yourself. Best Erick Thanks for the valuable inputs. Phong On Tue, Apr 12, 2011 at 9:09 AM, Erick Erickson erickerick...@gmail.com wrote: Sorry, fat fingers. Sent that last e-mail inadvertently. Anyway, if I have this correct, I'd recommend going to autocommit and NOT committing from the clients. That's usually the recommended procedure. This is especially true if you have a master/slave setup, because each commit from each client will trigger (potentially) a replication. Best Erick On Tue, Apr 12, 2011 at 9:07 AM, Erick Erickson erickerick...@gmail.com wrote: If your commit from the client fails, you don't really know the state of your index anyway. All the threads you have sending documents to Solr are adding them to a single internal buffer. Committing flushes that buffer. So if thread 1 gets an error on commit, it will presumably have some documents from thread 2 in the commit. But thread 2 won't necessarily see the results. So I don't think your statement about needing to know if a commit fails is really On Tue, Apr 12, 2011 at 8:50 AM, Phong Dais phong.gd...@gmail.com wrote: Hi, I did not want to hijack this thread ( http://www.mail-archive.com/solr-user@lucene.apache.org/msg34181.html ) but I am experiencing the same exact problem mentioned here. To sum up the issue, I am getting intermittent Unavailable Service exception during indexing commit phase. I know that I am calling commit very often but I do not see any way around this. This is my situation, I am indexing a huge amount of documents using multiple instance of SolrJ client running on multiple servers. There is no way for me control when commit is called from these clients, so two different clients can call commit at the same time. I am not sure if I can/should use auto/timed commit because I need to know if a commit failed so I can rollback the batch that failed. What kind of options do I have? Should I try to catch the exception and keep trying to recommit until it goes through? I can see some potential of problems with this approach. Do I need to write a request broker to queue up all these commit and send them to solr one by one in a timely manner? Just wanted to know if anyone has a solution for this problem before I dive off the deep end. Thanks, Phong
Spellchecking in the Chinese Lanugage
Hi, I have been trying to get spellcheck to work in the Chinese language. So far I have not had any luck. Can someone shed some light here as a general guide line in terms of what need to happen? I am using the CJKAnalyzer in the text field type and searching works fine, but spelling does not work. Here are the things I have tried: 1. Put CJKAnalyzer in the textSpell field type. 2. Set the characterEncoding param to utf-8 in the spellcheck search component. 3. Using Luke, I can see the Chinese characters in the spell field in the main index. 4. After building the spelling index, I don't see Chinese characters in the spellchecker index, only terms in English. 5. Tried adding the NGramFilterFactory to the CJKAnalyzer with no luck either. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Spellchecking-in-the-Chinese-Lanugage-tp2812726p2812726.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing Flickr and Panaramio
Did this go to the list? I think I may need to resubscribe... Sent from my iPhone On Apr 12, 2011, at 12:55 AM, Estrada Groups estrada.adam.gro...@gmail.com wrote: Has anyone tried doing this? Got any tips for someone getting started? Thanks, Adam Sent from my iPhone
Re: Solr 1.30 Collection Distribution Search
Thanks Eric, I thought the master does automatically when you setup collection distribution. I wish there are more document for 1.3 collection distribution. Do you know how to show the slave stats on the Master admin page, the distribution tab? Thanks in advance guys. Sent from my iPhone On Apr 12, 2011, at 11:47 AM, Erick Erickson erickerick...@gmail.com wrote: Yes. You need to put, say, a load balancer on front of your slaves and distribute the requests to the slave. Best Erick On Tue, Apr 12, 2011 at 2:20 PM, Li Tan litan1...@gmail.com wrote: I have 1 master, and 2 slaves setup with 1.30 collection distribution. My frontwed web application does query to the master, do I need to change any code in the web application to query on the slaves? or does the master requests query from the slaves automatcially? Please help thx.
Re: Indexing Flickr and Panaramio
It did: http://search-lucene.com/?q=panaramio Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Estrada Groups estrada.adam.gro...@gmail.com To: Estrada Groups estrada.adam.gro...@gmail.com Cc: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tue, April 12, 2011 3:14:56 PM Subject: Re: Indexing Flickr and Panaramio Did this go to the list? I think I may need to resubscribe... Sent from my iPhone On Apr 12, 2011, at 12:55 AM, Estrada Groups estrada.adam.gro...@gmail.com wrote: Has anyone tried doing this? Got any tips for someone getting started? Thanks, Adam Sent from my iPhone
Re: Spellchecking in the Chinese Lanugage
Hi, Does spellchecking in Chinese actually make sense? I once asked a native Chinese speaker about that and the person told me it didn't really make sense. Anyhow, with n-grams, I don't think this could technically work even if it made sense for Chinese, could it? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: alexw aw...@crossview.com To: solr-user@lucene.apache.org Sent: Tue, April 12, 2011 3:07:48 PM Subject: Spellchecking in the Chinese Lanugage Hi, I have been trying to get spellcheck to work in the Chinese language. So far I have not had any luck. Can someone shed some light here as a general guide line in terms of what need to happen? I am using the CJKAnalyzer in the text field type and searching works fine, but spelling does not work. Here are the things I have tried: 1. Put CJKAnalyzer in the textSpell field type. 2. Set the characterEncoding param to utf-8 in the spellcheck search component. 3. Using Luke, I can see the Chinese characters in the spell field in the main index. 4. After building the spelling index, I don't see Chinese characters in the spellchecker index, only terms in English. 5. Tried adding the NGramFilterFactory to the CJKAnalyzer with no luck either. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Spellchecking-in-the-Chinese-Lanugage-tp2812726p2812726.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Searching during postcommit
If I follow things correctly, I think you should be seeing new documents only after the commit is done and the new index searcher is open and available for search. If you are searching before the new searcher is available, you are probably still hitting the old searcher. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Reeza Edah Tally re...@nova-hub.com To: solr-user@lucene.apache.org Sent: Tue, April 12, 2011 9:25:59 AM Subject: Searching during postcommit Hi, I have been trying to perform a search using a CommonsHttpSolrServer when my postCommit event listener is called. I am not able to find the documents just commited; the post in postCommit caused me to assume that I would; it seems that the commit only takes effect when all postCommit have returned. Am I missing something or is there another way I can do this? Thanks, Reeza
Re: Indexing Flickr and Panaramio
Hi, I did Flickr into Lucene about 3 years ago. There is a Flickr API, which covers almost everything you need (as I remember, not always Flickr feature was implemented at that time in the API, like the collection was not searchable). You can harvest by user ID or searching for a topic. You can use a language library (PHP, Java etc.) to wrap the details of communication. It is possible, that you would like to merge information into one entity before send to Solr (like merging the user, collection and set info into each pictures). The last step is to transform this information into a Solr document (again either directly or with a language library). I am not sure if it helps you, but if you ask more specific question, I try to answer. regards, Péter 2011/4/12 Estrada Groups estrada.adam.gro...@gmail.com: Has anyone tried doing this? Got any tips for someone getting started? Thanks, Adam Sent from my iPhone
Re: function query apply only in the subset of the query
On Tue, Apr 12, 2011 at 10:25 AM, Marco Martinez mmarti...@paradigmatecnologico.com wrote: Thanks but I tried this and I saw that this work in a standard scenario, but in my query i use a my own query parser and it seems that they dont doing the AND and returns all the docs in the index: My query: _query_:{!bm25}car AND _val_:marketValue - 67000 docs returned This would seem to point to your generated query {!bm25}car matching all docs for some reason? -Yonik http://www.lucenerevolution.org -- Lucene/Solr User Conference, May 25-26, San Francisco
Re: Spellchecking in the Chinese Lanugage
It doesn't make sense to spell check individual character sized words, but makes a lot of sense for phrases. Due to pervasive use of pinyin IM, it's very easy to write phrases that are totally wrong in semantics and but sounds correct. n-gram should work if it doesn't mangle the characters. On Tue, Apr 12, 2011 at 12:47 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, Does spellchecking in Chinese actually make sense? I once asked a native Chinese speaker about that and the person told me it didn't really make sense. Anyhow, with n-grams, I don't think this could technically work even if it made sense for Chinese, could it? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: alexw aw...@crossview.com To: solr-user@lucene.apache.org Sent: Tue, April 12, 2011 3:07:48 PM Subject: Spellchecking in the Chinese Lanugage Hi, I have been trying to get spellcheck to work in the Chinese language. So far I have not had any luck. Can someone shed some light here as a general guide line in terms of what need to happen? I am using the CJKAnalyzer in the text field type and searching works fine, but spelling does not work. Here are the things I have tried: 1. Put CJKAnalyzer in the textSpell field type. 2. Set the characterEncoding param to utf-8 in the spellcheck search component. 3. Using Luke, I can see the Chinese characters in the spell field in the main index. 4. After building the spelling index, I don't see Chinese characters in the spellchecker index, only terms in English. 5. Tried adding the NGramFilterFactory to the CJKAnalyzer with no luck either. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Spellchecking-in-the-Chinese-Lanugage-tp2812726p2812726.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: partial optimize does not reduce the segment number to maxNumSegments
Hi Hoss, thanks for your response... you are right I got a typo in my question, but I did use maxSegments, and here is the exactly url I used: curl 'http://localhost:8080/solr/97/update?optimize=truemaxSegments=10waitFlush=true' I used jconsole and du -sk to monitor each partial optimize, and I am sure the optimize was done and it always reduce segment files from 130+ to 65+ when I started with maxSegments=10; when I run again with maxSegments=9, it reduce to somewhere in 50. when I use maxSegments=2, it always reduce the segment to 18; and maxSegments=1 (full optimize) will always reduce the core to 10 segment files. this has been repeated for about dozen times. I think the resulting files number is depending on the size of the core. I have a core takes 10GB disk space, and it has 4 million documents. It perhaps also depends on other sole/lucene configurations? let me know if I should give you any data with our solr config. Here is the actual data from the test I run lately for your reference, you can see it definitely finished each partial optimize and the time spent is also included (please note I am using a core id there which is different from yours): /tmp # ls /xxx/solr/data/32455077/index | wc --- this is the start point, 150 seg files 150 150 946 /tmp # time curl 'http://localhost:8080/solr/32455077/update?optimize=truemaxSegments=10waitFlush=true' real0m36.050s user0m0.002s sys0m0.003s /tmp # ls /xxx/solr/data/32455077/index | wc- after first partial optimize (10), reduce to 82 82 82 746 /tmp # time curl 'http://localhost:8080/solr/32455077/update?optimize=truemaxSegments=9waitFlush=true' real1m54.364s user0m0.003s sys0m0.002s /tmp # ls /xxx/solr/data/32455077/index | wc 74 74 674 /tmp # time curl 'http://localhost:8080/solr/32455077/update?optimize=truemaxSegments=8waitFlush=true' real2m0.443s user0m0.002s sys0m0.003s /tmp # ls /xxx/solr/data/32455077/index | wc 66 66 602 /tmp # time curl 'http://localhost:8080/solr/32455077/update?optimize=truemaxSegments=7waitFlush=true' ?xml version=1.0 encoding=UTF-8? real3m22.201s user0m0.002s sys0m0ls /tmp # ls /xxx/solr/data/32455077/index | wc 58 58 530 /tmp # time curl 'http://localhost:8080/solr/32455077/update?optimize=truemaxSegments=6w real3m29.277s user0m0.001s sys0m0.004s /tmp # ls /xxx/solr/data/32455077/index | wc 50 50 458 /tmp # time curl 'http://localhost:8080/solr/32455077/update?optimize=truemaxSegments=5w real3m41.514s user0m0.003s sys0m0.003s /tmp # ls /xxx/solr/data/32455077/index | wc 42 42 386 /tmp # time curl 'http://localhost:8080/solr/32455077/update?optimize=truemaxSegments=4w real5m35.697s user0m0.003s sys0m0.004s /tmp # ls /xxx/solr/data/32455077/index | wc 34 34 314 /tmp # time curl 'http://localhost:8080/solr/32455077/update?optimize=truemaxSegments=3wa real7m8.773s user0m0.003s sys0m0.002s /tmp # ls /xxx/solr/data/32455077/index | wc 26 26 242 /tmp # time curl 'http://localhost:8080/solr/32455077/update?optimize=truemaxSegments=2w real9m18.814s user0m0.004s sys0m0.001s /tmp # ls /xxx/solr/data/32455077/index | wc 18 18 170 /tmp # time curl 'http://localhost:8080/solr/32455077/update?optimize=truemaxSegments=1w (full optimize) real16m6.599s user0m0.003s sys0m0.004s Disk Space Usage: first 3 runs took about 20% extra middle couple runs took about 50% extra last full optimize took 100% extra -- View this message in context: http://lucene.472066.n3.nabble.com/partial-optimize-does-not-reduce-the-segment-number-to-maxNumSegments-tp2682195p2812415.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Spellchecking in the Chinese Lanugage
Thanks Otis and Luke. Yes it does make sense to spellcheck phrases in Chinese. Looks like the default Solr spellCheck component is already doing some kind of NGram-ing. When examining the spellCheck index, I did see gram1, gram2, gram3, gram4... The problem is no Chinese terms were indexed into the spellChecker index, only English terms. Regards, Alex -- View this message in context: http://lucene.472066.n3.nabble.com/Spellchecking-in-the-Chinese-Lanugage-tp2812726p2813149.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: partial optimize does not reduce the segment number to maxNumSegments
: /tmp # ls /xxx/solr/data/32455077/index | wc --- this is the start point, 150 seg files : 150 150 946 : /tmp # time curl the number of files i nthe index directory is not the number of segments the number of segments is an internal lucene concept that impacts the the number of files, but it is not an actual file count. A segment can consist of multiple files depending on how your schema.xml is configured (and wether you are using the compound file format) You can see the current number of segments by looking at the stats page... http://localhost:8983/solr/admin/stats.jsp SolrIndexReader{this=64a7c45e,r=ReadOnlyDirectoryReader@64a7c45e,refCnt=1,segments=10} ...that's from the solr example, where the index directory at the time of that request actually contained 93 files. -Hoss
Re: Indexing Flickr and Panaramio
Thanks Peter! I am thinking that I may just use Nutch to do the crawl and index off of these sites. I need to check out the APIs for each to make sure I'm not missing anything related to the geospatial data for each image. Obviously both do the extraction when the images are uploaded so I'm guessing that it's also stored somewhere too ;-) Adam Sent from my iPhone On Apr 12, 2011, at 4:00 PM, Péter Király kirun...@gmail.com wrote: Hi, I did Flickr into Lucene about 3 years ago. There is a Flickr API, which covers almost everything you need (as I remember, not always Flickr feature was implemented at that time in the API, like the collection was not searchable). You can harvest by user ID or searching for a topic. You can use a language library (PHP, Java etc.) to wrap the details of communication. It is possible, that you would like to merge information into one entity before send to Solr (like merging the user, collection and set info into each pictures). The last step is to transform this information into a Solr document (again either directly or with a language library). I am not sure if it helps you, but if you ask more specific question, I try to answer. regards, Péter 2011/4/12 Estrada Groups estrada.adam.gro...@gmail.com: Has anyone tried doing this? Got any tips for someone getting started? Thanks, Adam Sent from my iPhone
Vetting Our Architecture: 2 Repeaters and Slaves.
I am hoping to get some feedback on the architecture I've been planning for a medium to high volume site. This is my first time working with Solr, so I want to be sure what I'm planning isn't totally weird, unsupported, etc. We've got a a pair of F5 loadbalancers and 4 hosts. 2 of those hosts will be repeaters (master+slave), and 2 of those hosts will be pure slaves. One of the F5 vips, Index-vip will have members HOST1 and HOST2, but HOST2 will be downed and not taking traffic from that vip. The second vip, Search-vip will have 3 members: HOST2, HOST3, and HOST4. The Index-vip is intended to be used to post and commit index changes. The Search-vip is intended to be customer facing. Here is some ASCII art. The line with the X's thru it denotes a downed member of a vip, one that isn't taking any traffic. The M: denotes the value in the solrconfig.xml that the host uses as the master. Index-vip Search-vip / \ / | \ / X /|\ / \ / | \ / X / | \ / \ / | \ / X /|\ / \ / | \ HOST1 HOST2 HOST3 HOST4 REPEATERREPEATERSLAVE SLAVE M:Index-vipM:Index-vip M:Index-vip M:Index-vip I've been working through a couple failure scenarios. Recovering from a failure of HOST2, HOST3, or HOST4 is pretty straightforward. Loosing HOST1 is my major concern. My plan for recovering from a failure of HOST1 is as follows: Enable HOST2 as a member of the Index-vip, while disabling member HOST1. HOST2 effectively becomes the Master. HOST2, 3, and 4 continue fielding customer requests and pulling indexes from Index-vip. Since HOST2 is now in charge of crunching indexes and fielding customer requests, I assume load will increase on that box. When we recover HOST1, we will simply make sure it has replicated against Index-vip and then re-enable HOST1 as a member of the Index-vip and disable HOST2. Hopefully this makes sense. If all goes correctly, I've managed to keep all services up and running without loosing any index data. So, I have a few questions: 1. Has anyone else tried this dual repeater approach? 2. Am I going to have any semaphore/blocking issues if a repeater is pulling index data from itself? 3. Is there a better way to do this? Thanks, Parker
Re: Vetting Our Architecture: 2 Repeaters and Slaves.
I think the repeaters are misleading you a bit here. The purpose of a repeater is usually to replicate across a slow network, say in a remote data center, then slaves at that center can get more timely updates. I don't think they add anything to your disaster recovery scenario. So I'll ignore repeaters for a bit here. The only difference between a master and a slave is a bit of configuration, and usually you'll allocate, say, memory differently on the two machines when you start the JVM. You might disable caches on the master (since they're used for searching). You may.. Let's say I have master M, and slaves S1, S2, S3. The slaves have an up-to-date index as of the last replication (just like your repeater would have). If any slave goes down, you can simply bring up another machine as a slave, point it at your master, wait for replication on that slave and then let your load balancer know it's there. This is the HOST2-4 failure you outlined Should the master fail you have two choices, depending upon how long you can wait for *new* content to be searchable. Let's say you can wait half a day in this situation. Spin up a new machine, copy the index over from one of the slaves (via a simple copy or by replicating). Point your indexing process at the master, point your slaves at the master for replication and you're done. Let's say you can't wait very long at all (and remember this better be quite a rare event). Then you could take a slave (let's say S1) it out of the loop that serves searches. Copy in the configuration files you use for your masters to it, point the indexer and searchers at it and you're done. Now spin up a new slave as above and your old configuration is back. Note that in two of these cases, you temporarily have 2 slaves doing the work that 3 used to, so a bit of over-capacity may be in order. But a really good question here is how to be sure all your data is in your index. After all, the slaves (and repeater for that matter) are only current up to the last replication. The simplest thing to do is simply re-index everything from the last known commit point. Assuming you have a uniqueKey defined, if you index documents that are already in the index, they'll just be replaced, no harm done. So let's say your replication interval is 10 minutes (picking a number from thin air). When your system is back and you restart your indexer, restart indexing from, say, the time you noticed your master went down - 1 hour as the restart point for your indexer. You can be more deterministic than this by examining the log on the machine you're using to replace the master with and noting the last replication time and subtract your hour (or whatever) from that. Anyway, hope I haven't confused you unduly! The take-away is that a that a slave can be made into a master as fast as a repeater can, the replication process is the same and I just don't see what a repeater buys you in the scenario you described. Best Erick On Tue, Apr 12, 2011 at 6:33 PM, Parker Johnson parker_john...@gap.comwrote: I am hoping to get some feedback on the architecture I've been planning for a medium to high volume site. This is my first time working with Solr, so I want to be sure what I'm planning isn't totally weird, unsupported, etc. We've got a a pair of F5 loadbalancers and 4 hosts. 2 of those hosts will be repeaters (master+slave), and 2 of those hosts will be pure slaves. One of the F5 vips, Index-vip will have members HOST1 and HOST2, but HOST2 will be downed and not taking traffic from that vip. The second vip, Search-vip will have 3 members: HOST2, HOST3, and HOST4. The Index-vip is intended to be used to post and commit index changes. The Search-vip is intended to be customer facing. Here is some ASCII art. The line with the X's thru it denotes a downed member of a vip, one that isn't taking any traffic. The M: denotes the value in the solrconfig.xml that the host uses as the master. Index-vip Search-vip / \ / | \ / X /|\ / \ / | \ / X / | \ / \ / | \ / X /|\ / \ / | \ HOST1 HOST2 HOST3 HOST4 REPEATERREPEATERSLAVE SLAVE M:Index-vipM:Index-vip M:Index-vip M:Index-vip I've been working through a couple failure scenarios. Recovering from a failure of HOST2, HOST3, or HOST4 is pretty straightforward. Loosing HOST1 is my major concern. My plan for recovering from a failure of HOST1 is as follows: Enable HOST2 as a member of the Index-vip, while disabling member HOST1. HOST2 effectively becomes the Master. HOST2, 3, and 4 continue fielding customer requests and pulling indexes from Index-vip. Since HOST2 is now in charge of crunching
Re: Solr and Permissions
ManifoldCF sounds like it might be the right solution, so long as it's not secretly building a filter query in the back end, otherwise it will hit the same limits. In the meantime, I have made a minor improvement to my filter query; it now scans the permitted IDs and attempts to build a filter query using ranges (e.g. instead of 1 OR 2 OR 3 it will filter using [1 TO 3]) which will hopefully keep me going in the meantime. Liam On 12 March 2011 01:46, go canal goca...@yahoo.com wrote: Thank you Jan, I will take a look at the MainfoldCF. So it seems that the solution is basically to implement something outside of Solr for permission control. thanks, canal From: Jan Høydahl jan@cominvent.com To: solr-user@lucene.apache.org Sent: Fri, March 11, 2011 4:17:22 PM Subject: Re: Solr and Permissions Hi, Talk to the ManifoldCF guys - they have successfully implemented support for document level security for many repositories including CMC/ECMs and may have some hints for you to write your own Authority connector against your system, which will fetch the ACL for the document and index it with the document itself. This eliminates long query-time filters. Re-indexing content for which ACLs have changed is a very common way of doing this, and you should not worry too much about performance implications before there is a real issue. In real world, you don't change folder permissions very often, and that will be a cost you'll have to live with. If you worry that this lag between repository state and index state may cause people to see content they are not entitled to, it is possible to do late binding filtering of the result set as well, but I would avoid that if possible. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 11. mars 2011, at 06.48, go canal wrote: To be fair, I think there is a slight difference between a Content Management and a Search Engine. Access control at per document level, per type level, supporting dynamic role changes, etc.are more like content management use cases; where search solution like Solr focuses on different set of use cases; But in real world, any content management systems need full text search; so the question is to how to support search with permission control. JackRabbit integrated with Lucene/Tika, this could be one solution but I do not know its performance and scalability; CouchDB also integrates with Lucene/Tika, another option? I have yet to see a Search Engine that provides some sort of Content Management features like we are discussing here (Solr, Elastic Search ?) Then the last option is probably to build an application that works with a document repository with all necessary content management features and Solr which provides search capability; and handling the permissions outside Solr? thanks, canal From: Liam O'Boyle liam.obo...@intelligencebank.com To: solr-user@lucene.apache.org Cc: go canal goca...@yahoo.com Sent: Fri, March 11, 2011 2:28:19 PM Subject: Re: Solr and Permissions As Canal points out, grouping into types is not always possible. In our case, permissions are not on a per-type level, but either on a per folder (of which there can be hundreds) or per item in some cases (of which there can be... any number at all). Reindexing is also to slow to really be an option; some of the items use Tika to extract content, which means that we need to reextract the content (variable length of time; average is about half a second, but on some documents it will sit there until the connection times out) . Querying it, modifying then resubmitting without rerunning content extraction is still faster, but involves sending even more data over the network; either way is relatively slow. Liam On 11 March 2011 16:24, go canal goca...@yahoo.com wrote: I have similar requirements. Content type is one solution; but there are also other use cases where this not enough. Another requirement is, when the access permission is changed, we need to update the field - my understanding is we can not unless re-index the whole document again. Am I correct? thanks, canal From: Sujit Pal sujit@comcast.net To: solr-user@lucene.apache.org Sent: Fri, March 11, 2011 10:39:27 AM Subject: Re: Solr and Permissions How about assigning content types to documents in the index, and map users to a set of content types they are allowed to access? That way you will pass in fewer parameters in the fq. -sujit On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote: Morning, We use solr to index a range of content to which, within our application, access is restricted by a system of user groups and permissions. In order to ensure that search results don't reveal information about items which the user doesn't have access to, we need to
Re: Vetting Our Architecture: 2 Repeaters and Slaves.
Hi Parker, Lovely ASCII art. :) Yes, I think you can simplify this by introducing shared storage (e.g., SAN) that hosts the index to which you active/primary master writes. When your primary master dies, you start your stand-by master that is configured to point to the same index. If there are any left-over index locks from the primary master, they can be removed (these is a property for that in solrconfig.xml) when Solr starts. Your Index VIP can then be pointed to the the new master. Slaves talk to the master via Index VIP, so they hardly notice this. And since the index is on the SAN, your slaves could actually point to that same index and avoid the whole replication process, thus removing one more moving piece, plus eliminating OS cache-unfriendly disk IO caused by index replication as a bonus feature. Repeaters are handy for DR (replication to the second DC) or when you have so many slaves that their (very frequent) replication requests and actual index replication are too much for a single master, but it doesn't sound like you need them here unless you really want to have your index or even mirror the whole cluster setup in a second DC. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Parker Johnson parker_john...@gap.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tue, April 12, 2011 6:33:08 PM Subject: Vetting Our Architecture: 2 Repeaters and Slaves. I am hoping to get some feedback on the architecture I've been planning for a medium to high volume site. This is my first time working with Solr, so I want to be sure what I'm planning isn't totally weird, unsupported, etc. We've got a a pair of F5 loadbalancers and 4 hosts. 2 of those hosts will be repeaters (master+slave), and 2 of those hosts will be pure slaves. One of the F5 vips, Index-vip will have members HOST1 and HOST2, but HOST2 will be downed and not taking traffic from that vip. The second vip, Search-vip will have 3 members: HOST2, HOST3, and HOST4. The Index-vip is intended to be used to post and commit index changes. The Search-vip is intended to be customer facing. Here is some ASCII art. The line with the X's thru it denotes a downed member of a vip, one that isn't taking any traffic. The M: denotes the value in the solrconfig.xml that the host uses as the master. Index-vip Search-vip / \ / | \ / X /| \ / \ / | \ / X/ | \ / \ /| \ / X / |\ / \ / | \ HOST1 HOST2 HOST3 HOST4 REPEATER REPEATERSLAVE SLAVE M:Index-vipM:Index-vip M:Index-vip M:Index-vip I've been working through a couple failure scenarios. Recovering from a failure of HOST2, HOST3, or HOST4 is pretty straightforward. Loosing HOST1 is my major concern. My plan for recovering from a failure of HOST1 is as follows: Enable HOST2 as a member of the Index-vip, while disabling member HOST1. HOST2 effectively becomes the Master. HOST2, 3, and 4 continue fielding customer requests and pulling indexes from Index-vip. Since HOST2 is now in charge of crunching indexes and fielding customer requests, I assume load will increase on that box. When we recover HOST1, we will simply make sure it has replicated against Index-vip and then re-enable HOST1 as a member of the Index-vip and disable HOST2. Hopefully this makes sense. If all goes correctly, I've managed to keep all services up and running without loosing any index data. So, I have a few questions: 1. Has anyone else tried this dual repeater approach? 2. Am I going to have any semaphore/blocking issues if a repeater is pulling index data from itself? 3. Is there a better way to do this? Thanks, Parker