Re: how to get row no. of current record
On Wed, Aug 3, 2011 at 9:35 AM, Ranveer ranveer.s...@gmail.com wrote: Hi Anshum, Thanks for reply. My requirement is to get result start from current id. For this I need to set start rows. I am looking something like Jonty's post : http://lucene.472066.n3.nabble.com/previous-and-next-rows-of-current-record-td3187935.html [...] Jonathan's replies (the last two) in that thread pretty much tell one what to do: Your web. app has to maintain track of where you are in paging through the list of results. This should not be overly difficult to implement. Regards, Gora
Re: Why Slop doens't match anything?
On Wed, Aug 3, 2011 at 1:33 AM, Alexander Ramos Jardim alexander.ramos.jar...@gmail.com wrote: [...] I am not using dismax. I didn't find the solution for the problem. I just made a full-import and the problem ended. Still odd. [...] Maybe you changed the type of the field in question, or changed positionIncrementGap for the field, in between? Regards, Gora
Re: Jetty error message regarding EnvEntry in WebAppContext
On Tue, Aug 2, 2011 at 18:42, Jonathan Rochkind rochk...@jhu.edu wrote: You know that Solr distro comes with a jetty with a Solr in it, right, as an example application? Even if you don't want to use it for some reason, that would probably be the best model to look at for a working jetty with solr. Sure, I know about the pre-configured Jetty and that one runes fine on the command line. Or is the problem that you want a different version of jetty? What I actually wanted is a robust background service with init script. As it happens, I just recently set up a jetty 6.1.26 for another project, not for solr. It was kind of a pain not being too familiar with java deployment or jetty. But I did get JDNI working, by following the jetty instructions here: http://docs.codehaus.org/display/JETTY/JNDI (It was a bit confusing to figure out what they were talking about not being familiar with jetty, but eventually I got it, and the instructions were correct.) I can imagine. I'll probably try to hand that task over to someone who does have a clue. :) Thanks for your response! Marian
Re: performance crossover between single index and sharding
On 02.08.2011 21:00, Shawn Heisey wrote: ... I did try some early tests with a single large index. Performance was pretty decent once it got warmed up, but I was worried about how it would perform under a heavy load, and how it would cope with frequent updates. I never really got very far with testing those fears, because the full rebuild time was unacceptable - at least 8 hours. The source database can keep up with six DIH instances reindexing at once, which completes much quicker than a single machine grabbing the entire database. I may increase the number of shards after I remove virtualization, but I'll need to fix a few limitations in my build system. ... At first, thanks a lot to all answers and here is my setup. I know that it is very difficult to give specific recommendations about this. Because of changing from FAST Search to Solr I can state that Solr performs very well, if not excellent. To show that I compare apples and oranges here are my previous FAST Search setup: - one master server (controlling, logging, search dispatcher) - six index server (4.25 mio docs per server, 5 slices per index) (searching and indexing at the same time, indexing once per week during the weekend) - each server has 4GB RAM, all servers are physical on seperate machines - RAM usage controlled by the processes - total of 25.5 mio. docs (mainly metadata) from 1500 databases worldwide - index size is about 67GB per indexer -- about 402GB total - about 3 qps at peek times - with average search time of 0.05 seconds at peek times And here is now my current Solr setup: - one master server (indexing only) - two slave server (search only) but only one is online, the second is fallback - each server has 32GB RAM, all server are virtuell (master on a seperate physical machine, both slaves together on a physical machine) - RAM usage is currently 20GB to java heap - total of 31 mio. docs (all metadata) from 2000 databases worldwide - index size is 156GB total - search handler statistic report 0.6 average requests per second - average time per request 39.5 (is that seconds?) - building the index from scratch takes about 20 hours The good thing is I have the ability to compare a commercial product and enterprise system to open source. I started with my simple Solr setup because of kiss (keep it simple and stupid). Actually it is doing excellent as single index on a single virtuell server. But the average time per request should be reduced now, thats why I started this discussion. While searches with smaller Solr index size (3 mio. docs) showed that it can stand with FAST Search it now shows that its time to go with sharding. I think we are already far behind the point of search performance crossover. What I hope to get with sharding: - reduce time for building the index - reduce average time per request What I fear with sharding: - i currently have master/slave, do I then have e.g. 3 master and 3 slaves? - the query changes because of sharding (is there a search distributor?) - how to distribute the content the indexer with DIH on 3 server? - anything else to think about while changing to sharding? Conclusion: - Solr can handle much more than 30 mio. docs of metadata in a single index if java heap size is large enough. Have an eye on Lucenes fieldCache and sorted fields, especially title (string) fields. - The crossover in my case is somewhere between 3 mio. and 10 mio. docs per index for Solr (compared to FAST Search). FAST recommends about 3 to 6 mio. docs per 4GB RAM server for their system. Anyone able to reduce my fears about sharding? Thanks again for all your answers. Regards Bernd -- * BASE - Bielefeld Academic Search Engine - www.base-search.net *
Re: performance crossover between single index and sharding
OK, here is a brief on our sharded setup. We have 10 shards, 3 per high-end Amazon machine. Majority of the searches are done on 2 shards at most, that have the latest data in their indices. We use logical sharding, not hash based. These two lead to a situation, where given a user query that *will for sure* hit the 2 last (or adjacent in time) shards, other solr shards would have to search in vain. Therefore, we have implemented the query router, which is essentially solr itself with modifications in the QueryComponent. Before implementing the router it was nearly impossible to run the system. Why did we do the sharding? Simply because we started to see a lot OOM exceptions, and various other instability issues. Also we had to rebuild the index very often due to changes in the preceeding pipeline. Therefore distributing over shards was another asset for us in the sense, that reindexing could be carried out in parallel. On top of that, which is certainly not least, our search became faster, the slimmer we kept the shards. We don't yet have master / slave architecture, as this is done when the user base grows. We started with growing amounts of data, therefore came horizontal scaling. Regards, Dmitry Kan twitter.com/DmitryKan On Wed, Aug 3, 2011 at 12:24 PM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: On 02.08.2011 21:00, Shawn Heisey wrote: ... I did try some early tests with a single large index. Performance was pretty decent once it got warmed up, but I was worried about how it would perform under a heavy load, and how it would cope with frequent updates. I never really got very far with testing those fears, because the full rebuild time was unacceptable - at least 8 hours. The source database can keep up with six DIH instances reindexing at once, which completes much quicker than a single machine grabbing the entire database. I may increase the number of shards after I remove virtualization, but I'll need to fix a few limitations in my build system. ... At first, thanks a lot to all answers and here is my setup. I know that it is very difficult to give specific recommendations about this. Because of changing from FAST Search to Solr I can state that Solr performs very well, if not excellent. To show that I compare apples and oranges here are my previous FAST Search setup: - one master server (controlling, logging, search dispatcher) - six index server (4.25 mio docs per server, 5 slices per index) (searching and indexing at the same time, indexing once per week during the weekend) - each server has 4GB RAM, all servers are physical on seperate machines - RAM usage controlled by the processes - total of 25.5 mio. docs (mainly metadata) from 1500 databases worldwide - index size is about 67GB per indexer -- about 402GB total - about 3 qps at peek times - with average search time of 0.05 seconds at peek times And here is now my current Solr setup: - one master server (indexing only) - two slave server (search only) but only one is online, the second is fallback - each server has 32GB RAM, all server are virtuell (master on a seperate physical machine, both slaves together on a physical machine) - RAM usage is currently 20GB to java heap - total of 31 mio. docs (all metadata) from 2000 databases worldwide - index size is 156GB total - search handler statistic report 0.6 average requests per second - average time per request 39.5 (is that seconds?) - building the index from scratch takes about 20 hours The good thing is I have the ability to compare a commercial product and enterprise system to open source. I started with my simple Solr setup because of kiss (keep it simple and stupid). Actually it is doing excellent as single index on a single virtuell server. But the average time per request should be reduced now, thats why I started this discussion. While searches with smaller Solr index size (3 mio. docs) showed that it can stand with FAST Search it now shows that its time to go with sharding. I think we are already far behind the point of search performance crossover. What I hope to get with sharding: - reduce time for building the index - reduce average time per request What I fear with sharding: - i currently have master/slave, do I then have e.g. 3 master and 3 slaves? - the query changes because of sharding (is there a search distributor?) - how to distribute the content the indexer with DIH on 3 server? - anything else to think about while changing to sharding? Conclusion: - Solr can handle much more than 30 mio. docs of metadata in a single index if java heap size is large enough. Have an eye on Lucenes fieldCache and sorted fields, especially title (string) fields. - The crossover in my case is somewhere between 3 mio. and 10 mio. docs per index for Solr (compared to FAST Search). FAST recommends about 3 to 6 mio. docs per 4GB RAM server for their system. Anyone able to reduce my fears about
Dispatching a query to multiple different cores
Hello there! I have a multicore solr with 6 different simple cores and somewhat different schemas and I defined another meta core which I would it to be a dispatcher: the requests are sent to simple cores and results are aggregated before sending back the results to the user. Any idea or hints how can I achieve this? I am wondering whether writing custom SearchComponent or a custom SearchHandler are good entry points? Is it possible to acces other SolrCore which are in the same container as the meta core? Many thanks for your help. Boubaker
Re: Why Slop doens't match anything?
Hm... No. 2011/8/3 Gora Mohanty g...@mimirtech.com On Wed, Aug 3, 2011 at 1:33 AM, Alexander Ramos Jardim alexander.ramos.jar...@gmail.com wrote: [...] I am not using dismax. I didn't find the solution for the problem. I just made a full-import and the problem ended. Still odd. [...] Maybe you changed the type of the field in question, or changed positionIncrementGap for the field, in between? Regards, Gora -- Alexander Ramos Jardim
Re: Why Slop doens't match anything?
Hm... No. Can you paste output of debugQuery=on for two queries?
RE: Joining on multi valued fields
Hi Yonik Sorry for my late reply. I have been trying to get to the bottom of this but I'm getting inconsistent behaviour. Here's an example: Query = pi:rcs100 - Here going to use pid_rcs as join value result name=response numFound=1 start=0 doc str name=pircs100/str str name=ctrcs/str str name=pid_rcsG1/str str name=name_rcsEmerging Market Countries/str str name=definition_rcsAll business events relating to companies and other issuers of securities./str /doc /result /response Query = code:G1 - See how many docs have G1 in their code field. Notice that code is multi valued - result name=response numFound=2 start=0 - doc str name=ctcat/str date name=maindocdate2011-04-22T05:48:57Z/date str name=pinCIF3wGpXk+1029782/str - arr name=code strG1/str strG7U/str strGK/str strME7/str strME8/str strMN/str strMR/str /arr /doc - doc str name=ctcat/str date name=maindocdate2011-04-22T05:48:57Z/date str name=pinCIF7YcLP+1029782/str - arr name=code strG1/str strG7U/str strGK/str strME7/str strME8/str strMN/str strMR/str /arr /doc /result /response Now for the join: http://10.15.39.137:8983/solr/file/select?q={!join from=pid_rcs to=code}pi:rcs100 - result name=response numFound=3 start=0 - doc str name=ctcat/str date name=maindocdate2011-04-22T05:48:57Z/date str name=pinCIF3wGpXk+1029782/str - arr name=code strG1/str strG7U/str strGK/str strME7/str strME8/str strMN/str strMR/str /arr /doc - doc str name=ctcat/str date name=maindocdate2011-04-22T05:48:57Z/date str name=pinCIF7YcLP+1029782/str - arr name=code strG1/str strG7U/str strGK/str strME7/str strME8/str strMN/str strMR/str /arr /doc - doc str name=ctcat/str date name=maindocdate2011-04-22T05:48:58Z/date str name=pinCN1763203+1029782/str - arr name=code strA2/str strA5/str strA9/str strAN/str strB125/str strB126/str strB130/str strBL63/str strG41/str strGK/str strMZ/str /arr /doc /result /response So as you can see I get back 3 results when only 2 match the criteria. i.e. docs where G1 is present in multi valued code field. Why should the last document be included in the result of the join? Thank you, Matt -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: 01 August 2011 18:28 To: solr-user@lucene.apache.org Subject: Re: Joining on multi valued fields On Mon, Aug 1, 2011 at 12:58 PM, matthew.fow...@thomsonreuters.com wrote: I have been using the JOIN patch https://issues.apache.org/jira/browse/SOLR-2272 with great success. However I have hit a case where it doesn't seem to be working. It doesn't seem to work when joining to a multi-valued field. That should work (and the unit tests do test with multi-valued fields). Can you come up with a simple example where you are not getting the expected results? -Yonik http://www.lucidimagination.com This email was sent to you by Thomson Reuters, the global news and information company. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Thomson Reuters.
Re: changing the root directory where solrCloud stores info inside zookeeper File system
Thanks - let me try and do this here manually later today and get back to you. - Mark On Aug 2, 2011, at 7:41 AM, Yatir Ben Shlomo wrote: Thanks A lot mark, Since My SolrCloud code was old I tried downloading and building the newest code from here https://svn.apache.org/repos/asf/lucene/dev/trunk/ I am using tomcat6 I manually created the sc sub-directory in my zooKeeper ensemble file-system I used this connection String to my ZK ensemble zook1:2181/sc,zook2:2181/sc,zook3:2181/sc but I still get the same problem here is the entire catalina.out log with the exception Using CATALINA_BASE: /opt/tomcat6 Using CATALINA_HOME: /opt/tomcat6 Using CATALINA_TMPDIR: /opt/tomcat6/temp Using JRE_HOME:/usr/java/default/ Using CLASSPATH: /opt/tomcat6/bin/bootstrap.jar Java HotSpot(TM) 64-Bit Server VM warning: Failed to reserve shared memory (errno = 12). Aug 2, 2011 4:28:46 AM org.apache.catalina.core.AprLifecycleListener init INFO: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: /usr/java/jdk1.6.0_21/jre/lib/amd64/server:/usr/java/jdk1.6.0_21/jre/lib/a md64:/usr/java/jdk1.6.0_21/jre/../lib/amd64:/usr/java/packages/lib/amd64:/ usr/lib64:/lib64:/lib:/usr/lib Aug 2, 2011 4:28:46 AM org.apache.coyote.http11.Http11Protocol init INFO: Initializing Coyote HTTP/1.1 on http-8983 Aug 2, 2011 4:28:46 AM org.apache.coyote.http11.Http11Protocol init INFO: Initializing Coyote HTTP/1.1 on http-8080 Aug 2, 2011 4:28:46 AM org.apache.catalina.startup.Catalina load INFO: Initialization processed in 448 ms Aug 2, 2011 4:28:46 AM org.apache.catalina.core.StandardService start INFO: Starting service Catalina Aug 2, 2011 4:28:46 AM org.apache.catalina.core.StandardEngine start INFO: Starting Servlet Engine: Apache Tomcat/6.0.29 Aug 2, 2011 4:28:46 AM org.apache.catalina.startup.HostConfig deployDescriptor INFO: Deploying configuration descriptor solr1.xml Aug 2, 2011 4:28:46 AM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: Using JNDI solr.home: /home/tomcat/solrCloud1 Aug 2, 2011 4:28:46 AM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to '/home/tomcat/solrCloud1/' Aug 2, 2011 4:28:46 AM org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() Aug 2, 2011 4:28:46 AM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: Using JNDI solr.home: /home/tomcat/solrCloud1 Aug 2, 2011 4:28:46 AM org.apache.solr.core.CoreContainer$Initializer initialize INFO: looking for solr.xml: /home/tomcat/solrCloud1/solr.xml Aug 2, 2011 4:28:46 AM org.apache.solr.core.CoreContainer init INFO: New CoreContainer 853527367 Aug 2, 2011 4:28:46 AM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: Using JNDI solr.home: /home/tomcat/solrCloud1 Aug 2, 2011 4:28:46 AM org.apache.solr.core.SolrResourceLoader init INFO: Solr home set to '/home/tomcat/solrCloud1/' Aug 2, 2011 4:28:46 AM org.apache.solr.cloud.SolrZkServerProps getProperties INFO: Reading configuration from: /home/tomcat/solrCloud1/zoo.cfg Aug 2, 2011 4:28:46 AM org.apache.solr.core.CoreContainer initZooKeeper INFO: Zookeeper client=zook1:2181/sc,zook2:2181/sc,zook3:2181/sc Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv INFO: Client environment:zookeeper.version=3.3.1-942149, built on 05/07/2010 17:14 GMT Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv INFO: Client environment:host.name=ob1079.nydc1.outbrain.com Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv INFO: Client environment:java.version=1.6.0_21 Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv INFO: Client environment:java.vendor=Sun Microsystems Inc. Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv INFO: Client environment:java.home=/usr/java/jdk1.6.0_21/jre Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv INFO: Client environment:java.class.path=/opt/tomcat6/bin/bootstrap.jar Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv INFO: Client environment:java.library.path=/usr/java/jdk1.6.0_21/jre/lib/amd64/server:/ usr/java/jdk1.6.0_21/jre/lib/amd64:/usr/java/jdk1.6.0_21/jre/../lib/amd64: /usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv INFO: Client environment:java.io.tmpdir=/opt/tomcat6/temp Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv INFO: Client environment:java.compiler=NA Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv INFO: Client environment:os.name=Linux Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv INFO: Client environment:os.arch=amd64 Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv INFO: Client environment:os.version=2.6.18-194.8.1.el5 Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv INFO: Client
Re: about the Solr request filter
Can we see the queries you're running and the data you expect back? And an idea of the documents you're expecting to be matched, including the field definitions from your schema.xml for the fields in question. Are you using SolrJ? Just a URL in a browser? How do you mean manually? It might help to review: http://wiki.apache.org/solr/UsingMailingLists Best Erick 2011/7/28 于浩 yuhao.1...@qq.com: Hello,Dear friends, I have got an problem in developing with solr. In My Application ,It must sends multiple query to solr server after the page is loaded. Then I found a problem: some request will return statusCode:0 and QTime:0, The solr has accepted the request, but It does not return a result document. If I send each request one by one manually ,It will return the result. But If I send the request frequently in a very short times, It will return nothing only statusCode:0 and QTime:0. I think this may be a stratege for solr. but i can't find any documents or discussions on the internet. so i want you can help me. -- Surely, 你永远是最棒的!
Re: Possible to use quotes in dismax qf?
Did you look at phrase fields (pf) in dismax? Best Erick On Thu, Jul 28, 2011 at 11:26 AM, O. Klein kl...@octoweb.nl wrote: I removed the post as it might confuse people. But because of analysers combining 2 words in a phrase query using shingles and positionfilter and the usage of dismax, I need q to be the original query plus the original query as phrasequery. That way the combined words are also highlighted and do I get the results I need. qf is not the place to do this it seems though. Any way to do this in Solr? -- View this message in context: http://lucene.472066.n3.nabble.com/Re-Possible-to-use-quotes-in-dismax-qf-tp3206891p3206986.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dealing with so many different sorting options
Well, you're kind of stuck unfortunately. It's pretty much required that you'll have to reindex when you add new fields if you want existing documents to have that field. I don't think there's any good way to use the DB to sort Solr results that would be performant. About using Solr as your data source. I'm not sure what you mean here. Solr is many things, but it's not intended to be a data store. You can essentially store the entire DB in the Solr index though, there's nothing wrong with that. Admittedly, re-indexing is a pain but I suspect that as your app matures you'll find yourself adding fields less and less often. Sorry I can't offer better suggestions, but that's the nature of the developing apps G.. Best Erick On Fri, Jul 29, 2011 at 3:36 PM, Jason Toy jason...@gmail.com wrote: As I'm using solr more and more, I'm finding that I need to do searches and then order by new criteria. So I am constantly add new fields into solr and then reindexing everything. I want to know if adding in all this data into solr is the normal way to deal with sorting. I'm finding that I have almost a whole copy of my database in solr. Should I be pulling out all the data from solr and then sort in my database? This solution seems like it would take too long. Could/Should I just move to solr as my primary store so I can query directly against it without having to reindex all the time? Right now we store about 50 million docs, but the size is growing pretty fast and it is a pain to reindex everything everytime I add a new column to sort by.
Re: Looking for a senior search engineer
Here's a page where you can hire guns that you might be interested in... http://wiki.apache.org/solr/Support Best ERick On Fri, Jul 29, 2011 at 8:59 PM, Michael Economy mich...@goodreads.com wrote: Hi, Sorry if this isn't the right place for this message, but it's a very specific role we're looking for and I'm not sure where else to find solr experts! I was wondering if anyone would be interested, or knew anyone who would be interested in working on goodreads.com's search: We're using Solr, and we'd like someone with experience doing: solr-replication faceted search more cool stuff We run ruby on rails for the website. Potential applicants don't need to know ruby or rails, but they'd be expected to pick it up after starting. More info on our website: http://.goodreads.com/about/us Michael Economy Director Engineering, Goodreads Inc.
Re: Different Access Permissions?
Sure, it's possible. It's just that you have to do the work yourself G... You could define a series of request handlers for various classes of user and route the request to the correct handler based on that user's attributes. You could construct the query manually based on the user's attributes. You could create a page with drop-downs for various fields based on the user's attributes... But the common thread here is that you have to do all the logic in the app, outside Solr, that routes the query to the correct place in Solr based on user attributes that are also outside solr. If this is irrelevant, perhaps you could explain the use case in a bit more depth? Best Erick On Mon, Aug 1, 2011 at 2:36 AM, deniz denizdurmu...@gmail.com wrote: Hi All, here comes the problem... Let's say that I have a document having different fields. Is it possible to let some users to query the documents partially? Like this: Document has name, age, occupation, country fields. UserA can make search within name and country fields while UserB can make a search in the whole document... is this possible? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Different-Access-Permissions-tp3215190p3215190.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Joining on multi valued fields
Hmmm, if these are real responses from a solr server at rest (i.e. documents not being changed between queries) then what you show definitely looks like a bug. That's interesting, since TestJoin implements a random test that should cover cases like this pretty well. I assume you are using a version of trunk (4.0-dev) and not just the actual attached to the JIRA issue (which IIRC had at least one bug... SOLR-2521). Have you tried a more recent version of trunk? -Yonik http://www.lucidimagination.com On Wed, Aug 3, 2011 at 7:00 AM, matthew.fow...@thomsonreuters.com wrote: Hi Yonik Sorry for my late reply. I have been trying to get to the bottom of this but I'm getting inconsistent behaviour. Here's an example: Query = pi:rcs100 - Here going to use pid_rcs as join value result name=response numFound=1 start=0 doc str name=pircs100/str str name=ctrcs/str str name=pid_rcsG1/str str name=name_rcsEmerging Market Countries/str str name=definition_rcsAll business events relating to companies and other issuers of securities./str /doc /result /response Query = code:G1 - See how many docs have G1 in their code field. Notice that code is multi valued - result name=response numFound=2 start=0 - doc str name=ctcat/str date name=maindocdate2011-04-22T05:48:57Z/date str name=pinCIF3wGpXk+1029782/str - arr name=code strG1/str strG7U/str strGK/str strME7/str strME8/str strMN/str strMR/str /arr /doc - doc str name=ctcat/str date name=maindocdate2011-04-22T05:48:57Z/date str name=pinCIF7YcLP+1029782/str - arr name=code strG1/str strG7U/str strGK/str strME7/str strME8/str strMN/str strMR/str /arr /doc /result /response Now for the join: http://10.15.39.137:8983/solr/file/select?q={!join from=pid_rcs to=code}pi:rcs100 - result name=response numFound=3 start=0 - doc str name=ctcat/str date name=maindocdate2011-04-22T05:48:57Z/date str name=pinCIF3wGpXk+1029782/str - arr name=code strG1/str strG7U/str strGK/str strME7/str strME8/str strMN/str strMR/str /arr /doc - doc str name=ctcat/str date name=maindocdate2011-04-22T05:48:57Z/date str name=pinCIF7YcLP+1029782/str - arr name=code strG1/str strG7U/str strGK/str strME7/str strME8/str strMN/str strMR/str /arr /doc - doc str name=ctcat/str date name=maindocdate2011-04-22T05:48:58Z/date str name=pinCN1763203+1029782/str - arr name=code strA2/str strA5/str strA9/str strAN/str strB125/str strB126/str strB130/str strBL63/str strG41/str strGK/str strMZ/str /arr /doc /result /response So as you can see I get back 3 results when only 2 match the criteria. i.e. docs where G1 is present in multi valued code field. Why should the last document be included in the result of the join? Thank you, Matt -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: 01 August 2011 18:28 To: solr-user@lucene.apache.org Subject: Re: Joining on multi valued fields On Mon, Aug 1, 2011 at 12:58 PM, matthew.fow...@thomsonreuters.com wrote: I have been using the JOIN patch https://issues.apache.org/jira/browse/SOLR-2272 with great success. However I have hit a case where it doesn't seem to be working. It doesn't seem to work when joining to a multi-valued field. That should work (and the unit tests do test with multi-valued fields). Can you come up with a simple example where you are not getting the expected results? -Yonik http://www.lucidimagination.com This email was sent to you by Thomson Reuters, the global news and information company. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Thomson Reuters.
Re: Update some fields for all documents: LUCENE-1879 vs. ParallelReader .FilterIndex
How are these fields used? Because if they're not used for searching, you could put them in their own core and rebuild that index at your whim, then querying that core when you need the relationship information. If you have a DB backing your system, you could perhaps store the info there and query that (but I like the second core better G).. But if you could use a separate index just for the relationships, you wouldn't have to deal with the slow re-indexing of all the docs... Best Erick On Mon, Aug 1, 2011 at 4:12 AM, karsten-s...@gmx.de wrote: Hi lucene/solr-folk, Issue: Our documents are stable except for two fields which are used for linking between the docs. So we like to update this two fields in a batch once a month (possible once a week). We can not reindex all docs once a month, because we are using XeLDA in some fields for stemming (morphological analysis), and XeLDA is slow. We have 14 Mio docs (less than 100GByte Main-Index and 3 GByte for this two changable fields). In the next half year we will migrating our search engine from verity K2 to solr; so we could wait for solr 4.0 ( btw any news about http://lucene.472066.n3.nabble.com/Release-schedule-Lucene-4-td2256958.html ? ). Solution? Our issue is exactly the purpose of ParallelReader. But Solr do not support ParallelReader (for a good reason: http://lucene.472066.n3.nabble.com/Vertical-Partitioning-advice-td494623.html#a494624 ). So I see two possible ways to solve our issue: 1. waiting for the new Parallel incremental indexing ( https://issues.apache.org/jira/browse/LUCENE-1879 ) and hoping that solr will integrate this. Pro: - nothing to do for us except waiting. Contra: - I did not found anything of the (old) patch in current trunk. 2. Change lucene index below/without solr in a batch: a) Each month generate a new index only with our two changed fields (e.g. with DIH) b) Use FilterIndex and ParallelReader to mock a correct index c) “Merge” this mock index to a new Index (via IndexWriter.addIndexes(IndexReader...) ) Pro: - The patch for https://issues.apache.org/jira/browse/LUCENE-1812 should be a good example, how to do this. Contra: - relation between DocId and document index order is not an guaranteed feature of DIH, (e.g. we will have to split the main index to ensure that no merge will occur in/after DIH). - To run this batch, solr has to be stopped and restarted. - Even if we know, that our two field should change only for a subset of the docs, we nevertheless have to reindex this two fields for all the docs. Any comments, hints or tips? Is there a third (better) way to solve our issue? Is there already an working example of the 2. solution? Will LUCENE-1879 (Parallel incremental indexing) be part of solr 4.0? Best regards Karsten
RE: Joining on multi valued fields
No I haven't. I will get the latest out of the trunk and report back. Cheers again, Matt -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: 03 August 2011 14:51 To: Fowler, Matthew (Markets Eikon) Cc: solr-user@lucene.apache.org Subject: Re: Joining on multi valued fields Hmmm, if these are real responses from a solr server at rest (i.e. documents not being changed between queries) then what you show definitely looks like a bug. That's interesting, since TestJoin implements a random test that should cover cases like this pretty well. I assume you are using a version of trunk (4.0-dev) and not just the actual attached to the JIRA issue (which IIRC had at least one bug... SOLR-2521). Have you tried a more recent version of trunk? -Yonik http://www.lucidimagination.com On Wed, Aug 3, 2011 at 7:00 AM, matthew.fow...@thomsonreuters.com wrote: Hi Yonik Sorry for my late reply. I have been trying to get to the bottom of this but I'm getting inconsistent behaviour. Here's an example: Query = pi:rcs100 - Here going to use pid_rcs as join value result name=response numFound=1 start=0 doc str name=pircs100/str str name=ctrcs/str str name=pid_rcsG1/str str name=name_rcsEmerging Market Countries/str str name=definition_rcsAll business events relating to companies and other issuers of securities./str /doc /result /response Query = code:G1 - See how many docs have G1 in their code field. Notice that code is multi valued - result name=response numFound=2 start=0 - doc str name=ctcat/str date name=maindocdate2011-04-22T05:48:57Z/date str name=pinCIF3wGpXk+1029782/str - arr name=code strG1/str strG7U/str strGK/str strME7/str strME8/str strMN/str strMR/str /arr /doc - doc str name=ctcat/str date name=maindocdate2011-04-22T05:48:57Z/date str name=pinCIF7YcLP+1029782/str - arr name=code strG1/str strG7U/str strGK/str strME7/str strME8/str strMN/str strMR/str /arr /doc /result /response Now for the join: http://10.15.39.137:8983/solr/file/select?q={!join from=pid_rcs to=code}pi:rcs100 - result name=response numFound=3 start=0 - doc str name=ctcat/str date name=maindocdate2011-04-22T05:48:57Z/date str name=pinCIF3wGpXk+1029782/str - arr name=code strG1/str strG7U/str strGK/str strME7/str strME8/str strMN/str strMR/str /arr /doc - doc str name=ctcat/str date name=maindocdate2011-04-22T05:48:57Z/date str name=pinCIF7YcLP+1029782/str - arr name=code strG1/str strG7U/str strGK/str strME7/str strME8/str strMN/str strMR/str /arr /doc - doc str name=ctcat/str date name=maindocdate2011-04-22T05:48:58Z/date str name=pinCN1763203+1029782/str - arr name=code strA2/str strA5/str strA9/str strAN/str strB125/str strB126/str strB130/str strBL63/str strG41/str strGK/str strMZ/str /arr /doc /result /response So as you can see I get back 3 results when only 2 match the criteria. i.e. docs where G1 is present in multi valued code field. Why should the last document be included in the result of the join? Thank you, Matt -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: 01 August 2011 18:28 To: solr-user@lucene.apache.org Subject: Re: Joining on multi valued fields On Mon, Aug 1, 2011 at 12:58 PM, matthew.fow...@thomsonreuters.com wrote: I have been using the JOIN patch https://issues.apache.org/jira/browse/SOLR-2272 with great success. However I have hit a case where it doesn't seem to be working. It doesn't seem to work when joining to a multi-valued field. That should work (and the unit tests do test with multi-valued fields). Can you come up with a simple example where you are not getting the expected results? -Yonik http://www.lucidimagination.com This email was sent to you by Thomson Reuters, the global news and information company. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Thomson Reuters. This email was sent to you by Thomson Reuters, the global news and information company. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Thomson Reuters.
Re: performance crossover between single index and sharding
Replies inline. On 8/3/2011 2:24 AM, Bernd Fehling wrote: To show that I compare apples and oranges here are my previous FAST Search setup: - one master server (controlling, logging, search dispatcher) - six index server (4.25 mio docs per server, 5 slices per index) (searching and indexing at the same time, indexing once per week during the weekend) - each server has 4GB RAM, all servers are physical on seperate machines - RAM usage controlled by the processes - total of 25.5 mio. docs (mainly metadata) from 1500 databases worldwide - index size is about 67GB per indexer -- about 402GB total - about 3 qps at peek times - with average search time of 0.05 seconds at peek times An average query time of 50 milliseconds isn't too bad. If the number from your Solr setup below (39.5) is the QTime, then Solr thinks it is performing better, but Solr's QTime does not include absolutely everything that hs to happen. Do you by chance have 95th and 99th percentile query times for either system? And here is now my current Solr setup: - one master server (indexing only) - two slave server (search only) but only one is online, the second is fallback - each server has 32GB RAM, all server are virtuell (master on a seperate physical machine, both slaves together on a physical machine) - RAM usage is currently 20GB to java heap - total of 31 mio. docs (all metadata) from 2000 databases worldwide - index size is 156GB total - search handler statistic report 0.6 average requests per second - average time per request 39.5 (is that seconds?) - building the index from scratch takes about 20 hours I can't tell whether you mean that each physical host has 32GB or each VM has 32GB. You want to be sure that you are not oversubscribing your memory. If you can get more memory in your machines, you really should. Do you know whether that 0.6 seconds is most of the delay that a user sees when making a search request, or are there other things going on that contribute more delay? In our webapp, the Solr request time is usually small compared with everything else the server and the user's browser are doing to render the results page. As much as I hate being the tall pole in the tent, I look forward to the day when the developers can change that balance. The good thing is I have the ability to compare a commercial product and enterprise system to open source. I started with my simple Solr setup because of kiss (keep it simple and stupid). Actually it is doing excellent as single index on a single virtuell server. But the average time per request should be reduced now, thats why I started this discussion. While searches with smaller Solr index size (3 mio. docs) showed that it can stand with FAST Search it now shows that its time to go with sharding. I think we are already far behind the point of search performance crossover. What I hope to get with sharding: - reduce time for building the index - reduce average time per request You will probably achieve both of these things by sharding, especially if you have a lot of CPU cores available. Like mine, your query volume is very low, so the CPU cores are better utilized distributing the search. What I fear with sharding: - i currently have master/slave, do I then have e.g. 3 master and 3 slaves? - the query changes because of sharding (is there a search distributor?) - how to distribute the content the indexer with DIH on 3 server? - anything else to think about while changing to sharding? I think sharding is probably a good idea for you, as long as you don't lose redundancy. You can duplicate the FAST concept of a master server, in a Solr core with no index. The solrconfig.xml for the core needs to include the shards parameter. That core combined with those shards will make up one complete index chain, and you need to have at least two complete chains, running on separate physical hardware. A load balancer will be critical. I use two small VMs on separate hosts with heartbeat and haproxy for mine. Thanks, Shawn
Strategies for sorting by array, when you can't sort by array?
Hi all- Well, this is a problem. I have a list of names as a multi-valued field and I am searching on this field and need to return the results sorted. I know from searching and reading the documentation (and getting the error) that sorting on a multi-valued field isn't possible. Okay, so, what I haven't found is any real good solution/workaround to the problem. I was wondering what strategies others have done to overcome this particular situation; collapsing the individual names into a single field with copyField doesn't work because the name searched may not be the first name in the field. Thanks for any hints/tips/tricks. Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Re: Strategies for sorting by array, when you can't sort by array?
Although you weren't very clear about it, it sounds as if you want the results to be sorted by a name that actually matched the query? In general that is not going to be easy, since it is not something that can be computed in advance and thus indexed. -Mike On 08/03/2011 10:39 AM, Olson, Ron wrote: Hi all- Well, this is a problem. I have a list of names as a multi-valued field and I am searching on this field and need to return the results sorted. I know from searching and reading the documentation (and getting the error) that sorting on a multi-valued field isn't possible. Okay, so, what I haven't found is any real good solution/workaround to the problem. I was wondering what strategies others have done to overcome this particular situation; collapsing the individual names into a single field with copyField doesn't work because the name searched may not be the first name in the field. Thanks for any hints/tips/tricks. Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
RE: Strategies for sorting by array, when you can't sort by array?
Right, the search term is the sort field. I can manually sort an individual page, but when the user clicks on the next page, the sort is reset, visually. -Original Message- From: Mike Sokolov [mailto:soko...@ifactory.com] Sent: Wednesday, August 03, 2011 9:52 AM To: solr-user@lucene.apache.org Cc: Olson, Ron Subject: Re: Strategies for sorting by array, when you can't sort by array? Although you weren't very clear about it, it sounds as if you want the results to be sorted by a name that actually matched the query? In general that is not going to be easy, since it is not something that can be computed in advance and thus indexed. -Mike On 08/03/2011 10:39 AM, Olson, Ron wrote: Hi all- Well, this is a problem. I have a list of names as a multi-valued field and I am searching on this field and need to return the results sorted. I know from searching and reading the documentation (and getting the error) that sorting on a multi-valued field isn't possible. Okay, so, what I haven't found is any real good solution/workaround to the problem. I was wondering what strategies others have done to overcome this particular situation; collapsing the individual names into a single field with copyField doesn't work because the name searched may not be the first name in the field. Thanks for any hints/tips/tricks. Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you. DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Dismax mm per field
Hello, Is there a way to apply (e)dismax mm parameter per field? If I have a query field1:(blah blah) AND field2:(foo bar) is there a way to apply mm only to field2? Thanks, Dmitriy -- View this message in context: http://lucene.472066.n3.nabble.com/Dismax-mm-per-field-tp3222594p3222594.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strategies for sorting by array, when you can't sort by array?
Hi Ron. This is an interesting problem you have. One idea would be to create an index with the entity relationship going in the other direction. So instead of one to many, go many to one. You would end up with multiple documents with varying names but repeated parent entity information -- perhaps simply using just an ID which is used as a lookup. Do a search on this name field, sorting by a non-tokenized variant of the name field. Use Result-Grouping to consolidate multiple matches of a name to the same parent document. This whole idea might very well be academic since duplicating all the parent entity information for searching on that too might be a bit much than you care to bother with. And I don't think Solr 4's join feature addresses this use case. In the end, I think Solr could be modified to support this, with some work. It would make a good feature request in JIRA. ~ David Smiley On Aug 3, 2011, at 10:39 AM, Olson, Ron wrote: Hi all- Well, this is a problem. I have a list of names as a multi-valued field and I am searching on this field and need to return the results sorted. I know from searching and reading the documentation (and getting the error) that sorting on a multi-valued field isn't possible. Okay, so, what I haven't found is any real good solution/workaround to the problem. I was wondering what strategies others have done to overcome this particular situation; collapsing the individual names into a single field with copyField doesn't work because the name searched may not be the first name in the field. Thanks for any hints/tips/tricks. Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Re: Update some fields for all documents: LUCENE-1879 vs. ParallelReader .FilterIndex
Hi Erick, our two changable fields are used for linking between documents on application level. From lucene point of view they are just two searchable fields with stored term vector for one of them. Our queries will use one of this fields and a couple of fields from the stable fields. So the question is really about updating two fields in an existing lucene index with more then fifty other fields. Best regards Karsten P.S. about our linking between documents: Our two fields called outgoingLinks and possibleIncomingLinks. Our source-documents have an abstract and a couple of metadata. We are using regular expression to find outgoing links in this abstract. This means a couple of words, which indicates 1. that the author made a reference (like in my previos work published as 'Very important Article' in Nature 2010, 12 page 7) 2. that this reference contains metadata to an other document Each of this links is transformed to a special key (2010NaturNr12Page7). On the other side, we transform the metadata to all possible keys. This key generation grows with our knowledge of possible link pattern. For the lucene indexer this is a black-box: There is a service which produce the keys for outgoing and possibleIncoming from our source (xml-)documents, this keys must be searchable in lucene/solr. P.P.S. in Context: http://lucene.472066.n3.nabble.com/Update-some-fields-for-all-documents-LUCENE-1879-vs-ParallelReader-amp-FilterIndex-td3215398.html Original-Nachricht Datum: Wed, 3 Aug 2011 09:57:03 -0400 Von: Erick Erickson erickerick...@gmail.com An: solr-user@lucene.apache.org Betreff: Re: Update some fields for all documents: LUCENE-1879 vs. ParallelReader .FilterIndex How are these fields used? Because if they're not used for searching, you could put them in their own core and rebuild that index at your whim, then querying that core when you need the relationship information. If you have a DB backing your system, you could perhaps store the info there and query that (but I like the second core better G).. But if you could use a separate index just for the relationships, you wouldn't have to deal with the slow re-indexing of all the docs... Best Erick On Mon, Aug 1, 2011 at 4:12 AM, karsten-s...@gmx.de wrote: Hi lucene/solr-folk, Issue: Our documents are stable except for two fields which are used for linking between the docs. So we like to update this two fields in a batch once a month (possible once a week). We can not reindex all docs once a month, because we are using XeLDA in some fields for stemming (morphological analysis), and XeLDA is slow. We have 14 Mio docs (less than 100GByte Main-Index and 3 GByte for this two changable fields). In the next half year we will migrating our search engine from verity K2 to solr; so we could wait for solr 4.0 ( btw any news about http://lucene.472066.n3.nabble.com/Release-schedule-Lucene-4-td2256958.html ? ). Solution? Our issue is exactly the purpose of ParallelReader. But Solr do not support ParallelReader (for a good reason: http://lucene.472066.n3.nabble.com/Vertical-Partitioning-advice-td494623.html#a494624 ). So I see two possible ways to solve our issue: 1. waiting for the new Parallel incremental indexing ( https://issues.apache.org/jira/browse/LUCENE-1879 ) and hoping that solr will integrate this. Pro: - nothing to do for us except waiting. Contra: - I did not found anything of the (old) patch in current trunk. 2. Change lucene index below/without solr in a batch: a) Each month generate a new index only with our two changed fields (e.g. with DIH) b) Use FilterIndex and ParallelReader to mock a correct index c) “Merge” this mock index to a new Index (via IndexWriter.addIndexes(IndexReader...) ) Pro: - The patch for https://issues.apache.org/jira/browse/LUCENE-1812 should be a good example, how to do this. Contra: - relation between DocId and document index order is not an guaranteed feature of DIH, (e.g. we will have to split the main index to ensure that no merge will occur in/after DIH). - To run this batch, solr has to be stopped and restarted. - Even if we know, that our two field should change only for a subset of the docs, we nevertheless have to reindex this two fields for all the docs. Any comments, hints or tips? Is there a third (better) way to solve our issue? Is there already an working example of the 2. solution? Will LUCENE-1879 (Parallel incremental indexing) be part of solr 4.0? Best regards Karsten
RE: Strategies for sorting by array, when you can't sort by array?
*Sigh*...I had thought maybe reversing it would work, but that would require creating a whole new index, on a separate core, as the existing index is used for other purposes. Plus, given the volume of data, that would be a big deal, update-wise. What would be better would be to remove that particular sort option-button on the webpage. ;) I'll create a Jira issue, but in the meanwhile I'll have to come up with something else. I guess I didn't realize how much of a corner case this problem is. :) Thanks for the suggestions! Ron -Original Message- From: Smiley, David W. [mailto:dsmi...@mitre.org] Sent: Wednesday, August 03, 2011 10:26 AM To: solr-user@lucene.apache.org Subject: Re: Strategies for sorting by array, when you can't sort by array? Hi Ron. This is an interesting problem you have. One idea would be to create an index with the entity relationship going in the other direction. So instead of one to many, go many to one. You would end up with multiple documents with varying names but repeated parent entity information -- perhaps simply using just an ID which is used as a lookup. Do a search on this name field, sorting by a non-tokenized variant of the name field. Use Result-Grouping to consolidate multiple matches of a name to the same parent document. This whole idea might very well be academic since duplicating all the parent entity information for searching on that too might be a bit much than you care to bother with. And I don't think Solr 4's join feature addresses this use case. In the end, I think Solr could be modified to support this, with some work. It would make a good feature request in JIRA. ~ David Smiley On Aug 3, 2011, at 10:39 AM, Olson, Ron wrote: Hi all- Well, this is a problem. I have a list of names as a multi-valued field and I am searching on this field and need to return the results sorted. I know from searching and reading the documentation (and getting the error) that sorting on a multi-valued field isn't possible. Okay, so, what I haven't found is any real good solution/workaround to the problem. I was wondering what strategies others have done to overcome this particular situation; collapsing the individual names into a single field with copyField doesn't work because the name searched may not be the first name in the field. Thanks for any hints/tips/tricks. Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you. DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Re: Dismax mm per field
There is not, and the way dismax works makes it not really that feasible in theory, sadly. One thing you could do instead is combine multiple separate dismax queries using the nested query syntax. This will effect your relevancy ranking possibly in odd ways, but anything that accomplishes 'mm per field' will neccesarily not really be using dismax's disjunction-max relevancy ranking in the way it's intended. Here's how you could combine two seperate dismax queries: defType=lucene q=_query_:{!dismax qf=field1 mm=100%}blah blah AND _query_:{!dismax qf=field2 mm=80%}foo bar That whole q value would need to be properly URI escaped, which I haven't done here for human-readability. Dismax has always got an mm, there's no way to not have an mm with dismax, but mm 100% might be what you mean. Of course, one of those queries could also not be dismax at all, but ordinary lucene query parser or anything else. And of course you could have the same query text for nested queries repeating eg blah blah in both. On 8/3/2011 11:24 AM, Dmitriy Shvadskiy wrote: Hello, Is there a way to apply (e)dismax mm parameter per field? If I have a query field1:(blah blah) AND field2:(foo bar) is there a way to apply mm only to field2? Thanks, Dmitriy -- View this message in context: http://lucene.472066.n3.nabble.com/Dismax-mm-per-field-tp3222594p3222594.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Strategies for sorting by array, when you can't sort by array?
There's no great way to do this. I understand your problem as: It's a multi-valued field, but you want to sort on whichever of those values matched the query, not on the values that didn't. (Not entirely clear what to do if the documents are in the result set becuse of a match in an entirely different field!) I would sometimes like to do that too, and haven't really been able to come up with any great way to do it. Something involving facetting kind of gets you closer, but ends up being a huge pain and doesn't get you (or at least me) all the way to supporting the interface I'd really want. On 8/3/2011 10:39 AM, Olson, Ron wrote: Hi all- Well, this is a problem. I have a list of names as a multi-valued field and I am searching on this field and need to return the results sorted. I know from searching and reading the documentation (and getting the error) that sorting on a multi-valued field isn't possible. Okay, so, what I haven't found is any real good solution/workaround to the problem. I was wondering what strategies others have done to overcome this particular situation; collapsing the individual names into a single field with copyField doesn't work because the name searched may not be the first name in the field. Thanks for any hints/tips/tricks. Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Re: Strategies for sorting by array, when you can't sort by array?
Not so much that it's a corner case in the sense of being unusual neccesarily (I'm not sure), it's just something that fundamentally doesn't fit well into lucene's architecture. I'm not sure that filing a JIRA will be much use, it's really unclear how one would get lucene to do this, it would be signficant work to do, and it's unlikely any Solr developer is going to decide to spend signficant time on it unless they need it for their own clients. On 8/3/2011 11:40 AM, Olson, Ron wrote: *Sigh*...I had thought maybe reversing it would work, but that would require creating a whole new index, on a separate core, as the existing index is used for other purposes. Plus, given the volume of data, that would be a big deal, update-wise. What would be better would be to remove that particular sort option-button on the webpage. ;) I'll create a Jira issue, but in the meanwhile I'll have to come up with something else. I guess I didn't realize how much of a corner case this problem is. :) Thanks for the suggestions! Ron -Original Message- From: Smiley, David W. [mailto:dsmi...@mitre.org] Sent: Wednesday, August 03, 2011 10:26 AM To: solr-user@lucene.apache.org Subject: Re: Strategies for sorting by array, when you can't sort by array? Hi Ron. This is an interesting problem you have. One idea would be to create an index with the entity relationship going in the other direction. So instead of one to many, go many to one. You would end up with multiple documents with varying names but repeated parent entity information -- perhaps simply using just an ID which is used as a lookup. Do a search on this name field, sorting by a non-tokenized variant of the name field. Use Result-Grouping to consolidate multiple matches of a name to the same parent document. This whole idea might very well be academic since duplicating all the parent entity information for searching on that too might be a bit much than you care to bother with. And I don't think Solr 4's join feature addresses this use case. In the end, I think Solr could be modified to support this, with some work. It would make a good feature request in JIRA. ~ David Smiley On Aug 3, 2011, at 10:39 AM, Olson, Ron wrote: Hi all- Well, this is a problem. I have a list of names as a multi-valued field and I am searching on this field and need to return the results sorted. I know from searching and reading the documentation (and getting the error) that sorting on a multi-valued field isn't possible. Okay, so, what I haven't found is any real good solution/workaround to the problem. I was wondering what strategies others have done to overcome this particular situation; collapsing the individual names into a single field with copyField doesn't work because the name searched may not be the first name in the field. Thanks for any hints/tips/tricks. Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you. DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Re: Solr request filter and indexing process
A ha,I have found the root cause , the Solr has return the result properly .The root cause is the SolrPHPClient, The SolrPHPClient uses file_get_contents function for connecting to Solr by default ,this function is not stable, usually returns http status error. thanks for everybody who gives me help.Good luck for you! 2011/8/2 Chris Hostetter hossman_luc...@fucit.org : thanks for the reply. This is tomcat log files on my Solr Server: : I found that : if the server returns status=0 and QTime=0, the SolrPhpClient : will throughs an Exception. But the same query String will not always return : status=0 and QTime=0. The Query String is valid, I have tested them in Solr I know nothing about PHP but if your client code is throwing an exception anytime status=0 and QTime=0 then it sounds like a bug in your client code -- there is no reason why those two numbers being 0 should be considered an error. It just means the request was processed in under a millisecond. -Hoss
A rant about field collapsing
I am working on an implementation of search within our application using solr. About 2 months ago we had the need to group results by a certain field. After some searching I came across the JIRA in progress for this - field collapsing: https://issues.apache.org/jira/browse/SOLR-236 It was scheduled for the next solr release and had a full set of proper JIRA subtasks and patch files of almost complete implementations attached. So as you can imagine I was happy to apply this patch and build it into our application and await for the next release when it would be part of the main trunk. Now imagine my surprise when we have come around to upgrade to see that suddenly field collapsing has been thrown away in favour of a totally different grouping implementation https://issues.apache.org/jira/browse/SOLR-2524 How was it decided that this would be used instead? It was not made very clear that LUCENE-1421 was in progress which would effectively make the field collapsing work irrelevant by fixing the problem in lucene rather than primarily in solr. This has cost me days of work to now merge our custom changes somehow to the new implementation. I guess it is my own fault for basing our custom changes around an unresolved enhancement but as SOLR-236 had been 3-4 years in progress and SOLR-2524 did not exist at the time it seemed pretty safe to assume that the same problem was not being fixed in 2 totally different ways! -- View this message in context: http://lucene.472066.n3.nabble.com/A-rant-about-field-collapsing-tp3222798p3222798.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dismax mm per field
Thanks Jonathan. I thought it would be possible via nested queries but somehow could not get it to work. I'll give it another shot. On Wed, Aug 3, 2011 at 12:32 PM, Jonathan Rochkind [via Lucene] ml-node+3222792-952640420-221...@n3.nabble.com wrote: There is not, and the way dismax works makes it not really that feasible in theory, sadly. One thing you could do instead is combine multiple separate dismax queries using the nested query syntax. This will effect your relevancy ranking possibly in odd ways, but anything that accomplishes 'mm per field' will neccesarily not really be using dismax's disjunction-max relevancy ranking in the way it's intended. Here's how you could combine two seperate dismax queries: defType=lucene q=_query_:{!dismax qf=field1 mm=100%}blah blah AND _query_:{!dismax qf=field2 mm=80%}foo bar That whole q value would need to be properly URI escaped, which I haven't done here for human-readability. Dismax has always got an mm, there's no way to not have an mm with dismax, but mm 100% might be what you mean. Of course, one of those queries could also not be dismax at all, but ordinary lucene query parser or anything else. And of course you could have the same query text for nested queries repeating eg blah blah in both. On 8/3/2011 11:24 AM, Dmitriy Shvadskiy wrote: Hello, Is there a way to apply (e)dismax mm parameter per field? If I have a query field1:(blah blah) AND field2:(foo bar) is there a way to apply mm only to field2? Thanks, Dmitriy -- View this message in context: http://lucene.472066.n3.nabble.com/Dismax-mm-per-field-tp3222594p3222594.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Dismax-mm-per-field-tp3222594p3222792.html To unsubscribe from Dismax mm per field, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3222594code=ZHNodmFkc2tpeUBnbWFpbC5jb218MzIyMjU5NHwtMjczNzY1OTgx. -- View this message in context: http://lucene.472066.n3.nabble.com/Dismax-mm-per-field-tp3222594p3222851.html Sent from the Solr - User mailing list archive at Nabble.com.
Records skipped when using DataImportHandler
Hi, I am a newbie to Solr and have been trying to learn using DataImportHandler. I have a query in data-config.xml that fetches about 5 records when i fire it in SQL Query manager. However, when Solr does a full import, it is skipping 4 records and only importing 1 record. What could be the reason for that. ? My data-config.xml looks like this - dataConfig dataSource type=JdbcDataSource name=GeoService driver=net.sourceforge.jtds.jdbc.Driver url=jdbc:jtds:sqlserver://10.168.50.104/ZipCodeLookup user=sa password=psiuser/ document entity name=city query=select ll.cityId as id, ll.zip as zipCode, c.cityName as cityName, st.stateName as state, ct.countryName as country from latlonginfo ll,city c, state st, country ct where ll.cityId = c.cityID and c.stateID=st.stateID and st.countryID = ct.countryID order by ll.areacode dataSource=GeoService field column=zipCode name=zipCode/ field column=cityName name=cityName/ field column=state name=state/ field column=country name=country/ /entity /document /dataConfig My fields definition in schema.xml looks as below - field name=CityName type=text_general indexed=true stored=true / field name=zipCode type=text_general indexed=true stored=true/ field name=state type=text_general indexed=true stored=true / field name=country type=text_general indexed=true stored=true / One observation I made was the 1 record that is being indexes is the last record in the result set. I have verified that there are no duplicate records being retreived. For eg, if the result set from Database is - zipcode CityName state country --- - - --- 91324 Northridge CA USA 91325 Northridge CA USA 91327 Northridge CA USA 91328 Northridge CA USA 91329 Northridge CA USA 91330 Northridge CA USA The record being indexed is the last record all the time. Any suggestions are welcome. Thanks, Anand
Setting up Namespaces to Avoid Running Multiple Solr Instances
Hi, we run several independent websites on the same machines. Each site uses a similar codebase for search. Currently each site contacts its own solr server on a slightly different port. This means of course that we are running several solr servers (each on their own port) on the same machine. I would like to make this simpler by running just one server, listening on one port. Can we do this and at the same time have the indexes and search data separated for each web site? So, I'm asking if I can namespace or federate the solr server. But by doing so I would like to have the indexes etc. not comingled within the server. Im new to solr so there might be a hiccup from the fact that currently each solr server points to its own directory on a site-specific path (something like /apps/site/solr/*) which contains the solr plugin (were using ruby on rails). Can this be setup as a namespace (one for each web site) within the single server instance? Mike
Re: Setting up Namespaces to Avoid Running Multiple Solr Instances
I think that Solr multi-core (nothing to do with CPU cores, just what it's called in Solr) is what you're looking for. http://wiki.apache.org/solr/CoreAdmin On 8/3/2011 2:25 PM, Mike Papper wrote: Hi, we run several independent websites on the same machines. Each site uses a similar codebase for search. Currently each site contacts its own solr server on a slightly different port. This means of course that we are running several solr servers (each on their own port) on the same machine. I would like to make this simpler by running just one server, listening on one port. Can we do this and at the same time have the indexes and search data separated for each web site? So, I'm asking if I can namespace or federate the solr server. But by doing so I would like to have the indexes etc. not comingled within the server. Im new to solr so there might be a hiccup from the fact that currently each solr server points to its own directory on a site-specific path (something like /apps/site/solr/*) which contains the solr plugin (were using ruby on rails). Can this be setup as a namespace (one for each web site) within the single server instance? Mike
RE: question on solr.ASCIIFoldingFilterFactory
lboutros wrote: I used Spanish stemming, put the ASCIIFoldingFilterFactory before the stemming filter and added it in the query part too. Ludovic. My experiments with french stemmer does not yield good results with this order. Applying the ASCIIFoldingFilterFactory before stemming confuses the language specific stemmer. For example: étranglée = ASCIIFoldingFilterFactory = etranglee = FrencheStemmer = etranglee étranglé = ASCIIFoldingFilterFactory = etrangle = FrencheStemmer = etrangl étranglée = FrencheStemmer = étrangl = ASCIIFoldingFilterFactory = etrangl étranglé = FrencheStemmer = étrangl = ASCIIFoldingFilterFactory = etrangl -- View this message in context: http://lucene.472066.n3.nabble.com/question-on-solr-ASCIIFoldingFilterFactory-tp2780463p3223314.html Sent from the Solr - User mailing list archive at Nabble.com.
Does solr support multiple index set
Hey, This might be completely naive question. Could, I create more than one instance of index sets on a single instance of solr server? If so, how could I specify which schema to use and which index set to use. I am planning to create 2 separate index set using a single solr server. Data that needs to be indexed are coming from 2 disparate source and have different scheme. I want to create 2 separate schema like schema name=example1 version=1.4 /schema schema name=example2 version=1.4 schema and do all the regular operations (index, update, delete and query). Thanks, Sharath
Re: Does solr support multiple index set
Hello Sharath, Yes you can create many indexes. See this article: http://wiki.apache.org/solr/CoreAdmin See you, Helton On Wed, Aug 3, 2011 at 4:55 PM, Sharath Jagannath shotsonclo...@gmail.comwrote: Hey, This might be completely naive question. Could, I create more than one instance of index sets on a single instance of solr server? If so, how could I specify which schema to use and which index set to use. I am planning to create 2 separate index set using a single solr server. Data that needs to be indexed are coming from 2 disparate source and have different scheme. I want to create 2 separate schema like schema name=example1 version=1.4 /schema schema name=example2 version=1.4 schema and do all the regular operations (index, update, delete and query). Thanks, Sharath
Help with ShardParams
Hello, Can someone point me a good example or two of usage of the ShardParams shards.start and shards.rows? I have a Solr instance of 250M documents spread across 4 shards. And I need to be able to reliably and quickly access the records by page at the request of the user. I understand the searching limitation of the Distributed search when the start parameter gets high and have recently found the ShardParams and was hoping that this might be of some use. Thanks, John
Is there anyway to sort differently for facet values?
Hi, guys, Is there anyway to sort differently for facet values? For example, sometimes I want to sort facet values by their values instead of # of docs, and I want to be able to have a predefined order for certain facets as well. Is that possible in Solr we can do that? Thanks, YH
Re: indexing taking very long time
What version of Solr are you using? If it's a recent version, then optimizing is not that essential, you can do it during off hours, perhaps nightly or weekly. As far as indexing speed, have you profiled your application to see whether it's Solr or your indexing process that's the bottleneck? A quick check would be to monitor the CPU utilization on the server and see if it's high. As far as multithreading, one option is to simply have multiple clients indexing simultaneously. But you haven't indicated how the indexing is being done. Are you using DIH? SolrJ? Streaming documents to Solr? You have to provide those kinds of details to get meaningful help. Best Erick On Aug 2, 2011 8:06 AM, Naveen Gupta nkgiit...@gmail.com wrote: Hi We have a requirement where we are indexing all the messages of a a thread, a thread may have attachment too . We are adding to the solr for indexing and searching for applying few business rule. For a user, we have almost many threads (100k) in number and each thread may be having 10-20 messages. Now what we are finding is that it is taking 30 mins to index the entire threads. When we run optimize then it is taking faster time. The question here is that how frequently this optimize should be called and when ? Please note that we are following commit strategy (that is every after 10k threads, commit is called). we are not calling commit after every doc. Secondly how can we use multi threading from solr perspective in order to improve jvm and other utilization ? Thanks Naveen
Re: lucene/solr, raw indexing/searching
I predict you'll spend a lot of time on the admin/analysis page understanding what the various combinations of tokenizers and filters do. Because, you see, you already have differences, to whit: your Solr schema has LowercaseFilter and removeDuplicates. Have you determined *why* Solr indexing is slower? You might consider using SolrJ and firing multiple threads/processes at the issue to bring indexing performance up to acceptable levels and avoid this problem entirely Best Erick On Aug 2, 2011 12:37 PM, Jonathan Rochkind rochk...@jhu.edu wrote: In your solr schema.xml, are the fields you are using defined as text fields with analyzers? It sounds like you want no analysis at all, which probably means you don't want text fields either, you just want string fields. That will make it impossible to search for individual tokens though, searches will match only on complete matches of the value. I'm not quite sure how to do what you want, it depends on exactly what you want. What kind of searching do you expect to support? If you still do want tokenization, you'll still want some analysis... but I'm not quite sure how that corresponds to what you'd want to do on the lucene end. What you're trying to do is going to be inevitably confusing, I think. Which doesn't mean it's not possible. You might find it less confusing if you were willing to use Solr to index though, rather than straight lucene -- you could use Solr via the SolrJ java classes, rather than the HTTP interface. On 8/2/2011 11:14 AM, dhastings wrote: Hello, I am trying to get lucene and solr to agree on a completely Raw indexing method. I use lucene in my indexers that write to an index on disk, and solr to search those indexes that i create, as creating the indexes without solr is much much faster than using the solr server. are there settings for BOTH solr and lucene to use EXACTLY whats in the content as opposed to interpreting what it thinks im trying to do? My content is extremely specific and needs no interpretation or adjustment, indexing or searching, a text field. for example: 203.1 seems to be indexed as 2031. searching for 203.1 i can get to work correctly, but then it wont find whats indexed using 3.1's standard analyzer. if i have content that is : this is rev. 23.302 i need it indexed EXACTLY as it appears, this is rev. 23.302 I do not want any of solr or lucenes attempts to fix my content or my queries. rev. needs to stay rev. and not turn into rev, 23.302 needs to stay as such, and NOT turn into 23302. this is for BOTH indexing and searching. any hints? right now for indexing i have: Set nostopwords = new HashSet(); nostopwords.add(buahahahahahaha); Analyzer an = new StandardAnalyzer(Version.LUCENE_31, nostopwords); writer = new IndexWriter(fsDir,an,MaxFieldLength.UNLIMITED); writer.setUseCompoundFile(false) ; and for searching i have in my schema : fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Thanks. Very much appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/lucene-solr-raw-indexing-searching-tp3219277p3219277.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: MultiSearcher/ParallelSearcher - searching over multiple cores?
As far as I know, you're right. There's no built-in way to do what you want, especially since the fact that you're talking about different search fields implies that the scores from the documents aren't comparable anyway. How do you intend to combine the results for presentation to the user? Best Erick On Aug 2, 2011 5:11 PM, Ralf Musick ra...@gmx.de wrote:
Re: Update some fields for all documents: LUCENE-1879 vs. ParallelReader .FilterIndex
Hmmm, the only thing that comes to mind is the join feature being added to Solr 4.x, but I confess I'm not entirely familiar with that functionality so can't tell if it really solver your problem. Other than that I'm out of ideas, but the again it's late and I'm tired so maybe I'm not being very creative G... Best Erick On Aug 3, 2011 11:40 AM, karsten-s...@gmx.de wrote:
Re: Records skipped when using DataImportHandler
Sorry, I'm on a restricted machine so can't get the precise URL. But, there's a debug page for DIH that might allow you to see what the query actually returns. I'd guess one of two things: 1 you aren't getting the number of rows you think. 2 you aren't committing the documents you add. But that's just a guess. Best Erick On Aug 3, 2011 2:15 PM, anand sridhar anand.for...@gmail.com wrote: Hi, I am a newbie to Solr and have been trying to learn using DataImportHandler. I have a query in data-config.xml that fetches about 5 records when i fire it in SQL Query manager. However, when Solr does a full import, it is skipping 4 records and only importing 1 record. What could be the reason for that. ? My data-config.xml looks like this - dataConfig dataSource type=JdbcDataSource name=GeoService driver=net.sourceforge.jtds.jdbc.Driver url=jdbc:jtds:sqlserver://10.168.50.104/ZipCodeLookup user=sa password=psiuser/ document entity name=city query=select ll.cityId as id, ll.zip as zipCode, c.cityName as cityName, st.stateName as state, ct.countryName as country from latlonginfo ll,city c, state st, country ct where ll.cityId = c.cityID and c.stateID=st.stateID and st.countryID = ct.countryID order by ll.areacode dataSource=GeoService field column=zipCode name=zipCode/ field column=cityName name=cityName/ field column=state name=state/ field column=country name=country/ /entity /document /dataConfig My fields definition in schema.xml looks as below - field name=CityName type=text_general indexed=true stored=true / field name=zipCode type=text_general indexed=true stored=true/ field name=state type=text_general indexed=true stored=true / field name=country type=text_general indexed=true stored=true / One observation I made was the 1 record that is being indexes is the last record in the result set. I have verified that there are no duplicate records being retreived. For eg, if the result set from Database is - zipcode CityName state country --- - - --- 91324 Northridge CA USA 91325 Northridge CA USA 91327 Northridge CA USA 91328 Northridge CA USA 91329 Northridge CA USA 91330 Northridge CA USA The record being indexed is the last record all the time. Any suggestions are welcome. Thanks, Anand
Re: Is there anyway to sort differently for facet values?
have you looked at the facet.sort parameter? The index value is what I think you want. Best Erick On Aug 3, 2011 7:03 PM, Way Cool way1.wayc...@gmail.com wrote: Hi, guys, Is there anyway to sort differently for facet values? For example, sometimes I want to sort facet values by their values instead of # of docs, and I want to be able to have a predefined order for certain facets as well. Is that possible in Solr we can do that? Thanks, YH
Re: SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.ICUTokenizerFactory'
Guys, I am still stuck. Any help? Thanks, Satish On Tue, Aug 2, 2011 at 5:23 PM, Robert Muir rcm...@gmail.com wrote: did you add the analysis-extras jar itself? thats what has this factory. On Tue, Aug 2, 2011 at 5:03 AM, Satish Talim satish.ta...@gmail.com wrote: I am using Solr 3.3 on a Windows box. I want to use the solr.ICUTokenizerFactory in my schema.xml and added the fieldType name=text_icu as per the URL - http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUTokenizerFactory I also added the following files to my apache-solr-3.3.0\example\lib folder: lucene-icu-3.3.0.jar lucene-smartcn-3.3.0.jar icu4j-4_8.jar lucene-stempel-3.3.0.jar When I start my Solr server from apache-solr-3.3.0\example folder: java -jar start.jar I get the following errors: SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.ICUTokenizerFactory' SEVERE: org.apache.solr.common.SolrException: analyzer without class or tokenizer filter list SEVERE: org.apache.solr.common.SolrException: Unknown fieldtype 'text_icu' specified on field subject I tried adding various other jar files to the lib folder but it does not help. What am I doing wrong? Satish
Highlighting does not works with uniqueField set
Hi, I am new to solr. Am facing an issue wherein the highlighting of the searchresults for matches is not working when I have set a unique field as: uniqueKeyid/uniqueKey If this is commented then highlighting starts working. I need to have a unique field. Could someone please explain this erratic behaviour. I am setting this field while posting the documents to be indexed. Thanks Regards, Anand *** The Royal Bank of Scotland plc. Registered in Scotland No 90312. Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB. Authorised and regulated by the Financial Services Authority. The Royal Bank of Scotland N.V. is authorised and regulated by the De Nederlandsche Bank and has its seat at Amsterdam, the Netherlands, and is registered in the Commercial Register under number 33002587. Registered Office: Gustav Mahlerlaan 350, Amsterdam, The Netherlands. The Royal Bank of Scotland N.V. and The Royal Bank of Scotland plc are authorised to act as agent for each other in certain jurisdictions. This e-mail message is confidential and for use by the addressee only. If the message is received by anyone other than the addressee, please return the message to the sender by replying to it and then delete the message from your computer. Internet e-mails are not necessarily secure. The Royal Bank of Scotland plc and The Royal Bank of Scotland N.V. including its affiliates (RBS group) does not accept responsibility for changes made to this message after it was sent. For the protection of RBS group and its clients and customers, and in compliance with regulatory requirements, the contents of both incoming and outgoing e-mail communications, which could include proprietary information and Non-Public Personal Information, may be read by authorised persons within RBS group other than the intended recipient(s). Whilst all reasonable care has been taken to avoid the transmission of viruses, it is the responsibility of the recipient to ensure that the onward transmission, opening or use of this message and any attachments will not adversely affect its systems or data. No responsibility is accepted by the RBS group in this regard and the recipient should carry out such virus and other checks as it considers appropriate. Visit our website at www.rbs.com ***
java.lang.IllegalStateException: Committed error in the logs
I am getting following error log on trying to search. Any idea why this error is coming. Search results are coming after a long delay. SEVERE: org.mortbay.jetty.EofException at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791) at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569) at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012) at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:278) at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212) at org.apache.solr.common.util.FastWriter.flush(FastWriter.java:115) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:344) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: java.net.SocketException: Connection reset at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:96) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at org.mortbay.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:368) at org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:129) at org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:149) at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:714) ... 25 more 2011-08-04 06:05:10.550:WARN::Committed before 500 null||org.mortbay.jetty.EofException|?at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791)|?at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569)|?at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012)|?at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:278)|?at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122)|?at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212)|?at org.apache.solr.common.util.FastWriter.flush(FastWriter.java:115)|?at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:344)|?at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)|?at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)|?at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)|?at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)|?at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)|?at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)|?at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)|?at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)|?at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)|?at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)|?at org.mortbay.jetty.Server.handle(Server.java:326)|?at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)|?at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)|?at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)|?at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)|?at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)|?at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)|?at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)|Caused by:
csv responsewriter and numfound
Hi, Is there anyway to get numFound from csv response format? Some parameter? Or shall I change the code for csvResponseWriter for this? Thanks, Pooja