Group count in SOLR 3.3
Hi guys, we are using SOLR 3.3 with Solrj inside our java project. In actual version we had to add some grouping support, so we add parameters into SolrQuery object like this: query.setParam(GroupParams.GROUP, true); query.setParam(GroupParams.GROUP_MAIN, true); query.setParam(GroupParams.GROUP_FIELD, OUR_GROUP_FIELD); and we get QueryResponse with results we need. Awesome! But now I have one remaining problem, I don't know how get number of groups from QueryResponse. I found I must add group.ngroups=true param into query. So I did it: query.setParam(GroupParams.GROUP_TOTAL_COUNT, true); But QueryResponse seems same. There's no method like "getGroupCount()" or group count param in header. Am I doing something wrong? Or is it SOLR 3.3 problem? If we upgrade to newer version, will it works? Thanks for any advice! Roman
Re: Weighted Search Results / Multi-Value Value's Not Aggregating Weight
Hey, Please disregard this, I worked out what the actual problem was. I am going to post another query with something else I discovered. Thanks :) David On 22/08/2012 7:24 PM, David Radunz wrote: Hey, I have been having some problems getting good search results when using weighting against many fields with multi-values. After quite a bit of testing it seems to me that the problem is (at least as far as my query is concerned) is that the only one weighting is taken into account per field. For example, in a multi-value field if we have "Comedy" and "Romance" and have separate weightings for those - the one with the highest weighting is used (and not a combined weighting). Which means that searched for romantic comedy returns "Alvin and the Chipmunks" (Family, Children Comedy). Query: facet=on&fl=id,name,matching_genres,score,url_path,url_key,price,special_price,small_image,thumbnail,sku,stock_qty,release_date&sort=score+desc,retail_rating+desc,release_date+desc&start=&q=**+-sku:"1019660"+-movie_id:"1805"+-movie_id:"1806"+(series_names_attr_opt_id:"454282"^9000+OR+cat_id:"22"^9+OR+cat_id:"248"^9+OR+cat_id:"249"^9+OR+matching_genres:"Comedy"^9+OR+matching_genres:"Romance"^7+OR+matching_genres:"Drama"^5)&fq=store_id:"1"+AND+avail_status_attr_opt_id:"available"+AND+(format_attr_opt_id:"372619")&rows=4 Now if I change matching_genres:"Romance"^7 to matching_genres:"Romance"^70 (adding a 0) suddenly the first result is "Sex and the City: The Movie / Sex and the City 2" (which ironically is "Drama", "Comedy", "Romance - The very combination we are looking for). So is there a way to structure my query so that all of the multi-value values are treated individually? Aggregating the weighting/score? Thanks in advance! David
Re: Solr - Index Concurrency - Is it possible to have multiple threads write to same index?
Thanks for the reply Mikhail. For our needs the speed is more important than flexibility and we have huge text files (ex: blogs / articles ~2 MB size) that needs to be read from our filesystem and then store into the index. We have our app creating separate core per client (dynamically) and there is one instance of EmbeddedSolrServer for each core thats used for adding documents to the index. Each document has about 10 fields and one of the field has ~2MB data stored (stored = true, analyzed=true). Also we have logic built into our webapp to dynamically create the solr config files (solrConfig & schema per core - filters/analyzers/handler values can be different for each core) for each core before creating an instance of EmbeddedSolrServer for that core. Another reason to go with EmbeddedSolrServer is to reduce overhead of transporting large data (~2 MB) over http/xml. We use this setup for building our master index which then gets replicated to slave servers using replication scripts provided by solr. We also have solr admin ui integrated into our webapp (using admin jsp & handlers from solradmin ui) We have been using this MultiCore setup for more than a year now and so far we havent run into any issues with EmbeddedSolrServer integrated into our webapp. However I am now trying to figure out the impact if we allow multiple threads sending request to EmbeddedSolrServer (same core) for adding docs to index simultaneously. Our understanding was that EmbeddedSolrServer would give us better performance over http solr for our needs. Its quite possible that we might be wrong and http solr would have given us similar/better performance. Also based on documentation from SolrWiki I am assuming that EmbeddedSolrServer API is same as the one used by Http Solr. Said that, can you please tell if there is any specific downside to using EmbeddedSolrServer that could cause issues for us down the line. I am also interested in your below comment about indexing 1 million docs in few mins. Ideally we would like to get to that speed I am assuming this depends on the size of the doc and type of analyzer/tokenizer/filters being used. Correct? Can you please share (or point me to documentation) on how to get this speed for 1 mil docs. >> - one million is a fairly small amount, in average it should be indexed >> in few mins. I doubt that you really need to distribute indexing Thanks -K -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Index-Concurrency-Is-it-possible-to-have-multiple-threads-write-to-same-index-tp4002544p4002776.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Scalability of Solr Result Grouping/Field Collapsing: Millions/Billions of documents?
Yes, distributed grouping works, but grouping takes a lot of resources. If you can avoid in distributed mode, so much the better. On Wed, Aug 22, 2012 at 3:35 PM, Tom Burton-West wrote: > Thanks Tirthankar, > > So the issue in memory use for sorting. I'm not sure I understand how > sorting of grouping fields is involved with the defaults and field > collapsing, since the default sorts by relevance not grouping field. On > the other hand I don't know much about how field collapsing is implemented. > > So far the few tests I've made haven't revealed any memory problems. We > are using very small string fields for grouping and I think that we > probably only have a couple of cases where we are grouping more than a few > thousand docs. I will try to find a query with a lot of docs per group > and take a look at the memory use using JConsole. > > Tom > > > On Wed, Aug 22, 2012 at 4:02 PM, Tirthankar Chatterjee < > tchatter...@commvault.com> wrote: > >> Hi Tom, >> >> We had an issue where we are keeping millions of docs in a single node and >> we were trying to group them on a string field which is nothing but full >> file path… that caused SOLR to go out of memory… >> >> ** ** >> >> Erick has explained nicely in the thread as to why it won’t work and I had >> to find another way of architecting it. >> >> ** ** >> >> How do you think this is different in your case. If you want to group by a >> string field with thousands of similar entries I am guessing you will face >> the same issue. >> >> ** ** >> >> Thanks, >> >> Tirthankar >> ***Legal Disclaimer*** >> "This communication may contain confidential and privileged material for >> the >> sole use of the intended recipient. Any unauthorized review, use or >> distribution >> by others is strictly prohibited. If you have received the message in >> error, >> please advise the sender by reply email and delete the message. Thank you." >> ** >> -- Lance Norskog goks...@gmail.com
Re: Solr Custom Filter Factory - How to pass parameters?
Thanks Erick. I tried to do it all at the filter but the problem i am running into doing it at the filter is intercepting the final commit calls or in other words I am unable to figure out when the final commit should happen such that I don't miss out any data. One option I tried is to increase the in-memory batch size and commit the data from in-memory to database in "incrementToken" method but this can lead to missing out data from in-memory if the size of the batch is less than the set threshold. I'll try using SolrEventListener and see if that can help resolve the issues i am running into. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Custom-Filter-Factory-How-to-pass-parameters-handle-PostProcessing-tp4002217p4002768.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Cloud assigning incorrect port to shards
What container are you using? Sent from my iPhone On Aug 22, 2012, at 3:14 PM, "Buttler, David" wrote: > Hi, > I have set up a Solr 4 beta cloud cluster. I have uploaded a config > directory, and linked it with a configuration name. > > I have started two solr on two computers and added a couple of shards using > the Core Admin function on the admin page. > > When I go to the admin cloud view, the shards all have the computer name and > port attached to them. BUT, the port is the default port (8983), and not the > port that I assigned on the command line. I can still connect to the correct > port, and not the reported port. I anticipate that this will lead to errors > when I get to doing distributed query, as zookeeper seems to be collecting > incorrect information. > > Any thoughts as to why the incorrect port is being stored in zookeeper? > > Thanks, > Dave
Re: Solr 4.0 Beta missing example/conf files?
Yeah - we want fix that for sure. Sent from my iPhone On Aug 22, 2012, at 6:34 PM, Markus Jelsma wrote: > Hi, > > I would think so. Perhaps something for: > https://issues.apache.org/jira/browse/SOLR-3288 > > > -Original message- >> From:Tom Burton-West >> Sent: Wed 22-Aug-2012 22:35 >> To: solr-user@lucene.apache.org >> Subject: Re: Solr 4.0 Beta missing example/conf files? >> >> Thanks Markus! >> >> Should the README.txt file in solr/example be updated to reflect this? >> Is that something I need to enter a JIRA issue for? >> >> Tom >> >> On Wed, Aug 22, 2012 at 3:12 PM, Markus Jelsma >> wrote: >> >>> Hi - The example has been moved to collection1/ >>> >>> >>> >>> -Original message- From:Tom Burton-West Sent: Wed 22-Aug-2012 20:59 To: solr-user@lucene.apache.org Subject: Solr 4.0 Beta missing example/conf files? Hello, Usually in the example/solr file in Solr distributions there is a >>> populated conf file. However in the distribution I downloaded of solr 4.0.0-BETA, there is no /conf directory. Has this been moved somewhere? Tom ls -l apache-solr-4.0.0-BETA/example/solr total 107 drwxr-sr-x 2 tburtonw dlps0 May 29 13:02 bin drwxr-sr-x 3 tburtonw dlps 22 Jun 28 09:21 collection1 -rw-r--r-- 1 tburtonw dlps 2259 May 29 13:02 README.txt -rw-r--r-- 1 tburtonw dlps 2171 Jul 31 19:35 solr.xml -rw-r--r-- 1 tburtonw dlps 501 May 29 13:02 zoo.cfg >>> >>
Re: Scalability of Solr Result Grouping/Field Collapsing: Millions/Billions of documents?
Thanks Tirthankar, So the issue in memory use for sorting. I'm not sure I understand how sorting of grouping fields is involved with the defaults and field collapsing, since the default sorts by relevance not grouping field. On the other hand I don't know much about how field collapsing is implemented. So far the few tests I've made haven't revealed any memory problems. We are using very small string fields for grouping and I think that we probably only have a couple of cases where we are grouping more than a few thousand docs. I will try to find a query with a lot of docs per group and take a look at the memory use using JConsole. Tom On Wed, Aug 22, 2012 at 4:02 PM, Tirthankar Chatterjee < tchatter...@commvault.com> wrote: > Hi Tom, > > We had an issue where we are keeping millions of docs in a single node and > we were trying to group them on a string field which is nothing but full > file path… that caused SOLR to go out of memory… > > ** ** > > Erick has explained nicely in the thread as to why it won’t work and I had > to find another way of architecting it. > > ** ** > > How do you think this is different in your case. If you want to group by a > string field with thousands of similar entries I am guessing you will face > the same issue. > > ** ** > > Thanks, > > Tirthankar > ***Legal Disclaimer*** > "This communication may contain confidential and privileged material for > the > sole use of the intended recipient. Any unauthorized review, use or > distribution > by others is strictly prohibited. If you have received the message in > error, > please advise the sender by reply email and delete the message. Thank you." > ** >
RE: Solr 4.0 Beta missing example/conf files?
Hi, I would think so. Perhaps something for: https://issues.apache.org/jira/browse/SOLR-3288 -Original message- > From:Tom Burton-West > Sent: Wed 22-Aug-2012 22:35 > To: solr-user@lucene.apache.org > Subject: Re: Solr 4.0 Beta missing example/conf files? > > Thanks Markus! > > Should the README.txt file in solr/example be updated to reflect this? > Is that something I need to enter a JIRA issue for? > > Tom > > On Wed, Aug 22, 2012 at 3:12 PM, Markus Jelsma > wrote: > > > Hi - The example has been moved to collection1/ > > > > > > > > -Original message- > > > From:Tom Burton-West > > > Sent: Wed 22-Aug-2012 20:59 > > > To: solr-user@lucene.apache.org > > > Subject: Solr 4.0 Beta missing example/conf files? > > > > > > Hello, > > > > > > Usually in the example/solr file in Solr distributions there is a > > populated > > > conf file. However in the distribution I downloaded of solr 4.0.0-BETA, > > > there is no /conf directory. Has this been moved somewhere? > > > > > > Tom > > > > > > ls -l apache-solr-4.0.0-BETA/example/solr > > > total 107 > > > drwxr-sr-x 2 tburtonw dlps0 May 29 13:02 bin > > > drwxr-sr-x 3 tburtonw dlps 22 Jun 28 09:21 collection1 > > > -rw-r--r-- 1 tburtonw dlps 2259 May 29 13:02 README.txt > > > -rw-r--r-- 1 tburtonw dlps 2171 Jul 31 19:35 solr.xml > > > -rw-r--r-- 1 tburtonw dlps 501 May 29 13:02 zoo.cfg > > > > > >
Re: Solr 3.6.1: query performance is slow when asterisk is in the query
Ok, I'll take your suggestion, but I would still be really happy if the wildcard searches behaved a little more intelligent (body:* not looking for everything in the body). More like when you do "q=*:*" it doesn't really search for everything in every field. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-3-6-1-query-performance-is-slow-when-asterisk-is-in-the-query-tp4002496p4002743.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.0 Beta missing example/conf files?
Thanks Markus! Should the README.txt file in solr/example be updated to reflect this? Is that something I need to enter a JIRA issue for? Tom On Wed, Aug 22, 2012 at 3:12 PM, Markus Jelsma wrote: > Hi - The example has been moved to collection1/ > > > > -Original message- > > From:Tom Burton-West > > Sent: Wed 22-Aug-2012 20:59 > > To: solr-user@lucene.apache.org > > Subject: Solr 4.0 Beta missing example/conf files? > > > > Hello, > > > > Usually in the example/solr file in Solr distributions there is a > populated > > conf file. However in the distribution I downloaded of solr 4.0.0-BETA, > > there is no /conf directory. Has this been moved somewhere? > > > > Tom > > > > ls -l apache-solr-4.0.0-BETA/example/solr > > total 107 > > drwxr-sr-x 2 tburtonw dlps0 May 29 13:02 bin > > drwxr-sr-x 3 tburtonw dlps 22 Jun 28 09:21 collection1 > > -rw-r--r-- 1 tburtonw dlps 2259 May 29 13:02 README.txt > > -rw-r--r-- 1 tburtonw dlps 2171 Jul 31 19:35 solr.xml > > -rw-r--r-- 1 tburtonw dlps 501 May 29 13:02 zoo.cfg > > >
RE: Full Text Indexing for DOCX files
Thanks Jack, I'll give that version of SOLR a try. Vincent Vu Nguyen Web Applications Developer Division of Science Quality and Translation Office of the Associate Director for Science Centers for Disease Control and Prevention (CDC) 404-498-0384 v...@cdc.gov Century Bldg 2400 Atlanta, GA 30329 -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Wednesday, August 22, 2012 4:07 PM To: solr-user@lucene.apache.org Subject: Re: Full Text Indexing for DOCX files I've indexed Office 2007 .docx using Solr 3.6. It sounds as if Solr 1.3 had an old release of Tika/POI. No big surprise there. -- Jack Krupansky -Original Message- From: Nguyen, Vincent (CDC/OD/OADS) (CTR) Sent: Wednesday, August 22, 2012 3:57 PM To: solr-user@lucene.apache.org Subject: Full Text Indexing for DOCX files Has anyone been able to index DOCX files? I get this error message when using office 2007 documents (Location of error unknown)org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. POI only supports OLE2 Office documents We're currently using SOLR1.3 Vincent Vu Nguyen
Re: Full Text Indexing for DOCX files
I've indexed Office 2007 .docx using Solr 3.6. It sounds as if Solr 1.3 had an old release of Tika/POI. No big surprise there. -- Jack Krupansky -Original Message- From: Nguyen, Vincent (CDC/OD/OADS) (CTR) Sent: Wednesday, August 22, 2012 3:57 PM To: solr-user@lucene.apache.org Subject: Full Text Indexing for DOCX files Has anyone been able to index DOCX files? I get this error message when using office 2007 documents (Location of error unknown)org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. POI only supports OLE2 Office documents We're currently using SOLR1.3 Vincent Vu Nguyen
Re: Does DIH commit during large import?
solrconfig.xml has a setting ramBufferSizeMB that can be set to limit the memory consumed during indexing. When this limit is reached, the buffers are flushed to the current segment. NOTE: the segment is NOT closed, there is no implied commit here, and the data will not be searchable until a commit happens. Best Erick On Wed, Aug 22, 2012 at 7:10 AM, Alexandre Rafalovitch wrote: > Thanks, I will look into autoCommit. > > I assume there are memory implications of not committing? Or is it > just writing in a separate file and can theoretically do it > indefinitely? > > Regards, >Alex. > Personal blog: http://blog.outerthoughts.com/ > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > - Time is the quality of nature that keeps events from happening all > at once. Lately, it doesn't seem to be working. (Anonymous - via GTD > book) > > > On Wed, Aug 22, 2012 at 2:42 AM, Lance Norskog wrote: >> Solr has a separate feature called 'autoCommit'. This is configured in >> solrconfig.xml. You can set Solr to commit all documents every N >> milliseconds or every N documents, whichever comes first. If you want >> intermediate commits during a long DIH session, you have to use this >> or make your own script that does commits. >> >> On Tue, Aug 21, 2012 at 8:48 AM, Shawn Heisey wrote: >>> On 8/21/2012 6:41 AM, Alexandre Rafalovitch wrote: I am doing an import of large records (with large full-text fields) and somewhere around 30 records DataImportHandler runs out of memory (Heap) on a TIKA import (triggered from custom Processor) and does roll-back. I am using store=false and trying some tricks and tracking possible memory leaks, but also have a question about DIH itself. What actually happens when I run DIH on a large (XML Source) job? Does it accumulate some sort of status in memory that it commits at the end? If so, can I do intermediate commits to drop the memory requirements? Or, will it help to do several passes over the same dataset and import only particular entries at a time? I am using the Solr 4 (alpha) UI, so I can see some of the options there. >>> >>> >>> I use Solr 3.5 and a MySQL database for import, so my setup may not be >>> completely relevant, but here is my experience. >>> >>> Unless you turn on autocommit in solrconfig, documents will not be >>> searchable during the import. If you have "commit=true" for DIH (which I >>> believe is the default), there will be a commit at the end of the import. >>> >>> It looks like there's an out of memory issue filed on Solr 4 DIH with Tika >>> that is suspected to be a bug in Tika rather than Solr. The issue details >>> talk about some workarounds for those who are familiar with Tika -- I'm not. >>> The issue URL: >>> >>> https://issues.apache.org/jira/browse/SOLR-2886 >>> >>> Thanks, >>> Shawn >>> >> >> >> >> -- >> Lance Norskog >> goks...@gmail.com
Full Text Indexing for DOCX files
Has anyone been able to index DOCX files? I get this error message when using office 2007 documents (Location of error unknown)org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. POI only supports OLE2 Office documents We're currently using SOLR1.3 Vincent Vu Nguyen
Cloud assigning incorrect port to shards
Hi, I have set up a Solr 4 beta cloud cluster. I have uploaded a config directory, and linked it with a configuration name. I have started two solr on two computers and added a couple of shards using the Core Admin function on the admin page. When I go to the admin cloud view, the shards all have the computer name and port attached to them. BUT, the port is the default port (8983), and not the port that I assigned on the command line. I can still connect to the correct port, and not the reported port. I anticipate that this will lead to errors when I get to doing distributed query, as zookeeper seems to be collecting incorrect information. Any thoughts as to why the incorrect port is being stored in zookeeper? Thanks, Dave
RE: Solr 4.0 Beta missing example/conf files?
Hi - The example has been moved to collection1/ -Original message- > From:Tom Burton-West > Sent: Wed 22-Aug-2012 20:59 > To: solr-user@lucene.apache.org > Subject: Solr 4.0 Beta missing example/conf files? > > Hello, > > Usually in the example/solr file in Solr distributions there is a populated > conf file. However in the distribution I downloaded of solr 4.0.0-BETA, > there is no /conf directory. Has this been moved somewhere? > > Tom > > ls -l apache-solr-4.0.0-BETA/example/solr > total 107 > drwxr-sr-x 2 tburtonw dlps0 May 29 13:02 bin > drwxr-sr-x 3 tburtonw dlps 22 Jun 28 09:21 collection1 > -rw-r--r-- 1 tburtonw dlps 2259 May 29 13:02 README.txt > -rw-r--r-- 1 tburtonw dlps 2171 Jul 31 19:35 solr.xml > -rw-r--r-- 1 tburtonw dlps 501 May 29 13:02 zoo.cfg >
Solr 4.0 Beta missing example/conf files?
Hello, Usually in the example/solr file in Solr distributions there is a populated conf file. However in the distribution I downloaded of solr 4.0.0-BETA, there is no /conf directory. Has this been moved somewhere? Tom ls -l apache-solr-4.0.0-BETA/example/solr total 107 drwxr-sr-x 2 tburtonw dlps0 May 29 13:02 bin drwxr-sr-x 3 tburtonw dlps 22 Jun 28 09:21 collection1 -rw-r--r-- 1 tburtonw dlps 2259 May 29 13:02 README.txt -rw-r--r-- 1 tburtonw dlps 2171 Jul 31 19:35 solr.xml -rw-r--r-- 1 tburtonw dlps 501 May 29 13:02 zoo.cfg
Re: Scalability of Solr Result Grouping/Field Collapsing: Millions/Billions of documents?
Hi Lance and Tirthankar, We are currently using Solr 3.6. I tried a search across our current 12 shards grouping by book id (record_no in our schema) and it seems to work fine (the query with the actual urls for the shards changed is appended below.) I then searched for the record_no of the second group in the results to confirm that the number of records being folded is correct. In both cases the numFound is 505 so it seems as though the record counts for the group are correct. Then I tried the same search but changed the shards parameter to limit the search to 1/2 of the shards and got numFound = 325. This shows that the items in the group are distributed between different shards. What am I missing here? What is it that you are saying does not work? Tom Field Collapse query ( IP address changed, and newlines added and shard urls simplified for readability) http://solr-myhost.edu/serve-9/select?indent=on&version=2.2 &shards=shard1,shard2,shard3, shard4,shard5, shard,6,...shard12 &q=title:nature&fq=&start=0&rows=10&fl=id,author,title,volume_enumcron,score &group=true&group.field=record_no&group.limit=2
Re: Solr Custom Filter Factory - How to pass parameters?
I'm reaching a bit here, haven't implemented one myself, but... It seems like you're just dealing with some shared memory. So say your filter recorded all the stuff you want to put into the DB. When you put stuff in to the shared memory, you probably have to figure out when you should commit the batch (if you're indexing 100M docs, you probably don't want to use up that much memory, but what do I know). This is all done at the filter. It seems like you could also create an SolrEventListener on the PostCommit event (see: http://wiki.apache.org/solr/SolrPlugins#SolrEventListener) to put whatever remained in your list into your DB. Of course you'd have to do some synchronization so multiple threads played nice with each other. And you'd have to be sure to fire a commit at the end of your indexing process if you wanted some certainty that everything was tidied up. If some delay isn't a problem and you have autocommit configured, then your event listener would be called when then next autocommit happened. FWIW Erick On Tue, Aug 21, 2012 at 8:19 PM, ksu wildcats wrote: > Jack > > Reading through the documentation for UpdateRequestProcessor my > understanding is that its good for handling processing of documents before > analysis. > Is it true that processAdd (where we can have custom logic) is invoked once > per document and is invoked before any of the analyzers gets invoked? > > I couldn't figure out how I can use UpdateRequestProcessor to access the > tokens stored in memory by CustomFilterFactory/CustomFilter. > > Can you please provide more information on how I can use > UpdateRequestProcessor to handle any post processing that needs to be done > after all documents are added to the index? > > Also does CustomFilterFactory/CustomFilter has any ways to do post > processing after all documents are added to index? > > Here is the code i have for CustomFilterFactory/CustomFilter. This might > help understand what i am trying to do and may be there is a better way to > do this. > The main problem i have with this approach is that i am forced to write > results stored in memory (customMap) to database per document and if i have > 1 million documents then thats 1 million db calls. I am trying to avoid the > number of calls made to database by storing results in memory and write > results to database once for every X documents (say, every 1 docs). > > public class CustomFilterFactory extends BaseTokenFilterFactory { > public CustomFilter create(TokenStream input) { > String databaseName = getArgs().get("paramname"); > return new CustomFilter(input, databasename); > } > } > > public class CustomFilter extends TokenFilter { > private TermAttribute termAtt; > Map customMap = new HashMap Integer>(); > String databasename = null; > protected CustomFilter(TokenStream input, String databasename) { > super(input); > termAtt = (TermAttribute) addAttribute(TermAttribute.class); > this.databasename = databasename; > } > > public final boolean incrementToken() throws IOException { > if (!input.incrementToken()) { > writeResultsToDB() > return false; > } > > if (addWordToCustomMap()) { > // do some analysis on term and then populate > customMap > // customMap.put(term,somevalue); > } > > if (customMap.size() > commitSize) { > writeResultsToDB() > } > return true; > } > > boolean addWordToCustomMap() { > // custom logic - some validation on term to determine if > this should be > added to customMap > } > > void writeResultsToDB() throws IOException { > // custom logic that reads data from customMap, does some > analysis and > writes them to database. > } > } > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-Custom-Filter-Factory-How-to-pass-parameters-tp4002217p4002531.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Which directories are required in Solr?
Hi, checkout : https://github.com/geek4377/jetty-solr you can remove exampledocs from the list to get only the required dirs for running solr. On Wed, Aug 22, 2012 at 1:02 PM, Alexander Cougarman wrote: > Hi. Which folders/files can be deleted from the default Solr package > (apache-solr-3.6.1.zip) on Windows if all we'd like to do is index/store > documents? Thanks. > > Sincerely, > Alex > >
Re: Which directories are required in Solr?
Why do you care? I suspect that the example directory can be removed assuming you're distributing the war file. But disk space is really cheap, I suspect that tidying up the directories for aesthetic reasons isn't worth the risk of removing something that you might need later... Best Erick On Wed, Aug 22, 2012 at 3:32 AM, Alexander Cougarman wrote: > Hi. Which folders/files can be deleted from the default Solr package > (apache-solr-3.6.1.zip) on Windows if all we'd like to do is index/store > documents? Thanks. > > Sincerely, > Alex >
Re: Solr 3.6.1: query performance is slow when asterisk is in the query
Jack, sorry to forgot to answer you, we tried "[* TO *]" and the response times are the same as doing plain "*" -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-3-6-1-query-performance-is-slow-when-asterisk-is-in-the-query-tp4002496p4002708.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Edismax parser weird behavior
Don't have an immediate answer for you on #1, but for #2, "mm" does not override explicit operators - "and" - it only applies to terms that are not the immediate operand of an explicit operator. Note that by default lower-case operators are enabled in edismax - "and" is treated as "AND" - you can set "lowercaseOperators=false" to avoid that. -- Jack Krupansky -Original Message- From: amitesh116 Sent: Wednesday, August 22, 2012 8:13 AM To: solr-user@lucene.apache.org Subject: Edismax parser weird behavior Hi I am experiencing 2 strange behavior in edismax: edismax is configured to behave default OR (using mm=0) Total there are 700 results 1. Search for *auto* = *50 results* Search for *NOT auto* it gives *651 results*. Mathematically, it should give only 650 results for *NOT auto*. 2. Search for *auto* = 50 results Search for *car = 100 results* Search for *auto and car = 10 results* Since we have set mm=0, it should behave like OR and results for auto and car would be more than 100 at least Please help me, understand these two issues. Are these normal behavior? Do I need to tweak the query? Or do I need to look into config or scheam xml files. Thanks in Advance -- View this message in context: http://lucene.472066.n3.nabble.com/Edismax-parser-weird-behavior-tp4002626.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 3.6.1: query performance is slow when asterisk is in the query
You could also add a bodySize numeric (trie) field, which you can check for 0 for empty/missing bodies. And don't forget to check and see whether the "[* TO *]" range query might be faster. -- Jack Krupansky -Original Message- From: david3s Sent: Wednesday, August 22, 2012 12:37 PM To: solr-user@lucene.apache.org Subject: Re: Solr 3.6.1: query performance is slow when asterisk is in the query Hello Chris, thanks a lot for your reply. But is there an alternative solution? Because I see adding "has_body" as data duplication. Imagine in that in a Relational DB you had to create extra columns because you can't do something like "where body is not null" If there's no other alternative I'll have to go with your suggestion that I greatly appreciate. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-3-6-1-query-performance-is-slow-when-asterisk-is-in-the-query-tp4002496p4002698.html Sent from the Solr - User mailing list archive at Nabble.com.
Index version & generation for Solr 3.5
Hi, I ran into an issue lately with Index version & generation for Solr 3.5. In Solr 1.4., the index version of slave service increments upon each replication. However, I noticed it's not the case for Solr 3.5; the index version would increase 20, or 30 after replication. Does anyone know why and any reference on the web for this? The index generation does still increment after replication though. Thanks, Xin
Re: Solr 3.6.1: query performance is slow when asterisk is in the query
The name of the game for performance and functionality in Solr quite often *denormalization*, which might run against your RDBMS instincts, but once you embrace it, you'll find that things go a lot more smoothly. Michael Della Bitta Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 www.appinions.com Where Influence Isn’t a Game On Wed, Aug 22, 2012 at 12:37 PM, david3s wrote: > Hello Chris, thanks a lot for your reply. But is there an alternative > solution? Because I see adding "has_body" as data duplication. > > Imagine in that in a Relational DB you had to create extra columns because > you can't do something like "where body is not null" > > If there's no other alternative I'll have to go with your suggestion that I > greatly appreciate. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-3-6-1-query-performance-is-slow-when-asterisk-is-in-the-query-tp4002496p4002698.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Grammar for ComplexPhraseQueryParser
> Does anyone have the grammar file (.jj > file) for the complex phrase query > parser. The patch from https://issues.apache.org/jira/browse/SOLR-1604 does > not have the grammar file as part of it. It does not have a separate grammar file. It just extends QueryPaser.
Re: Solr 3.6.1: query performance is slow when asterisk is in the query
Hello Chris, thanks a lot for your reply. But is there an alternative solution? Because I see adding "has_body" as data duplication. Imagine in that in a Relational DB you had to create extra columns because you can't do something like "where body is not null" If there's no other alternative I'll have to go with your suggestion that I greatly appreciate. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-3-6-1-query-performance-is-slow-when-asterisk-is-in-the-query-tp4002496p4002698.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr memory: CATALINA_OPTS in setenv.sh ?
Check your cores' "status" page and see if you're running the MMapDirectory (you probably are.) In that case, you probably want to devote even less RAM to Tomcat's heap because the index files are being read out of memory-mapped pages that don't reside on the heap, so you'd be devoting more memory to caching them if you freed it up by lowering the heap. Michael Della Bitta Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 www.appinions.com Where Influence Isn’t a Game On Wed, Aug 22, 2012 at 12:05 PM, Bruno Mannina wrote: > Le 22/08/2012 16:57, Bruno Mannina a écrit : > >> Dear users, >> >> I try to know if my add in the setenv.sh (which I need to create because >> it didn't exist) file has been set but when I click on the link Java >> Properties on Admin Solr web page >> I can't see the variable CATALINA_OPTS. >> >> In fact, I would like to know if my line added in the file setenv.sh is >> ok: >> |CATALINA_OPTS=||"-server -Xss7G -Xms14G -Xmx14G $CATALINA_OPTS >> -XX:+UseConcMarkSweepGC -XX:NewSize=7G -XX:+UseParNewGC"| >> >> my setenv.sh file contains only this line (inside >> /usr/share/tomcat6/bin/). >> >> How can I see if memroy is well allocated ? >> >> Other question: is |*-XX:NewSize=7G* is ok?| >> >> I have 24Go Ram (14G ~60%) >> > I changed the method, I edited the file tomcat6 in /etc/init.d and I modify > the JAVA_OPTS var to: > JAVA_OPTS="-server -Djava.awt.headless=true -Xms14G -Xmx14G" > > Do you think it's correct if I have 24Go Ram? > Do you think something is missing ? like Xss or other ? > > I found many google pages but not really a page that explain how to choose > the right configuration. > I think there isn't a unique answer to this question. > > it seems there are several methods to adjust memory for JVM but what is the > best ?
Query-side Join work in distributed Solr?
Just to clarify that query-side joins ( e.g. {!join from=id to=parent_signal_id_s}id:foo ) do not work in a distributed mode yet? I saw LUCENE-3759 as unresolved but also some some Twitter traffic saying there was a patch available. Cheers, Tim
Re: Solr Score threshold 'reasonably', independent of results returned
Commercial solutions often have %age that is meant to signify the quality of match. Solr has relative score and you cannot tell by just looking at this value if a result is relevant enough to be in first page or not. Score depends on "what else is in the index" so not easy to normalize in the way you suggest. Ravish On Wed, Aug 22, 2012 at 4:03 PM, Mou wrote: > Hi, > I think that this totally depends on your requirements and thus applicable > for a user scenario. Score does not have any absolute meaning, it is always > relative to the query. If you want to watch some particular queries and > want > to show results with score above previously set threshold, you can use > this. > > If I always have that x% threshold in place , there may be many queries > which would not return anything and I certainly do not want that. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-Score-threshold-reasonably-independent-of-results-returned-tp4002312p4002673.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Solr memory: CATALINA_OPTS in setenv.sh ?
Le 22/08/2012 16:57, Bruno Mannina a écrit : Dear users, I try to know if my add in the setenv.sh (which I need to create because it didn't exist) file has been set but when I click on the link Java Properties on Admin Solr web page I can't see the variable CATALINA_OPTS. In fact, I would like to know if my line added in the file setenv.sh is ok: |CATALINA_OPTS=||"-server -Xss7G -Xms14G -Xmx14G $CATALINA_OPTS -XX:+UseConcMarkSweepGC -XX:NewSize=7G -XX:+UseParNewGC"| my setenv.sh file contains only this line (inside /usr/share/tomcat6/bin/). How can I see if memroy is well allocated ? Other question: is |*-XX:NewSize=7G* is ok?| I have 24Go Ram (14G ~60%) I changed the method, I edited the file tomcat6 in /etc/init.d and I modify the JAVA_OPTS var to: JAVA_OPTS="-server -Djava.awt.headless=true -Xms14G -Xmx14G" Do you think it's correct if I have 24Go Ram? Do you think something is missing ? like Xss or other ? I found many google pages but not really a page that explain how to choose the right configuration. I think there isn't a unique answer to this question. it seems there are several methods to adjust memory for JVM but what is the best ?
Re: Scalability of Solr Result Grouping/Field Collapsing: Millions/Billions of documents?
Hi Tirthankar, Can you give me a quick summary of what won't work and why? I couldn't figure it out from looking at your thread. You seem to have a different issue, but maybe I'm missing something here. Tom On Tue, Aug 21, 2012 at 7:10 PM, Tirthankar Chatterjee < tchatter...@commvault.com> wrote: > This wont work, see my thread on Solr3.6 Field collapsing > Thanks, > Tirthankar > >
Re: Scalability of Solr Result Grouping/Field Collapsing: Millions/Billions of documents?
Hi Lance, I don't understand enough of how the field collapsing is implemented, but I thought it worked with distributed search. Are you saying it only works if everything that needs collapsing is on the same shard? Tom On Wed, Aug 22, 2012 at 2:41 AM, Lance Norskog wrote: > How do you separate the documents among the shards? Can you set up the > shards such that one "collapse group" is only on a single shard? That > you never have to do distributed grouping? > > On Tue, Aug 21, 2012 at 4:10 PM, Tirthankar Chatterjee > wrote: > > This wont work, see my thread on Solr3.6 Field collapsing > > Thanks, > > Tirthankar > > > > -Original Message- > > From: Tom Burton-West > > Date: Tue, 21 Aug 2012 18:39:25 > > To: solr-user@lucene.apache.org > > Reply-To: "solr-user@lucene.apache.org" > > Cc: William Dueber; Phillip Farber > > Subject: Scalability of Solr Result Grouping/Field Collapsing: > > Millions/Billions of documents? > > > > Hello all, > > > > We are thinking about using Solr Field Collapsing on a rather large scale > > and wonder if anyone has experience with performance when doing Field > > Collapsing on millions of or billions of documents (details below. ) Are > > there performance issues with grouping large result sets? > > > > Details: > > We have a collection of the full text of 10 million books/journals. This > > is spread across 12 shards with each shard holding about 800,000 > > documents. When a query matches a journal article, we would like to > group > > all the matching articles from the same journal together. (there is a > > unique id field identifying the journal). Similarly when there is a > match > > in multiple copies of the same book we would like to group all results > for > > the same book together (again we have a unique id field we can group on). > > Sometimes a short query against the OCR field will result in over one > > million hits. Are there known performance issues when field collapsing > > result sets containing a million hits? > > > > We currently index the entire book as one Solr document. We would like > to > > investigate the feasibility of indexing each page as a Solr document > with a > > field indicating the book id. We could then offer our users the choice > of > > a list of the most relevant pages, or a list of the books containing the > > most relevant pages. We have approximately 3 billion pages. Does > anyone > > have experience using field collapsing on this sort of scale? > > > > Tom > > > > Tom Burton-West > > Information Retrieval Programmer > > Digital Library Production Service > > Univerity of Michigan Library > > http://www.hathitrust.org/blogs/large-scale-search > > **Legal Disclaimer*** > > "This communication may contain confidential and privileged > > material for the sole use of the intended recipient. Any > > unauthorized review, use or distribution by others is strictly > > prohibited. If you have received the message in error, please > > advise the sender by reply email and delete the message. Thank > > you." > > * > > > > -- > Lance Norskog > goks...@gmail.com >
RE: Co-existing solr cloud installations
This is really nice. Thanks for pointing it out. Dave -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, August 21, 2012 8:23 PM To: solr-user@lucene.apache.org Subject: Re: Co-existing solr cloud installations You can use a connect string of host:port/path to 'chroot' a path. I think currently you have to manually create the path first though. See the ZkCli tool (doc'd on SolrCloud wiki) for a simple way to do that. I keep meaning to look into auto making it if it doesn't exist, but have not gotten to it. - Mark On Tue, Aug 21, 2012 at 4:46 PM, Buttler, David wrote: > Hi all, > I would like to use a single zookeeper cluster to manage multiple Solr cloud > installations. However, the current design of how Solr uses zookeeper seems > to preclude that. Have I missed a configuration option to set a zookeeper > prefix for all of a Solr cloud configuration directories? > > If I look at the zookeeper data it looks like: > > * /clusterstate.json > * /collections > * /configs > * /live_nodes > * /overseer > * /overseer_elect > * /zookeeper > Is there a reason not to put all of these nodes under some user-configurable > higher-level node, such as /solr4? > It could have a reasonable default value to make it just as easy to find as /. > > My current issue is that I have an old Solr cloud instance from back in the > Solr 1.5 days, and I don't expect that the new version and the old version > will play nice. > > Thanks, > Dave >
Re: Solr Score threshold 'reasonably', independent of results returned
Hi, I think that this totally depends on your requirements and thus applicable for a user scenario. Score does not have any absolute meaning, it is always relative to the query. If you want to watch some particular queries and want to show results with score above previously set threshold, you can use this. If I always have that x% threshold in place , there may be many queries which would not return anything and I certainly do not want that. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Score-threshold-reasonably-independent-of-results-returned-tp4002312p4002673.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: display SOLR Query in web page
It's not great to leak internal implementation details of your application out like this, and it may be that someone more skilled at exploiting things like this could find one. Michael Della Bitta Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 www.appinions.com Where Influence Isn’t a Game On Wed, Aug 22, 2012 at 10:20 AM, Bernd Fehling wrote: > I haven't spent time in trying anything, just entered a query and recognized > that it showed up in the page source view. > If they really escape everything it is not that dangerous? > > Actually I don't want to try anything with their page, > they might not have any humor ;-) > > Bernd > > > Am 22.08.2012 15:41, schrieb Michael Della Bitta: >> Actually, I'm having a little trouble coming up with a >> proof-of-concept exploit for this... it doesn't seem like Solr is >> exposed directly, and it does seem like it's escaping submitted >> content before redisplaying it on the page. >> >> I'm not crazy about leaking the raw query string into the HTML, but it >> doesn't seem to lead to more than just that. >> >> Please let me know if I am missing something, it's still morningtime >> here in the US and I haven't had enough coffee yet. :) >> >> Michael Della Bitta >> >> >> Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 >> www.appinions.com >> Where Influence Isn’t a Game >> >> >> On Wed, Aug 22, 2012 at 9:32 AM, Michael Della Bitta >> wrote: >>> Ouch, not to mention the potential for XSS. >>> >>> I'll see if I can get in touch with someone. >>> >>> Michael Della Bitta >>> >>> >>> Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 >>> www.appinions.com >>> Where Influence Isn’t a Game >>> >>> >>> On Wed, Aug 22, 2012 at 3:40 AM, Bernd Fehling >>> wrote: Now this is very scary, while searching for "solr direct access per docid" I got a hit from US Homeland Security Digital Library. Interested in what they have to tell me about my search I clicked on the link to the page. First the page had nothing unusual about it, but why I get the hit? http://www.hsdl.org/?collection/stratpol&id=4 Inspecting the page source view shows that they have the solr query displayed direct on their page as "span" with "style=display:none". -- snippet -- *** SOLR Query *** — q=Collection:0 AND (TabSection:("Congressional hearings and testimony", "Congressional reports", "Congressional resolutions", "Directives (presidential)", "Executive orders", "Major Legislation", "Public laws", "Reports (CBO)", "Reports (CHDS)", "Reports (CRS)",... ... AND (Title_nostem:("China Forces Senior Intelligence Officer")^10 AlternateTitle_nostem:("China Forces Senior Intelligence Officer")^9)&sort=score desc&rows=30&start=0&indent=off&facet=on&facet.limit=1&facet.mincount=1&fl=AlternateTitle_text,Collection,CoverageCountry,CoverageState,Creator_nostem,DateLastModified,DateOfRecordEntry,Description_text,DisplayDate,DocID,ExternalDocId,ExternalDocSource,FileDate,FileExtension,FileSize,FileTitle_text,Format,Language,PublishDate,Publisher_text,Publisher_nostem,ReportNumber,ResourceType,RetrievedFrom,Rights,Subjects,Source,TabSection,Title_text,URL_text,Alternate_URL_text,CreatedBy,ModifiedBy,Notes&wt=phps&facet.field=Creator&facet.field=Format&facet.field=Language&facet.field=Publisher&facet.field=TabSection -- snippet -- As you can see I have searched for "China Forces Senior Intelligence Officer" so this is directly showing the query string. Do they know that there is also a delete by query? And the are also escape sequences? This is what I call scary. Maybe some of the US fellows can give them a hint and a helping hand. Regards Bernd > > -- > * > Bernd FehlingUniversitätsbibliothek Bielefeld > Dipl.-Inform. (FH)LibTec - Bibliothekstechnologie > Universitätsstr. 25 und Wissensmanagement > 33615 Bielefeld > Tel. +49 521 106-4060 bernd.fehling(at)uni-bielefeld.de > > BASE - Bielefeld Academic Search Engine - www.base-search.net > *
Solr memory: CATALINA_OPTS in setenv.sh ?
Dear users, I try to know if my add in the setenv.sh (which I need to create because it didn't exist) file has been set but when I click on the link Java Properties on Admin Solr web page I can't see the variable CATALINA_OPTS. In fact, I would like to know if my line added in the file setenv.sh is ok: |CATALINA_OPTS=||"-server -Xss7G -Xms14G -Xmx14G $CATALINA_OPTS -XX:+UseConcMarkSweepGC -XX:NewSize=7G -XX:+UseParNewGC"| my setenv.sh file contains only this line (inside /usr/share/tomcat6/bin/). How can I see if memroy is well allocated ? Other question: is |*-XX:NewSize=7G* is ok?| I have 24Go Ram (14G ~60%)
search is slow for URL fields of type String.
This is string fieldType: These are the filelds using 'string' fieldType: And this the sample query: /select/?q=url:http\://www.foxbusiness.com/personal-finance/2012/08/10/social-change-coming-from-gas-prices-to-rent-prices-and-beyond/ AND image_url:* Each query like this taking around 400 milli seconds. What are the change I can do to the fieldType to improve query performance? thanks Srini -- View this message in context: http://lucene.472066.n3.nabble.com/search-is-slow-for-URL-fields-of-type-String-tp4002662.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: display SOLR Query in web page
I haven't spent time in trying anything, just entered a query and recognized that it showed up in the page source view. If they really escape everything it is not that dangerous? Actually I don't want to try anything with their page, they might not have any humor ;-) Bernd Am 22.08.2012 15:41, schrieb Michael Della Bitta: > Actually, I'm having a little trouble coming up with a > proof-of-concept exploit for this... it doesn't seem like Solr is > exposed directly, and it does seem like it's escaping submitted > content before redisplaying it on the page. > > I'm not crazy about leaking the raw query string into the HTML, but it > doesn't seem to lead to more than just that. > > Please let me know if I am missing something, it's still morningtime > here in the US and I haven't had enough coffee yet. :) > > Michael Della Bitta > > > Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 > www.appinions.com > Where Influence Isn’t a Game > > > On Wed, Aug 22, 2012 at 9:32 AM, Michael Della Bitta > wrote: >> Ouch, not to mention the potential for XSS. >> >> I'll see if I can get in touch with someone. >> >> Michael Della Bitta >> >> >> Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 >> www.appinions.com >> Where Influence Isn’t a Game >> >> >> On Wed, Aug 22, 2012 at 3:40 AM, Bernd Fehling >> wrote: >>> Now this is very scary, while searching for "solr direct access per docid" >>> I got a hit >>> from US Homeland Security Digital Library. Interested in what they have to >>> tell me >>> about my search I clicked on the link to the page. First the page had >>> nothing unusual >>> about it, but why I get the hit? >>> http://www.hsdl.org/?collection/stratpol&id=4 >>> >>> Inspecting the page source view shows that they have the solr query >>> displayed direct >>> on their page as "span" with "style=display:none". >>> -- snippet -- >>> >>> >>> *** SOLR Query *** — q=Collection:0 AND >>> (TabSection:("Congressional hearings and testimony", "Congressional >>> reports", "Congressional resolutions", "Directives (presidential)", >>> "Executive orders", "Major Legislation", "Public laws", "Reports (CBO)", >>> "Reports (CHDS)", "Reports (CRS)",... >>> ... >>> AND (Title_nostem:("China Forces Senior Intelligence Officer")^10 >>> AlternateTitle_nostem:("China Forces Senior Intelligence >>> Officer")^9)&sort=score >>> desc&rows=30&start=0&indent=off&facet=on&facet.limit=1&facet.mincount=1&fl=AlternateTitle_text,Collection,CoverageCountry,CoverageState,Creator_nostem,DateLastModified,DateOfRecordEntry,Description_text,DisplayDate,DocID,ExternalDocId,ExternalDocSource,FileDate,FileExtension,FileSize,FileTitle_text,Format,Language,PublishDate,Publisher_text,Publisher_nostem,ReportNumber,ResourceType,RetrievedFrom,Rights,Subjects,Source,TabSection,Title_text,URL_text,Alternate_URL_text,CreatedBy,ModifiedBy,Notes&wt=phps&facet.field=Creator&facet.field=Format&facet.field=Language&facet.field=Publisher&facet.field=TabSection >>> -- snippet -- >>> >>> As you can see I have searched for "China Forces Senior Intelligence >>> Officer" so this is directly showing the >>> query string. >>> Do they know that there is also a delete by query? >>> And the are also escape sequences? >>> >>> This is what I call scary. >>> Maybe some of the US fellows can give them a hint and a helping hand. >>> >>> Regards >>> Bernd -- * Bernd FehlingUniversitätsbibliothek Bielefeld Dipl.-Inform. (FH)LibTec - Bibliothekstechnologie Universitätsstr. 25 und Wissensmanagement 33615 Bielefeld Tel. +49 521 106-4060 bernd.fehling(at)uni-bielefeld.de BASE - Bielefeld Academic Search Engine - www.base-search.net *
Re: Solr - case-insensitive search do not work
Did you see my message about debugging parameters? Try that and see what's happening behind the scenes. I can confirm that by default the queries are NOT case sensitive. Ravish On Wed, Aug 22, 2012 at 2:45 PM, meghana wrote: > Hi Ravish , the defination for text_en_splitting in solr default schema and > of mine are same.. still its not working... any idea? > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-case-insensitive-search-do-not-work-tp4002605p4002645.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Problem to start solr-4.0.0-BETA with tomcat-6.0.20
Hi, I tried to start the solr-4.0.0-BETA with tomcat-6.0.20 but does not work. I copied the apache-solr-4.0.0-BETA.war to $TOMCAT_HOME/webapps. Then I copied the directory apache-solr-4.0.0-BETA\example\solr to C:\home\solr-4.0-beta and adjusted the file $TOMCAT_HOME\conf\Catalina\localhost\apache-solr-4.0.0-BETA.xml to point the solr/home to C:/home/solr-4.0-beta. With this configuration, when I startup tomcat I got: SEVERE: org.apache.solr.common.SolrException: Invalid luceneMatchVersion 'LUCENE_40', valid values are: [LUCENE_20, LUCENE_21, LUCENE_22, LUCENE_23, LUCENE_24, LUCENE_29, LUCENE_30, LUCENE_31, LUCENE_32, LUCENE_33, LUCENE_34, LUCENE_35, LUCENE_36, LUCENE_CURRENT ] or a string in format 'VV' So I changed the line in solrconfig.xml: LUCENE_40 to LUCENE_CURRENT So I got a new error: Caused by: java.lang.ClassNotFoundException: solr.NRTCachingDirectoryFactory This class is within the file apache-solr-core-4.0.0-BETA.jar but for some reason classloader of the class is not loaded. I then moved all jars in $TOMCAT_HOME\webapps\apache-solr-4.0.0-BETA\WEB-INF\lib to $TOMCAT_HOME\lib. After this setup, I got a new error: SEVERE: java.lang.ClassCastException: org.apache.solr.core.NRTCachingDirectoryFactory can not be cast to org.apache.solr.core.DirectoryFactory So I changed the line in solrconfig.xml: to So I got a new error: Caused by: java.lang.ClassCastException: org.apache.solr.spelling.DirectSolrSpellChecker can not be cast to org.apache.solr.spelling.SolrSpellChecker How can I resolve the problem of classloader? How can I resolve the problem of cast of NRTCachingDirectoryFactory and DirectSolrSpellChecker? I can not startup the solr 4.0 beta with tomcat. Thanks,
Re: Solr - case-insensitive search do not work
Hi Ravish , the defination for text_en_splitting in solr default schema and of mine are same.. still its not working... any idea? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-case-insensitive-search-do-not-work-tp4002605p4002645.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: display SOLR Query in web page
Actually, I'm having a little trouble coming up with a proof-of-concept exploit for this... it doesn't seem like Solr is exposed directly, and it does seem like it's escaping submitted content before redisplaying it on the page. I'm not crazy about leaking the raw query string into the HTML, but it doesn't seem to lead to more than just that. Please let me know if I am missing something, it's still morningtime here in the US and I haven't had enough coffee yet. :) Michael Della Bitta Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 www.appinions.com Where Influence Isn’t a Game On Wed, Aug 22, 2012 at 9:32 AM, Michael Della Bitta wrote: > Ouch, not to mention the potential for XSS. > > I'll see if I can get in touch with someone. > > Michael Della Bitta > > > Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 > www.appinions.com > Where Influence Isn’t a Game > > > On Wed, Aug 22, 2012 at 3:40 AM, Bernd Fehling > wrote: >> Now this is very scary, while searching for "solr direct access per docid" I >> got a hit >> from US Homeland Security Digital Library. Interested in what they have to >> tell me >> about my search I clicked on the link to the page. First the page had >> nothing unusual >> about it, but why I get the hit? >> http://www.hsdl.org/?collection/stratpol&id=4 >> >> Inspecting the page source view shows that they have the solr query >> displayed direct >> on their page as "span" with "style=display:none". >> -- snippet -- >> >> >> *** SOLR Query *** — q=Collection:0 AND >> (TabSection:("Congressional hearings and testimony", "Congressional >> reports", "Congressional resolutions", "Directives (presidential)", >> "Executive orders", "Major Legislation", "Public laws", "Reports (CBO)", >> "Reports (CHDS)", "Reports (CRS)",... >> ... >> AND (Title_nostem:("China Forces Senior Intelligence Officer")^10 >> AlternateTitle_nostem:("China Forces Senior Intelligence >> Officer")^9)&sort=score >> desc&rows=30&start=0&indent=off&facet=on&facet.limit=1&facet.mincount=1&fl=AlternateTitle_text,Collection,CoverageCountry,CoverageState,Creator_nostem,DateLastModified,DateOfRecordEntry,Description_text,DisplayDate,DocID,ExternalDocId,ExternalDocSource,FileDate,FileExtension,FileSize,FileTitle_text,Format,Language,PublishDate,Publisher_text,Publisher_nostem,ReportNumber,ResourceType,RetrievedFrom,Rights,Subjects,Source,TabSection,Title_text,URL_text,Alternate_URL_text,CreatedBy,ModifiedBy,Notes&wt=phps&facet.field=Creator&facet.field=Format&facet.field=Language&facet.field=Publisher&facet.field=TabSection >> -- snippet -- >> >> As you can see I have searched for "China Forces Senior Intelligence >> Officer" so this is directly showing the >> query string. >> Do they know that there is also a delete by query? >> And the are also escape sequences? >> >> This is what I call scary. >> Maybe some of the US fellows can give them a hint and a helping hand. >> >> Regards >> Bernd
Re: display SOLR Query in web page
Ouch, not to mention the potential for XSS. I'll see if I can get in touch with someone. Michael Della Bitta Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 www.appinions.com Where Influence Isn’t a Game On Wed, Aug 22, 2012 at 3:40 AM, Bernd Fehling wrote: > Now this is very scary, while searching for "solr direct access per docid" I > got a hit > from US Homeland Security Digital Library. Interested in what they have to > tell me > about my search I clicked on the link to the page. First the page had nothing > unusual > about it, but why I get the hit? > http://www.hsdl.org/?collection/stratpol&id=4 > > Inspecting the page source view shows that they have the solr query displayed > direct > on their page as "span" with "style=display:none". > -- snippet -- > > > *** SOLR Query *** — q=Collection:0 AND > (TabSection:("Congressional hearings and testimony", "Congressional > reports", "Congressional resolutions", "Directives (presidential)", > "Executive orders", "Major Legislation", "Public laws", "Reports (CBO)", > "Reports (CHDS)", "Reports (CRS)",... > ... > AND (Title_nostem:("China Forces Senior Intelligence Officer")^10 > AlternateTitle_nostem:("China Forces Senior Intelligence > Officer")^9)&sort=score > desc&rows=30&start=0&indent=off&facet=on&facet.limit=1&facet.mincount=1&fl=AlternateTitle_text,Collection,CoverageCountry,CoverageState,Creator_nostem,DateLastModified,DateOfRecordEntry,Description_text,DisplayDate,DocID,ExternalDocId,ExternalDocSource,FileDate,FileExtension,FileSize,FileTitle_text,Format,Language,PublishDate,Publisher_text,Publisher_nostem,ReportNumber,ResourceType,RetrievedFrom,Rights,Subjects,Source,TabSection,Title_text,URL_text,Alternate_URL_text,CreatedBy,ModifiedBy,Notes&wt=phps&facet.field=Creator&facet.field=Format&facet.field=Language&facet.field=Publisher&facet.field=TabSection > -- snippet -- > > As you can see I have searched for "China Forces Senior Intelligence Officer" > so this is directly showing the > query string. > Do they know that there is also a delete by query? > And the are also escape sequences? > > This is what I call scary. > Maybe some of the US fellows can give them a hint and a helping hand. > > Regards > Bernd
Re: SpellCheck Component does not work for certain words
Hi, Just few things to add up, I found that when we search for less than or equal to 3 letters I'm not able to get any suggestions and also when I search for "finding", I dont get any suggestions related to it even though i have search results regarding the same. But when i Search for "findingg" i get suggestions for it and one of the suggestions is "finding" and in this case the search results are zero. Can you tell me if this is the way the spell check is intended to work or am I going wrong some where? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/SpellCheck-Component-does-not-work-for-certain-words-tp4002573p4002636.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Scalability of Solr Result Grouping/Field Collapsing: Millions/Billions of documents?
You can collapse in each Shards as a separate query Lance Norskog wrote: How do you separate the documents among the shards? Can you set up the shards such that one "collapse group" is only on a single shard? That you never have to do distributed grouping? On Tue, Aug 21, 2012 at 4:10 PM, Tirthankar Chatterjee wrote: > This wont work, see my thread on Solr3.6 Field collapsing > Thanks, > Tirthankar > > -Original Message- > From: Tom Burton-West > Date: Tue, 21 Aug 2012 18:39:25 > To: solr-user@lucene.apache.org > Reply-To: "solr-user@lucene.apache.org" > Cc: William Dueber; Phillip Farber > Subject: Scalability of Solr Result Grouping/Field Collapsing: > Millions/Billions of documents? > > Hello all, > > We are thinking about using Solr Field Collapsing on a rather large scale > and wonder if anyone has experience with performance when doing Field > Collapsing on millions of or billions of documents (details below. ) Are > there performance issues with grouping large result sets? > > Details: > We have a collection of the full text of 10 million books/journals. This > is spread across 12 shards with each shard holding about 800,000 > documents. When a query matches a journal article, we would like to group > all the matching articles from the same journal together. (there is a > unique id field identifying the journal). Similarly when there is a match > in multiple copies of the same book we would like to group all results for > the same book together (again we have a unique id field we can group on). > Sometimes a short query against the OCR field will result in over one > million hits. Are there known performance issues when field collapsing > result sets containing a million hits? > > We currently index the entire book as one Solr document. We would like to > investigate the feasibility of indexing each page as a Solr document with a > field indicating the book id. We could then offer our users the choice of > a list of the most relevant pages, or a list of the books containing the > most relevant pages. We have approximately 3 billion pages. Does anyone > have experience using field collapsing on this sort of scale? > > Tom > > Tom Burton-West > Information Retrieval Programmer > Digital Library Production Service > Univerity of Michigan Library > http://www.hathitrust.org/blogs/large-scale-search > **Legal Disclaimer*** > "This communication may contain confidential and privileged > material for the sole use of the intended recipient. Any > unauthorized review, use or distribution by others is strictly > prohibited. If you have received the message in error, please > advise the sender by reply email and delete the message. Thank > you." > * -- Lance Norskog goks...@gmail.com
Edismax parser weird behavior
Hi I am experiencing 2 strange behavior in edismax: edismax is configured to behave default OR (using mm=0) Total there are 700 results 1. Search for *auto* = *50 results* Search for *NOT auto* it gives *651 results*. Mathematically, it should give only 650 results for *NOT auto*. 2. Search for *auto* = 50 results Search for *car = 100 results* Search for *auto and car = 10 results* Since we have set mm=0, it should behave like OR and results for auto and car would be more than 100 at least Please help me, understand these two issues. Are these normal behavior? Do I need to tweak the query? Or do I need to look into config or scheam xml files. Thanks in Advance -- View this message in context: http://lucene.472066.n3.nabble.com/Edismax-parser-weird-behavior-tp4002626.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr - case-insensitive search do not work
Also, try comparing your field configuration to Solrs default text field and see if you can spot any differences. Ravish On Wed, Aug 22, 2012 at 1:09 PM, Ravish Bhagdev wrote: > OK. Try without quotes like myfield:cloud+university and see if it has > any effect. > > Also, try both queries with debugging turned on and post the output of the > same ( http://wiki.apache.org/solr/CommonQueryParameters#Debugging ) > > It must be some field configuration issue or that double quotes are > causing some analyzers to not work on your query. > > Hope this helps. > > Ravish > > On Wed, Aug 22, 2012 at 12:11 PM, meghana wrote: > >> @Ravish Bhagdev , Yes I am adding double quotes around my search , as >> shown >> in my post. Like, >> >> myfield:"cloud university" >> >> >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Solr-case-insensitive-search-do-not-work-tp4002605p4002610.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > >
Re: Solr - case-insensitive search do not work
OK. Try without quotes like myfield:cloud+university and see if it has any effect. Also, try both queries with debugging turned on and post the output of the same ( http://wiki.apache.org/solr/CommonQueryParameters#Debugging ) It must be some field configuration issue or that double quotes are causing some analyzers to not work on your query. Hope this helps. Ravish On Wed, Aug 22, 2012 at 12:11 PM, meghana wrote: > @Ravish Bhagdev , Yes I am adding double quotes around my search , as shown > in my post. Like, > > myfield:"cloud university" > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-case-insensitive-search-do-not-work-tp4002605p4002610.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Runtime.exec() not working on Tomcat
Could it be different 'current' working directories? What happens if you hardcode the full path into the command and input/output files? ./convert.bin -> /Dev/Solr/bin/convert.bin, etc. Also, you may want to use some file system observation tools to figure out exactly what file is touched where. Look for dtrace on Unix-like systems and for SysInternals ProcMon on Windows. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Aug 22, 2012 at 7:18 AM, 122jxgcn wrote: > I have following code on my Apache Tika Maven project. > > This code works when I test locally, but fails when it's attached as > external jar in Apache Solr (container is Tomcat). > > String cmd; contains command string that will convert file with input as > > ./convert.bin input.custom output.xml
Runtime.exec() not working on Tomcat
I have following code on my Apache Tika Maven project. This code works when I test locally, but fails when it's attached as external jar in Apache Solr (container is Tomcat). String cmd; contains command string that will convert file with input as ./convert.bin input.custom output.xml I checked that convert.bin and input.custom exists. String cmd; // As explained above File out = new File(dir_path, output.xml); // dir_path is file path Process ps = null; try { ps = Runtime.getRuntime().exec(cmd); // execute command int exitVal = ps.waitFor(); logger.info("Executing Runtime successful with exit value of " + exitVal); // exitVal is 0 } catch (Exception e) { logger.error("Exception in executing Runtime: " + e); // not reaching here } // I get "Out file does not exist", although I should get the proper output if (out.exists()) logger.info("Out file exists]"); else logger.info("Out file does not exist]"); // reaches here out.setWritable(true); out.setReadable(true); out.setExecutable(true); out.deleteOnExit(); // I get FileNotFoundException here InputStream xml_stream = new FileInputStream(out); I'm really confused because I get the right result locally (Maven test), but not when it is on Tomcat. Any help please? -- View this message in context: http://lucene.472066.n3.nabble.com/Runtime-exec-not-working-on-Tomcat-tp4002614.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Does DIH commit during large import?
Thanks, I will look into autoCommit. I assume there are memory implications of not committing? Or is it just writing in a separate file and can theoretically do it indefinitely? Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Aug 22, 2012 at 2:42 AM, Lance Norskog wrote: > Solr has a separate feature called 'autoCommit'. This is configured in > solrconfig.xml. You can set Solr to commit all documents every N > milliseconds or every N documents, whichever comes first. If you want > intermediate commits during a long DIH session, you have to use this > or make your own script that does commits. > > On Tue, Aug 21, 2012 at 8:48 AM, Shawn Heisey wrote: >> On 8/21/2012 6:41 AM, Alexandre Rafalovitch wrote: >>> >>> I am doing an import of large records (with large full-text fields) >>> and somewhere around 30 records DataImportHandler runs out of >>> memory (Heap) on a TIKA import (triggered from custom Processor) and >>> does roll-back. I am using store=false and trying some tricks and >>> tracking possible memory leaks, but also have a question about DIH >>> itself. >>> >>> What actually happens when I run DIH on a large (XML Source) job? Does >>> it accumulate some sort of status in memory that it commits at the >>> end? If so, can I do intermediate commits to drop the memory >>> requirements? Or, will it help to do several passes over the same >>> dataset and import only particular entries at a time? I am using the >>> Solr 4 (alpha) UI, so I can see some of the options there. >> >> >> I use Solr 3.5 and a MySQL database for import, so my setup may not be >> completely relevant, but here is my experience. >> >> Unless you turn on autocommit in solrconfig, documents will not be >> searchable during the import. If you have "commit=true" for DIH (which I >> believe is the default), there will be a commit at the end of the import. >> >> It looks like there's an out of memory issue filed on Solr 4 DIH with Tika >> that is suspected to be a bug in Tika rather than Solr. The issue details >> talk about some workarounds for those who are familiar with Tika -- I'm not. >> The issue URL: >> >> https://issues.apache.org/jira/browse/SOLR-2886 >> >> Thanks, >> Shawn >> > > > > -- > Lance Norskog > goks...@gmail.com
Re: Solr - case-insensitive search do not work
@Ravish Bhagdev , Yes I am adding double quotes around my search , as shown in my post. Like, myfield:"cloud university" -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-case-insensitive-search-do-not-work-tp4002605p4002610.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr - case-insensitive search do not work
is already present in your field type definition (its twice now) Are you adding quotes around your query by any chance? Ravish On Wed, Aug 22, 2012 at 11:31 AM, meghana wrote: > I want to apply case-insensitive search for field *myfield* in solr. > > I googled a bit for that , and i found that , i need to apply > *LowerCaseFilterFactory *to Field Type and field should be of > solr.TextFeild. > > I applied that in my *schema.xml* and re-index the data, then also my > search > seems to be case-sensitive. > > Below is search that i perform. > * > http://localhost:8080/solr/select?q=myfield:"cloud > university"&hl=on&hl.snippets=99&hl.fl=myfield* > > Below is definition for field type > > positionIncrementGap="100" autoGeneratePhraseQueries="true"> > > > > > ignoreCase="true" > words="stopwords_en.txt" > enablePositionIncrements="true" > /> > generateWordParts="1" generateNumberParts="1" catenateWords="1" > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > > > ignoreCase="true" expand="true"/> > ignoreCase="true" > words="stopwords_en.txt" > enablePositionIncrements="true" > /> > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > > > and below is my field definition > > stored="true" > /> > > Not sure , what is wrong with this. Please help me to resolve this. > > Thanks > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-case-insensitive-search-do-not-work-tp4002605.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Solr - case-insensitive search do not work
I want to apply case-insensitive search for field *myfield* in solr. I googled a bit for that , and i found that , i need to apply *LowerCaseFilterFactory *to Field Type and field should be of solr.TextFeild. I applied that in my *schema.xml* and re-index the data, then also my search seems to be case-sensitive. Below is search that i perform. * http://localhost:8080/solr/select?q=myfield:"cloud university"&hl=on&hl.snippets=99&hl.fl=myfield* Below is definition for field type and below is my field definition Not sure , what is wrong with this. Please help me to resolve this. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-case-insensitive-search-do-not-work-tp4002605.html Sent from the Solr - User mailing list archive at Nabble.com.
Weighted Search Results / Multi-Value Value's Not Aggregating Weight
Hey, I have been having some problems getting good search results when using weighting against many fields with multi-values. After quite a bit of testing it seems to me that the problem is (at least as far as my query is concerned) is that the only one weighting is taken into account per field. For example, in a multi-value field if we have "Comedy" and "Romance" and have separate weightings for those - the one with the highest weighting is used (and not a combined weighting). Which means that searched for romantic comedy returns "Alvin and the Chipmunks" (Family, Children Comedy). Query: facet=on&fl=id,name,matching_genres,score,url_path,url_key,price,special_price,small_image,thumbnail,sku,stock_qty,release_date&sort=score+desc,retail_rating+desc,release_date+desc&start=&q=**+-sku:"1019660"+-movie_id:"1805"+-movie_id:"1806"+(series_names_attr_opt_id:"454282"^9000+OR+cat_id:"22"^9+OR+cat_id:"248"^9+OR+cat_id:"249"^9+OR+matching_genres:"Comedy"^9+OR+matching_genres:"Romance"^7+OR+matching_genres:"Drama"^5)&fq=store_id:"1"+AND+avail_status_attr_opt_id:"available"+AND+(format_attr_opt_id:"372619")&rows=4 Now if I change matching_genres:"Romance"^7 to matching_genres:"Romance"^70 (adding a 0) suddenly the first result is "Sex and the City: The Movie / Sex and the City 2" (which ironically is "Drama", "Comedy", "Romance - The very combination we are looking for). So is there a way to structure my query so that all of the multi-value values are treated individually? Aggregating the weighting/score? Thanks in advance! David
Re: Solr search – Tika extracted text from PDF not return highlighting snippet
Thanks for your reply, I had tryied many things (copy field etc) with no succes. Notice that the "pdfs" are stored as BLOB in mysql database. I am trying to use DIH in order to fetch the binaries from DB. Is it possible? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-search-Tika-extracted-text-from-PDF-not-return-highlighting-snippet-tp3999647p4002587.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Use a different folder for schema.xml
You can include one xml file into another, something like 1. 2. ]> 4. 5. &resourcedb; 6. - Ravish On Wed, Aug 22, 2012 at 8:56 AM, Alexander Cougarman wrote: > Thanks, Lance. Please forgive my ignorance, but what do you mean by soft > links/XML include feature? Can you provide an example? Thanks again. > > Sincerely, > Alex > > -Original Message- > From: Lance Norskog [mailto:goks...@gmail.com] > Sent: 22 August 2012 9:55 AM > To: solr-user@lucene.apache.org > Subject: Re: Use a different folder for schema.xml > > It is possible to store the entire conf/ directory somewhere. To store > only the schema.xml file, try soft links or the XML include feature: > conf/schema.xml includes from somewhere else. > > On Tue, Aug 21, 2012 at 11:31 PM, Alexander Cougarman > wrote: > > Hi. For our Solr instance, we need to put the schema.xml file in a > different location than where it resides now. Is this possible? Thanks. > > > > Sincerely, > > Alex > > > > > > -- > Lance Norskog > goks...@gmail.com >
RE: Use a different folder for schema.xml
Thanks, Lance. Please forgive my ignorance, but what do you mean by soft links/XML include feature? Can you provide an example? Thanks again. Sincerely, Alex -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: 22 August 2012 9:55 AM To: solr-user@lucene.apache.org Subject: Re: Use a different folder for schema.xml It is possible to store the entire conf/ directory somewhere. To store only the schema.xml file, try soft links or the XML include feature: conf/schema.xml includes from somewhere else. On Tue, Aug 21, 2012 at 11:31 PM, Alexander Cougarman wrote: > Hi. For our Solr instance, we need to put the schema.xml file in a different > location than where it resides now. Is this possible? Thanks. > > Sincerely, > Alex > -- Lance Norskog goks...@gmail.com
display SOLR Query in web page
Now this is very scary, while searching for "solr direct access per docid" I got a hit from US Homeland Security Digital Library. Interested in what they have to tell me about my search I clicked on the link to the page. First the page had nothing unusual about it, but why I get the hit? http://www.hsdl.org/?collection/stratpol&id=4 Inspecting the page source view shows that they have the solr query displayed direct on their page as "span" with "style=display:none". -- snippet -- *** SOLR Query *** — q=Collection:0 AND (TabSection:("Congressional hearings and testimony", "Congressional reports", "Congressional resolutions", "Directives (presidential)", "Executive orders", "Major Legislation", "Public laws", "Reports (CBO)", "Reports (CHDS)", "Reports (CRS)",... ... AND (Title_nostem:("China Forces Senior Intelligence Officer")^10 AlternateTitle_nostem:("China Forces Senior Intelligence Officer")^9)&sort=score desc&rows=30&start=0&indent=off&facet=on&facet.limit=1&facet.mincount=1&fl=AlternateTitle_text,Collection,CoverageCountry,CoverageState,Creator_nostem,DateLastModified,DateOfRecordEntry,Description_text,DisplayDate,DocID,ExternalDocId,ExternalDocSource,FileDate,FileExtension,FileSize,FileTitle_text,Format,Language,PublishDate,Publisher_text,Publisher_nostem,ReportNumber,ResourceType,RetrievedFrom,Rights,Subjects,Source,TabSection,Title_text,URL_text,Alternate_URL_text,CreatedBy,ModifiedBy,Notes&wt=phps&facet.field=Creator&facet.field=Format&facet.field=Language&facet.field=Publisher&facet.field=TabSection -- snippet -- As you can see I have searched for "China Forces Senior Intelligence Officer" so this is directly showing the query string. Do they know that there is also a delete by query? And the are also escape sequences? This is what I call scary. Maybe some of the US fellows can give them a hint and a helping hand. Regards Bernd
Highlighting is case sensitive when search with double quote
when i search with "abc cde", solr will return result but highlighting portion is as per below, and when i search with "ABC cde" it will have below response ... ... ABC cde . seems highlighting returns response is case sensitive. in above both case other query parameters are same. how can i get case insensitive response. Thanks, Vishal Parekh -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-is-case-sensitive-when-search-with-double-quote-tp4002576.html Sent from the Solr - User mailing list archive at Nabble.com.
Which directories are required in Solr?
Hi. Which folders/files can be deleted from the default Solr package (apache-solr-3.6.1.zip) on Windows if all we'd like to do is index/store documents? Thanks. Sincerely, Alex
SpellCheck Component does not work for certain words
Hi, Im using Solr 1.4.0 version and tried to use spellcheck component by doing the following changes in solrconfig.xml solr.IndexBasedSpellChecker ./spellchekerFile1 spell true and included the below in the standard request handler to enable spell check spellcheck We have used copyfield for the spell field in schema.xml to contain all the data from the other fields. The search is working fine with certain queries. When we try to search for "dep", it does not give any suggestions. but, when we search for "depo" there are results which contains "dep" also. Can you tell me why is that the suggestions are fetched when we query as "dep"? Should we use solr.WordBreakSolrSpellChecker to get the indented results? If so, can you guide us on the required config changes? Please guide me on this Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/SpellCheck-Component-does-not-work-for-certain-words-tp4002573.html Sent from the Solr - User mailing list archive at Nabble.com.