Re: How to manage resource out of index?
hi li, i looked at doing something similar - where we only index the text but retrieve search results / highlight from files -- we ended up giving up because of the amount of customisation required in solr -- mainly because we wanted the distributed search functionality in solr which meant making sure the original file ended up the same filing system i.e. machine too!). we ended up just storing the main text field too even though there was a bit of text -- in the end solr/lucene can handle the index size fine and disk space is cheaper than man-hours to customise solr/lucene to work in this way! that was our conclusion anyway and it works fine -- we also have separate index / search server(s) so we don't care about merge time either -- and as i said above - we use the distributed search so don't tend to need to merge very large indexes anyway. when your system grows / you go into production you'll probably split the indexes too to use solr's distributed search func. for the sake of query speed). hope that helps, bec :) On 7 July 2010 14:07, Li Li wrote: > I used to store full text into lucene index. But I found it's very > slow when merging index because when merging 2 segments it copy the > fdt files into a new one. So I want to only index full text. But When > searching I need the full text for applications such as hightlight and > view full text. I can store the full text by pair in > database and load it to memory. And When I search in lucene(or solr), > I retrive url of doc first, then use url to get full text. But when > they are stored separately, it is hard to managed. They may be not > consistent with each other. Does lucene or solr provied any method to > ease this problem? Or any one has some experience of this problem? > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
How to manage resource out of index?
I used to store full text into lucene index. But I found it's very slow when merging index because when merging 2 segments it copy the fdt files into a new one. So I want to only index full text. But When searching I need the full text for applications such as hightlight and view full text. I can store the full text by pair in database and load it to memory. And When I search in lucene(or solr), I retrive url of doc first, then use url to get full text. But when they are stored separately, it is hard to managed. They may be not consistent with each other. Does lucene or solr provied any method to ease this problem? Or any one has some experience of this problem?
Re: document level security: indexing/searching techniques
You could implement a good solution with the underlying Lucene ParallelReader http://lucene.apache.org/java/3_0_2/api/core/org/apache/lucene/index/ParallelReader.html Keep the 100 search fields - 'static' info - in one index, the permissions info in another index that gets updated when the permissions change. Does SOLR expose this kind of functionality? -Glen Newton http://zzzoot.blogspot.com/ http://zzzoot.blogspot.com/2009/07/project-torngat-building-large-scale.html On 7 July 2010 00:38, RL wrote: > > I've a question about indexing/searching techniques in relation to document > level security. > In planning a system that has, let's say, about 1million search documents > with about 100 search fields each. Most of them unstored to keep the index > size low, because some of them can contain some kilobytes and some of them > several hundred kilobytes. Two of these search fields are for permission > checking, where i keep the explicitely allowed and explicitely disallowed > users and usergroups. (usergroups can be in a hierarchical structure with > permission inheritance) > > So when a user searches in the system, his user id, and ids of usergroup > memberships are added as a filter query in my application logic before the > query is sent to solr. So far so good for the searching part. > > But the problem is, that the permissions can be changed by administrators of > that system, requiring to re-index the two permission search fields. > > first idea: > Partial updates of index entries is not possible, so i need to fetch all the > 1million documents from a database to do a re-indexing just because some > permissions changed. The fetching process is rather expensive and requires > more then 14hours. I am sure that this can be optimized of course, but i > would rather try to avoid a whole re-indexing of all content. > > second idea: > Another idea would be to store just the permissions in one small and fast to > update index and all the other stuff in the other huge and not so often > updated index. But i didn't find any possibilities to combine these two > indices in one query. Is that even possible? > > > Does somebody have experience with these topics or give advice how to solve > that case properly? > Thanks in advance. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/document-level-security-indexing-searching-techniques-tp946528p946528.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- -
Re: document level security: indexing/searching techniques
What Ken describes is called 'role-based' security. Users have roles, and security items talk about roles, not users. http://en.wikipedia.org/wiki/Role-based_access_control On Tue, Jul 6, 2010 at 3:15 PM, Peter Sturge wrote: > Yes, you don't want to hard code permissions into your index - it will give > you headaches. > > You might want to have a look at SOLR 1872: > https://issues.apache.org/jira/browse/SOLR-1872 . > This patch provides doc level security through an external ACL mechanism (in > this case, an XML file) controlling a filter query, > This way, you don't need to change the schema - you can even use existing > indexes, and you can change access control without affecting your stored > data. > > HTH, > Peter > > > On Tue, Jul 6, 2010 at 5:16 PM, Ken Krugler > wrote: > >> >> On Jul 6, 2010, at 8:27am, osocurious2 wrote: >> >> >>> Someone else was recently asking a similar question (or maybe it was you >>> but >>> worded differently :) ). >>> >>> Putting user level security at a document level seems like a recipe for >>> pain. Solr/Lucene don't do frequent update well...and being highly >>> optimized >>> for query, I don't blame them. Is there any way to create a series of >>> roles >>> that you can apply to your documents? If the security level of the >>> document >>> isn't changing, just the user access to them, give the docs a role in the >>> index, put your user/usergroup stuff in a DB or some other system and >>> resolve your user into valid roles, then FilterQuery on role. >>> >> >> You're right, baking in too fine-grained a level of security information is >> a bad idea. >> >> As one example that worked pretty well for code search with Krugle, we set >> access control on a per project level using LDAP groups - ie each project >> had some number of groups that were granted access rights. Each file in the >> project would inherit the same list of groups. >> >> Then, when a user logs in they get authenticated via LDAP, and we have the >> set of groups they belong to being returned by the LDAP server. This then >> becomes a fairly well-bounded list of "terms" for an OR query against the >> "acl-groups" field in each file/project document. Just don't forget to set >> the boost to 0 for that portion of the query :) >> >> -- Ken >> >> >> Ken Krugler >> +1 530-210-6378 >> http://bixolabs.com >> e l a s t i c w e b m i n i n g >> >> >> >> >> > -- Lance Norskog goks...@gmail.com
Re: general debugging techniques?
Ah! I did not notice the 'too many open files' part. This means that your mergeFactor setting is too high for what your operating system allows. The default mergeFactor is 10 (which translates into thousands of open file descriptors). You should lower this number. On Tue, Jul 6, 2010 at 1:14 PM, Jim Blomo wrote: > On Sat, Jul 3, 2010 at 1:10 PM, Lance Norskog wrote: >> You don't need to optimize, only commit. > > OK, thanks for the tip, Lance. I thought the "too many open files" > problem was because I wasn't optimizing/merging frequently enough. My > understanding of your suggestion is that commit also does merging, and > since I am only building the index, not querying or updating it, I > don't need to optimize. > >> This means that the JVM spends 98% of its time doing garbage >> collection. This means there is not enough memory. > > I'll increase the memory to 4G, decrease the documentCache to 5 and try again. > >> I made a mistake - the bug in Lucene is not about PDFs - it happens >> with every field in every document you index in any way- so doing this >> in Tika outside Solr does not help. The only trick I can think of is >> to alternate between indexing large and small documents. This way the >> bug does not need memory for two giant documents in a row. > > I've checked out and built solr from branch_3x with the > tika-0.8-SNAPSHOT patch. (Earlier I was having trouble with Tika > crashing too frequently.) I've confirmed that LUCENE-2387 is fixed in > this branch so hopefully I won't run into that this time. > >> Also, do not query the indexer at all. If you must, don't do sorted or >> faceting requests. These eat up a lot of memory that is only freed >> with the next commit (index reload). > > Good to know, though I have not been querying the index and definitely > haven't ventured into faceted requests yet. > > The advice is much appreciated, > > Jim > -- Lance Norskog goks...@gmail.com
index format error because disk full
the index file is ill-formated because disk full when feeding. Can I roll back to last version? Is there any method to avoid unexpected errors when indexing? attachments are my segment_N
Re: Deleting Terms:
That's because deleting a document simply marks it as deleted, it doesn't really do much else with it, all that work is deferred to the optimize step as you've found. But deleted documents will NOT be found even though the admin page shows their terms still in the index. Best Erick On Tue, Jul 6, 2010 at 1:20 PM, Kumaravel Kandasami < kumaravel.kandas...@gmail.com> wrote: > FYI - optimise() operations solved the issue. > > > Kumar_/|\_ > www.saisk.com > ku...@saisk.com > "making a profound difference with knowledge and creativity..." > > > On Tue, Jul 6, 2010 at 11:47 AM, Kumaravel Kandasami < > kumaravel.kandas...@gmail.com> wrote: > > > BTW, Using SOLRJ - javabin api. > > > > > > > > Kumar_/|\_ > > www.saisk.com > > ku...@saisk.com > > "making a profound difference with knowledge and creativity..." > > > > > > On Tue, Jul 6, 2010 at 11:43 AM, Kumaravel Kandasami < > > kumaravel.kandas...@gmail.com> wrote: > > > >> Hi, > >> > >>How to delete the terms associated with the document ? > >> > >> Current scenario: We are deleting documents based on a query > >> ('field:value'). > >> The documents are getting deleted, however, the old terms associated to > >> the field are displayed in the admin. > >> > >> How do we make SOLR to re-evaluate and update the terms associated to a > >> specific fields or latest updated document ? > >> > >> (I am assuming we are missing some api calls .) > >> > >> Thank you. > >> > >> > >> Kumar_/|\_ > >> www.saisk.com > >> ku...@saisk.com > >> "making a profound difference with knowledge and creativity..." > >> > > > > >
Re: Relevancy and non-matching words
Underneath SOLR is Lucene. Here's a description of Lucene's scoring algorithm (follow the "Similarity" link) http://lucene.apache.org/java/2_4_0/scoring.html#Understanding%20the%20Scoring%20Formula Letters in non-matching words isn't relevant, what is is the relationship between the number of search terms found and the number of tokens (think of them as words) in the field. I'm also assuming you've either set the default operator to AND or that your default field is "title". Using &debugQyery=on will show you a lot. you can also access that information from the admin pages (Full Interface link or something like that). HTH Erick On Tue, Jul 6, 2010 at 12:17 PM, dbashford wrote: > > Is there some sort of threshold that I can tweak which sets how many > letters > in non-matching words makes a result more or less relevant? > > Searching on title, q=fantasy football, and I get this: > > {"title":"The Fantasy Football Guys", > "score":2.8387074}, > {"title":"Fantasy Football Bums", > "score":2.8387074}, > {"title":"Fantasy Football Xtreme", > "score":2.7019854}, > {"title":"Fantasy Football Fools", > "score":2.7019634}, > {"title":"Fantasy Football Brothers", > "score":2.5917912} > > (I have some other scoring things in there that account for the difference > between Xtreme and Fools.) > > The behavior I'm noticing is that there is some threshold for the length of > non matching words that, when tripped, kicks the score down a notch. 4 to > 5 > seems to trip one, 6 to 7. > > I would really like something like "Bums" to score the same as "Xtreme" and > "Brothers" and let my other criterion determine which document should come > out on top. Is there something that can be tweaked to get this to happen? > > Or is my assumption a bit off base? > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Relevancy-and-non-matching-words-tp946799p946799.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Adding new elements to index
first do you have a unique key defined in your schema.xml? If you do, some of those 300 rows could be replacing earlier rows. You say: " if I have 200 rows indexed from postgres and 100 rows from Oracle, the full-import process only indexes 200 documents from oracle, although it shows clearly that the query retruned 300 rows." Which really looks like a typo, if you have 100 rows from Oracle how did you get 200 rows from Oracle? Are you perhaps doing this in two different jobs and deleting the first import before running the second? And if this is irrelevant, could you provide more details like how you're indexing things (I'm assuming DIH, but you don't state that anywhere). If it *is* DIH, providing that configuration would help. Best Erick On Tue, Jul 6, 2010 at 11:19 AM, Xavier Rodriguez wrote: > Hi, > > I have a SOLR installed on a Tomcat application server. This solr instance > has some data indexed from a postgres database. Now I need to add some > entities from an Oracle database. When I run the full-import command, the > documents indexed are only documents from postgres. In fact, if I have 200 > rows indexed from postgres and 100 rows from Oracle, the full-import > process > only indexes 200 documents from oracle, although it shows clearly that the > query retruned 300 rows. > > I'm not doing a delta-import, simply a full import. I've tried to clean the > index, reload the configuration, and manually remove dataimport.properties > because it's the only metadata i found. Is there any other file to check > or > modify just to get all 300 rows indexed? > > Of course, I tried to find one of that oracle fields, with no results. > > Thanks a lot, > > Xavier Rodriguez. >
Re: Wildcards queries
Still not enough info. Please show: 1> the field type (not field, but field type showing the analyzers for the field you're interested in). 2> example data you've indexed 3> the query you submit 4> the response from the query (especially with &debugQuery=on appended to the query). Otherwise, it's really hard to guess what's going on. HTH Erick On Tue, Jul 6, 2010 at 9:58 AM, Robert Naczinski < robert.naczin...@googlemail.com> wrote: > Hi, > > thanks for the reply. I am an absolute beginner with Solr. > > I have taken, for the beginning, the configuration from > {solr.home}example/solr . > > In solrconfig.xml are all queryparser commented out ;-( Where can a > find the QeryParser? Javadoc, Wiki? > > Regards, > > Robert > > 2010/7/6 Mark Miller : > > On 7/6/10 8:53 AM, Robert Naczinski wrote: > >> Hi, > >> > >> we use in our application EmbeddedSolrServer. > > > > Great! > > > >> Everything went fine. > > > > Excellent! > > > >> Now I want use wildcards queries. > > > > Cool! > > > >> > >> It does not work. > > > > Bummer! > > > >> Must be adapted for the schema.xml? > > > > Not necessarily... > > > >> > >> Can someone help me? > > > > We can try! > > > >>In wiki, I find nothing? > > > > No, you will find lots! > > > >> Why do I need simple > >> example or link. > > > > Because it would be helpful! > > > > > >> > >> Regards, > >> > >> Robert > > > > > > What query parser are you using? Dismax? That query parser does not > > support wildcards. Try the lucene queryparser if that's the case. > > > > Otherwise respond with more information about your setup. > > > > -- > > - Mark > > > > http://www.lucidimagination.com > > >
Re: Problem building Nightly Solr
On Jul 6, 2010, at 3:44pm, Chris Hostetter wrote: : Can you try "ant compile example"? : After Lucene/Solr merge, solr ant build needs to compile before example : target. the "compile" target is already in the dependency tree for the "example" target, so that won't change anything. At the moment, the "nightly" snapshots produced by hudson only iclude the "solr" section of the "dev" tree -- not modules or the lucene-java sections . The compiled versions of thothat code is included, so you can *run* solr from the hudson artifacts, but aparently you can't compile it. (this is particularly odd since the nightlies include all the compiled lucene code as jars in a "lucene-libs/" directory, but the build system doesn't seem to use that directory ... at least not when compiling solrj). This is all side effects of trunk still being somewhat in transition -- there are kinks in dealing with the artifacts of the nightly build process tha still need worked out, -- but if your goal is to compile things yourself, then you might as well just check out the entire trunk and use that compile fro mthat anyway. Note that you'll need to "ant compile" from the top of the lucene directory first, before trying any of the solr-specific builds from inside of the /solr sub-dir. Or at least that's what I ran into when trying to build a solr dist recently. -- Ken Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g
Re: Problem building Nightly Solr
: (this is particularly odd since the nightlies include all the compiled : lucene code as jars in a "lucene-libs/" directory, but the build system : doesn't seem to use that directory ... at least not when compiling solrj). https://issues.apache.org/jira/browse/SOLR-1989 -Hoss
Re: Problem building Nightly Solr
: Can you try "ant compile example"? : After Lucene/Solr merge, solr ant build needs to compile before example : target. the "compile" target is already in the dependency tree for the "example" target, so that won't change anything. At the moment, the "nightly" snapshots produced by hudson only iclude the "solr" section of the "dev" tree -- not modules or the lucene-java sections . The compiled versions of thothat code is included, so you can *run* solr from the hudson artifacts, but aparently you can't compile it. (this is particularly odd since the nightlies include all the compiled lucene code as jars in a "lucene-libs/" directory, but the build system doesn't seem to use that directory ... at least not when compiling solrj). This is all side effects of trunk still being somewhat in transition -- there are kinks in dealing with the artifacts of the nightly build process tha still need worked out, -- but if your goal is to compile things yourself, then you might as well just check out the entire trunk and use that compile fro mthat anyway. -Hoss
Re: Unicode processing - Issue with CharStreamAwareWhitespaceTokenizerFactory
The Char-filters MUST come before the Tokenizer, due to their nature of processing the character-stream and not the tokens. If you need to apply the accent normalizatino later in the analysis chain, either use ISOLatin1AccentFilterFactory or help with the implementation of SOLR-1978. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 5. juli 2010, at 17.32, Saïd Radhouani wrote: > Thanks Koji for the reply and for updating wiki. As it's written now in wiki, > it sounds (at least to me) like MappingCharFilterFactory works only with > WhitespaceTokenizerFactory. > > Did you really mean that? Because this filter works also with other > tkenizers. For instance, in my text type, I'm using StandardTokenizerFactory > for document processing, and WhitespaceTokenizerFactory for query processing. > > I also noticed that, in whatever order you put this filter in the definition > of a field type, it's always applied (during text processing) before the > tokenizer and all the other filters. Is there a reason for that? Is there a > possibility to force the filter to be applied at a certain order among the > other filters? > > Thanks, > -S > > On Jul 5, 2010, at 4:28 PM, Koji Sekiguchi wrote: > >> >>> In the same wiki, they say that CharStreamAwareWhitespaceTokenizerFactory >>> must be used with MappingCharFilterFactory. But, when I use these tokenizer >>> and filter together, I get a sever error saying that the filed type >>> containing these filter and tokenizer is unknown. However, when I use this >>> filter with StandardTokenizerFactory or WhitespaceTokenizerFactory! >>> >>> >> The wiki is not correct today. Before Lucene 2.9 (and Solr 1.4), >> Tokenizers can take Reader argument in constructor. But after that, >> because they can take CharStream argument in constructor, >> *CharStreamAware* Tokenizers are no longer needed (all Tokenizers >> are aware of CharStream). I'll update the wiki. >> >> Koji >> >> -- >> http://www.rondhuit.com/en/ >> >
Re: document level security: indexing/searching techniques
Yes, you don't want to hard code permissions into your index - it will give you headaches. You might want to have a look at SOLR 1872: https://issues.apache.org/jira/browse/SOLR-1872 . This patch provides doc level security through an external ACL mechanism (in this case, an XML file) controlling a filter query, This way, you don't need to change the schema - you can even use existing indexes, and you can change access control without affecting your stored data. HTH, Peter On Tue, Jul 6, 2010 at 5:16 PM, Ken Krugler wrote: > > On Jul 6, 2010, at 8:27am, osocurious2 wrote: > > >> Someone else was recently asking a similar question (or maybe it was you >> but >> worded differently :) ). >> >> Putting user level security at a document level seems like a recipe for >> pain. Solr/Lucene don't do frequent update well...and being highly >> optimized >> for query, I don't blame them. Is there any way to create a series of >> roles >> that you can apply to your documents? If the security level of the >> document >> isn't changing, just the user access to them, give the docs a role in the >> index, put your user/usergroup stuff in a DB or some other system and >> resolve your user into valid roles, then FilterQuery on role. >> > > You're right, baking in too fine-grained a level of security information is > a bad idea. > > As one example that worked pretty well for code search with Krugle, we set > access control on a per project level using LDAP groups - ie each project > had some number of groups that were granted access rights. Each file in the > project would inherit the same list of groups. > > Then, when a user logs in they get authenticated via LDAP, and we have the > set of groups they belong to being returned by the LDAP server. This then > becomes a fairly well-bounded list of "terms" for an OR query against the > "acl-groups" field in each file/project document. Just don't forget to set > the boost to 0 for that portion of the query :) > > -- Ken > > > Ken Krugler > +1 530-210-6378 > http://bixolabs.com > e l a s t i c w e b m i n i n g > > > > >
Re: Problem building Nightly Solr
(10/07/07 6:25), darknovan...@gmail.com wrote: I'd like to try the new edismax feature in Solr, so I downloaded the latest nightly (apache-solr-4.0-2010-07-05_08-06-42) and tried running "ant example". It fails with a missing package error. I've pasted in the output below. I tried a nightly from a couple weeks ago, and it did the same thing, as did the current svn version. Just to make sure it wasn'ta problem with my environment, I tried building Solr 1.4.1 and it worked fine. I'm running java 1.6.0_20 and ant 1.7.1. Is there anything I should be doing differently or is this something that needs to get fix in the builds? Thanks, Nick --- nick:/tmp/apache-solr-4.0-2010-07-05_08-06-42$ ant example Buildfile: build.xml init-forrest-entities: dist-contrib: init: init-forrest-entities: compile-lucene: compile-solrj: [javac] Compiling 89 source files to /tmp/apache-solr-4.0-2010-07-05_08-06-42/build/solrj [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:19: package org.apache.lucene.util does not exist [javac] import org.apache.lucene.util.PriorityQueue; [javac] ^ [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:352: cannot find symbol [javac] symbol : class PriorityQueue [javac] location: class org.apache.solr.common.util.ConcurrentLRUCache [javac] private static class PQueue extends PriorityQueue { [javac] ^ [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:319: cannot find symbol [javac] symbol : method size() [javac] location: class org.apache.solr.common.util.ConcurrentLRUCache.PQueue [javac] while (queue.size() > queue.myMaxSize && queue.size() > 0) { [javac] ^ [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:319: cannot find symbol [javac] symbol : method size() [javac] location: class org.apache.solr.common.util.ConcurrentLRUCache.PQueue [javac] while (queue.size() > queue.myMaxSize && queue.size() > 0) { [javac] ^ [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:320: cannot find symbol [javac] symbol : method pop() [javac] location: class org.apache.solr.common.util.ConcurrentLRUCache.PQueue [javac] CacheEntry otherEntry = (CacheEntry) queue.pop(); [javac] ^ [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:355: non-static variable super cannot be referenced from a static context [javac] super.initialize(maxSz); [javac] ^ [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:355: cannot find symbol [javac] symbol : method initialize(int) [javac] location: class java.lang.Object [javac] super.initialize(maxSz); [javac] ^ [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:359: cannot find symbol [javac] symbol : variable heap [javac] location: class org.apache.solr.common.util.ConcurrentLRUCache.PQueue [javac] Object[] getValues() { return heap; } [javac] ^ [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:368: non-static method size() cannot be referenced from a static context [javac] if (size() < myMaxSize) { [javac] ^ [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:369: cannot find symbol [javac] symbol : method add(java.lang.Object) [javac] location: class org.apache.solr.common.util.ConcurrentLRUCache.PQueue [javac] add(element); [javac] ^ [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:371: non-static method size() cannot be referenced from a static context [javac] } else if (size() > 0 && !lessThan(element, heap[1])) { [javac] ^ [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:371: cannot find symbol [javac] symbol : variable heap [javac] location: class org.apache.solr.common.util.ConcurrentLRUCache.PQueue [javac] } else if (size() > 0 && !lessThan(element, heap[1])) { [javac] ^ [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:372: cannot find symbol [javac] symbol : variable heap [javac] location: class org.apache.solr.common.util.ConcurrentLRUCache.PQueue [javac] Object ret = heap[1]; [javac] ^ [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:373: cannot find symbol [javac] symbol : variable heap [javac] location: class org.apache.solr.common.util.ConcurrentLRUCache.PQueue [javac] heap[1] = element; [javac] ^ [javac] /tmp/apache-solr-4.0-2010-07-05_08
Problem building Nightly Solr
I'd like to try the new edismax feature in Solr, so I downloaded the latest nightly (apache-solr-4.0-2010-07-05_08-06-42) and tried running "ant example". It fails with a missing package error. I've pasted in the output below. I tried a nightly from a couple weeks ago, and it did the same thing, as did the current svn version. Just to make sure it wasn'ta problem with my environment, I tried building Solr 1.4.1 and it worked fine. I'm running java 1.6.0_20 and ant 1.7.1. Is there anything I should be doing differently or is this something that needs to get fix in the builds? Thanks, Nick --- nick:/tmp/apache-solr-4.0-2010-07-05_08-06-42$ ant example Buildfile: build.xml init-forrest-entities: dist-contrib: init: init-forrest-entities: compile-lucene: compile-solrj: [javac] Compiling 89 source files to /tmp/apache-solr-4.0-2010-07-05_08-06-42/build/solrj [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:19: package org.apache.lucene.util does not exist [javac] import org.apache.lucene.util.PriorityQueue; [javac] ^ [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:352: cannot find symbol [javac] symbol : class PriorityQueue [javac] location: class org.apache.solr.common.util.ConcurrentLRUCache [javac] private static class PQueue extends PriorityQueue { [javac] ^ [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:319: cannot find symbol [javac] symbol : method size() [javac] location: class org.apache.solr.common.util.ConcurrentLRUCache.PQueue [javac] while (queue.size() > queue.myMaxSize && queue.size() > 0) { [javac] ^ [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:319: cannot find symbol [javac] symbol : method size() [javac] location: class org.apache.solr.common.util.ConcurrentLRUCache.PQueue [javac] while (queue.size() > queue.myMaxSize && queue.size() > 0) { [javac] ^ [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:320: cannot find symbol [javac] symbol : method pop() [javac] location: class org.apache.solr.common.util.ConcurrentLRUCache.PQueue [javac] CacheEntry otherEntry = (CacheEntry) queue.pop(); [javac] ^ [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:355: non-static variable super cannot be referenced from a static context [javac] super.initialize(maxSz); [javac] ^ [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:355: cannot find symbol [javac] symbol : method initialize(int) [javac] location: class java.lang.Object [javac] super.initialize(maxSz); [javac] ^ [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:359: cannot find symbol [javac] symbol : variable heap [javac] location: class org.apache.solr.common.util.ConcurrentLRUCache.PQueue [javac] Object[] getValues() { return heap; } [javac] ^ [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:368: non-static method size() cannot be referenced from a static context [javac] if (size() < myMaxSize) { [javac] ^ [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:369: cannot find symbol [javac] symbol : method add(java.lang.Object) [javac] location: class org.apache.solr.common.util.ConcurrentLRUCache.PQueue [javac] add(element); [javac] ^ [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:371: non-static method size() cannot be referenced from a static context [javac] } else if (size() > 0 && !lessThan(element, heap[1])) { [javac] ^ [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:371: cannot find symbol [javac] symbol : variable heap [javac] location: class org.apache.solr.common.util.ConcurrentLRUCache.PQueue [javac] } else if (size() > 0 && !lessThan(element, heap[1])) { [javac] ^ [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:372: cannot find symbol [javac] symbol : variable heap [javac] location: class org.apache.solr.common.util.ConcurrentLRUCache.PQueue [javac] Object ret = heap[1]; [javac] ^ [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-42/src/common/org/apache/solr/common/util/ConcurrentLRUCache.java:373: cannot find symbol [javac] symbol : variable heap [javac] location: class org.apache.solr.common.util.ConcurrentLRUCache.PQueue [javac] heap[1] = element; [javac] ^ [javac] /tmp/apache-solr-4.0-2010-07-05_08-06-
Re: Solr results not updating
That's exactly what it was. I forgot to commit. Thanks, Moazzam On Tue, Jul 6, 2010 at 3:29 PM, Markus Jelsma wrote: > Hi, > > > > If q=*:* doesn't show your insert, then you forgot the commit: > > http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22 > > > > Cheers, > > > > -Original message- > From: Moazzam Khan > Sent: Tue 06-07-2010 22:09 > To: solr-user@lucene.apache.org; > Subject: Solr results not updating > > Hi, > > I just successfully inserted a document into SOlr but when I search > for it, it doesn't show up. Is it a cache issue or something? Is there > a way to make sure it was inserted properly? And, it's there? > > Thanks, > Moazzam >
RE: Solr results not updating
Hi, If q=*:* doesn't show your insert, then you forgot the commit: http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22 Cheers, -Original message- From: Moazzam Khan Sent: Tue 06-07-2010 22:09 To: solr-user@lucene.apache.org; Subject: Solr results not updating Hi, I just successfully inserted a document into SOlr but when I search for it, it doesn't show up. Is it a cache issue or something? Is there a way to make sure it was inserted properly? And, it's there? Thanks, Moazzam
Re: general debugging techniques?
On Sat, Jul 3, 2010 at 1:10 PM, Lance Norskog wrote: > You don't need to optimize, only commit. OK, thanks for the tip, Lance. I thought the "too many open files" problem was because I wasn't optimizing/merging frequently enough. My understanding of your suggestion is that commit also does merging, and since I am only building the index, not querying or updating it, I don't need to optimize. > This means that the JVM spends 98% of its time doing garbage > collection. This means there is not enough memory. I'll increase the memory to 4G, decrease the documentCache to 5 and try again. > I made a mistake - the bug in Lucene is not about PDFs - it happens > with every field in every document you index in any way- so doing this > in Tika outside Solr does not help. The only trick I can think of is > to alternate between indexing large and small documents. This way the > bug does not need memory for two giant documents in a row. I've checked out and built solr from branch_3x with the tika-0.8-SNAPSHOT patch. (Earlier I was having trouble with Tika crashing too frequently.) I've confirmed that LUCENE-2387 is fixed in this branch so hopefully I won't run into that this time. > Also, do not query the indexer at all. If you must, don't do sorted or > faceting requests. These eat up a lot of memory that is only freed > with the next commit (index reload). Good to know, though I have not been querying the index and definitely haven't ventured into faceted requests yet. The advice is much appreciated, Jim
Solr results not updating
Hi, I just successfully inserted a document into SOlr but when I search for it, it doesn't show up. Is it a cache issue or something? Is there a way to make sure it was inserted properly? And, it's there? Thanks, Moazzam
Re: using DataImport Dev Console: no errors, but no documents
: It fetches 5322 rows but doesn't process any documents and doesn't : populate the index. Any suggestions would be appreciated. I don't know much about DIH, but it seems weird that both of your entities say 'rootEntity="false"' looking at the docs, that definitely doesn't seem like what you want... http://wiki.apache.org/solr/DataImportHandler >> rootEntity : By default the entities falling under the document are >> root entities. If it is set to false , the entity directly falling >> under that entity will be treated as the root entity (so on and so >> forth). For every row returned by the root entity a document is >> created in Solr -Hoss
Re: DatImportHandler and cron issue
: What we are seeing is the request is dispatched to solr server,but its not : being processed. you'll have to explain what you mean by "not being processed" ? According to your logs, DIH is in fact working and logging it's progress... : 2010-06-14 12:51:01,328 INFO [org.apache.solr.core.SolrCore] : (http-0.0.0.0-8080-1) [npmetrosearch_statesman] webapp=/solr : path=/dataimport : params={site=statesman&forDate=03/24/10&articleTypes=story,slideshow,video,poll,specialArticle,list&clean=false&commit=true&entity=initialLoad&command=full-import&numArticles=-1&server=app5} : status=0 QTime=0 : 2010-06-14 12:51:01,329 INFO : [org.apache.solr.handler.dataimport.DataImporter] (Thread-378) Starting Full : Import : 2010-06-14 12:51:01,332 INFO : [org.apache.solr.handler.dataimport.SolrWriter] (Thread-378) Read : dataimport.properties : 2010-06-14 12:51:01,425 INFO : [org.apache.solr.handler.dataimport.DocBuilder] (Thread-378) Time taken = : 0:0:0.93 -Hoss
Re: proximity question
> Will quotes do an exact match within > a proximity test? No. > If not, does anybody know how to accomplish this? It is not supported out-of-the-box. You need to plug Lucene's XmlQueryParser or SurroundQueryParser. Similar discussion: http://search-lucene.com/m/PO3iXKRuAv1/
proximity question
Will quotes do an exact match within a proximity test? For instance body:""mountain goat" grass"~10 should match: "the mountain goat went up the hill to eat grass" but should NOT match "the mountain where the goat lives is covered in grass" If not, does anybody know how to accomplish this? Thanks, Mike Anderson
Re: Deleting Terms:
FYI - optimise() operations solved the issue. Kumar_/|\_ www.saisk.com ku...@saisk.com "making a profound difference with knowledge and creativity..." On Tue, Jul 6, 2010 at 11:47 AM, Kumaravel Kandasami < kumaravel.kandas...@gmail.com> wrote: > BTW, Using SOLRJ - javabin api. > > > > Kumar_/|\_ > www.saisk.com > ku...@saisk.com > "making a profound difference with knowledge and creativity..." > > > On Tue, Jul 6, 2010 at 11:43 AM, Kumaravel Kandasami < > kumaravel.kandas...@gmail.com> wrote: > >> Hi, >> >>How to delete the terms associated with the document ? >> >> Current scenario: We are deleting documents based on a query >> ('field:value'). >> The documents are getting deleted, however, the old terms associated to >> the field are displayed in the admin. >> >> How do we make SOLR to re-evaluate and update the terms associated to a >> specific fields or latest updated document ? >> >> (I am assuming we are missing some api calls .) >> >> Thank you. >> >> >> Kumar_/|\_ >> www.saisk.com >> ku...@saisk.com >> "making a profound difference with knowledge and creativity..." >> > >
Re: Deleting Terms:
BTW, Using SOLRJ - javabin api. Kumar_/|\_ www.saisk.com ku...@saisk.com "making a profound difference with knowledge and creativity..." On Tue, Jul 6, 2010 at 11:43 AM, Kumaravel Kandasami < kumaravel.kandas...@gmail.com> wrote: > Hi, > >How to delete the terms associated with the document ? > > Current scenario: We are deleting documents based on a query > ('field:value'). > The documents are getting deleted, however, the old terms associated to the > field are displayed in the admin. > > How do we make SOLR to re-evaluate and update the terms associated to a > specific fields or latest updated document ? > > (I am assuming we are missing some api calls .) > > Thank you. > > > Kumar_/|\_ > www.saisk.com > ku...@saisk.com > "making a profound difference with knowledge and creativity..." >
Deleting Terms:
Hi, How to delete the terms associated with the document ? Current scenario: We are deleting documents based on a query ('field:value'). The documents are getting deleted, however, the old terms associated to the field are displayed in the admin. How do we make SOLR to re-evaluate and update the terms associated to a specific fields or latest updated document ? (I am assuming we are missing some api calls .) Thank you. Kumar_/|\_ www.saisk.com ku...@saisk.com "making a profound difference with knowledge and creativity..."
Re: document level security: indexing/searching techniques
On Jul 6, 2010, at 8:27am, osocurious2 wrote: Someone else was recently asking a similar question (or maybe it was you but worded differently :) ). Putting user level security at a document level seems like a recipe for pain. Solr/Lucene don't do frequent update well...and being highly optimized for query, I don't blame them. Is there any way to create a series of roles that you can apply to your documents? If the security level of the document isn't changing, just the user access to them, give the docs a role in the index, put your user/usergroup stuff in a DB or some other system and resolve your user into valid roles, then FilterQuery on role. You're right, baking in too fine-grained a level of security information is a bad idea. As one example that worked pretty well for code search with Krugle, we set access control on a per project level using LDAP groups - ie each project had some number of groups that were granted access rights. Each file in the project would inherit the same list of groups. Then, when a user logs in they get authenticated via LDAP, and we have the set of groups they belong to being returned by the LDAP server. This then becomes a fairly well-bounded list of "terms" for an OR query against the "acl-groups" field in each file/project document. Just don't forget to set the boost to 0 for that portion of the query :) -- Ken Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g
Relevancy and non-matching words
Is there some sort of threshold that I can tweak which sets how many letters in non-matching words makes a result more or less relevant? Searching on title, q=fantasy football, and I get this: {"title":"The Fantasy Football Guys", "score":2.8387074}, {"title":"Fantasy Football Bums", "score":2.8387074}, {"title":"Fantasy Football Xtreme", "score":2.7019854}, {"title":"Fantasy Football Fools", "score":2.7019634}, {"title":"Fantasy Football Brothers", "score":2.5917912} (I have some other scoring things in there that account for the difference between Xtreme and Fools.) The behavior I'm noticing is that there is some threshold for the length of non matching words that, when tripped, kicks the score down a notch. 4 to 5 seems to trip one, 6 to 7. I would really like something like "Bums" to score the same as "Xtreme" and "Brothers" and let my other criterion determine which document should come out on top. Is there something that can be tweaked to get this to happen? Or is my assumption a bit off base? -- View this message in context: http://lucene.472066.n3.nabble.com/Relevancy-and-non-matching-words-tp946799p946799.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: document level security: indexing/searching techniques
Someone else was recently asking a similar question (or maybe it was you but worded differently :) ). Putting user level security at a document level seems like a recipe for pain. Solr/Lucene don't do frequent update well...and being highly optimized for query, I don't blame them. Is there any way to create a series of roles that you can apply to your documents? If the security level of the document isn't changing, just the user access to them, give the docs a role in the index, put your user/usergroup stuff in a DB or some other system and resolve your user into valid roles, then FilterQuery on role. -- View this message in context: http://lucene.472066.n3.nabble.com/document-level-security-indexing-searching-techniques-tp946528p946649.html Sent from the Solr - User mailing list archive at Nabble.com.
Adding new elements to index
Hi, I have a SOLR installed on a Tomcat application server. This solr instance has some data indexed from a postgres database. Now I need to add some entities from an Oracle database. When I run the full-import command, the documents indexed are only documents from postgres. In fact, if I have 200 rows indexed from postgres and 100 rows from Oracle, the full-import process only indexes 200 documents from oracle, although it shows clearly that the query retruned 300 rows. I'm not doing a delta-import, simply a full import. I've tried to clean the index, reload the configuration, and manually remove dataimport.properties because it's the only metadata i found. Is there any other file to check or modify just to get all 300 rows indexed? Of course, I tried to find one of that oracle fields, with no results. Thanks a lot, Xavier Rodriguez.
Re: Wildcards queries
Hi, a bit more information would help to identify what's the problem in your case. but in general these facts come into my mind: - leading wildcard queries are not available in solr (without extending the QueryParser). - no text analysing will be performed on the search word when using wildcards so you need to make sure that the search field isn't configured to be stemmed and you are not searching for an upper case term whereas your text was lowercased through analysis. hope that helps a little bit -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcards-queries-tp946334p946589.html Sent from the Solr - User mailing list archive at Nabble.com.
document level security: indexing/searching techniques
I've a question about indexing/searching techniques in relation to document level security. In planning a system that has, let's say, about 1million search documents with about 100 search fields each. Most of them unstored to keep the index size low, because some of them can contain some kilobytes and some of them several hundred kilobytes. Two of these search fields are for permission checking, where i keep the explicitely allowed and explicitely disallowed users and usergroups. (usergroups can be in a hierarchical structure with permission inheritance) So when a user searches in the system, his user id, and ids of usergroup memberships are added as a filter query in my application logic before the query is sent to solr. So far so good for the searching part. But the problem is, that the permissions can be changed by administrators of that system, requiring to re-index the two permission search fields. first idea: Partial updates of index entries is not possible, so i need to fetch all the 1million documents from a database to do a re-indexing just because some permissions changed. The fetching process is rather expensive and requires more then 14hours. I am sure that this can be optimized of course, but i would rather try to avoid a whole re-indexing of all content. second idea: Another idea would be to store just the permissions in one small and fast to update index and all the other stuff in the other huge and not so often updated index. But i didn't find any possibilities to combine these two indices in one query. Is that even possible? Does somebody have experience with these topics or give advice how to solve that case properly? Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/document-level-security-indexing-searching-techniques-tp946528p946528.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcards queries
Hi, thanks for the reply. I am an absolute beginner with Solr. I have taken, for the beginning, the configuration from {solr.home}example/solr . In solrconfig.xml are all queryparser commented out ;-( Where can a find the QeryParser? Javadoc, Wiki? Regards, Robert 2010/7/6 Mark Miller : > On 7/6/10 8:53 AM, Robert Naczinski wrote: >> Hi, >> >> we use in our application EmbeddedSolrServer. > > Great! > >> Everything went fine. > > Excellent! > >> Now I want use wildcards queries. > > Cool! > >> >> It does not work. > > Bummer! > >> Must be adapted for the schema.xml? > > Not necessarily... > >> >> Can someone help me? > > We can try! > >>In wiki, I find nothing? > > No, you will find lots! > >> Why do I need simple >> example or link. > > Because it would be helpful! > > >> >> Regards, >> >> Robert > > > What query parser are you using? Dismax? That query parser does not > support wildcards. Try the lucene queryparser if that's the case. > > Otherwise respond with more information about your setup. > > -- > - Mark > > http://www.lucidimagination.com >
Re: solr with hadoop
> If you do distributed indexing correctly, what about updating the documents > and what about replicating them correctly? Yes, you can do you and it'll work great. On Mon, Jul 5, 2010 at 7:42 AM, MitchK wrote: > > I need to revive this discussion... > > If you do distributed indexing correctly, what about updating the documents > and what about replicating them correctly? > > Does this work? Or wasn't this an issue? > > Kind regards > - Mitch > -- > View this message in context: > http://lucene.472066.n3.nabble.com/solr-with-hadoop-tp482688p944413.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Wildcards queries
On 7/6/10 8:53 AM, Robert Naczinski wrote: > Hi, > > we use in our application EmbeddedSolrServer. Great! > Everything went fine. Excellent! > Now I want use wildcards queries. Cool! > > It does not work. Bummer! > Must be adapted for the schema.xml? Not necessarily... > > Can someone help me? We can try! >In wiki, I find nothing? No, you will find lots! > Why do I need simple > example or link. Because it would be helpful! > > Regards, > > Robert What query parser are you using? Dismax? That query parser does not support wildcards. Try the lucene queryparser if that's the case. Otherwise respond with more information about your setup. -- - Mark http://www.lucidimagination.com
Wildcards queries
Hi, we use in our application EmbeddedSolrServer. Everything went fine. Now I want use wildcards queries. It does not work. Must be adapted for the schema.xml? Can someone help me? In wiki, I find nothing? Why do I need simple example or link. Regards, Robert
Re: Data Import Handler Rich Format Documents
On 6/28/2010 8:28 AM, Alexey Serba wrote: Ok, I'm trying to integrate the TikaEntityProcessor as suggested. �I'm using Solr Version: 1.4.0 and getting the following error: java.lang.ClassNotFoundException: Unable to load BinURLDataSource or org.apache.solr.handler.dataimport.BinURLDataSource It seems that DIH-Tika integration is not a part of Solr 1.4.0/1.4.1 release. You should use trunk / nightly builds. https://issues.apache.org/jira/browse/SOLR-1583 Thanks, that would explain things - I'm using a stock 1.4.0 download. My data-config.xml looks like this: � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �url="http://www.mysite.com/${my_database.content_url}"; � � � � � � � � I added the entity name="my_database_url" section to an existing (working) database entity to be able to have Tika index the content pointed to by the content_url. Is there anything obviously wrong with what I've tried so far? I think you should move Tika entity into my_database entity and simplify the whole configuration ... http://www.mysite.com/${my_database.content_url}"; This, I guess, would be after I checked out and built from trunk? Thanks - Tod
Re: Duplicate items in distributed search
On Jul 4, 2010, at 5:10 PM, Andrew Clegg wrote: Mark Miller-3 wrote: On 7/4/10 12:49 PM, Andrew Clegg wrote: I thought so but thanks for clarifying. Maybe a wording change on the wiki Sounds like a good idea - go ahead and make the change if you'd like. That page seems to be marked immutable... You have to create an account and log in in order to edit wiki pages. Erik
Re: problem with formulating a negative query
Hi, Chris Hostetter wrote: AND, OR, and NOT are just syntactic-sugar for modifying the MUST, MUST_NOT, and SHOULD. The default op of "OR" only affects the first clause of your query (R) because it doesn't have any modifiers -- Thanks for pointing that out! -Sascha the second clause has that NOT modifier so your query is effectivley... topic:R -topic:[* TO *] ...which by definition can't match anything. -Hoss