Re: Filtering query results
Thank you much for your responses guys. I do not have ACL. I need to make a web service call to find out if a user has access to a document. I was hoping to get search results, call the web service with the IDs from the search results telling me what IDs the user has access to, and then filter others before returning back to the user. ACL and role based fq is definitely some food for thought. I will need to figure out the synchronization issues. Thanks Aseem On Fri, Nov 20, 2009 at 8:04 AM, Glock, Thomas thomas.gl...@pfizer.com wrote: Hi Aseem - I had a similar challenge. The solution that works for my case was to add role as a repeating string value in the solr schema. Each piece of content contains 1 or more roles and these values are supplied to solr for indexing. Users also have one or more roles (which correspond exactly to the metadata placed on content and supplied to Solr.) So when peforming the search query, we add add an fq parameter to filter search results. For example q=Search Phrasefq=role:(role1 || role2 || role3) Note that ultimate restriction to content is handled elsewhere, this is only done as a filtering mechanism for search results. Additionally, we do not have unlimited sets of roles and that helps to keep the query string on the HTTP GET to a minimum. Finally, the roles for my system are additive such that if there is a match on any one role - the user has access - so an OR clause works. Your system may have more complex role rules. -Original Message- From: aseem cheema [mailto:aseemche...@gmail.com] Sent: Thursday, November 19, 2009 5:00 PM To: solr-user@lucene.apache.org Subject: Filtering query results Hey Guys, I need to filter out some results based on who is performing the search. In other words, if a document is not accessible to a user performing search, I don't want it to be in the result set. What is the best/easiest way to do this reliable/securely in Solr? Thanks -- Aseem -- Aseem
Filtering query results
Hey Guys, I need to filter out some results based on who is performing the search. In other words, if a document is not accessible to a user performing search, I don't want it to be in the result set. What is the best/easiest way to do this reliable/securely in Solr? Thanks -- Aseem
XmlUpdateRequestHandler with HTMLStripCharFilterFactory
I am trying to post a document with the following content using SolrJ: centercontent/center I need the xml/html tags to be ignored. Even though this works fine in analysis.jsp, this does not work with SolrJ, as the client escapes the and with lt; and gt; and HTMLStripCharFilterFactory does not strip those escaped tags. How can I achieve this? Any ideas will be highly appreciated. There is escapedTags in HTMLStripCharFilterFactory constructor. Is there a way to get that to work? Thanks -- Aseem
add XML/HTML documents using SolrJ, without bypassing HTML char filter
Hey Guys, How do I add HTML/XML documents using SolrJ such that it does not by pass the HTML char filter? SolrJ escapes the HTML/XML value of a field, and that make it bypass the HTML char filter. For example centercontent/center if added to a field with HTMLStripCharFilter on the field using SolrJ, is not stripped of center tags. But if check in analysis.jsp, it does get stripped. When I look at the SolrJ XML feed, it looks like this: adddoc boost=1.0field name=idhttp://haha.com/fieldfield name=textlt;centergt;contentlt;/centergt;/field/doc/add Any help is highly appreciated. Thanks. -- Aseem
Re: add XML/HTML documents using SolrJ, without bypassing HTML char filter
Ohhh... you are a life saver... thank you so much.. it makes sense. Aseem On Wed, Nov 11, 2009 at 7:40 PM, Ryan McKinley ryan...@gmail.com wrote: The HTMLStripCharFilter will strip the html for the *indexed* terms, it does not effect the *stored* field. If you don't want html in the stored field, can you just strip it out before passing to solr? On Nov 11, 2009, at 8:07 PM, aseem cheema wrote: Hey Guys, How do I add HTML/XML documents using SolrJ such that it does not by pass the HTML char filter? SolrJ escapes the HTML/XML value of a field, and that make it bypass the HTML char filter. For example centercontent/center if added to a field with HTMLStripCharFilter on the field using SolrJ, is not stripped of center tags. But if check in analysis.jsp, it does get stripped. When I look at the SolrJ XML feed, it looks like this: adddoc boost=1.0field name=idhttp://haha.com/fieldfield name=textlt;centergt;contentlt;/centergt;/field/doc/add Any help is highly appreciated. Thanks. -- Aseem -- Aseem
Re: XmlUpdateRequestHandler with HTMLStripCharFilterFactory
Alright. It turns out that escapedTags is not for what I thought it is for. The problem that I am having with HTMLStripCharFilterFactory is that it strips the html while indexing the field, but not while storing the field. That is why what is see in analysis.jsp, which is index analysis, does not match what gets stored... because.. well HTML is stripped only for indexing. Makes so much sense. Thanks to Ryan McKinley for clarifying this. Aseem On Wed, Nov 11, 2009 at 9:50 AM, aseem cheema aseemche...@gmail.com wrote: I am trying to post a document with the following content using SolrJ: centercontent/center I need the xml/html tags to be ignored. Even though this works fine in analysis.jsp, this does not work with SolrJ, as the client escapes the and with lt; and gt; and HTMLStripCharFilterFactory does not strip those escaped tags. How can I achieve this? Any ideas will be highly appreciated. There is escapedTags in HTMLStripCharFilterFactory constructor. Is there a way to get that to work? Thanks -- Aseem -- Aseem
HTMLStripCharFilterFactory not working when using SolrJ java client
Hey Guys, I have HTMLStripCharFilterFactory char filter declared in my schema.xml for fieldType text (code below). I am using this field type for body field of my schema. I am seeing different behavior when I use SolrJ to post a document (code below) and when I use the analysis.jsp. The text I am putting in the field is centercontent/center. When SolrJ is used, the field gets the whole value centercontent/center, but when analysis.jsp is used, it shows only content being used for the field. What am I possibly doing wrong here? How do I get HTMLStripCharFilterFactory to work, even if I am pushing data using SolrJ. Thanks. Your help is highly appreciated. Thanks -- Aseem # schema.xml ## analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer ## SolrJ Code ## CommonsHttpSolrServer server = new CommonsHttpSolrServer(http://aseem.desktop.amazon.com:8983/solr/sharepoint;); SolrInputDocument doc = new SolrInputDocument(); UpdateRequest req = new UpdateRequest(); doc.addField(url, http://haha.com;); doc.addField(body, sbr.toString());*/ doc.addField(body, centercontent/center); req.add(doc); req.setAction(ACTION.COMMIT, false, false); UpdateResponse resp = req.process(server); System.out.println(resp);
Re: HTMLStripCharFilterFactory not working when using SolrJ java client
I printed the UpdateRequest object (getXML) and the XML is: adddoc boost=1.0field name=urlhttp://haha.com/fieldfield name=bodylt;centergt;contentlt;/centergt;/field/doc/add I can see that the issue is because the HTML/XML are replaced by lt; gt; I understand that it is required to do so to keep them from interfering with the solr xml document, but how do I accomplish what I want to? I need to get the html in body field stripped out. Any help is highly appreciated. Thanks Aseem On Tue, Nov 10, 2009 at 10:56 AM, aseem cheema aseemche...@gmail.com wrote: Hey Guys, I have HTMLStripCharFilterFactory char filter declared in my schema.xml for fieldType text (code below). I am using this field type for body field of my schema. I am seeing different behavior when I use SolrJ to post a document (code below) and when I use the analysis.jsp. The text I am putting in the field is centercontent/center. When SolrJ is used, the field gets the whole value centercontent/center, but when analysis.jsp is used, it shows only content being used for the field. What am I possibly doing wrong here? How do I get HTMLStripCharFilterFactory to work, even if I am pushing data using SolrJ. Thanks. Your help is highly appreciated. Thanks -- Aseem # schema.xml ## analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer ## SolrJ Code ## CommonsHttpSolrServer server = new CommonsHttpSolrServer(http://aseem.desktop.amazon.com:8983/solr/sharepoint;); SolrInputDocument doc = new SolrInputDocument(); UpdateRequest req = new UpdateRequest(); doc.addField(url, http://haha.com;); doc.addField(body, sbr.toString());*/ doc.addField(body, centercontent/center); req.add(doc); req.setAction(ACTION.COMMIT, false, false); UpdateResponse resp = req.process(server); System.out.println(resp); -- Aseem
Re: HTMLStripCharFilterFactory not working when using SolrJ java client
HTMLStripCharFilterFactory class has a constructor that accept escaptedTags. I believe this will solve my problem. But I am not sure how to pass this from schema.xml file. I have tried charFilter class=solr.HTMLStripCharFilterFactory escapedTags=lt;,gt;/ but that didn't work. Anybody? Thanks On Tue, Nov 10, 2009 at 10:56 AM, aseem cheema aseemche...@gmail.com wrote: Hey Guys, I have HTMLStripCharFilterFactory char filter declared in my schema.xml for fieldType text (code below). I am using this field type for body field of my schema. I am seeing different behavior when I use SolrJ to post a document (code below) and when I use the analysis.jsp. The text I am putting in the field is centercontent/center. When SolrJ is used, the field gets the whole value centercontent/center, but when analysis.jsp is used, it shows only content being used for the field. What am I possibly doing wrong here? How do I get HTMLStripCharFilterFactory to work, even if I am pushing data using SolrJ. Thanks. Your help is highly appreciated. Thanks -- Aseem # schema.xml ## analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer ## SolrJ Code ## CommonsHttpSolrServer server = new CommonsHttpSolrServer(http://aseem.desktop.amazon.com:8983/solr/sharepoint;); SolrInputDocument doc = new SolrInputDocument(); UpdateRequest req = new UpdateRequest(); doc.addField(url, http://haha.com;); doc.addField(body, sbr.toString());*/ doc.addField(body, centercontent/center); req.add(doc); req.setAction(ACTION.COMMIT, false, false); UpdateResponse resp = req.process(server); System.out.println(resp); -- Aseem