Re: Filtering query results

2009-11-20 Thread aseem cheema
Thank you much for your responses guys. I do not have ACL. I need to
make a web service call to find out if a user has access to a
document. I was hoping to get search results, call the web service
with the IDs from the search results telling me what IDs the user has
access to, and then filter others before returning back to the user.
ACL and role based fq is definitely some food for thought. I will need
to figure out the synchronization issues.

Thanks
Aseem


On Fri, Nov 20, 2009 at 8:04 AM, Glock, Thomas thomas.gl...@pfizer.com wrote:
 Hi Aseem -

 I had a similar challenge.  The solution that works for my case was to
 add role as a repeating string value in the solr schema.

 Each piece of content contains 1 or more roles and these values are
 supplied to solr for indexing.

 Users also have one or more roles (which correspond exactly to the
 metadata placed on content and supplied to Solr.)

 So when peforming the search query, we add add an fq parameter to filter
 search results.  For example q=Search Phrasefq=role:(role1 || role2 ||
 role3)

 Note that ultimate restriction to content is handled elsewhere, this is
 only done as a filtering mechanism for search results.  Additionally, we
 do not have unlimited sets of roles and that helps to keep the query
 string on the HTTP GET to a minimum.  Finally, the roles for my system
 are additive such that if there is a match on any one role - the user
 has access - so an OR clause works.  Your system may have more complex
 role rules.

 -Original Message-
 From: aseem cheema [mailto:aseemche...@gmail.com]
 Sent: Thursday, November 19, 2009 5:00 PM
 To: solr-user@lucene.apache.org
 Subject: Filtering query results

 Hey Guys,
 I need to filter out some results based on who is performing the search.
 In other words, if a document is not accessible to a user performing
 search, I don't want it to be in the result set. What is the
 best/easiest way to do this reliable/securely in Solr?
 Thanks
 --
 Aseem




-- 
Aseem


Filtering query results

2009-11-19 Thread aseem cheema
Hey Guys,
I need to filter out some results based on who is performing the
search. In other words, if a document is not accessible to a user
performing search, I don't want it to be in the result set. What is
the best/easiest way to do this reliable/securely in Solr?
Thanks
-- 
Aseem


XmlUpdateRequestHandler with HTMLStripCharFilterFactory

2009-11-11 Thread aseem cheema
I am trying to post a document with the following content using SolrJ:
centercontent/center
I need the xml/html tags to be ignored. Even though this works fine in
analysis.jsp, this does not work with SolrJ, as the client escapes the
 and  with lt; and gt; and HTMLStripCharFilterFactory does not
strip those escaped tags. How can I achieve this? Any ideas will be
highly appreciated.

There is escapedTags in HTMLStripCharFilterFactory constructor. Is
there a way to get that to work?
Thanks
-- 
Aseem


add XML/HTML documents using SolrJ, without bypassing HTML char filter

2009-11-11 Thread aseem cheema
Hey Guys,
How do I add HTML/XML documents using SolrJ such that it does not by
pass the HTML char filter?

SolrJ escapes the HTML/XML value of a field, and that make it bypass
the HTML char filter. For example centercontent/center if added to
a field with HTMLStripCharFilter on the field using SolrJ, is not
stripped of center tags. But if check in analysis.jsp, it does get
stripped. When I look at the SolrJ XML feed, it looks like this:
adddoc boost=1.0field name=idhttp://haha.com/fieldfield
name=textlt;centergt;contentlt;/centergt;/field/doc/add

Any help is highly appreciated. Thanks.

-- 
Aseem


Re: add XML/HTML documents using SolrJ, without bypassing HTML char filter

2009-11-11 Thread aseem cheema
Ohhh... you are a life saver... thank you so much.. it makes sense.

Aseem

On Wed, Nov 11, 2009 at 7:40 PM, Ryan McKinley ryan...@gmail.com wrote:
 The HTMLStripCharFilter will strip the html for the *indexed* terms, it does
 not effect the *stored* field.

 If you don't want html in the stored field, can you just strip it out before
 passing to solr?


 On Nov 11, 2009, at 8:07 PM, aseem cheema wrote:

 Hey Guys,
 How do I add HTML/XML documents using SolrJ such that it does not by
 pass the HTML char filter?

 SolrJ escapes the HTML/XML value of a field, and that make it bypass
 the HTML char filter. For example centercontent/center if added to
 a field with HTMLStripCharFilter on the field using SolrJ, is not
 stripped of center tags. But if check in analysis.jsp, it does get
 stripped. When I look at the SolrJ XML feed, it looks like this:
 adddoc boost=1.0field name=idhttp://haha.com/fieldfield
 name=textlt;centergt;contentlt;/centergt;/field/doc/add

 Any help is highly appreciated. Thanks.

 --
 Aseem





-- 
Aseem


Re: XmlUpdateRequestHandler with HTMLStripCharFilterFactory

2009-11-11 Thread aseem cheema
Alright. It turns out that escapedTags is not for what I thought it is for.
The problem that I am having with HTMLStripCharFilterFactory is that
it strips the html while indexing the field, but not while storing the
field. That is why what is see in analysis.jsp, which is index
analysis, does not match what gets stored... because.. well HTML is
stripped only for indexing. Makes so much sense.

Thanks to Ryan McKinley for clarifying this.
Aseem

On Wed, Nov 11, 2009 at 9:50 AM, aseem cheema aseemche...@gmail.com wrote:
 I am trying to post a document with the following content using SolrJ:
 centercontent/center
 I need the xml/html tags to be ignored. Even though this works fine in
 analysis.jsp, this does not work with SolrJ, as the client escapes the
  and  with lt; and gt; and HTMLStripCharFilterFactory does not
 strip those escaped tags. How can I achieve this? Any ideas will be
 highly appreciated.

 There is escapedTags in HTMLStripCharFilterFactory constructor. Is
 there a way to get that to work?
 Thanks
 --
 Aseem




-- 
Aseem


HTMLStripCharFilterFactory not working when using SolrJ java client

2009-11-10 Thread aseem cheema
Hey Guys,
I have HTMLStripCharFilterFactory char filter declared in my
schema.xml for fieldType text (code below). I am using this field type
for body field of my schema. I am seeing different behavior when I use
SolrJ to post a document (code below) and when I use the analysis.jsp.
The text I am putting in the field is centercontent/center.

When SolrJ is used, the field gets the whole value
centercontent/center, but when analysis.jsp is used, it shows only
content being used for the field.

What am I possibly doing wrong here? How do I get
HTMLStripCharFilterFactory to work, even if I am pushing data using
SolrJ. Thanks.

Your help is highly appreciated.
Thanks
-- 
Aseem

# schema.xml ##
analyzer type=index
  charFilter class=solr.HTMLStripCharFilterFactory/
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.StopFilterFactory
  ignoreCase=true
  words=stopwords.txt
  enablePositionIncrements=true
  /
  filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1  catenateAll=0
splitOnCaseChange=1/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/
  filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
  filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer

## SolrJ Code ##
 CommonsHttpSolrServer server = new
CommonsHttpSolrServer(http://aseem.desktop.amazon.com:8983/solr/sharepoint;);
  SolrInputDocument doc = new SolrInputDocument();
  UpdateRequest req = new UpdateRequest();
  doc.addField(url, http://haha.com;);
  doc.addField(body, sbr.toString());*/
  doc.addField(body, centercontent/center);
  req.add(doc);
  req.setAction(ACTION.COMMIT, false, false);
  UpdateResponse resp = req.process(server);
  System.out.println(resp);


Re: HTMLStripCharFilterFactory not working when using SolrJ java client

2009-11-10 Thread aseem cheema
I printed the UpdateRequest object (getXML) and the XML is:
adddoc boost=1.0field name=urlhttp://haha.com/fieldfield
name=bodylt;centergt;contentlt;/centergt;/field/doc/add

I can see that the issue is because the HTML/XML  are replaced by lt; gt;
I understand that it is required to do so to keep them from
interfering with the solr xml document, but how do I accomplish what I
want to? I need to get the html in body field stripped out.

Any help is highly appreciated.
Thanks
Aseem

On Tue, Nov 10, 2009 at 10:56 AM, aseem cheema aseemche...@gmail.com wrote:
 Hey Guys,
 I have HTMLStripCharFilterFactory char filter declared in my
 schema.xml for fieldType text (code below). I am using this field type
 for body field of my schema. I am seeing different behavior when I use
 SolrJ to post a document (code below) and when I use the analysis.jsp.
 The text I am putting in the field is centercontent/center.

 When SolrJ is used, the field gets the whole value
 centercontent/center, but when analysis.jsp is used, it shows only
 content being used for the field.

 What am I possibly doing wrong here? How do I get
 HTMLStripCharFilterFactory to work, even if I am pushing data using
 SolrJ. Thanks.

 Your help is highly appreciated.
 Thanks
 --
 Aseem

 # schema.xml ##
        analyzer type=index
          charFilter class=solr.HTMLStripCharFilterFactory/
          tokenizer class=solr.WhitespaceTokenizerFactory/
          filter class=solr.StopFilterFactory
                  ignoreCase=true
                  words=stopwords.txt
                  enablePositionIncrements=true
                  /
          filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1                  catenateAll=0
 splitOnCaseChange=1/
          filter class=solr.LowerCaseFilterFactory/
          filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/
          filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
          filter class=solr.RemoveDuplicatesTokenFilterFactory/
        /analyzer

 ## SolrJ Code ##
     CommonsHttpSolrServer server = new
 CommonsHttpSolrServer(http://aseem.desktop.amazon.com:8983/solr/sharepoint;);
      SolrInputDocument doc = new SolrInputDocument();
      UpdateRequest req = new UpdateRequest();
      doc.addField(url, http://haha.com;);
      doc.addField(body, sbr.toString());*/
      doc.addField(body, centercontent/center);
      req.add(doc);
      req.setAction(ACTION.COMMIT, false, false);
      UpdateResponse resp = req.process(server);
      System.out.println(resp);




-- 
Aseem


Re: HTMLStripCharFilterFactory not working when using SolrJ java client

2009-11-10 Thread aseem cheema
HTMLStripCharFilterFactory class has a constructor that accept
escaptedTags. I believe this will solve my problem. But I am not sure
how to pass this from schema.xml file. I have tried charFilter
class=solr.HTMLStripCharFilterFactory escapedTags=lt;,gt;/ but
that didn't work.
Anybody?
Thanks

On Tue, Nov 10, 2009 at 10:56 AM, aseem cheema aseemche...@gmail.com wrote:
 Hey Guys,
 I have HTMLStripCharFilterFactory char filter declared in my
 schema.xml for fieldType text (code below). I am using this field type
 for body field of my schema. I am seeing different behavior when I use
 SolrJ to post a document (code below) and when I use the analysis.jsp.
 The text I am putting in the field is centercontent/center.

 When SolrJ is used, the field gets the whole value
 centercontent/center, but when analysis.jsp is used, it shows only
 content being used for the field.

 What am I possibly doing wrong here? How do I get
 HTMLStripCharFilterFactory to work, even if I am pushing data using
 SolrJ. Thanks.

 Your help is highly appreciated.
 Thanks
 --
 Aseem

 # schema.xml ##
        analyzer type=index
          charFilter class=solr.HTMLStripCharFilterFactory/
          tokenizer class=solr.WhitespaceTokenizerFactory/
          filter class=solr.StopFilterFactory
                  ignoreCase=true
                  words=stopwords.txt
                  enablePositionIncrements=true
                  /
          filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1                  catenateAll=0
 splitOnCaseChange=1/
          filter class=solr.LowerCaseFilterFactory/
          filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/
          filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
          filter class=solr.RemoveDuplicatesTokenFilterFactory/
        /analyzer

 ## SolrJ Code ##
     CommonsHttpSolrServer server = new
 CommonsHttpSolrServer(http://aseem.desktop.amazon.com:8983/solr/sharepoint;);
      SolrInputDocument doc = new SolrInputDocument();
      UpdateRequest req = new UpdateRequest();
      doc.addField(url, http://haha.com;);
      doc.addField(body, sbr.toString());*/
      doc.addField(body, centercontent/center);
      req.add(doc);
      req.setAction(ACTION.COMMIT, false, false);
      UpdateResponse resp = req.process(server);
      System.out.println(resp);




-- 
Aseem