DynamicField and FacetFields..

2007-12-01 Thread Jeryl Cook
Question:
I need to dynamically data to SOLR, so I do not have a predefined list of 
field names...

so i use the dynamicField option in the schma and match approioate datatype..
in my schema.xml
field name=id type=string indexed=true stored=true required=true /

dynamicField name=*_s  type=string  indexed=true  stored=true/



Then programatically my code
...
document.addField( dynamicFieldName + _s,  dynamicFieldValue, 10 ); 
facetFieldNames.put( dynamicFieldName + _s,null);//TODO:use copyField..
server.add( document,true );
server.commit();

when i attempt to graph results, i want to display 
SolrQuery query = new SolrQuery();
query.setQuery( *:* );
query.setFacetLimit(10);//TODO:
Iterator facetsIt = facetFieldNames.entrySet().iterator();
while(facetsIt.hasNext()){
EntryString,Stringentry = (Entry)facetsIt.next();
String facetName = (String)entry.getKey();
query.addFacetField(facetName);
}
 
QueryResponse rsp;
   
rsp = server.query( query );
   ListFacetField facetFieldList = rsp.getFacetFields(); 
   assertNotNull(facetFieldList);

   


my facetFieldList is null, of course if i addFacetField if id it 
works..because i define it in the schema.xml

is this just a something that is not implemented? or am i missing something...

Thanks.



Jeryl Cook 



/^\ Pharaoh /^\ 

http://pharaohofkush.blogspot.com/ 



..Act your age, and not your shoe size..

-Prince(1986)

 Date: Fri, 30 Nov 2007 21:23:59 -0500
 From: [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Highlighting, word index
 
 It's good you already have the data because if you somehow got it from
 some sort of calculations I'd have to tell my product manager that
 the feature he wanted that I told him couldn't be done with our data
 was possible after all G...
 
 About page breaks:
 
 Another approach to paging is to index a special page token with an
 increment of 0 from the last word of the page. Say you have the following:
 last ctrl-l first. Then index last, $$$ at an increment of 0 then first.
 
 You can then quite quickly calculate the pages by using
 termdocs/termenum on your special token and count.
 
 Which approach you use depends upon whether you want span and/or
 phrase queries to match across page boundaries. If you use an increment as
 Mike suggests, matching last first~3 won't work. It just depends upon
 whether how you want to match across the page break.
 
 Best
 Erick
 
 On Nov 30, 2007 4:37 PM, Mike Klaas [EMAIL PROTECTED] wrote:
 
  On 30-Nov-07, at 1:02 PM, Owens, Martin wrote:
 
  
   Hello everyone,
  
   We're working to replace the old Linux version of dtSearch with
   Lucene/Solr, using the http requests for our perl side and java for
   the indexing.
  
   The functionality that is causing the most problems is the
   highlighting since we're not storing the text in solr (only
   indexing) and we need to highlight an image file (ocr) so what we
   really need is to request from solr the word indexes of the
   matches, we then tie this up to the ocr image and create html boxes
   to do the highlighting.
 
  This isn't possible with Solr out-of-the-box.  Also, the usual
  methods for highlighting won't work because Solr typically re-
  analyzes the raw text to find the appropriate highlighting points.
  However, it shouldn't be too hard to come up with a custom solution.
  You can tell lucene to store token offsets using TermVectors
  (configurable via schema.xml).  Then you can customize the request
  handler to return the token offsets (and/or positions) by retrieving
  the TVs.
 
   The text is also multi page, each page is seperated by Ctrl-L page
   breaks, should we handle the paging out selves or can Solr tell use
   which page the match happened on too?
 
  Again, not automatically.  However, if you wrote an analyzer that
  bumped up the position increment of tokens every time a new page was
  found (to, say the next multiple of 1000), then you infer the
  matching page by the token position.
 
  cheers,
  -Mike
 


RE: DynamicField and FacetFields..

2007-12-01 Thread Jeryl Cook
fixed, i had a typo...may want to delete my post( i want to :P .)

Jeryl Cook  
 From: [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Subject: DynamicField and  FacetFields..
 Date: Sat, 1 Dec 2007 14:21:12 -0500
 
 Question:
 I need to dynamically data to SOLR, so I do not have a predefined list of 
 field names...
 
 so i use the dynamicField option in the schma and match approioate datatype..
 in my schema.xml
 field name=id type=string indexed=true stored=true required=true /
 
 dynamicField name=*_s  type=string  indexed=true  stored=true/
 
 
 
 Then programatically my code
 ...
 document.addField( dynamicFieldName + _s,  dynamicFieldValue, 10 ); 
 facetFieldNames.put( dynamicFieldName + _s,null);//TODO:use copyField..
 server.add( document,true );
 server.commit();
 
 when i attempt to graph results, i want to display 
 SolrQuery query = new SolrQuery();
 query.setQuery( *:* );
 query.setFacetLimit(10);//TODO:
 Iterator facetsIt = facetFieldNames.entrySet().iterator();
 while(facetsIt.hasNext()){
 EntryString,Stringentry = (Entry)facetsIt.next();
 String facetName = (String)entry.getKey();
 query.addFacetField(facetName);
 }
  
 QueryResponse rsp;

 rsp = server.query( query );
ListFacetField facetFieldList = rsp.getFacetFields(); 
assertNotNull(facetFieldList);
 

 
 
 my facetFieldList is null, of course if i addFacetField if id it 
 works..because i define it in the schema.xml
 
 is this just a something that is not implemented? or am i missing something...
 
 Thanks.
 
 
 
 Jeryl Cook 
 
 
 
 /^\ Pharaoh /^\ 
 
 http://pharaohofkush.blogspot.com/ 
 
 
 
 ..Act your age, and not your shoe size..
 
 -Prince(1986)
 
  Date: Fri, 30 Nov 2007 21:23:59 -0500
  From: [EMAIL PROTECTED]
  To: solr-user@lucene.apache.org
  Subject: Re: Solr Highlighting, word index
  
  It's good you already have the data because if you somehow got it from
  some sort of calculations I'd have to tell my product manager that
  the feature he wanted that I told him couldn't be done with our data
  was possible after all G...
  
  About page breaks:
  
  Another approach to paging is to index a special page token with an
  increment of 0 from the last word of the page. Say you have the following:
  last ctrl-l first. Then index last, $$$ at an increment of 0 then first.
  
  You can then quite quickly calculate the pages by using
  termdocs/termenum on your special token and count.
  
  Which approach you use depends upon whether you want span and/or
  phrase queries to match across page boundaries. If you use an increment as
  Mike suggests, matching last first~3 won't work. It just depends upon
  whether how you want to match across the page break.
  
  Best
  Erick
  
  On Nov 30, 2007 4:37 PM, Mike Klaas [EMAIL PROTECTED] wrote:
  
   On 30-Nov-07, at 1:02 PM, Owens, Martin wrote:
  
   
Hello everyone,
   
We're working to replace the old Linux version of dtSearch with
Lucene/Solr, using the http requests for our perl side and java for
the indexing.
   
The functionality that is causing the most problems is the
highlighting since we're not storing the text in solr (only
indexing) and we need to highlight an image file (ocr) so what we
really need is to request from solr the word indexes of the
matches, we then tie this up to the ocr image and create html boxes
to do the highlighting.
  
   This isn't possible with Solr out-of-the-box.  Also, the usual
   methods for highlighting won't work because Solr typically re-
   analyzes the raw text to find the appropriate highlighting points.
   However, it shouldn't be too hard to come up with a custom solution.
   You can tell lucene to store token offsets using TermVectors
   (configurable via schema.xml).  Then you can customize the request
   handler to return the token offsets (and/or positions) by retrieving
   the TVs.
  
The text is also multi page, each page is seperated by Ctrl-L page
breaks, should we handle the paging out selves or can Solr tell use
which page the match happened on too?
  
   Again, not automatically.  However, if you wrote an analyzer that
   bumped up the position increment of tokens every time a new page was
   found (to, say the next multiple of 1000), then you infer the
   matching page by the token position.
  
   cheers,
   -Mike