Re: Highlighting/getBestFragment

2008-04-13 Thread khirb7



Mike Klaas wrote:
 
 
 On 10-Apr-08, at 7:41 AM, khirb7 wrote:

 I have done deep search and I found that lucene provide this that  
 methode  :
 getBestFragments
 highlighter.getBestFragments(tokenStream, text, maxNumFragment,  
 ...);

 so with this methode we can precise to lucene to return
 maxNumFragment
 fragment (with highligted word)of fragsize characters, but there is no
 maxFragSize parameter in solr. this would be useful in my case if I  
 want to
 highlight not only the first occurrence of a searched word but up to 1
 occurrence of the same word in the highlighted text.
 
 I'm not sure I understand exactly what you want the parameter to do.
 
 see http://wiki.apache.org/solr/HighlightingParameters
 
 use:
 hl.fragsize=size to set the desired fragment size, and
 hl.snippets=number to set the number of returned snippets/fragments.
 
 -Mike
 
 
thank you for your response,

I think that I wasn't enough clear in my last post, (I have already read
http://wiki.apache.org/solr/HighlightingParameters before asking my question
last time)this is what I want to do:
now solr give in response one fragment and  I know 
hl.fragsize=size to set the desired fragment size, and
hl.snippets=number to set the number of returned snippets/fragments. but
hl.snippets is useful if we deal with multi-valuated field  (for instance
the feature field in the solr schema example) but in my case I have a single
field myText  which type is text   in each document so here
hl.snippets=number  has no sense, either used or not the highlighted
result is the same.

here is what I want to do.
lucene provide overloaded  methodes getBestFragment()  to return
fragments :
I think that solr classes use this methode 
highlighter.getBestFragment(tokenStream, text)
which return one fragment containing the first occurence of the searched
wordhighlighted , but I dont want only the first occurrence but the N(2th or
3th.) th one's
and I want to replace the previous methode by 

String result =
highlighter.getBestFragments(tokenStream, text, 5, ...); 
here we have maxNumFragment=5  the the five best fragment
so I want to know and where I must modify in Solr to do that:
which class and how.
or in solrconfig.xml  but i found this difficult may be I have to create my
Handler

I am waitin your suggestion how to deal with that.


 

 

-- 
View this message in context: 
http://www.nabble.com/Highlighting-getBestFragment-tp16608862p16656982.html
Sent from the Solr - Dev mailing list archive at Nabble.com.



[jira] Updated: (SOLR-469) Data Import RequestHandler

2008-04-13 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-469:
---

Attachment: SOLR-469.patch

A new patch consisting of a few bug fixes and some major new features. The 
changes include:

 * No need to write fields in data-config if the field name from DB/XML and 
field-name in schema.xml are the same. This removes a lot of useless verbosity 
from data-config.xml 
 * A cool new interactive development page, in which you write/change 
data-config.xml and see results immeadiately making interations extremely fast! 
Use http://host:port/solr/admin/dataimport.jsp or if using multi-core 
http://host:port/solr/core-name/admin/dataimport.jsp
 * You can start using the interactive mode without specifying data-config file 
in solrconfig.xml, however, specifying the data sources is necessary in 
solrconfig.xml
 * Interactive development uses a new debug mode in DataImportHandler, add 
debug=on to the full-import command to see the actual documents which are 
created by DataImportHandler. This shows the first 10 documents created by 
DataImportHandler using the existing config without committing them to solr. It 
supports the start and rows parameter (just like query params) which you can 
use to see any document. This comes in very useful when suppose the 1000th 
document failed during indexing and you want to see the reason. If there are 
exceptions, the stacktrace is shown with the response.
 * Verbose mode with verbose=on as a request parameter (used in conjunction 
with debug=on) which shows exactly how DataImportHandler created each document. 
 ** What query was executed?
 ** How much time it took?
 ** What rows it gave back?
 ** What transformers were applied and what was the result?
 ** Another advantage is that you can see the fields which are indexed but not 
stored
 * A show-config command has been added which gives the data-config.xml as a 
raw response (uses RawResponseWriter)
 * A new interface called Evaluator has been added which makes it possible to 
plugin new expression evaluators (for resolving variable names)
 * Using the same Evaluator interface, a few new evaluators have been added
 ** formatDate - use as ${dataimporter.functions.formatDate('NOW',-MM-dd 
HH:mm)}, this will format NOW as per the given format and return a string which 
can be used in queries or urls. It supports the full DateMathParser syntax. You 
can also format fields e.g. 
${dataimporter.functions.formatDate(A.purchase_date,dd-MM-)}
 ** encodeUrl - useful for URL-encoding parameters when making a HTTP call. Use 
as ${dataimport.functions.encodeUrl(emp.name)}
 ** escapeSql - useful for escaping parameters supplied in sql statements. This 
can replace quotes with two quotes to avoid sql syntax errors. Use as 
${dataimporter.functions.escapeSql(emp.name)}
 * Custom Evaluators can be specified in data-config.xml (more details and 
example will be added to the wiki)
 * HttpDataSource now reads the content encoding from the response by default. 
Previously it assumed the default encoding to be UTF-8. This behavior can be 
overriden by explicitly specifying an encoding in solrconfig.xml
 * A FileDataSource has been added which can read content from local files 
(e.g. XML feed files on local disk).
 * Transformers can signal skipping a document by adding a key $skipDoc with 
value true in the returned map.
 * NumberFormatTransformer is a new transformer which can be used to 
extract/convert numbers from strings. It uses the java.text.NumberFormat class 
in Java to provide its features.
 * The Context interface has been enhanced to add new methods for 
getting/setting session variables which can be used by Transformers to share 
data. Also a new method called getParentContext can enable a 
Transformer/EntityProcessor to get the parent entity's context in full imports.

Please let us know your comments and feedback. More details and examples will 
soon be added to the wiki page at http://wiki.apache.org/solr/DataImportHandler

 Data Import RequestHandler
 --

 Key: SOLR-469
 URL: https://issues.apache.org/jira/browse/SOLR-469
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.3
Reporter: Noble Paul
Assignee: Grant Ingersoll
 Fix For: 1.3

 Attachments: SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
 SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch


 We need a RequestHandler Which can import data from a DB or other dataSources 
 into the Solr index .Think of it as an advanced form of SqlUpload Plugin 
 (SOLR-103).
 The way it works is as follows.
 * Provide a configuration file (xml) to the Handler which takes in the 
 necessary SQL queries and mappings to a solr schema
   - It also takes in a 

[jira] Assigned: (SOLR-486) Support binary formats for QueryresponseWriter

2008-04-13 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley reassigned SOLR-486:
-

Assignee: Yonik Seeley

 Support binary formats for QueryresponseWriter
 --

 Key: SOLR-486
 URL: https://issues.apache.org/jira/browse/SOLR-486
 Project: Solr
  Issue Type: Improvement
  Components: clients - java, search
Reporter: Noble Paul
Assignee: Yonik Seeley
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, 
 SOLR-486.patch, SOLR-486.patch, SOLR-486.patch


 QueryResponse writer only allows text data to be written.
 So it is not possible to implement a binary protocol . Create another 
 interface which has a method 
 write(OutputStream os, SolrQueryRequest request, SolrQueryResponse response)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-486) Support binary formats for QueryresponseWriter

2008-04-13 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12588402#action_12588402
 ] 

Yonik Seeley commented on SOLR-486:
---

I'm hacking on this now... I like how you separated the dependency on 
lucene-specific stuff via the resolver.  The problem is that we lose streaming 
ability for doc lists... if someone requests 1000 documents or whatever, 
everything is blown up in memory which could cause an OOM.  I'm adding the 
ability for the resolver to call back to the codec... not as nicely separated, 
but better results.

 Support binary formats for QueryresponseWriter
 --

 Key: SOLR-486
 URL: https://issues.apache.org/jira/browse/SOLR-486
 Project: Solr
  Issue Type: Improvement
  Components: clients - java, search
Reporter: Noble Paul
Assignee: Yonik Seeley
Priority: Minor
 Fix For: 1.3

 Attachments: SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, 
 SOLR-486.patch, SOLR-486.patch, SOLR-486.patch


 QueryResponse writer only allows text data to be written.
 So it is not possible to implement a binary protocol . Create another 
 interface which has a method 
 write(OutputStream os, SolrQueryRequest request, SolrQueryResponse response)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Fwd: New binary distribution of Oracle-Lucene integration

2008-04-13 Thread J. Delgado
Here is the latest on the Oracle-Lucene Integration.

J.D.

-- Forwarded message --
From: Marcelo Ochoa [EMAIL PROTECTED]
Date: Mon, Apr 7, 2008 at 10:01 AM
Subject: New binary distribution of Oracle-Lucene integration
To: [EMAIL PROTECTED]


Hi all:
 I just released a new version of Oracle-Lucene integration
 implemented as a Domain Index.
 Binary distribution have a very straightforward installation and
 testing step, downloads are at SF.net web site:

http://sourceforge.net/project/showfiles.php?group_id=56183package_id=255524release_id=589900
 Updated documentation is available as Google Document at:
 http://docs.google.com/Doc?id=ddgw7sjp_54fgj9kg
 Source is available with public CVS access at:
 http://dbprism.cvs.sourceforge.net/dbprism/ojvm/
 As consequence of reading many mails with feedback and development
 tips from this list this new version have a lot performance
 improvement by using a rowid-lucene doc id cache, usage of
 LoadFirstFieldSelector class to prevent Lucene from loading a complete
 doc if only we need the rowid.
 Many thanks to all for sharing the experience.
 A complete list of changes is at:

http://dbprism.cvs.sourceforge.net/dbprism/ojvm/ChangeLog.txt?revision=1.3view=markup
 Best regards, Marcelo.

 PD: I have a plan to a make a new version of Oracle-Lucene integration
 synchronized with Lucene 2.3.1 ASAP.
 --
 Marcelo F. Ochoa
 http://marceloochoa.blogspot.com/
 http://marcelo.ochoa.googlepages.com/home
 __
 Do you Know DBPrism? Look @ DB Prism's Web Site
 http://www.dbprism.com.ar/index.html
 More info?
 Chapter 17 of the book Programming the Oracle Database using Java 
 Web Services
 http://www.amazon.com/gp/product/183296/
 Chapter 21 of the book Professional XML Databases - Wrox Press
 http://www.amazon.com/gp/product/1861003587/
 Chapter 8 of the book Oracle  Open Source - O'Reilly
 http://www.oreilly.com/catalog/oracleopen/



--
Marcelo F. Ochoa
http://marceloochoa.blogspot.com/
http://marcelo.ochoa.googlepages.com/home
__
Do you Know DBPrism? Look @ DB Prism's Web Site
http://www.dbprism.com.ar/index.html
More info?
Chapter 17 of the book Programming the Oracle Database using Java 
Web Services
http://www.amazon.com/gp/product/183296/
Chapter 21 of the book Professional XML Databases - Wrox Press
http://www.amazon.com/gp/product/1861003587/
Chapter 8 of the book Oracle  Open Source - O'Reilly
http://www.oreilly.com/catalog/oracleopen/