Re: Highlighting/getBestFragment
Mike Klaas wrote: On 10-Apr-08, at 7:41 AM, khirb7 wrote: I have done deep search and I found that lucene provide this that methode : getBestFragments highlighter.getBestFragments(tokenStream, text, maxNumFragment, ...); so with this methode we can precise to lucene to return maxNumFragment fragment (with highligted word)of fragsize characters, but there is no maxFragSize parameter in solr. this would be useful in my case if I want to highlight not only the first occurrence of a searched word but up to 1 occurrence of the same word in the highlighted text. I'm not sure I understand exactly what you want the parameter to do. see http://wiki.apache.org/solr/HighlightingParameters use: hl.fragsize=size to set the desired fragment size, and hl.snippets=number to set the number of returned snippets/fragments. -Mike thank you for your response, I think that I wasn't enough clear in my last post, (I have already read http://wiki.apache.org/solr/HighlightingParameters before asking my question last time)this is what I want to do: now solr give in response one fragment and I know hl.fragsize=size to set the desired fragment size, and hl.snippets=number to set the number of returned snippets/fragments. but hl.snippets is useful if we deal with multi-valuated field (for instance the feature field in the solr schema example) but in my case I have a single field myText which type is text in each document so here hl.snippets=number has no sense, either used or not the highlighted result is the same. here is what I want to do. lucene provide overloaded methodes getBestFragment() to return fragments : I think that solr classes use this methode highlighter.getBestFragment(tokenStream, text) which return one fragment containing the first occurence of the searched wordhighlighted , but I dont want only the first occurrence but the N(2th or 3th.) th one's and I want to replace the previous methode by String result = highlighter.getBestFragments(tokenStream, text, 5, ...); here we have maxNumFragment=5 the the five best fragment so I want to know and where I must modify in Solr to do that: which class and how. or in solrconfig.xml but i found this difficult may be I have to create my Handler I am waitin your suggestion how to deal with that. -- View this message in context: http://www.nabble.com/Highlighting-getBestFragment-tp16608862p16656982.html Sent from the Solr - Dev mailing list archive at Nabble.com.
[jira] Updated: (SOLR-469) Data Import RequestHandler
[ https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-469: --- Attachment: SOLR-469.patch A new patch consisting of a few bug fixes and some major new features. The changes include: * No need to write fields in data-config if the field name from DB/XML and field-name in schema.xml are the same. This removes a lot of useless verbosity from data-config.xml * A cool new interactive development page, in which you write/change data-config.xml and see results immeadiately making interations extremely fast! Use http://host:port/solr/admin/dataimport.jsp or if using multi-core http://host:port/solr/core-name/admin/dataimport.jsp * You can start using the interactive mode without specifying data-config file in solrconfig.xml, however, specifying the data sources is necessary in solrconfig.xml * Interactive development uses a new debug mode in DataImportHandler, add debug=on to the full-import command to see the actual documents which are created by DataImportHandler. This shows the first 10 documents created by DataImportHandler using the existing config without committing them to solr. It supports the start and rows parameter (just like query params) which you can use to see any document. This comes in very useful when suppose the 1000th document failed during indexing and you want to see the reason. If there are exceptions, the stacktrace is shown with the response. * Verbose mode with verbose=on as a request parameter (used in conjunction with debug=on) which shows exactly how DataImportHandler created each document. ** What query was executed? ** How much time it took? ** What rows it gave back? ** What transformers were applied and what was the result? ** Another advantage is that you can see the fields which are indexed but not stored * A show-config command has been added which gives the data-config.xml as a raw response (uses RawResponseWriter) * A new interface called Evaluator has been added which makes it possible to plugin new expression evaluators (for resolving variable names) * Using the same Evaluator interface, a few new evaluators have been added ** formatDate - use as ${dataimporter.functions.formatDate('NOW',-MM-dd HH:mm)}, this will format NOW as per the given format and return a string which can be used in queries or urls. It supports the full DateMathParser syntax. You can also format fields e.g. ${dataimporter.functions.formatDate(A.purchase_date,dd-MM-)} ** encodeUrl - useful for URL-encoding parameters when making a HTTP call. Use as ${dataimport.functions.encodeUrl(emp.name)} ** escapeSql - useful for escaping parameters supplied in sql statements. This can replace quotes with two quotes to avoid sql syntax errors. Use as ${dataimporter.functions.escapeSql(emp.name)} * Custom Evaluators can be specified in data-config.xml (more details and example will be added to the wiki) * HttpDataSource now reads the content encoding from the response by default. Previously it assumed the default encoding to be UTF-8. This behavior can be overriden by explicitly specifying an encoding in solrconfig.xml * A FileDataSource has been added which can read content from local files (e.g. XML feed files on local disk). * Transformers can signal skipping a document by adding a key $skipDoc with value true in the returned map. * NumberFormatTransformer is a new transformer which can be used to extract/convert numbers from strings. It uses the java.text.NumberFormat class in Java to provide its features. * The Context interface has been enhanced to add new methods for getting/setting session variables which can be used by Transformers to share data. Also a new method called getParentContext can enable a Transformer/EntityProcessor to get the parent entity's context in full imports. Please let us know your comments and feedback. More details and examples will soon be added to the wiki page at http://wiki.apache.org/solr/DataImportHandler Data Import RequestHandler -- Key: SOLR-469 URL: https://issues.apache.org/jira/browse/SOLR-469 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.3 Reporter: Noble Paul Assignee: Grant Ingersoll Fix For: 1.3 Attachments: SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch We need a RequestHandler Which can import data from a DB or other dataSources into the Solr index .Think of it as an advanced form of SqlUpload Plugin (SOLR-103). The way it works is as follows. * Provide a configuration file (xml) to the Handler which takes in the necessary SQL queries and mappings to a solr schema - It also takes in a
[jira] Assigned: (SOLR-486) Support binary formats for QueryresponseWriter
[ https://issues.apache.org/jira/browse/SOLR-486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley reassigned SOLR-486: - Assignee: Yonik Seeley Support binary formats for QueryresponseWriter -- Key: SOLR-486 URL: https://issues.apache.org/jira/browse/SOLR-486 Project: Solr Issue Type: Improvement Components: clients - java, search Reporter: Noble Paul Assignee: Yonik Seeley Priority: Minor Fix For: 1.3 Attachments: SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch QueryResponse writer only allows text data to be written. So it is not possible to implement a binary protocol . Create another interface which has a method write(OutputStream os, SolrQueryRequest request, SolrQueryResponse response) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-486) Support binary formats for QueryresponseWriter
[ https://issues.apache.org/jira/browse/SOLR-486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12588402#action_12588402 ] Yonik Seeley commented on SOLR-486: --- I'm hacking on this now... I like how you separated the dependency on lucene-specific stuff via the resolver. The problem is that we lose streaming ability for doc lists... if someone requests 1000 documents or whatever, everything is blown up in memory which could cause an OOM. I'm adding the ability for the resolver to call back to the codec... not as nicely separated, but better results. Support binary formats for QueryresponseWriter -- Key: SOLR-486 URL: https://issues.apache.org/jira/browse/SOLR-486 Project: Solr Issue Type: Improvement Components: clients - java, search Reporter: Noble Paul Assignee: Yonik Seeley Priority: Minor Fix For: 1.3 Attachments: SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch, SOLR-486.patch QueryResponse writer only allows text data to be written. So it is not possible to implement a binary protocol . Create another interface which has a method write(OutputStream os, SolrQueryRequest request, SolrQueryResponse response) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Fwd: New binary distribution of Oracle-Lucene integration
Here is the latest on the Oracle-Lucene Integration. J.D. -- Forwarded message -- From: Marcelo Ochoa [EMAIL PROTECTED] Date: Mon, Apr 7, 2008 at 10:01 AM Subject: New binary distribution of Oracle-Lucene integration To: [EMAIL PROTECTED] Hi all: I just released a new version of Oracle-Lucene integration implemented as a Domain Index. Binary distribution have a very straightforward installation and testing step, downloads are at SF.net web site: http://sourceforge.net/project/showfiles.php?group_id=56183package_id=255524release_id=589900 Updated documentation is available as Google Document at: http://docs.google.com/Doc?id=ddgw7sjp_54fgj9kg Source is available with public CVS access at: http://dbprism.cvs.sourceforge.net/dbprism/ojvm/ As consequence of reading many mails with feedback and development tips from this list this new version have a lot performance improvement by using a rowid-lucene doc id cache, usage of LoadFirstFieldSelector class to prevent Lucene from loading a complete doc if only we need the rowid. Many thanks to all for sharing the experience. A complete list of changes is at: http://dbprism.cvs.sourceforge.net/dbprism/ojvm/ChangeLog.txt?revision=1.3view=markup Best regards, Marcelo. PD: I have a plan to a make a new version of Oracle-Lucene integration synchronized with Lucene 2.3.1 ASAP. -- Marcelo F. Ochoa http://marceloochoa.blogspot.com/ http://marcelo.ochoa.googlepages.com/home __ Do you Know DBPrism? Look @ DB Prism's Web Site http://www.dbprism.com.ar/index.html More info? Chapter 17 of the book Programming the Oracle Database using Java Web Services http://www.amazon.com/gp/product/183296/ Chapter 21 of the book Professional XML Databases - Wrox Press http://www.amazon.com/gp/product/1861003587/ Chapter 8 of the book Oracle Open Source - O'Reilly http://www.oreilly.com/catalog/oracleopen/ -- Marcelo F. Ochoa http://marceloochoa.blogspot.com/ http://marcelo.ochoa.googlepages.com/home __ Do you Know DBPrism? Look @ DB Prism's Web Site http://www.dbprism.com.ar/index.html More info? Chapter 17 of the book Programming the Oracle Database using Java Web Services http://www.amazon.com/gp/product/183296/ Chapter 21 of the book Professional XML Databases - Wrox Press http://www.amazon.com/gp/product/1861003587/ Chapter 8 of the book Oracle Open Source - O'Reilly http://www.oreilly.com/catalog/oracleopen/