[jira] Updated: (SOLR-211) regex split() Tokenizer
[ https://issues.apache.org/jira/browse/SOLR-211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-211: --- Attachment: SOLR-211-RegexSplitTokenizer.patch Thanks for the quick feedback! Here is an updated version that 1. uses a compiled Pattern 2. uses matcher.find() to set proper start and offeset 3. is called PatternSplitTokenizerFactory 4. The tests make sure the output is the same as you would get with string.split( pattern ) > regex split() Tokenizer > --- > > Key: SOLR-211 > URL: https://issues.apache.org/jira/browse/SOLR-211 > Project: Solr > Issue Type: New Feature > Components: search >Reporter: Ryan McKinley > Attachments: SOLR-211-RegexSplitTokenizer.patch, > SOLR-211-RegexSplitTokenizer.patch > > > A TokenizerFactory that makes tokens from: > string.split( regex ); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-211) regex split() Tokenizer
[ https://issues.apache.org/jira/browse/SOLR-211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491125 ] Hoss Man commented on SOLR-211: --- > but I don't see a way to use a regex directly on a Reader. ...I think it's pretty much impossible to have a robust regex system that can operate on character streams, regex engines need to be able to backup a lot. > regex split() Tokenizer > --- > > Key: SOLR-211 > URL: https://issues.apache.org/jira/browse/SOLR-211 > Project: Solr > Issue Type: New Feature > Components: search >Reporter: Ryan McKinley > Attachments: SOLR-211-RegexSplitTokenizer.patch > > > A TokenizerFactory that makes tokens from: > string.split( regex ); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-211) regex split() Tokenizer
[ https://issues.apache.org/jira/browse/SOLR-211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491109 ] Yonik Seeley commented on SOLR-211: --- > should probably compile the regex [...] Yep... beat me to it. I was off trying to look up if there was a way to avoid reading everything into a String too... but I don't see a way to use a regex directly on a Reader. > regex split() Tokenizer > --- > > Key: SOLR-211 > URL: https://issues.apache.org/jira/browse/SOLR-211 > Project: Solr > Issue Type: New Feature > Components: search >Reporter: Ryan McKinley > Attachments: SOLR-211-RegexSplitTokenizer.patch > > > A TokenizerFactory that makes tokens from: > string.split( regex ); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-211) regex split() Tokenizer
[ https://issues.apache.org/jira/browse/SOLR-211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491106 ] Hoss Man commented on SOLR-211: --- some quick comments based on a cursory reading of the patch... 1) RegexSplitTokenizerFactory.init should probably compile the regex into a pattern that can be reused more then once ... i think String.split calls recompile each time. 2) i don't think the offset stuff will work properly ... the length of the regex string is not the same as the length of the string it matches on when splitting (ie: \p{javaWhitespace}) ... we would probably need to use the Matcher API and iterate over the individual matches. 3) in the vein of like things having like names, we may wan to call this the PatternSplitTokenizer and name it's init param "pattern" (to match PatternReplaceFilter) > regex split() Tokenizer > --- > > Key: SOLR-211 > URL: https://issues.apache.org/jira/browse/SOLR-211 > Project: Solr > Issue Type: New Feature > Components: search >Reporter: Ryan McKinley > Attachments: SOLR-211-RegexSplitTokenizer.patch > > > A TokenizerFactory that makes tokens from: > string.split( regex ); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-212) Embeddable class to call solr directly
[ https://issues.apache.org/jira/browse/SOLR-212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-212: --- Attachment: SOLR-212-DirectSolrConnection.patch This class now sits in: o.a.s.servlet because it uses package protected request parsing functions. It has a *really* simple test that should be extended. Example usage: DirectSolrConnection solr = new DirectSolrConnection(); String json = solr.request( "/select?qt=dismax&wt=json&q=...", null ); String xml = solr.request( "/update", " Embeddable class to call solr directly > -- > > Key: SOLR-212 > URL: https://issues.apache.org/jira/browse/SOLR-212 > Project: Solr > Issue Type: Improvement >Reporter: Ryan McKinley >Priority: Minor > Attachments: SOLR-212-DirectSolrConnection.patch > > > For some embedded applications, it is useful to call solr without running an > HTTP server. This class mimics the behavior you would get if you sent the > request through an HTTP connection. It is designed to work nicely (ie > simple) with JNI > the main function is: > public class DirectSolrConnection > { > String request( String pathAndParams, String body ) throws Exception > { > ... > } > } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-212) Embeddable class to call solr directly
Embeddable class to call solr directly -- Key: SOLR-212 URL: https://issues.apache.org/jira/browse/SOLR-212 Project: Solr Issue Type: Improvement Reporter: Ryan McKinley Priority: Minor For some embedded applications, it is useful to call solr without running an HTTP server. This class mimics the behavior you would get if you sent the request through an HTTP connection. It is designed to work nicely (ie simple) with JNI the main function is: public class DirectSolrConnection { String request( String pathAndParams, String body ) throws Exception { ... } } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-211) regex split() Tokenizer
[ https://issues.apache.org/jira/browse/SOLR-211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-211: --- Attachment: SOLR-211-RegexSplitTokenizer.patch simple regex tokenizer and a test. Given a field: "Architecture--United States--19th century" will create tokens for: "Architecture" "United States" "19th century" > regex split() Tokenizer > --- > > Key: SOLR-211 > URL: https://issues.apache.org/jira/browse/SOLR-211 > Project: Solr > Issue Type: New Feature > Components: search >Reporter: Ryan McKinley > Attachments: SOLR-211-RegexSplitTokenizer.patch > > > A TokenizerFactory that makes tokens from: > string.split( regex ); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-211) regex split() Tokenizer
regex split() Tokenizer --- Key: SOLR-211 URL: https://issues.apache.org/jira/browse/SOLR-211 Project: Solr Issue Type: New Feature Components: search Reporter: Ryan McKinley A TokenizerFactory that makes tokens from: string.split( regex ); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Call for Papers Opens for ApacheCon US 2007
Sorry to be late to this game, but I already submitted two talks, one a longer tutorial on Flare, and one a regular session on Flare. I'd be happy to pair up with Eric Pugh as long as I get airfare and hotel covered I'm happy to go. Erik On Apr 23, 2007, at 2:41 PM, Yonik Seeley wrote: On 4/23/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: I think there is definitely room for more then one presentation, and since Flare seems really cool with a lot of meat in it and my knowledge of it is fairly lacking anyway it would be great if you could spend a full session on a Flare Case Study. Yes, and if more people propose Solr presentations, there will be a better chance of more Solr presentations at ApacheCon. -Yonik
[jira] Commented: (SOLR-199) N-gram
[ https://issues.apache.org/jira/browse/SOLR-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491023 ] Yonik Seeley commented on SOLR-199: --- Since there is no impact or even memory overhead if unused, and just a teeny bit of disk overhead, this patch looks fine to me. > N-gram > -- > > Key: SOLR-199 > URL: https://issues.apache.org/jira/browse/SOLR-199 > Project: Solr > Issue Type: New Feature > Components: search >Reporter: Adam Hiatt >Priority: Trivial > Attachments: SOLR-81-ngram.patch > > > This tracks the creation of a patch that adds the n-gram/edge n-gram > tokenizing functionality that was initially part of SOLR-81 (spell checking). > This was taken out b/c the lucene SpellChecker class removed this dependency. > None-the-less, I think this is useful functionality and the addition is > trivial. How does everyone feel about such an addition? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Call for Papers Opens for ApacheCon US 2007
On 4/23/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: I think there is definitely room for more then one presentation, and since Flare seems really cool with a lot of meat in it and my knowledge of it is fairly lacking anyway it would be great if you could spend a full session on a Flare Case Study. Yes, and if more people propose Solr presentations, there will be a better chance of more Solr presentations at ApacheCon. -Yonik
Re: Call for Papers Opens for ApacheCon US 2007
: I was thinking about doing something on this as well. Is there : enough room for multiple presentations? Can two people do a : presentation? Chris, would you be interested in co-presenting? I think there is definitely room for more then one presentation, and since Flare seems really cool with a lot of meat in it and my knowledge of it is fairly lacking anyway it would be great if you could spend a full session on a Flare Case Study. -Hoss
Re: Call for Papers Opens for ApacheCon US 2007
Hi all, Erik Hatcher has shown me some of the abilities of Flare, I've been digging into it for a jobby job project, and I've done my first small Solr project which was adding PDF, Word, Excel, and Powerpoint parsing in the vein of the CSVRequestHandler code. (Patches to be forthcoming!) I was thinking about doing something on this as well. Is there enough room for multiple presentations? Can two people do a presentation? Chris, would you be interested in co-presenting? I've mostly been on the outside of the Lucene community, be much more active in some of the Jakarta projects, and then seduced away by Ruby for the past 18 months, but the possibilities of Solr and Flare have had me interested in getting involved in Apache again. Eric Pugh On Apr 23, 2007, at 1:21 PM, Chris Hostetter wrote: : Is anyone willing to submit an introductory talk on Solr? I was thinking about submitting two talks... Novice: "Solr Out of the Box" Advanced: "Solr Beyond the Box" The first being an attempt at showcasing all of the features of Solr available without writing any code (just configuration and maybe some XSLT) ... loading data from CSV, dismax query parsing, facets, highlighting, date math, json output, etc., and any other cool features that get committed between now and then. I'll roabbly also talk about Flare (but that would mean needing to learn about Flare before November) The second would look at examples of how Solr can be customized without building the whole thing from scratch ... writing custom plugins, and embedding Solr in other applications. (the custom plugins part i think i can cover pretty well, but i'll need to pick the brains of people *doing* Solr embedding for the second half if the proposal is accepted) What do you guys think? -Hoss --- Principal OpenSource Connections Site: http://www.opensourceconnections.com Blog: http://blog.opensourceconnections.com Cell: 1-434-466-1467
RE: The ability to offering offset of keyword in search result
: It come up with another question: In response XML, can I suppress output : of section? Because I only need highlight section. well ... not really, you can say rows=0 but then you won't get highlighting either. you can use "fl" to make it smaller though (just pick only one field, score or your uniqueKey perhaps) -Hoss
Re: Call for Papers Opens for ApacheCon US 2007
: Is anyone willing to submit an introductory talk on Solr? I was thinking about submitting two talks... Novice: "Solr Out of the Box" Advanced: "Solr Beyond the Box" The first being an attempt at showcasing all of the features of Solr available without writing any code (just configuration and maybe some XSLT) ... loading data from CSV, dismax query parsing, facets, highlighting, date math, json output, etc., and any other cool features that get committed between now and then. I'll roabbly also talk about Flare (but that would mean needing to learn about Flare before November) The second would look at examples of how Solr can be customized without building the whole thing from scratch ... writing custom plugins, and embedding Solr in other applications. (the custom plugins part i think i can cover pretty well, but i'll need to pick the brains of people *doing* Solr embedding for the second half if the proposal is accepted) What do you guys think? -Hoss
RE: The ability to offering offset of keyword in search result
Appreciate that, just work perfect! It come up with another question: In response XML, can I suppress output of section? Because I only need highlight section. Appreciate in advance again! -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Monday, April 23, 2007 1:44 AM To: solr-dev@lucene.apache.org Subject: Re: The ability to offering offset of keyword in search result David Xiao wrote: > Thanks. It looks like what I need but I am not quite understand how it > actually modify SolrConfig.xml to > enable highlight. > > Is there an example xml snippet for? > > You don't need to modify solrconfig.xml. Just add parameters to the query string. For example with the sample data, try: http://localhost:8983/solr/select?q=power&hl=true&hl.fl=name you'll get a highlighting section at the bottom > > > -Original Message- > From: Ryan McKinley [mailto:[EMAIL PROTECTED] > Sent: Sunday, April 22, 2007 2:39 PM > To: solr-dev@lucene.apache.org > Subject: Re: The ability to offering offset of keyword in search result > > > are you looking for highlighting? > http://wiki.apache.org/solr/HighlightingParameters > > This would give you: > In God We Trust – Each dollar had this > > > David Xiao wrote: >> Is that a feature that can identify offset of keyword in search result? >> >> >> >> For example, if there is a following text to be indexed: >> >> In God We Trust – Each dollar had this >> >> And user search for: Each, should return with value: 19. Means first occurs >> at character 19. >> >> >> >> I am not sure if lucene or solr will, have provide this functionality? Do >> you thought that could be useful to help give some search result context? >> Like google search does. >> >> >> >> Regards, >> >> David >> >> > >
Re: Call for Papers Opens for ApacheCon US 2007
On 4/17/07, Erik Hatcher <[EMAIL PROTECTED]> wrote: > ...The Call for Papers is now open for ApacheCon US, to be held > November 12-16 at the Peachtree Westin, Atlanta... Is anyone willing to submit an introductory talk on Solr? I could do it if needed, but I'd prefer someone else doing it, as it looks like my involvement with Solr will be low in the next few months. Note that the deadline for talk submissions is next Monday, April 30th (and not 23rd as currently mentioned on http://www.us.apachecon.com/, just confirmed this with the conference planners). -Bertrand