[jira] Updated: (SOLR-211) regex split() Tokenizer

2007-04-23 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-211:
---

Attachment: SOLR-211-RegexSplitTokenizer.patch

Thanks for the quick feedback!

Here is an updated version that 

1. uses a compiled Pattern
2. uses matcher.find() to set proper start and offeset
3. is called PatternSplitTokenizerFactory
4. The tests make sure the output is the same as you would get with 
string.split( pattern )



> regex split() Tokenizer
> ---
>
> Key: SOLR-211
> URL: https://issues.apache.org/jira/browse/SOLR-211
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Ryan McKinley
> Attachments: SOLR-211-RegexSplitTokenizer.patch, 
> SOLR-211-RegexSplitTokenizer.patch
>
>
> A TokenizerFactory that makes tokens from:
>   string.split( regex );

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-211) regex split() Tokenizer

2007-04-23 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491125
 ] 

Hoss Man commented on SOLR-211:
---

> but I don't see a way to use a regex directly on a Reader.

...I think it's pretty much impossible to have a robust regex system that can 
operate on character streams, regex engines need to be able to backup  a 
lot.

> regex split() Tokenizer
> ---
>
> Key: SOLR-211
> URL: https://issues.apache.org/jira/browse/SOLR-211
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Ryan McKinley
> Attachments: SOLR-211-RegexSplitTokenizer.patch
>
>
> A TokenizerFactory that makes tokens from:
>   string.split( regex );

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-211) regex split() Tokenizer

2007-04-23 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491109
 ] 

Yonik Seeley commented on SOLR-211:
---

> should probably compile the regex [...]

Yep... beat me to it.
I was off trying to look up if there was a way to avoid reading everything into 
a String too... but I don't see a way to use a regex directly on a Reader.

> regex split() Tokenizer
> ---
>
> Key: SOLR-211
> URL: https://issues.apache.org/jira/browse/SOLR-211
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Ryan McKinley
> Attachments: SOLR-211-RegexSplitTokenizer.patch
>
>
> A TokenizerFactory that makes tokens from:
>   string.split( regex );

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-211) regex split() Tokenizer

2007-04-23 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491106
 ] 

Hoss Man commented on SOLR-211:
---

some quick comments based on a cursory reading of the patch...

1) RegexSplitTokenizerFactory.init should probably compile the regex into a 
pattern that can be reused more then once ... i think  String.split calls 
recompile each time.
2) i don't think the offset stuff will work properly ... the length of the 
regex string is not the same as the length of the string it matches on when 
splitting (ie: \p{javaWhitespace}) ... we would probably need to use the 
Matcher API and iterate over the individual matches.
3) in the vein of like things having like names, we may wan to call this the 
PatternSplitTokenizer and name it's init param "pattern" (to match 
PatternReplaceFilter)

> regex split() Tokenizer
> ---
>
> Key: SOLR-211
> URL: https://issues.apache.org/jira/browse/SOLR-211
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Ryan McKinley
> Attachments: SOLR-211-RegexSplitTokenizer.patch
>
>
> A TokenizerFactory that makes tokens from:
>   string.split( regex );

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-212) Embeddable class to call solr directly

2007-04-23 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-212:
---

Attachment: SOLR-212-DirectSolrConnection.patch

This class now sits in: o.a.s.servlet because it uses package protected request 
parsing functions.

It has a *really* simple test that should be extended.

Example usage:

 DirectSolrConnection solr = new DirectSolrConnection();
 String json = solr.request( "/select?qt=dismax&wt=json&q=...", null );
 String xml = solr.request( "/update", " Embeddable class to call solr directly
> --
>
> Key: SOLR-212
> URL: https://issues.apache.org/jira/browse/SOLR-212
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ryan McKinley
>Priority: Minor
> Attachments: SOLR-212-DirectSolrConnection.patch
>
>
> For some embedded applications, it is useful to call solr without running an 
> HTTP server.  This class mimics the behavior you would get if you sent the 
> request through an HTTP connection.  It is designed to work nicely (ie 
> simple) with JNI
> the main function is:
> public class DirectSolrConnection 
> {
>   String request( String pathAndParams, String body ) throws Exception
>   {
> ...
>   }
> }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-212) Embeddable class to call solr directly

2007-04-23 Thread Ryan McKinley (JIRA)
Embeddable class to call solr directly
--

 Key: SOLR-212
 URL: https://issues.apache.org/jira/browse/SOLR-212
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
Priority: Minor


For some embedded applications, it is useful to call solr without running an 
HTTP server.  This class mimics the behavior you would get if you sent the 
request through an HTTP connection.  It is designed to work nicely (ie simple) 
with JNI

the main function is:

public class DirectSolrConnection 
{
  String request( String pathAndParams, String body ) throws Exception
  {
...
  }
}




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-211) regex split() Tokenizer

2007-04-23 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-211:
---

Attachment: SOLR-211-RegexSplitTokenizer.patch

simple regex tokenizer and a test.



 
   
   
 
 


Given a field:
  "Architecture--United States--19th century"

will create tokens for:
  "Architecture"
  "United States"
 "19th century"



> regex split() Tokenizer
> ---
>
> Key: SOLR-211
> URL: https://issues.apache.org/jira/browse/SOLR-211
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Ryan McKinley
> Attachments: SOLR-211-RegexSplitTokenizer.patch
>
>
> A TokenizerFactory that makes tokens from:
>   string.split( regex );

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-211) regex split() Tokenizer

2007-04-23 Thread Ryan McKinley (JIRA)
regex split() Tokenizer
---

 Key: SOLR-211
 URL: https://issues.apache.org/jira/browse/SOLR-211
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Ryan McKinley


A TokenizerFactory that makes tokens from:

  string.split( regex );




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Call for Papers Opens for ApacheCon US 2007

2007-04-23 Thread Erik Hatcher
Sorry to be late to this game, but I already submitted two talks, one  
a longer tutorial on Flare, and one a regular session on Flare.  I'd  
be happy to pair up with Eric Pugh as long as I get airfare and  
hotel covered I'm happy to go.


Erik

On Apr 23, 2007, at 2:41 PM, Yonik Seeley wrote:


On 4/23/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:
I think there is definitely room for more then one presentation,  
and since
Flare seems really cool with a lot of meat in it and my knowledge  
of it is
fairly lacking anyway it would be great if you could spend a full  
session

on a Flare Case Study.


Yes, and if more people propose Solr presentations, there will be a
better chance of more Solr presentations at ApacheCon.

-Yonik




[jira] Commented: (SOLR-199) N-gram

2007-04-23 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491023
 ] 

Yonik Seeley commented on SOLR-199:
---

Since there is no impact or even memory overhead if unused, and just a teeny 
bit of disk overhead, this patch looks fine to me.

> N-gram
> --
>
> Key: SOLR-199
> URL: https://issues.apache.org/jira/browse/SOLR-199
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Adam Hiatt
>Priority: Trivial
> Attachments: SOLR-81-ngram.patch
>
>
> This tracks the creation of a patch that adds the n-gram/edge n-gram 
> tokenizing functionality that was initially part of SOLR-81 (spell checking). 
> This was taken out b/c the lucene SpellChecker class removed this dependency. 
> None-the-less, I think this is useful functionality and the addition is 
> trivial. How does everyone feel about such an addition?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Call for Papers Opens for ApacheCon US 2007

2007-04-23 Thread Yonik Seeley

On 4/23/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:

I think there is definitely room for more then one presentation, and since
Flare seems really cool with a lot of meat in it and my knowledge of it is
fairly lacking anyway it would be great if you could spend a full session
on a Flare Case Study.


Yes, and if more people propose Solr presentations, there will be a
better chance of more Solr presentations at ApacheCon.

-Yonik


Re: Call for Papers Opens for ApacheCon US 2007

2007-04-23 Thread Chris Hostetter

: I was thinking about doing something on this as well.  Is there
: enough room for multiple presentations?   Can two people do a
: presentation?  Chris, would you be interested in co-presenting?

I think there is definitely room for more then one presentation, and since
Flare seems really cool with a lot of meat in it and my knowledge of it is
fairly lacking anyway it would be great if you could spend a full session
on a Flare Case Study.




-Hoss



Re: Call for Papers Opens for ApacheCon US 2007

2007-04-23 Thread Eric Pugh

Hi all,

Erik Hatcher has shown me some of the abilities of Flare, I've been  
digging into it for a jobby job project, and I've done my first small  
Solr project which was adding PDF, Word, Excel, and Powerpoint  
parsing in the vein of the CSVRequestHandler code.  (Patches to be  
forthcoming!)


I was thinking about doing something on this as well.  Is there  
enough room for multiple presentations?   Can two people do a  
presentation?  Chris, would you be interested in co-presenting?


I've mostly been on the outside of the Lucene community, be much more  
active in some of the Jakarta projects, and then seduced away by Ruby  
for the past 18 months, but the possibilities of Solr and Flare have  
had me interested in getting involved in Apache again.


Eric Pugh


On Apr 23, 2007, at 1:21 PM, Chris Hostetter wrote:



: Is anyone willing to submit an introductory talk on Solr?

I was thinking about submitting two talks...

  Novice:   "Solr Out of the Box"
  Advanced: "Solr Beyond the Box"

The first being an attempt at showcasing all of the features of Solr
available without writing any code (just configuration and maybe some
XSLT) ... loading data from CSV, dismax query parsing, facets,
highlighting, date math, json output, etc., and any other cool  
features

that get committed between now and then.  I'll roabbly also talk about
Flare (but that would mean needing to learn about Flare before  
November)


The second would look at examples of how Solr can be customized  
without

building the whole thing from scratch ... writing custom plugins, and
embedding Solr in other applications.  (the custom plugins part i  
think i
can cover pretty well, but i'll need to pick the brains of people  
*doing*

Solr embedding for the second half if the proposal is accepted)


What do you guys think?



-Hoss



---
Principal
OpenSource Connections
Site: http://www.opensourceconnections.com
Blog: http://blog.opensourceconnections.com
Cell: 1-434-466-1467






RE: The ability to offering offset of keyword in search result

2007-04-23 Thread Chris Hostetter

: It come up with another question: In response XML, can I suppress output
: of  section? Because I only need highlight section.

well ... not really, you can say rows=0 but then you won't get
highlighting either.

you can use "fl" to make it smaller though (just pick only one field,
score or your uniqueKey perhaps)


-Hoss



Re: Call for Papers Opens for ApacheCon US 2007

2007-04-23 Thread Chris Hostetter

: Is anyone willing to submit an introductory talk on Solr?

I was thinking about submitting two talks...

  Novice:   "Solr Out of the Box"
  Advanced: "Solr Beyond the Box"

The first being an attempt at showcasing all of the features of Solr
available without writing any code (just configuration and maybe some
XSLT) ... loading data from CSV, dismax query parsing, facets,
highlighting, date math, json output, etc., and any other cool features
that get committed between now and then.  I'll roabbly also talk about
Flare (but that would mean needing to learn about Flare before November)

The second would look at examples of how Solr can be customized without
building the whole thing from scratch ... writing custom plugins, and
embedding Solr in other applications.  (the custom plugins part i think i
can cover pretty well, but i'll need to pick the brains of people *doing*
Solr embedding for the second half if the proposal is accepted)


What do you guys think?



-Hoss



RE: The ability to offering offset of keyword in search result

2007-04-23 Thread David Xiao
Appreciate that, just work perfect! 
It come up with another question: In response XML, can I suppress output of 
 section? Because I only need highlight section.

Appreciate in advance again!


-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Monday, April 23, 2007 1:44 AM
To: solr-dev@lucene.apache.org
Subject: Re: The ability to offering offset of keyword in search result

David Xiao wrote:
> Thanks. It looks like what I need but I am not quite understand how it 
> actually modify SolrConfig.xml to 
> enable highlight.
> 
> Is there an example xml snippet for? 
> 
>

You don't need to modify solrconfig.xml.  Just add parameters to the 
query string.  For example with the sample data, try:

http://localhost:8983/solr/select?q=power&hl=true&hl.fl=name

you'll get a highlighting section at the bottom


> 
> 
> -Original Message-
> From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
> Sent: Sunday, April 22, 2007 2:39 PM
> To: solr-dev@lucene.apache.org
> Subject: Re: The ability to offering offset of keyword in search result
> 
> 
> are you looking for highlighting?
> http://wiki.apache.org/solr/HighlightingParameters
> 
> This would give you:
>   In God We Trust – Each dollar had this
> 
> 
> David Xiao wrote:
>> Is that a feature that can identify offset of keyword in search result?
>>
>>  
>>
>> For example, if there is a following text to be indexed:
>>
>> In God We Trust – Each dollar had this
>>
>> And user search for: Each, should return with value: 19. Means first occurs 
>> at character 19.
>>
>>  
>>
>> I am not sure if lucene or solr will, have provide this functionality? Do 
>> you thought that could be useful to help give some search result context? 
>> Like google search does.
>>
>>  
>>
>> Regards,
>>
>> David
>>
>>
> 
> 



Re: Call for Papers Opens for ApacheCon US 2007

2007-04-23 Thread Bertrand Delacretaz

On 4/17/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:


> ...The Call for Papers is now open for ApacheCon US, to be held
> November 12-16 at the Peachtree Westin, Atlanta...


Is anyone willing to submit an introductory talk on Solr?

I could do it if needed, but I'd prefer someone else doing it, as it
looks like my involvement with Solr will be low in the next few
months.

Note that the deadline for talk submissions is next Monday, April 30th
(and not 23rd as currently mentioned on http://www.us.apachecon.com/,
just confirmed this with the conference planners).

-Bertrand