Re: new facet parameter: facet.exists=true

2010-03-30 Thread Erik Hatcher
Faceting on a "facet_fields" field will only have a handful (most  
likely) or less values so you'd be able to have that particular  
faceting cached to use quickly.  I'm not sure how much memory it'd  
take up, but certainly not as much as actually faceting on the fields  
themselves.


However, another approach you could take is to use facet.query.   
facet.query=some_facet_field:[* TO *] will return back a non-zero  
number if there are any documents in the results that have  
some_facet_field with a value.  You'd of course need to add a separate  
facet.query parameter for each field you cared about.


Erik

On Mar 30, 2010, at 10:45 AM, Gregor Kaczor wrote:

I am not sure if i got your approach right. If i did not, please  
explain where the advantages are in time and memory footprint.


In my opinion faceting on facet field names does not avoid counting  
facets. If my result set is huge so will be the facet numbers on on  
the field of facet names. It does not seem to me like saving memory  
and time.


My idea is to stop counting facets after finding one. It would show  
that for a certain query there are some categories available. My aim  
is to keep the memory footprint low while still beeing able to facet  
>10^7 of documents. A problem i am dealing with right now.


 Original-Nachricht 

Datum: Tue, 30 Mar 2010 08:46:23 -0400
Von: Erik Hatcher 
An: java-dev@lucene.apache.org
Betreff: Re: new facet parameter: facet.exists=true


One trick to doing this is to index a field that lists the facet  
field

names that each document possesses.  Then you can facet on the field
of field names (sounds confusing, sorry) and you'll know if there are
any documents in a result set that have values in, say, a "category"
field.

There's actually a basic patch out there that'll do this
automatically: https://issues.apache.org/jira/browse/SOLR-1280  - it
needs a bit of polish, but that's the general idea.

Erik

On Mar 30, 2010, at 7:46 AM, Gregor Kaczor wrote:


Facetting in indexes with document volumes exceeding twenty million
documents is a time and particularly memory consuming search.

In such huge indexes i am not interested if there is 4 or 5 million
documents of a special type, i just want to know there are some and
if i choose that facet will i get a list of results.

Such an option would just count the first occurance of a facet term
and return it without doing much of computation.

I cound not figure out how to get that behaviour with existing
facetting parameters.

What do you think?

Gregor Kaczor

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: new facet parameter: facet.exists=true

2010-03-30 Thread Erik Hatcher
One trick to doing this is to index a field that lists the facet field  
names that each document possesses.  Then you can facet on the field  
of field names (sounds confusing, sorry) and you'll know if there are  
any documents in a result set that have values in, say, a "category"  
field.


There's actually a basic patch out there that'll do this  
automatically: https://issues.apache.org/jira/browse/SOLR-1280  - it  
needs a bit of polish, but that's the general idea.


Erik

On Mar 30, 2010, at 7:46 AM, Gregor Kaczor wrote:

Facetting in indexes with document volumes exceeding twenty million  
documents is a time and particularly memory consuming search.


In such huge indexes i am not interested if there is 4 or 5 million  
documents of a special type, i just want to know there are some and  
if i choose that facet will i get a list of results.


Such an option would just count the first occurance of a facet term  
and return it without doing much of computation.


I cound not figure out how to get that behaviour with existing  
facetting parameters.


What do you think?

Gregor Kaczor

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1941) MinPayloadFunction returns 0 when only one payload is present

2010-02-14 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833552#action_12833552
 ] 

Erik Hatcher commented on LUCENE-1941:
--

Uwe - patch looks good.  Go for it!

> MinPayloadFunction returns 0 when only one payload is present
> -
>
> Key: LUCENE-1941
> URL: https://issues.apache.org/jira/browse/LUCENE-1941
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Query/Scoring
>Affects Versions: 2.9, 3.0
>Reporter: Erik Hatcher
>Assignee: Uwe Schindler
> Fix For: 2.9.2, 3.0.1, 3.1
>
> Attachments: LUCENE-1941.patch, LUCENE-1941.patch
>
>
> In some experiments with payload scoring through PayloadTermQuery, I'm seeing 
> 0 returned when using MinPayloadFunction.  I believe there is a bug there.  
> No time at the moment to flesh out a unit test, but wanted to report it for 
> tracking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1941) MinPayloadFunction returns 0 when only one payload is present

2010-02-12 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832932#action_12832932
 ] 

Erik Hatcher commented on LUCENE-1941:
--

Feel free to adjust this issue to whichever Lucene version makes sense.  I 
don't have bandwidth at the moment to address this myself.

> MinPayloadFunction returns 0 when only one payload is present
> -
>
> Key: LUCENE-1941
> URL: https://issues.apache.org/jira/browse/LUCENE-1941
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Query/Scoring
>Affects Versions: 2.9, 3.0
>Reporter: Erik Hatcher
> Fix For: 2.9.2, 3.0.1, 3.1
>
>
> In some experiments with payload scoring through PayloadTermQuery, I'm seeing 
> 0 returned when using MinPayloadFunction.  I believe there is a bug there.  
> No time at the moment to flesh out a unit test, but wanted to report it for 
> tracking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2238) deprecate ChineseAnalyzer

2010-01-28 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806239#action_12806239
 ] 

Erik Hatcher commented on LUCENE-2238:
--

+1

> deprecate ChineseAnalyzer
> -
>
> Key: LUCENE-2238
> URL: https://issues.apache.org/jira/browse/LUCENE-2238
> Project: Lucene - Java
>  Issue Type: Task
>  Components: contrib/analyzers
>Reporter: Robert Muir
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2238.patch
>
>
> The ChineseAnalyzer, ChineseTokenizer, and ChineseFilter (not the smart one, 
> or CJK) indexes chinese text as individual characters and removes english 
> stopwords, etc.
> In my opinion we should simply deprecate all of this in favor of 
> StandardAnalyzer, StandardTokenizer, and StopFilter, which does the same 
> thing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2231) my lucene project is able to search single time how can make it as long as i can

2010-01-22 Thread Erik Hatcher (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher resolved LUCENE-2231.
--

Resolution: Not A Problem

Please ask support questions on the java-user list.  Also (bias noted here), 
the book "Lucene in Action" will help you out immensely with these getting 
started questions.

> my lucene project is able to search single time how can make it as long as i 
> can
> 
>
> Key: LUCENE-2231
> URL: https://issues.apache.org/jira/browse/LUCENE-2231
> Project: Lucene - Java
>  Issue Type: Wish
>Affects Versions: 2.9.1
>Reporter: sameeuddin Mohammed
>Priority: Critical
>   Original Estimate: 5h
>  Remaining Estimate: 5h
>
> i am using lucene with netbeans 6.5 when i execute my project it will show 
> only single time next time there are no results in search and i want to know 
> how to match lower case and higher case as same, and when i have i word for 
> ex simpletext i want to search for only simple 
> plz send my reply as soon as possible

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [Lucene-java Wiki] Update of "PoweredBy" by ChristianShirts

2010-01-20 Thread Erik Hatcher
Are we being spammed here?  goldenpunter.com doesn't have a search box  
on the main page, nor do I even see a "search" link.


Erik

On Jan 19, 2010, at 10:40 PM, Apache Wiki wrote:


Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java  
Wiki" for change notification.


The "PoweredBy" page has been changed by ChristianShirts.
http://wiki.apache.org/jakarta-lucene/PoweredBy?action=diff&rev1=418&rev2=419

--

  * [[http://bestratescall.com|Best Rates Call]] - Prepaid phone  
card price comparison site compares pre-paid phone cards and  
international calling cards online between phone card service  
providers, helping consumers find cheap phone card that offers best  
calling rates to selected destinations. Best Rates search uses lucene.
  * [[http://www.iphonecard.com.au]] - Australia Online Prepaid  
Phone Card Store, lucene indexes all phone card data
  * [[http://www.iterend.com/]] - Blog search & discovering engine  
powered partly by lucene.
+  * [[http://www.goldenpunter.com|Online Casino bonuses]] - online  
community powered partly by lucene
  * [[http://www.open-search-server.com/|Jaeksoft Open Search  
Server]] - An OpenSource Search Engine based on Lucene with a web  
crawler, XML APIs, web interface, faceting, collapsing, etc.
  * [[http://jamwiki.org/|JAMWiki]] - A Java-based Wiki whose goal  
is Mediawiki feature-parity.
  * [[http://www.jobsintime.de/|jobs in time]] - company homepage  
including a job search engine



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2198) support protected words in Stemming TokenFilters

2010-01-13 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799839#action_12799839
 ] 

Erik Hatcher commented on LUCENE-2198:
--

+1 on the StemAttribute approach.  I've just encountered this exact need in 
some custom code I've been reviewing, where the decision to stem or not is 
dynamic per term (with the approach I'm looking at using a custom term type 
string and a custom stem filter).

> support protected words in Stemming TokenFilters
> 
>
> Key: LUCENE-2198
> URL: https://issues.apache.org/jira/browse/LUCENE-2198
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 3.0
>Reporter: Robert Muir
>Priority: Minor
>
> This is from LUCENE-1515
> I propose that all stemming TokenFilters have an 'exclusion set' that 
> bypasses any stemming for words in this set.
> Some stemming tokenfilters have this, some do not.
> This would be one way for Karl to implement his new swedish stemmer (as a 
> text file of ignore words).
> Additionally, it would remove duplication between lucene and solr, as they 
> reimplement snowballfilter since it does not have this functionality.
> Finally, I think this is a pretty common use case, where people want to 
> ignore things like proper nouns in the stemming.
> As an alternative design I considered a case where we generalized this to 
> CharArrayMap (and ignoring words would mean mapping them to themselves), 
> which would also provide a mechanism to override the stemming algorithm. But 
> I think this is too expert, could be its own filter, and the only example of 
> this i can find is in the Dutch stemmer.
> So I think we should just provide ignore with CharArraySet, but if you feel 
> otherwise please comment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Lucene 2.4.1 src .zip issue

2009-12-09 Thread Erik Hatcher
I was doing some research on past releases of Lucene and downloaded  
the archived 2.4.1 src .zip and got this:


~/Downloads: unzip lucene-2.4.1-src.zip
Archive:  lucene-2.4.1-src.zip
  End-of-central-directory signature not found.  Either this file is  
not
  a zipfile, or it constitutes one disk of a multi-part archive.  In  
the
  latter case the central directory and zipfile comment will be found  
on

  the last disk(s) of this archive.
unzip:  cannot find zipfile directory in one of lucene-2.4.1-src.zip or
lucene-2.4.1-src.zip.zip, and cannot find lucene-2.4.1- 
src.zip.ZIP, period.


Yikes!

Anyone else have issues with it?   Or anomalous to my download?

Erik


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [VOTE] Release Apache Lucene 2.9.1, take 4

2009-11-03 Thread Erik Hatcher

+1


On Nov 3, 2009, at 2:05 AM, Michael McCandless wrote:


OK, again!

I've built new release artifacts from svn rev 832363 (on the 2.9
branch), here:

http://people.apache.org/~mikemccand/staging-area/rc4_lucene2.9.1/

Changes are here:

http://people.apache.org/~mikemccand/staging-area/rc4_lucene2.9.1changes/

Please vote to officially release these artifacts as Apache Lucene
Java 2.9.1.

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [VOTE] Release Apache Lucene Java 2.9.1, take 3

2009-10-29 Thread Erik Hatcher

+1

On Oct 29, 2009, at 7:27 PM, Michael McCandless > wrote:



OK, let's try this again!

I've built new release artifacts from svn rev 831145 (on the 2.9
branch), here:

 http://people.apache.org/~mikemccand/staging-area/rc3_lucene2.9.1/

Changes are here:

 http://people.apache.org/~mikemccand/staging-area/rc3_lucene2.9.1changes/

Please vote to officially release these artifacts as Apache Lucene
Java 2.9.1.

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [VOTE] Release Apache Lucene Java 2.9.1, take 2

2009-10-26 Thread Erik Hatcher

+1


On Oct 26, 2009, at 3:48 PM, Grant Ingersoll wrote:


+1

On Oct 26, 2009, at 2:43 PM, Michael McCandless wrote:


OK, I've built new release artifacts (incorporating Uwes feedback)
from svn rev 829889 (on the 2.9 branch), here:

http://people.apache.org/~mikemccand/staging-area/rc2_lucene2.9.1/

Changes are here:

http://people.apache.org/~mikemccand/staging-area/rc2_lucene2.9.1changes/

Please vote to officially release these artifacts as Apache Lucene
Java 2.9.1.

Mike




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1941) MinPayloadFunction returns 0 when only one payload is present

2009-10-02 Thread Erik Hatcher (JIRA)
MinPayloadFunction returns 0 when only one payload is present
-

 Key: LUCENE-1941
 URL: https://issues.apache.org/jira/browse/LUCENE-1941
 Project: Lucene - Java
  Issue Type: Bug
  Components: Query/Scoring
Affects Versions: 2.9
Reporter: Erik Hatcher


In some experiments with payload scoring through PayloadTermQuery, I'm seeing 0 
returned when using MinPayloadFunction.  I believe there is a bug there.  No 
time at the moment to flesh out a unit test, but wanted to report it for 
tracking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1938) Precedence query parser using the contrib/queryparser framework

2009-10-01 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761405#action_12761405
 ] 

Erik Hatcher commented on LUCENE-1938:
--

Yes, let's just remove the old PrecedenceQueryParser (which was just an 
experiment by me - is anyone actually using it?)

> Precedence query parser using the contrib/queryparser framework
> ---
>
> Key: LUCENE-1938
> URL: https://issues.apache.org/jira/browse/LUCENE-1938
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/*
>Affects Versions: 2.9
>Reporter: Adriano Crestani
>Assignee: Adriano Crestani
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-1938.patch
>
>
> Extend the current StandardQueryParser on contrib so it supports boolean 
> precedence

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [jira] Updated: (LUCENE-1855) Change AttributeSource API to use generics

2009-09-30 Thread Erik Hatcher


On Sep 30, 2009, at 3:10 PM, Robert Muir wrote:

uwe, somewhat related to attributes API... anyway I can trick you or  
Luis or someone more familiar with QueryParsing into looking at  
PrecedenceQueryParser under contrib/misc?


PQP was my contribution, I think.  But it was merely a proof-of- 
concept.  Anyone using it?  I'm not.  I'm fine with it going away  
rather than spending time on it.


Erik


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [VOTE] Release Lucene 2.9.0

2009-09-24 Thread Erik Hatcher

+1

ship it!

Erik


On Sep 21, 2009, at 1:06 PM, Mark Miller wrote:


Okay, lets give this a shot:

The (proposed) release artifacts have been built and are up at:

http://people.apache.org/~markrmiller/staging-area/lucene2.9/

The changes are here:

http://people.apache.org/~markrmiller/staging-area/lucene2.9changes/


Please vote to officially release these artifacts as 2.9.0, or point  
out

any errors and I'll fix and repackage.

We need at least 3 binding (PMC) votes.

Thanks everyone for all their hard work on this! This has been some  
release.


--
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: ApacheCon US - Lucene Meetup?!

2009-08-25 Thread Erik Hatcher
You can rest assured that there will be plenty of Lucene folks and  
gatherings around ACUS09.


Our company, Lucid Imagination, will certainly have some kind of  
sponsored event during the conference.  I haven't heard any details  
yet though - seems so far away, yet I guess it's not really.  Many  
folks from Lucid will be in town for the conference as well, Grant,  
myself for sure.


Erik

On Aug 25, 2009, at 4:03 PM, Simon Willnauer wrote:


I'm curious if there is a meetup this year @ ApacheCon US similar to
the one at ApacheCon Europe earlier this year?

Simon

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1850) Update overview example code

2009-08-24 Thread Erik Hatcher (JIRA)
Update overview example code


 Key: LUCENE-1850
 URL: https://issues.apache.org/jira/browse/LUCENE-1850
 Project: Lucene - Java
  Issue Type: Task
  Components: Examples, Javadocs
Reporter: Erik Hatcher
 Fix For: 2.9


See http://lucene.apache.org/java/2_4_1/api/core/overview-summary.html - need 
to update for non-deprecated best-practices/recommended API usage.

Also, double-check that the demo app works as documented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1806) Add args to test-macro

2009-08-14 Thread Erik Hatcher (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher resolved LUCENE-1806.
--

Resolution: Fixed

Done, thanks Jason.

> Add args to test-macro
> --
>
> Key: LUCENE-1806
> URL: https://issues.apache.org/jira/browse/LUCENE-1806
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.4.1
>Reporter: Jason Rutherglen
>Priority: Trivial
> Fix For: 2.9
>
> Attachments: LUCENE-1806.patch
>
>   Original Estimate: 0.03h
>  Remaining Estimate: 0.03h
>
> Add passing args to JUnit.  (Like Solr and mainly for debugging).  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1800) QueryParser should use reusable token streams

2009-08-13 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742781#action_12742781
 ] 

Erik Hatcher commented on LUCENE-1800:
--

Does anyone use PrecedenceQueryParser?   It was an experiment tossed out there, 
but I've not heard of anyone using it for real.  

> QueryParser should use reusable token streams
> -
>
> Key: LUCENE-1800
> URL: https://issues.apache.org/jira/browse/LUCENE-1800
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.9
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
> Fix For: 2.9
>
> Attachments: LUCENE-1800.patch, LUCENE-1800_analyzingQP.patch
>
>
> Just like indexing, the query parser should use reusable token streams

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: The new Contrib QueryParser should not be slated to replace the old one yet

2009-08-11 Thread Erik Hatcher

Agreed, don't deprecate our beloved QueryParser.

Erik

On Aug 11, 2009, at 1:54 PM, Mark Miller wrote:

I don't think we should stick with the current path of replacing the  
current QueryParser with the new contrib QueryParser in Lucene 3.0.


The new QueryParser has not been used much at all yet. Its  
interfaces (which will need to abide by back compat in core) have  
not been vetted enough.


The new parser appears to add complication to some of things that  
were very simple with the old parser.


The main benefits of the new parser are claimed to be the ability to  
plug and play many syntaxes and QueryBuilders. This is not an end  
user benefit though and I'm not even sure how much of a benefit it  
is to us. There is currently only one impl. It seems to me, once you  
start another impl, its a long shot that the exact same query tree  
representation is going to work with a completely different syntax.  
Sure, if you are just doing postfix rather than prefix, it will be  
fine – but the stuff that would likely be done – actual new syntaxes  
– are not likely to be very pluggable. If a syntax can map to the  
same query tree, I think we would likely stick to a single syntax –  
else suffer the confusion and maintenance headaches for syntactic  
sugar. More than a well factored QueryParser that can more easily  
allow different syntaxes to map to the same query tree  
representation, I think we just want a single solid syntax for core  
Lucene that supports Spans to some degree. We basically have that  
now, sans the spans support. Other, more exotic QueryParsers should  
live in contrib, as they do now.


Which isn't to say this QueryParser should not one day rule the  
roost – but I don't think its earned the right yet. And I don't  
think there is a hurry to toss the old parser.


Personally, I think that the old parser should not be deprecated.  
Lets let the new parser breath in contrib for a bit. Lets see if  
anyone actually adds any other syntaxes. Lets see if the  
pluggability results in any improvements. Lets see if some of the  
harder things to do (overriding query build methods?) become easier  
or keep people from using the new parser.


Lets just see if the new parser draws users without us forcing them  
to it. And lets also wait and see what other committers say – not  
many have gotten much time to deal with the new parser, or deal with  
user list questions on it.


I just think its premature to start moving people to this new  
parser. It didn't even really get in until right before release –  
the paint on the thing still reeks. There is no rush. I saw we  
undeprecate the current QueryParser and remove the wording in the  
new QueryParser about it replacing the new in 3.0. Later, if we  
think it should replace it (after having some experience to judge  
from), we can reinstate the current plan. Anyone agree?


--
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: The new Contrib QueryParser should not be slated to replace the old one yet

2009-08-11 Thread Erik Hatcher

+1

In other words: undeprecate our good friend QueryParser.

Erik

On Aug 11, 2009, at 1:54 PM, Mark Miller wrote:

I don't think we should stick with the current path of replacing the  
current QueryParser with the new contrib QueryParser in Lucene 3.0.


The new QueryParser has not been used much at all yet. Its  
interfaces (which will need to abide by back compat in core) have  
not been vetted enough.


The new parser appears to add complication to some of things that  
were very simple with the old parser.


The main benefits of the new parser are claimed to be the ability to  
plug and play many syntaxes and QueryBuilders. This is not an end  
user benefit though and I'm not even sure how much of a benefit it  
is to us. There is currently only one impl. It seems to me, once you  
start another impl, its a long shot that the exact same query tree  
representation is going to work with a completely different syntax.  
Sure, if you are just doing postfix rather than prefix, it will be  
fine – but the stuff that would likely be done – actual new syntaxes  
– are not likely to be very pluggable. If a syntax can map to the  
same query tree, I think we would likely stick to a single syntax –  
else suffer the confusion and maintenance headaches for syntactic  
sugar. More than a well factored QueryParser that can more easily  
allow different syntaxes to map to the same query tree  
representation, I think we just want a single solid syntax for core  
Lucene that supports Spans to some degree. We basically have that  
now, sans the spans support. Other, more exotic QueryParsers should  
live in contrib, as they do now.


Which isn't to say this QueryParser should not one day rule the  
roost – but I don't think its earned the right yet. And I don't  
think there is a hurry to toss the old parser.


Personally, I think that the old parser should not be deprecated.  
Lets let the new parser breath in contrib for a bit. Lets see if  
anyone actually adds any other syntaxes. Lets see if the  
pluggability results in any improvements. Lets see if some of the  
harder things to do (overriding query build methods?) become easier  
or keep people from using the new parser.


Lets just see if the new parser draws users without us forcing them  
to it. And lets also wait and see what other committers say – not  
many have gotten much time to deal with the new parser, or deal with  
user list questions on it.


I just think its premature to start moving people to this new  
parser. It didn't even really get in until right before release –  
the paint on the thing still reeks. There is no rush. I saw we  
undeprecate the current QueryParser and remove the wording in the  
new QueryParser about it replacing the new in 3.0. Later, if we  
think it should replace it (after having some experience to judge  
from), we can reinstate the current plan. Anyone agree?


--
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1405) Support for new Resources model in ant 1.7 in Lucene ant task.

2009-06-19 Thread Erik Hatcher (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher resolved LUCENE-1405.
--

Resolution: Fixed

Przemyslaw - apologies for the delay in addressing this valuable patch.  It's 
now been tested and committed.  I also added a comment to example.xml showing 
how to run the  task from a source checkout.

> Support for new Resources model in ant 1.7 in Lucene ant task.
> --
>
> Key: LUCENE-1405
> URL: https://issues.apache.org/jira/browse/LUCENE-1405
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/*
>Affects Versions: 2.3.2
>Reporter: Przemyslaw Sztoch
>Assignee: Erik Hatcher
> Fix For: 2.9
>
> Attachments: lucene-ant1.7-newresources.patch
>
>
> Ant Task for Lucene should use modern Resource model (not only FileSet child 
> element).
> There is a patch with required changes.
> Supported by old (ant 1.6) and new (ant 1.7) resources model:
>  
>   
>  
> Supported only by new (ant 1.7) resources model:
>  
>   
>  
>  
>   
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-1405) Support for new Resources model in ant 1.7 in Lucene ant task.

2009-06-12 Thread Erik Hatcher (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher reassigned LUCENE-1405:


Assignee: Erik Hatcher

> Support for new Resources model in ant 1.7 in Lucene ant task.
> --
>
> Key: LUCENE-1405
> URL: https://issues.apache.org/jira/browse/LUCENE-1405
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/*
>Affects Versions: 2.3.2
>Reporter: Przemyslaw Sztoch
>    Assignee: Erik Hatcher
> Fix For: 2.9
>
> Attachments: lucene-ant1.7-newresources.patch
>
>
> Ant Task for Lucene should use modern Resource model (not only FileSet child 
> element).
> There is a patch with required changes.
> Supported by old (ant 1.6) and new (ant 1.7) resources model:
>  
>   
>  
> Supported only by new (ant 1.7) resources model:
>  
>   
>  
>  
>   
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1635) Handle Escape character

2009-05-14 Thread Erik Hatcher (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher resolved LUCENE-1635.
--

Resolution: Invalid

This isn't a bug.  If you're using SimpleAnalyzer, it makes sense that it is 
splitting it up.  Escaping just allows the characters to pass through as-is to 
the analyzer.

> Handle Escape character
> ---
>
> Key: LUCENE-1635
> URL: https://issues.apache.org/jira/browse/LUCENE-1635
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: QueryParser
>Affects Versions: 2.0.0
> Environment: Os-Windows,J2EE
>Reporter: rimi
>Priority: Critical
>
> I have tried to search using the query :AWT-T.The query parser is returning 
> "awt t".It's removing the - special character.If I try to find using the 
> query :AWT\-T then also the query parser is returning the same query "awt t". 
> I have used simpleAnalyzer. Please help me.I want to search using the - 
> character and that's why I put AWT\-T because \ will escape the special 
> character.But it's not working in its way.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1629) contrib intelligent Analyzer for Chinese

2009-05-13 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708912#action_12708912
 ] 

Erik Hatcher commented on LUCENE-1629:
--

My initial thought is to move the  excluding **/*.java and **/*.html to 
the "compile" macro.   In the ancient past, Ant actually used to do this 
automatically with .



> contrib intelligent Analyzer for Chinese
> 
>
> Key: LUCENE-1629
> URL: https://issues.apache.org/jira/browse/LUCENE-1629
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Affects Versions: 2.4.1
> Environment: for java 1.5 or higher, lucene 2.4.1
>Reporter: Xiaoping Gao
>Assignee: Michael McCandless
> Fix For: 2.9
>
> Attachments: analysis-data.zip, bigramdict.mem, 
> build-resources.patch, coredict.mem, LUCENE-1629-java1.4.patch
>
>
> I wrote a Analyzer for apache lucene for analyzing sentences in Chinese 
> language. it's called "imdict-chinese-analyzer", the project on google code 
> is here: http://code.google.com/p/imdict-chinese-analyzer/
> In Chinese, "我是中国人"(I am Chinese), should be tokenized as "我"(I)   "是"(am)   
> "中国人"(Chinese), not "我" "是中" "国人". So the analyzer must handle each sentence 
> properly, or there will be mis-understandings everywhere in the index 
> constructed by Lucene, and the accuracy of the search engine will be affected 
> seriously!
> Although there are two analyzer packages in apache repository which can 
> handle Chinese: ChineseAnalyzer and CJKAnalyzer, they take each character or 
> every two adjoining characters as a single word, this is obviously not true 
> in reality, also this strategy will increase the index size and hurt the 
> performance baddly.
> The algorithm of imdict-chinese-analyzer is based on Hidden Markov Model 
> (HMM), so it can tokenize chinese sentence in a really intelligent way. 
> Tokenizaion accuracy of this model is above 90% according to the paper 
> "HHMM-based Chinese Lexical analyzer ICTCLAL" while other analyzer's is about 
> 60%.
> As imdict-chinese-analyzer is a really fast and intelligent. I want to 
> contribute it to the apache lucene repository.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-1629) contrib intelligent Analyzer for Chinese

2009-05-13 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708912#action_12708912
 ] 

Erik Hatcher edited comment on LUCENE-1629 at 5/13/09 5:58 AM:
---

My initial thought is to move the  excluding {noformat} **/*.java and 
**/*.html{noformat}  to the "compile" macro.   In the ancient past, Ant 
actually used to do this automatically with .



  was (Author: ehatcher):
My initial thought is to move the  excluding **/*.java and **/*.html 
to the "compile" macro.   In the ancient past, Ant actually used to do this 
automatically with .


  
> contrib intelligent Analyzer for Chinese
> 
>
> Key: LUCENE-1629
> URL: https://issues.apache.org/jira/browse/LUCENE-1629
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/analyzers
>Affects Versions: 2.4.1
> Environment: for java 1.5 or higher, lucene 2.4.1
>Reporter: Xiaoping Gao
>Assignee: Michael McCandless
> Fix For: 2.9
>
> Attachments: analysis-data.zip, bigramdict.mem, 
> build-resources.patch, coredict.mem, LUCENE-1629-java1.4.patch
>
>
> I wrote a Analyzer for apache lucene for analyzing sentences in Chinese 
> language. it's called "imdict-chinese-analyzer", the project on google code 
> is here: http://code.google.com/p/imdict-chinese-analyzer/
> In Chinese, "我是中国人"(I am Chinese), should be tokenized as "我"(I)   "是"(am)   
> "中国人"(Chinese), not "我" "是中" "国人". So the analyzer must handle each sentence 
> properly, or there will be mis-understandings everywhere in the index 
> constructed by Lucene, and the accuracy of the search engine will be affected 
> seriously!
> Although there are two analyzer packages in apache repository which can 
> handle Chinese: ChineseAnalyzer and CJKAnalyzer, they take each character or 
> every two adjoining characters as a single word, this is obviously not true 
> in reality, also this strategy will increase the index size and hurt the 
> performance baddly.
> The algorithm of imdict-chinese-analyzer is based on Hidden Markov Model 
> (HMM), so it can tokenize chinese sentence in a really intelligent way. 
> Tokenizaion accuracy of this model is above 90% according to the paper 
> "HHMM-based Chinese Lexical analyzer ICTCLAL" while other analyzer's is about 
> 60%.
> As imdict-chinese-analyzer is a really fast and intelligent. I want to 
> contribute it to the apache lucene repository.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Question related to improving search results

2009-05-02 Thread Erik Hatcher
I suppose you're talking about content that is indexed from web  
crawling.  It's a messy problem.  Extraneous junk needs to be filtered  
out and not indexed, so some form of header/footer/sidebar detection  
and exclusion definitely makes searching crawled pages much better.


When possible, index clean content.  In the case of wikipedia, you can  
get full dumps of the content without the templates, just the content.


Erik

On May 2, 2009, at 6:48 AM, Aditya wrote:


Hi,

New to this group.

Question:

Generally sites like wikipeadia have a template and every page  
follows it. These templates contains the word that occurs in every  
page.


For example wikipedia template has the list of language in the left  
panel. Now these words gets indexed every time since they are not  
(cannot be) stop words.
if user for example search for "Galego", every wikipedia page will  
be in the search result which is wrong as every wikipedia page does  
not talk about "Galego"


Any takes on this one for how to solve this problem?


Best Regards,
Aditya




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: I am unable to create index of an object having composite key

2009-04-24 Thread Erik Hatcher
You'll do best to direct this question to the Hibernate group.  java- 
dev is for Lucene development so not an appropriate Lucene place to  
ask.  java-user would be better, but your question is more Hibernate  
specific.


Erik


On Apr 24, 2009, at 3:49 AM, gopalbisht wrote:



Hi all,

I am using hibernate search with lucene. I need to create index of  
DomainTag
object  which have only one composite key. I am unware  how to  
define the

annotations for the composite key in DomainTag (pojo) class.
If any one can help , please help me. Thanks  in advance.

My DomainTag.hbm.xml file is as follows:
-- 



http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd"; >








   



 















-- 


--
View this message in context: 
http://www.nabble.com/I-am-unable-to-create-index-of-an-object-having-composite-key-tp23211575p23211575.html
Sent from the Lucene - Java Developer mailing list archive at  
Nabble.com.



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [VOTE] Release Lucene 2.4.1

2009-03-04 Thread Erik Hatcher


On Mar 4, 2009, at 2:20 PM, Michael Busch wrote:

On 3/4/09 5:28 AM, Michael McCandless wrote:


Grant Ingersoll wrote:


On Mar 4, 2009, at 8:05 AM, Michael McCandless wrote:


lucene-2.4.1-src.tar.gz
--> ant test


I'm not sure how this could ever pass.  The lib directory is not  
present, so neither is JUnit, so the tests do not compile.   I'm  
guessing you have Junit in your Ant lib dir., right?


Hmm... indeed I do.  And when I remove it, the tests do fail to  
compile.



Yeah, same on my machine.


Same here also.  It's a standard operating procedure for me to put  
junit.jar into ANT_HOME/lib.  I suppose that's a historical thing as  
Ant's  used to require that.


Erik


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Getting tokens from search results. Simple concept

2009-02-27 Thread Erik Hatcher
Have you looked at the contrib Highlighter?  Or using an Analyzer  
directly to give you the offsets?


Erik

On Feb 26, 2009, at 9:32 AM, HPDrifter wrote:



When I get a search result based on my index, I need the exact  
tokens which

were identified in the index as part of the result.  Why?  I need the
character offsets.

I have a solution right now...almost, but it bugs the hell out of me  
that I

can say something like...
documentHit[0].getIdentifiedTokens();

Do I need to make a contribution in order to make this happen?:ninja:


--
View this message in context: 
http://www.nabble.com/Getting-tokens-from-search-results.--Simple-concept-tp5364p5364.html
Sent from the Lucene - Java Developer mailing list archive at  
Nabble.com.



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: LIA2 on l.a.o/java OK?

2009-02-20 Thread Erik Hatcher

On Feb 20, 2009, at 6:56 AM, Grant Ingersoll wrote:
Isn't that what http://wiki.apache.org/lucene-java/Resources is  
for?  I like LIA as much as the next person,  but if we do it for  
LIA2 then it opens the door for others (http://www.amazon.com/s/ref=nb_ss_gw?url=search-alias%3Daps&field-keywords=Lucene&x=0&y=0 
) which will likely clutter the page quite a bit.


There is precedent.  Other books do make it to Apache sites.  iBatis  
has the Manning cover in the lower-left sidebar: .  Wicket has three big book covers: .   
Struts more subtly: .  ActiveMQ has a news  
blurb with big book cover: 


As for other books making it there... that'd be fine by me to have a  
few book covers shown on the home page.  I imagine we won't hear other  
authors even asking.


 I just don't think we can imply that LIA2 is the "official book on  
Lucene".


It's the only book dedicated exclusively to Lucene that I'm aware of,  
and all of the co-authors are committers/PMC members and active  
members of the community.  It's about as "official" as it gets.


Books on open source projects lend a great deal of credibility and  
I've seen first hand that they are used as deciding factors when  
choosing a technology.  A book means it is mature and has a good  
following.


Personal bias noted - I support putting it on the home page, and also  
news blurbs when there is activity, like when it goes to print and is  
available in hardcopy.


Erik


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Registration for ApacheCon Europe 2009 is now open!

2009-01-29 Thread Erik Hatcher
Cross-posting this announcement.  There are several relevant Lucene/ 
Solr talks including:


Trainings
  - Lucene Boot Camp (Grant Ingersoll)
  - Solr Boot Camp (Erik Hatcher)

Sessions
  - Introducing Apache Mahout (Grant)
  - Lucene Case Studies (Erik)
  - Advanced Indexing Techniques with Apache Lucene (Michael Busch)

And a whole slew of Hadoop/cloud coverage.

Erik




--

ApacheCon EU 2009 registration is now open!
23-27 March -- Mövenpick Hotel, Amsterdam, Netherlands
http://www.eu.apachecon.com/


Registration for ApacheCon Europe 2009 is now open - act before early
bird prices expire 6 February.  Remember to book a room at the Mövenpick
and use the Registration Code: Special package attendees for the
conference registration, and get 150 Euros off your full conference
registration.

Lower Costs - Thanks to new VAT tax laws, our prices this year are 19%
lower than last year in Europe!  We've also negotiated a Mövenpick rate
of a maximum of 155 Euros per night for attendees in our room block.

Quick Links:

  http://xrl.us/aceu09sp  See the schedule
  http://xrl.us/aceu09hp  Get your hotel room
  http://xrl.us/aceu09rp  Register for the conference

Other important notes:

- Geeks for Geeks is a new mini-track where we can feature advanced
technical content from project committers.  And our Hackathon on Monday
and Tuesday is open to all attendees - be sure to check it off in your
registration.

- The Call for Papers for ApacheCon US 2009, held 2-6 November
2009 in Oakland, CA, is open through 28 February, so get your
submissions in now.  This ApacheCon will feature special events with
some of the ASF's original founders in celebration of the 10th
anniversary of The Apache Software Foundation.

  http://www.us.apachecon.com/c/acus2009/

- Interested in sponsoring the ApacheCon conferences?  There are plenty
of sponsor packages available - please contact Delia Frees at
de...@apachecon.com for further information.

==
ApacheCon EU 2008: A week of Open Source at it's best!

Hackathon - open to all! | Geeks for Geeks | Lunchtime Sessions
In-Depth Trainings | Multi-Track Sessions | BOFs | Business Panel
Lightning Talks | Receptions | Fast Feather Track | Expo... and more!

- Shane Curcuru, on behalf of
 Noirin Shirley, Conference Lead,
 and the whole ApacheCon Europe 2009 Team
 http://www.eu.apachecon.com/  23-27 March -- Amsterdam, Netherlands



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Fwd: [Travel Assistance] Applications for ApacheCon EU 2009 - Now Open

2009-01-23 Thread Erik Hatcher



Begin forwarded message:


From: Tony Stevenson 
Date: January 23, 2009 8:28:19 AM EST
To: travel-assista...@apache.org
Subject: [Travel Assistance] Applications for ApacheCon EU 2009 -  
Now Open




The Travel Assistance Committee is now accepting applications for  
those
wanting to attend ApacheCon EU 2009 between the 23rd and 27th March  
2009

in Amsterdam.

The Travel Assistance Committee is looking for people who would like  
to

be able to attend ApacheCon EU 2009 who need some financial support in
order to get there. There are very few places available and the  
criteria
is high, that aside applications are open to all open source  
developers

who feel that their attendance would benefit themselves, their
project(s), the ASF or open source in general.

Financial assistance is available for travel, accommodation and  
entrance

fees either in full or in part, depending on circumstances. It is
intended that all our ApacheCon events are covered, so it may be  
prudent
for those in the United States or Asia to wait until an event closer  
to
them comes up - you are all welcome to apply for ApacheCon EU of  
course,
but there must be compelling reasons for you to attend an event  
further
away that your home location for your application to be considered  
above

those closer to the event location.

More information can be found on the main Apache website at
http://www.apache.org/travel/index.html - where you will also find a
link to the online application form.

Time is very tight for this event, so applications are open now and  
will

end on the 4th February 2009 - to give enough time for travel
arrangements to be made.

Good luck to all those that apply.


Regards,
The Travel Assistance Committee
--




--
Tony Stevenson
t...@pc-tony.com  //  pct...@apache.org  // pct...@freenode.net
http://blog.pc-tony.com/

1024D/51047D66 ECAF DC55 C608 5E82 0B5E  3359 C9C7 924E 5104 7D66
--



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1314) IndexReader.clone

2009-01-09 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662475#action_12662475
 ] 

Erik Hatcher commented on LUCENE-1314:
--

{quote}
Is there a way with ant to only test one test case?
Tried:
"ant -Dtestcase=org.apache.lucene.index.TestIndexReaderReopen test-core" which 
according to the Wiki http://wiki.apache.org/lucene-java/HowToContribute should 
work. 
{quote}

The value of the testcase parameter fits in this way **/${testcase}.java in 
common-build.xml, so in your case it'd be -Dtestcase=TestIndexReaderReopen


> IndexReader.clone
> -
>
> Key: LUCENE-1314
> URL: https://issues.apache.org/jira/browse/LUCENE-1314
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Index
>Affects Versions: 2.3.1
>Reporter: Jason Rutherglen
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 2.9
>
> Attachments: LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, 
> LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, 
> LUCENE-1314.patch, LUCENE-1314.patch, LUCENE-1314.patch, lucene-1314.patch, 
> lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, 
> lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, lucene-1314.patch, 
> lucene-1314.patch, lucene-1314.patch, lucene-1314.patch
>
>
> Based on discussion 
> http://www.nabble.com/IndexReader.reopen-issue-td18070256.html.  The problem 
> is reopen returns the same reader if there are no changes, so if docs are 
> deleted from the new reader, they are also reflected in the previous reader 
> which is not always desired behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1387) Add LocalLucene

2008-12-19 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658062#action_12658062
 ] 

Erik Hatcher commented on LUCENE-1387:
--

I've taken some quick peeks into the code, run the unit tests, nicely packaged 
and presented!

A couple of thoughts:

* Maybe the Filter's should be using the DocIdSet API rather than the BitSet 
deprecated stuff?  We can refactor that after being committed I supposed, but 
not something we want to leave like that.

* DistanceQuery is awkwardly named.  It's not an (extends) Query it's a 
POJO with helpers.  Maybe DistanceQueryFactory?   (but it creates a Filter also)

* CartesianPolyFilter is not a Filter (but CartesianShapeFilter is)

I think this looks good enough to commit as well, just noting the above for 
cosmetic refactoring consideration after the code is in.




> Add LocalLucene
> ---
>
> Key: LUCENE-1387
> URL: https://issues.apache.org/jira/browse/LUCENE-1387
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/*
>Reporter: Grant Ingersoll
>Priority: Minor
> Attachments: spatial-lucene.zip, spatial.tar.gz, spatial.zip
>
>
> Local Lucene (Geo-search) has been donated to the Lucene project, per 
> https://issues.apache.org/jira/browse/INCUBATOR-77.  This issue is to handle 
> the Lucene portion of integration.
> See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: solr NumberUtils to lucene?

2008-12-16 Thread Erik Hatcher
My thoughts... bring over any simple functions like these are that are  
generally useful.At a quick glance, the functions in Solr's  
NumberUtils are generally useful and fit well in Lucene's  
NumberTools.  What's the harm?


Erik

On Dec 16, 2008, at 9:14 PM, Ryan McKinley wrote:


I posted this same question for the same reasons a while back...
http://markmail.org/message/mji7jnpa5xjfflmw

I'm looking at local lucene and trying to figure out how it could go  
into lucene.  As is, locallucene depends on solr since it needs  
NumberUtils.


Any change of heart for moving it into lucene?

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

2008-12-08 Thread Erik Hatcher
Well, there's the pretty sophisticated and extensible XML query parser  
in contrib.  I've still only scratched the surface of it, but it meets  
the specs you mentioned.


Erik


On Dec 8, 2008, at 4:51 PM, robert engels wrote:

I think an important piece to make this work is the query parser/ 
syntax.


We already have a system similar to what is outlined below.  We made  
changes to the query syntax to support our various query extensions.


The nice thing, is that persisting queries is a simple string.  It  
also makes it very easy for external system to submit queries.


We also have XML definitions for a "result set".

I think the only way to make this work though, is probably a more  
detailed query syntax (similar to SQL), so that it can be easily  
extended with new clauses/functions without breaking existing code.


I would also suggest that any core queries classes have a  
representation here.


I would also like to see a way for "proprietary" clauses to be  
supported (like calls in SQL).


On Dec 8, 2008, at 3:37 PM, eks dev wrote:

That sounds much better. Trying to distribute lucene (my reason why  
all this would be interesting) itself is just not going to work for  
far too many applications and will put burden on API extensions.


My point is, I do not want to distribute Lucene Index, I need to  
distribute my application that is using Lucene. Think of it like  
having distributed Luke, usefull by itself, but not really usefull  
for slightly more complex use cases.
My Hit class is specialized Lucene Hit object, my Query has totally  
diferent features and agregates Lucene Query... this is what I can  
control, what I need to send over the wire and that is the place  
where I define what is my Version/API, if lucene API Classes change  
and all existing featurs remain, I have no problems in keeping my  
serialized objects compatible.  So the versioning becomes under my  
control, Lucene provides only features, library.


Having light layer, easily extensible,  on top of the core  API  
would be just great, as fas as I am concerned java Serialization is  
not my world, having something light and extensible in etch/thrift/ 
hadop IPC/ProtocolBuffers  direction is much more thrilling. That  
is exactly the road hadoop, nutch, katta and probably many others  
are taking, having comon base that supports such cases is maybe  
good idea, why not making RemoteSearchable using hadoop IPC, or  
etch/thrift ...


Maybe there are other reasons to suport java serialization, I do  
not know. Just painting one view on this idea





- Original Message 

From: Doug Cutting (JIRA) <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Monday, 8 December, 2008 19:52:46
Subject: [jira] Commented: (LUCENE-1473) Implement standard  
Serialization across Lucene versions



   [
https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654513 
#action_12654513

]

Doug Cutting commented on LUCENE-1473:
--

Would it take any more lines of code to remove Serializeable from  
the core
classes and re-implement RemoteSearchable in a separate layer on  
top of the core

APIs?  That layer could be a contrib module and could get all the
externalizeable love it needs.  It could support a specific  
popular subset of
query and filter classes, rather than arbitrary Query  
implementations.  It would
be extensible, so that if folks wanted to support new kinds of  
queries, they
easily could.  This other approach seems like a slippery slope,  
complicating
already complex code with new concerns.  It would be better to  
encapsulate these
concerns in a layer atop APIs whose back-compatibility we already  
make promises

about, no?


Implement standard Serialization across Lucene versions
---

   Key: LUCENE-1473
   URL: https://issues.apache.org/jira/browse/LUCENE-1473
   Project: Lucene - Java
Issue Type: Bug
Components: Search
  Affects Versions: 2.4
  Reporter: Jason Rutherglen
  Priority: Minor
   Attachments: custom-externalizable-reader.patch,  
LUCENE-1473.patch,

LUCENE-1473.patch, LUCENE-1473.patch, LUCENE-1473.patch


 Original Estimate: 8h
Remaining Estimate: 8h

To maintain serialization compatibility between Lucene versions,

serialVersionUID needs to be added to classes that implement
java.io.Serializable.  java.io.Externalizable may be implemented  
in classes for

faster performance.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






-
To unsubscribe, e-m

Re: Adding dependency to servlet-api

2008-11-07 Thread Erik Hatcher

Mark - I've done a quick implementation here:

  <https://issues.apache.org/jira/browse/SOLR-839>

I'm not familiar, yet, with what it takes (haven't read your  
contribution to LIA2 yet, bad, Erik, bad) to configure it - so any  
feedback you have on what might be needed beyond this is welcome:


  public Query parse() throws ParseException {
CorePlusExtensionsParser parser = new  
CorePlusExtensionsParser(getReq().getSchema().getQueryAnalyzer(),  
getReq().getSchema().getSolrQueryParser(null));

try {
  return parser.parse(new  
ByteArrayInputStream(getString().getBytes()));

} catch (ParserException e) {
  throw new ParseException(e.getMessage());
}
  }

Erik



On Nov 5, 2008, at 5:31 AM, mark harwood wrote:

How about simply adding a query parser plugin to Solr using the  
XML query parser?


My initial concern is to make public in Lucene/contrib the demo web  
app I have just written up for Lucene In Action 2. I wanted to put  
this in Lucene/contrib rather than limit it to being code  
distributed with the book.
This aside, I think it's generally important to maintain  
documentation/demos/code and other useful resources under the core  
Lucene project for those people where Solr might not necessarily be  
the answer.


I'd be happy to help or to even go the full distance and implement  
it myself.


Adding XML query support to Solr certainly sounds like it would be a  
sensible idea. I think applications with advanced query criteria  
struggle with the constraints of standard Lucene QueryParser syntax  
or passing "flat" parameters in Solr urls.
Not sure I can commit any time to extending Solr myself but happy to  
support you with any guidance you may need on this.


Cheers,
Mark








- Original Message 
From: Erik Hatcher <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Wednesday, 5 November, 2008 10:48:48
Subject: Re: Adding dependency to servlet-api

Mark,

How about simply adding a query parser plugin to Solr using the XML  
query parser?   It'd be pretty short, sweet, easy, and a real value- 
add to Solr too!   I'd be happy to help or to even go the full  
distance and implement it myself.  I've considered it often, as it  
would be great to provide the breadth of query types that your  
parser can create.


   Erik




On Nov 5, 2008, at 4:16 AM, mark harwood wrote:

Just checked Solr (forgot about that obvious precedent!) and they  
have it in trunk/lib and an entry in trunk/notice.txt which reads:


"  Includes software from other Apache Software Foundation  
projects, including, but not limited to:

   
- Apache Tomcat (lib/servlet-api-2.4.jar)

"
I thought the servlet api was Sun's to be honest so not sure why it  
is credited to Tomcat.


I could just follow this precedent. Anyone from the Solr camp care  
to comment?





- Original Message 
From: Uwe Schindler <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org; [EMAIL PROTECTED]
Sent: Wednesday, 5 November, 2008 9:53:51
Subject: RE: Adding dependency to servlet-api

E.g. Jetty webserver (Apache 2.0 License) ships the servlet 2.5 API  
in

source (SVN) and binary form along with its web container server.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [EMAIL PROTECTED]


From: Konstantin Priblouda [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 05, 2008 10:42 AM
To: java-dev@lucene.apache.org
Subject: Re: Adding dependency to servlet-api



[ Konstantin Pribluda http://www.pribluda.de ]
JTec quality components: http://www.pribluda.de/projects/


--- On Tue, 11/4/08, markharw00d <[EMAIL PROTECTED]> wrote:


From: markharw00d <[EMAIL PROTECTED]>
Subject: Adding dependency to servlet-api
To: java-dev@lucene.apache.org
Date: Tuesday, November 4, 2008, 11:09 PM
I'd like to add a web-based demo for the XML QueryParser
but unlike the existing web demo I'd prefer to use some
Java code that gets compiled rather than doing it all in JSP
files that aren't part of the build. Doing it this way
will add a dependency on servlet-api.jar which will need to
be added to the build somehow.
Has anyone done this before on an Apache project before and
know what the license implications are? Tomcat/Struts must
do this already but I'm not sure what is involved.


Geronimo project provides servlet api declarations in m2  
repository on

apache.
Usually this is agood choice.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




--

Re: Adding dependency to servlet-api

2008-11-05 Thread Erik Hatcher

Mark,

How about simply adding a query parser plugin to Solr using the XML  
query parser?   It'd be pretty short, sweet, easy, and a real value- 
add to Solr too!   I'd be happy to help or to even go the full  
distance and implement it myself.  I've considered it often, as it  
would be great to provide the breadth of query types that your parser  
can create.


Erik




On Nov 5, 2008, at 4:16 AM, mark harwood wrote:

Just checked Solr (forgot about that obvious precedent!) and they  
have it in trunk/lib and an entry in trunk/notice.txt which reads:


"  Includes software from other Apache Software Foundation projects,  
including, but not limited to:


 - Apache Tomcat (lib/servlet-api-2.4.jar)
 
"
I thought the servlet api was Sun's to be honest so not sure why it  
is credited to Tomcat.


I could just follow this precedent. Anyone from the Solr camp care  
to comment?





- Original Message 
From: Uwe Schindler <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org; [EMAIL PROTECTED]
Sent: Wednesday, 5 November, 2008 9:53:51
Subject: RE: Adding dependency to servlet-api

E.g. Jetty webserver (Apache 2.0 License) ships the servlet 2.5 API in
source (SVN) and binary form along with its web container server.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [EMAIL PROTECTED]


From: Konstantin Priblouda [mailto:[EMAIL PROTECTED]
Sent: Wednesday, November 05, 2008 10:42 AM
To: java-dev@lucene.apache.org
Subject: Re: Adding dependency to servlet-api



[ Konstantin Pribluda http://www.pribluda.de ]
JTec quality components: http://www.pribluda.de/projects/


--- On Tue, 11/4/08, markharw00d <[EMAIL PROTECTED]> wrote:


From: markharw00d <[EMAIL PROTECTED]>
Subject: Adding dependency to servlet-api
To: java-dev@lucene.apache.org
Date: Tuesday, November 4, 2008, 11:09 PM
I'd like to add a web-based demo for the XML QueryParser
but unlike the existing web demo I'd prefer to use some
Java code that gets compiled rather than doing it all in JSP
files that aren't part of the build. Doing it this way
will add a dependency on servlet-api.jar which will need to
be added to the build somehow.
Has anyone done this before on an Apache project before and
know what the license implications are? Tomcat/Struts must
do this already but I'm not sure what is involved.


Geronimo project provides servlet api declarations in m2 repository  
on

apache.
Usually this is agood choice.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] Release Lucene 2.4.0

2008-10-08 Thread Erik Hatcher
All tests pass (via "ant test") for me with the 2.4.0 download.  I  
have junit-4.4.jar in my ANT_HOME/lib directory.


Specifically:

[junit] Testsuite: org.apache.lucene.store.TestHugeRamFile
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.585  
sec


Maybe an intermittent anomaly, Grant?

Erik



On Oct 7, 2008, at 9:49 PM, Grant Ingersoll wrote:

I really hate to do this, but the source tests don't compile, since  
we now rely on JUnit to be shipped w/ Lucene.


Steps:
download the source tarball
untar
ant test

Lots of compile errors.  I think we just need to package lib/junit  
with the src.  Of course, maybe not a big deal, as we didn't package  
JUnit before, but we also didn't have a lib directory before,  
either.  Not sure if it is a show stopper.


However, when I copy junit into a lib directory there, I get:

[junit] Testcase:  
testHugeFile(org.apache.lucene.store.TestHugeRamFile):Caused an  
ERROR

   [junit] Java heap space
   [junit] java.lang.OutOfMemoryError: Java heap space
   [junit] at java.util.Arrays.copyOf(Arrays.java:2760)
   [junit] at java.util.Arrays.copyOf(Arrays.java:2734)
   [junit] at java.util.ArrayList.ensureCapacity(ArrayList.java: 
167)

   [junit] at java.util.ArrayList.add(ArrayList.java:351)
   [junit] at  
org.apache.lucene.store.RAMFile.addBuffer(RAMFile.java:69)
   [junit] at  
org 
.apache 
.lucene 
.store.RAMOutputStream.switchCurrentBuffer(RAMOutputStream.java:129)
   [junit] at  
org 
.apache.lucene.store.RAMOutputStream.writeBytes(RAMOutputStream.java: 
115)
   [junit] at  
org 
.apache 
.lucene.store.TestHugeRamFile.testHugeFile(TestHugeRamFile.java:68)

   [junit]


This passes when I run trunk on the same machine, but fails on both  
the branch and the downloaded src file.  I know I could just  
increase the memory, but it seems odd that trunk passes.


Otherwise, things look good.

So -0, I guess.


On Oct 7, 2008, at 9:55 AM, Michael McCandless wrote:



Reminder: this is a new vote (started 2 days ago) to release 2.4.0.

We still need 2 more binding (PMC) votes to release.

Mike

Michael McCandless wrote:



OK maybe 4th time's a charm ;)

Let's start a new VOTE to release these artifacts (derived from  
svn rev 701827) as Lucene 2.4.0:


http://people.apache.org/~mikemccand/staging-area/lucene2.4take4

Here's my +1.

Mike

Grant Ingersoll wrote:


+1.

On Oct 3, 2008, at 1:22 PM, Michael McCandless wrote:



OK let's try again!

Let's start a new VOTE to release these artifacts (derived from  
svn rev 701445) as Lucene 2.4.0:


http://people.apache.org/~mikemccand/staging-area/lucene2.4take3

Here's my vote: +1.

Mike

mark harwood wrote:


Hi Mike,
Given the repackaging any chance you can sneak in 2 contrib  
fixes I added recently?


Null pointer introduced to clients dropping in 2.4 upgrade  - 
http://svn.apache.org/viewvc?view=rev&revision=700815
Bug in fuzzy matching - 
http://svn.apache.org/viewvc?view=rev&revision=699512

No big deal if it's too late.




- Original Message 
From: Michael McCandless <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Friday, 3 October, 2008 16:48:41
Subject: Re: [VOTE] Release Lucene 2.4.0


Ugh.  I'll fix & re-spin.

Mike

Grant Ingersoll wrote:


The docs in the downloaded tarball still refer to 2.4-dev.

The doap.rdf file is (badly) out of date.



On Sep 30, 2008, at 8:30 AM, Michael McCandless wrote:



I've built the release artifacts, from revision 700430 on the  
2.4

branch.  These are the changes:

http://people.apache.org/~mikemccand/staging-area/lucene2.4changes/Changes.html

Please vote to officially release these artifacts as 2.4.0:

http://people.apache.org/~mikemccand/staging-area/lucene2.4

We need at least 3 binding (PMC) votes.

Mike






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ









-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] Release Lucene 2.4.0

2008-10-07 Thread Erik Hatcher

+1

Erik


On Oct 7, 2008, at 9:55 AM, Michael McCandless wrote:



Reminder: this is a new vote (started 2 days ago) to release 2.4.0.

We still need 2 more binding (PMC) votes to release.

Mike

Michael McCandless wrote:



OK maybe 4th time's a charm ;)

Let's start a new VOTE to release these artifacts (derived from svn  
rev 701827) as Lucene 2.4.0:


http://people.apache.org/~mikemccand/staging-area/lucene2.4take4

Here's my +1.

Mike

Grant Ingersoll wrote:


+1.

On Oct 3, 2008, at 1:22 PM, Michael McCandless wrote:



OK let's try again!

Let's start a new VOTE to release these artifacts (derived from  
svn rev 701445) as Lucene 2.4.0:


http://people.apache.org/~mikemccand/staging-area/lucene2.4take3

Here's my vote: +1.

Mike

mark harwood wrote:


Hi Mike,
Given the repackaging any chance you can sneak in 2 contrib  
fixes I added recently?


Null pointer introduced to clients dropping in 2.4 upgrade  - 
http://svn.apache.org/viewvc?view=rev&revision=700815
Bug in fuzzy matching - 
http://svn.apache.org/viewvc?view=rev&revision=699512

No big deal if it's too late.




- Original Message 
From: Michael McCandless <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Friday, 3 October, 2008 16:48:41
Subject: Re: [VOTE] Release Lucene 2.4.0


Ugh.  I'll fix & re-spin.

Mike

Grant Ingersoll wrote:


The docs in the downloaded tarball still refer to 2.4-dev.

The doap.rdf file is (badly) out of date.



On Sep 30, 2008, at 8:30 AM, Michael McCandless wrote:



I've built the release artifacts, from revision 700430 on the  
2.4

branch.  These are the changes:

http://people.apache.org/~mikemccand/staging-area/lucene2.4changes/Changes.html

Please vote to officially release these artifacts as 2.4.0:

http://people.apache.org/~mikemccand/staging-area/lucene2.4

We need at least 3 binding (PMC) votes.

Mike






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Can I filter the results returned by IndexReader.terms(field) using a field?

2008-10-06 Thread Erik Hatcher
Code in an end to the numeration when the first term from a different  
field arrives.  Different fields will not be interleaved.


Erik

On Oct 6, 2008, at 9:36 PM, Luis Fco. Ramirez Daza Gonzalez wrote:


Hi

I use IndexReader.Terms() to get all the terms in the index and then  
I iterate through the list and get only those terms for a specific  
field.
Is there a way to get the terms for a particular field? Otherwise I  
have to read all the terms in the index just to get the terms of a  
field. Something like>


TermEnum termEnum = reader.terms("");

I need this because I want to show a list of all the terms available  
in some fields, so the user can select from a list or from an  
autocomplete textbox.


Thanks in advance for your help.

Luis



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1061) Adding a factory to QueryParser to instantiate query instances

2008-08-28 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626499#action_12626499
 ] 

Erik Hatcher commented on LUCENE-1061:
--

Michael - you are a machine!

+1 to the subclassing approach and your general patch.

What might be even more interesting is to make the newXXX methods return Query 
instead of a specific type.  I'm not sure if that would work in all cases 
(surely not for BooleanQuery), but might for most of 'em.

For example, what if newTermQuery(Term term) returned a Query instead of a 
TermQuery?   That'd add a fair bit more flexibility, as long as none of the 
calling code needed a specific type of Query.

The hoops we jump through because we're in Java sheesh.  :)

> Adding a factory to QueryParser to instantiate query instances
> --
>
> Key: LUCENE-1061
> URL: https://issues.apache.org/jira/browse/LUCENE-1061
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: QueryParser
>Affects Versions: 2.3
>Reporter: John Wang
>Assignee: Michael McCandless
> Fix For: 2.4
>
> Attachments: LUCENE-1061.patch, lucene_patch.txt
>
>
> With the new efforts with Payload and scoring functions, it would be nice to 
> plugin custom query implementations while using the same QueryParser.
> Included is a patch with some refactoring the QueryParser to take a factory 
> that produces query instances.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1061) Adding a factory to QueryParser to instantiate query instances

2008-08-27 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626055#action_12626055
 ] 

Erik Hatcher commented on LUCENE-1061:
--

What's wrong with just subclassing QueryParser and overriding the desired 
methods?   Either way someone wanting to provide custom Query implementations 
will be writing effectively the same code, just with more indirection with this 
method.

> Adding a factory to QueryParser to instantiate query instances
> --
>
> Key: LUCENE-1061
> URL: https://issues.apache.org/jira/browse/LUCENE-1061
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: QueryParser
>Affects Versions: 2.3
>Reporter: John Wang
> Fix For: 2.4
>
> Attachments: lucene_patch.txt
>
>
> With the new efforts with Payload and scoring functions, it would be nice to 
> plugin custom query implementations while using the same QueryParser.
> Included is a patch with some refactoring the QueryParser to take a factory 
> that produces query instances.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1343) A replacement for ISOLatin1AccentFilter that does a more thorough job of removing diacritical marks or non-spacing modifiers.

2008-08-14 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12622476#action_12622476
 ] 

Erik Hatcher commented on LUCENE-1343:
--

{quote}
Unit tests are the best way to document the many ways this thing can work.
{quote}

gets a judges score of 11 from me.  Gold for Lance for Quote of the Day.

> A replacement for ISOLatin1AccentFilter that does a more thorough job of 
> removing diacritical marks or non-spacing modifiers.
> -
>
> Key: LUCENE-1343
> URL: https://issues.apache.org/jira/browse/LUCENE-1343
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Reporter: Robert Haschart
>Priority: Minor
> Attachments: normalizer.jar, UnicodeCharUtil.java, 
> UnicodeNormalizationFilter.java, UnicodeNormalizationFilterFactory.java
>
>
> The ISOLatin1AccentFilter takes Unicode characters that have diacritical 
> marks and replaces them with a version of that character with the diacritical 
> mark removed.  For example é becomes e.  However another equally valid way of 
> representing an accented character in Unicode is to have the unaccented 
> character followed by a non-spacing modifier character (like this:  é  )
> The ISOLatin1AccentFilter doesn't handle the accents in decomposed unicode 
> characters at all.Additionally there are some instances where a word will 
> contain what looks like an accented character, that is actually considered to 
> be a separate unaccented character  such as  Ł  but which to make searching 
> easier you want to fold onto the latin1  lookalike  version   L  .   
> The UnicodeNormalizationFilter can filter out accents and diacritical marks 
> whether they occur as composed characters or decomposed characters, it can 
> also handle cases where as described above characters that look like they 
> have diacritics (but don't) are to be folded onto the letter that they look 
> like ( Ł  -> L )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] Break Back Compatibility "Contract" on Fieldable

2008-07-30 Thread Erik Hatcher
+1 to all three from me.  Darn you, Java, for making object- 
orientation kludgey.


Erik

On Jul 30, 2008, at 8:44 AM, Grant Ingersoll wrote:


As they say, rules are meant to be broken...

For a variety of reasons, some outlined below, I (and others) would  
like us to break our back compatibility requirements and allow for  
modifying the Fieldable interface in 2.x releases with the 3.x plan  
to be to separate out write side interfaces from read side  
interfaces per Hoss' suggestion in http://lucene.markmail.org/message/77qs2pjy3inzfddj?q=Fieldable%2C+AbstractField 
.


Our reasons are based on LUCENE-1340, LUCENE-1219 and 
http://lucene.markmail.org/message/77qs2pjy3inzfddj?q=Fieldable%2C+AbstractField

Simply put, my gut says there are almost no implementations of  
Fieldable "in the wild", and those that are won't mind a few lines  
of code change here and there to accommodate Fieldable changing  
(since Fields really are just simple data structures and don't due  
much algorithmically, except maybe LazyField)


Thus, here's the vote part:

1. We mark Fieldable as being subject to change.  We heavily  
advertise (on java-dev and java-user and maybe general) that in the  
next minor release of Lucene (2.4), Fieldable will be changing.  It  
is also marked at the top of CHANGES.txt very clearly for all the  
world to see.  Since 2.4 is probably at least a month away, I think  
this gives anyone with a pulse enough time to react.


2. We thus allow 1340 and 1219 to go forward, and maybe some others.

3. [OPTIONAL] We commit to rethinking input Documents and output  
Documents for 3.x per Hoss' design suggestions in the email thread  
above.  At a minimum, it becomes an abstract base class.



+1 to all 3 items from me.

-Grant

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: test

2008-07-10 Thread Erik Hatcher

I've sent remove requests for both of those addresses.

Erik


On Jul 10, 2008, at 11:18 AM, Yonik Seeley wrote:


Thanks guys, I guess I checked nabble too quickly.

It's still happening too.
Should a moderator perhaps try removing  [EMAIL PROTECTED]
or [EMAIL PROTECTED] from the list and email them to
re-subscribe after the've fixed their problems?

-Yonik

On Thu, Jul 10, 2008 at 5:21 AM, Michael McCandless
<[EMAIL PROTECTED]> wrote:


I get the same thing, and I definitely saw your post, Yonik, twice.

Mike

Karl Wettin wrote:

I think that is a bounce from some a subscriber, sending the  
bounce to you
instead of the list. I got the same  yesterday when posting to  
java-users.

But looking at nabble my posts are there.

  karl

10 jul 2008 kl. 03.23 skrev Yonik Seeley:


sorry for the noise... this is just a test to java-dev.
I'm unable to post to java-user, and trying to re-subscribe  
didn't help.
Full text of the bounce email below, in case anyone else is  
seeing this.


-Yonik

Delivered-To: [EMAIL PROTECTED]
Received: by 10.114.75.13 with SMTP id x13cs2248waa;
Wed, 9 Jul 2008 08:52:57 -0700 (PDT)
Received: by 10.143.168.4 with SMTP id v4mr2332544wfo. 
39.1215618777288;

Wed, 09 Jul 2008 08:52:57 -0700 (PDT)
Return-Path: <>
Received: from spwiki.spsoftware.com ([61.17.14.87])
by mx.google.com with ESMTP id
28si10149607wfd.4.2008.07.09.08.52.51;
Wed, 09 Jul 2008 08:52:57 -0700 (PDT)
Received-SPF: neutral (google.com: 61.17.14.87 is neither permitted
nor denied by best guess record for domain of  
spwiki.spsoftware.com)

client-ip=61.17.14.87;
Authentication-Results: mx.google.com; spf=neutral (google.com:
61.17.14.87 is neither permitted nor denied by best guess record  
for

domain of spwiki.spsoftware.com) smtp.mail=
Received: from localhost (localhost)
  by spwiki.spsoftware.com (8.14.2/8.14.2) id m69Fni50017713;
  Wed, 9 Jul 2008 21:19:44 +0530
Date: Wed, 9 Jul 2008 21:19:44 +0530
From: Mail Delivery Subsystem <[EMAIL PROTECTED]>
Message-Id: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
MIME-Version: 1.0
Content-Type: multipart/report; report-type=delivery-status;
  boundary="m69Fni50017713.1215618584/spwiki.spsoftware.com"
Subject: Returned mail: see transcript for details
Auto-Submitted: auto-generated (failure)

This is a MIME-encapsulated message

--m69Fni50017713.1215618584/spwiki.spsoftware.com

The original message was received at Wed, 9 Jul 2008 21:19:38 +0530
from localhost.localdomain [127.0.0.1]

- The following addresses had permanent fatal errors -
spsoft
(reason: 550 5.1.1 User unknown)
(expanded from: root)

- Transcript of session follows -
550 5.1.1 spsoft... User unknown

--m69Fni50017713.1215618584/spwiki.spsoftware.com
Content-Type: message/delivery-status

Reporting-MTA: dns; spwiki.spsoftware.com
Received-From-MTA: DNS; localhost.localdomain
Arrival-Date: Wed, 9 Jul 2008 21:19:38 +0530

Final-Recipient: RFC822; [EMAIL PROTECTED]
X-Actual-Recipient: RFC822; [EMAIL PROTECTED]
Action: failed
Status: 5.1.1
Diagnostic-Code: X-Unix; 550 5.1.1 User unknown
Last-Attempt-Date: Wed, 9 Jul 2008 21:19:44 +0530

--m69Fni50017713.1215618584/spwiki.spsoftware.com
Content-Type: message/rfc822

Return-Path: <[EMAIL PROTECTED]>
Received: from spwiki.spsoftware.com (localhost.localdomain  
[127.0.0.1])

  by spwiki.spsoftware.com (8.14.2/8.14.2) with ESMTP id
m69FnP51017710
  for <[EMAIL PROTECTED]>; Wed, 9 Jul 2008 21:19:38 +0530
Received: from pop.spsoftindia.com
  by spwiki.spsoftware.com with POP3 (fetchmail-6.3.8)
  for <[EMAIL PROTECTED]> (multi-drop); Wed, 09 Jul 2008  
21:19:38

+0530 (IST)
Received: from mx05.mfg.onr.siteprotect.com (unknown  
[192.168.33.227])

  by mf20.mfg.onr.chicago.hostway (Postfix) with ESMTP id
14DF224302C9
  for <[EMAIL PROTECTED]>; Wed,  9 Jul 2008 10:51:24
-0500 (CDT)
Received: from mail.apache.org (hermes.apache.org [140.211.11.2])
  by mx05.mfg.onr.siteprotect.com (Postfix) with SMTP id  
A72CA55C08F

  for <[EMAIL PROTECTED]>; Wed,  9 Jul 2008 10:51:23
-0500 (CDT)
Received: (qmail 12837 invoked by uid 500); 9 Jul 2008 15:51:16  
-
Mailing-List: contact [EMAIL PROTECTED]; run by  
ezmlm

Precedence: bulk
List-Help: 
List-Unsubscribe: 
List-Post: 
List-Id: 
Reply-To: [EMAIL PROTECTED]
Delivered-To: mailing list [EMAIL PROTECTED]
Received: (qmail 12826 invoked by uid 99); 9 Jul 2008 15:51:16  
-

Received: from athena.apache.org (HELO athena.apache.org)
(140.211.11.136)
by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Jul 2008  
08:51:16 -0700

X-ASF-Spam-Status: No, hits=-0.0 required=10.0
  tests=SPF_PASS
X-Spam-Check-By: apache.org
Received-SPF: pass (athena.apache.org: domain of [EMAIL PROTECTED]
designates 209.85.146.176 as permitted sender)
Received: from [209.85.146.176] (HELO wa-out-1112.google.com)
(209.85.146.176)
by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Jul 200

Re: Contributing towards the development of Lucene

2008-07-08 Thread Erik Hatcher


On Jul 8, 2008, at 4:20 AM, Ajay Lakhani wrote:

I am interested in contributing towards the development of Lucene.
Could anyone suggest a proper way to start or how can I assign a bug  
to my name to work on.


AJ - welcome!

We have a wiki page set up to answer this question: 


Lots of ways to help out, with documentation, answering user  
questions, fixing bugs in JIRA, writing more test cases, etc.   Take  
your pick :)


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Is there a reason MemoryIndex does not implement Serializable?

2008-06-25 Thread Erik Hatcher

No reason done!

Erik

On Jun 25, 2008, at 11:05 AM, Jason Rutherglen wrote:


It seems like it could, it even has serialVersionUID defined.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: formatable changes log

2008-01-27 Thread Erik Hatcher
I switched to maintaining the CHANGES file in YAML format for the  
solr-ruby library:





There is even a unit test to make sure it at least parses properly:




This makes it easily machine processable and also is nicely readable,  
thanks to YAML's much-more-pleasant-than-XML nature.


Erik



On Jan 25, 2008, at 2:05 PM, Chris Hostetter wrote:



: As it is becoming hard to browse/navigate CHANGES.txt, how about  
maintaining

: it in a simple HTML file?

personally, i'm a fan of simple, plain text files for the  
CHANGES.txt ...

easy to edit, easy to read.

that said: if people want to start using a more structured  
changelog file
(xml/html/whatever) i've got no problem with that ... as long as we  
have a

stylesheet that can render it as plaintext.

(even better in my mind would be if we could keep editing in plain  
text,
and had some handy scripts to reformat into HTML .. but that's  
obviously a

little harder to get perfect and probably not worth the effort.)

The other thing to keep in mind if we're going to start discussing new
ways to manage change logs is that Jira has automated changelog /  
release

notes generation built into it, using the issue summaires...

http://issues.apache.org/jira/browse/LUCENE? 
report=com.atlassian.jira.plugin.system.project:changelog-panel
http://issues.apache.org/jira/secure/ConfigureReleaseNote.jspa? 
projectId=12310110


...allthough it's not quite as verbose as our current release notes.


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1095) StopFilter should have option to incr positionIncrement after stop word

2007-12-18 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552905
 ] 

Erik Hatcher commented on LUCENE-1095:
--

I believe QueryParser has been fixed since that first change I made mentioned 
by Steven to account for positions returned from an Analyzer.So maybe all 
is well with fixing StopFilter now.  Unit tests needed :)

> StopFilter should have option to incr positionIncrement after stop word
> ---
>
> Key: LUCENE-1095
> URL: https://issues.apache.org/jira/browse/LUCENE-1095
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Hoss Man
>
> I've seen this come up on the mailing list a few times in the last month, so 
> i'm filing a known bug/improvement arround it...
> StopFilter should have an option that if set, records how many stop words are 
> "skipped" in a row, and then sets that value as the positionIncrement on the 
> "next" token that StopFilter does return.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: svn commit: r603856 - in /lucene/java/trunk/contrib/benchmark: ./ src/java/org/apache/lucene/benchmark/byTask/feeds/

2007-12-14 Thread Erik Hatcher


On Dec 13, 2007, at 12:58 AM, [EMAIL PROTECTED] wrote:

+12/13/07
+  LUCENE-1086: DocMakers setup for the "docs.dir" property
+  fixed to properly handle absolute paths. (Shai Erera via Doron  
Cohen)

+


I haven't looked at the details of this beyond the commit messages  
that went by, but if this is because of an Ant property that is  
sending in a relative versus absolute path, you can ensure the path  
is always absolute by using :




instead of using the  variant.   If  
the file="..." value is not already absolute, it'll make it absolute  
based on the base directory.


(couldn't help myself) After a quick peek at the contrib/benchmark/ 
build.xml, this may be the culprit:


   

use file instead of value, and you can always assume the path is  
absolute from that default, and mandate anyone overriding that  
property specify an absolute path.  And there are some other  
properties (the ones pointing to JAR files, for example) that should  
be using the file variant as well.


My rule is that all Ant properties that point to any file or  
directory use the "file" variant of .


And another candidate for refactoring in the benchmark build.xml is  
this:


   


The relative path can be converted into an absolute path using two  
 elements instead of one .   My  
rule here is to avoid using "line", rather:


   
   

No need to use ${basedir} for that second argument as that is  
implicit in using  if the value is not already an  
absolute path.


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-167) [PATCH] QueryParser not handling queries containing AND and OR

2007-12-06 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549046
 ] 

Erik Hatcher commented on LUCENE-167:
-

the PrecedenceQueryParser is in the contrib/miscellaneous codebase (in Lucene's 
repo) and in released "miscellaneous" JAR.  But it has some issues that are 
documented in the test case, so it is definitely not ready for prime time.  

> [PATCH] QueryParser not handling queries containing AND and OR
> --
>
> Key: LUCENE-167
> URL: https://issues.apache.org/jira/browse/LUCENE-167
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: QueryParser
>Affects Versions: unspecified
> Environment: Operating System: Linux
> Platform: PC
>    Reporter: Morus Walter
>Assignee: Erik Hatcher
> Attachments: LuceneTest.java, QueryParser.jj.patch, QueryParser.patch
>
>
> The QueryParser does not seem to handle boolean queries containing AND and OR
> operators correctly:
> e.g.
> a AND b OR c AND d gets parsed as +a +b +c +d.
> The attached patch fixes this by changing the vector of boolean clauses into a
> vector of vectors of boolean clauses in the addClause method of the query
> parser. A new sub-vector is created whenever an explicit OR operator is used.
> Queries using explicit AND/OR are grouped by precedence of AND over OR. That 
> is
> a OR b AND c gets a OR (b AND c).
> Queries using implicit AND/OR (depending on the default operator) are handled 
> as
> before (so one can still use a +b -c to create one boolean query, where b is
> required, c forbidden and a optional).
> It's less clear how a query using both explizit AND/OR and implicit operators
> should be handled.
> Since the patch groups on explicit OR operators a query 
> a OR b c is read as a (b c)
> whereas
> a AND b c as +a +b c
> (given that default operator or is used).
> There's one issue left:
> The old query parser reads  a query 
> `a OR NOT b' as `a -b' which is the same as `a AND NOT b'.
> The modified query parser reads this as `a (-b)'.
> While this looks better (at least to me), it does not produce the result of a 
> OR
> NOT b. Instead the (-b) part seems to be silently dropped.
> While I understand that this query is illegal (just searching for one negative
> term) I don't think that silently dropping this part is an appropriate way to
> deal with that. But I don't think that's a query parser issue.
> The only question is, if the query parser should take care of that. 
> I attached the patch (made against 1.3rc3 but working for 1.3final as well) 
> and
> a test program.
> The test program parses a number of queries with default-or and default-and
> operator and reparses the result of the toString method of the created query.
> It outputs the initial query, the parsed query with default or, the reparesed
> query, the parsed query with the default and it's reparsed query.
> If called with a -q option, it also run's the queries against an index
> consisting of all documentes containing one or none a b c or d.
> Using an unpatched and a patched version of lucene in the classpath one can 
> look
> at the effect of the patch in detail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Web-based Luke

2007-11-12 Thread Erik Hatcher


On Nov 12, 2007, at 1:21 PM, mark harwood wrote:

I'm putting together a Google Web Toolkit-based version of Luke:
   http://www.inperspective.com/lucene/Luke.war
( Just add your version of lucene core jar to WEB-INF/lib  
subdirectory and you should have the basis of a web-enabled Luke.)


Mark: +1   Wow!  Very nice.

The intention behind this is to port Luke to a wholly Apache- 
licensed codebase so it can be managed in Lucene's subversion  
repository  (and for me to learn GWT!).


RDD (Resume Driven Development) at it's finest!

Early results are encouraging so I would like to consider how to  
handle this moving forward.


The considerations are:
1) Are folks interested in bringing this into the Lucene project?


Absolutely.


2) Where to manage it (in contrib?)


Seems like a fine place to put it for now.  But it really deserves a  
better home than that.  What about a new "client/luke" directory?   
(following on Solr's structure).


3) What needs to change in the build process to take GWT source  
(Java code) and feed it through the GWT compiler to produce  
Javascript/html etc?


Can't be much.


4) How to package it in the distribution (bundle Jetty?)


Yeah, that'd be nice.  Exactly how Solr does it.

In MVC terms, having separated the Model code from the (thinlet- 
based) View code I now also have the basis for building a Swing- 
based UI too on the same backend.


This is very nice, Mark.  This would surely plug into Solr's admin UI  
very well also.


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1049) Simple toString() for BooleanFilter

2007-11-09 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541497
 ] 

Erik Hatcher commented on LUCENE-1049:
--

Jason - the patch looks like it is generated backwards (minus signs, not 
plusses).  

> Simple toString() for BooleanFilter
> ---
>
> Key: LUCENE-1049
> URL: https://issues.apache.org/jira/browse/LUCENE-1049
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/*
>Reporter: Jason Calabrese
>Priority: Trivial
> Attachments: patch.txt
>
>
> While working with BooleanFilter I wanted a basic toString() for debugging.
> This is what I came up.  It works ok for me.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Fwd: lucene indexing and merge process

2007-10-18 Thread Erik Hatcher
Forwarding this to java-dev per request.  Seems like the best place  
to discuss this topic.


Erik


Begin forwarded message:


From: "John Wang" <[EMAIL PROTECTED]>
Date: October 17, 2007 5:43:29 PM EDT
To: [EMAIL PROTECTED]
Subject: lucene indexing and merge process

Hi Erik:

We are revamping our search system here at LinekdIn. And we are  
using Lucene.


One issue we ran across is that we store an UID in Lucene which  
we map to the DB storage. So given a docid, to lookup its UID, we  
have the following solutions:


1) Index it as a Stored field and get it from reader.document (very  
slow if recall is large)
2) Load/Warmup the FieldCache (for large corpus, loading up the  
indexreader can be slow)
3) construct it using the FieldCache and persist it on disk  
everytime the index changes. (not suitable for real time indexing,  
e.g. this process will degrade as # of documents get large)


None of the above solutions turn out to be adequate for our  
requirements.


 What we end up doing is to modify Lucene code by changing  
SegmentReader,DocumentWriter,and FieldWriter classes by taking  
advantage of the Lucene Segment/merge process. E.g:


 For each segment, we store a .udt file, which is an int[]  
array, (by changing the FieldWriter class)


 And SegmentReader will load the .udt file into an array.

 And merge happens seemlessly.

 Because the tight encapsulation around these classes, e.g.  
private and final methods, it is very difficult to extend Lucene  
while avoiding branch into our own version. Is there a way we can  
open up and make these classes extensible? We'd be happy to  
contribute what we have done.


 I guess to tackle the problem from a different angle: is there  
a way to incorporate FieldCache into the segments (it is strictly  
in memory now), and build disk versions while indexing.



 Hope I am making sense.

I did not send this out to the mailing list because I wasn't  
sure if this is a dev question or an user question, feel free to  
either forward it to the right mailing list or let me know and I  
can forward it.



Thanks

-John




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: contrib build.xml and java 1.5 in tests

2007-10-15 Thread Erik Hatcher
Set the properties *before* importing contrib-build.xml.  Ant  
properties are first come first serve, so you have to be sure to set  
the value you want before to "override" them, interestingly.


Erik

On Oct 15, 2007, at 11:31 AM, Karl Wettin wrote:

I have problems getting this build.xml of mine to accept java 1.5  
in tests. I compiles the source with generics and all that just  
fine, but the tests fail with the error javac -source 1.4 is set.  
Well..




  
Extended spell checker with phrase support and adaptive user  
session analysis.

  

  

  
  




What more do I need to do? I see that gdata use 1.5, but I can't  
figure out what it does that I do not.




--
karl

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Exceptions in TestConcurrentMergeScheduler

2007-10-04 Thread Erik Hatcher

Mike,

Would it work to have a common LuceneTestCase base class that could  
do that check and fail() in tearDown?


Erik


On Oct 4, 2007, at 5:31 AM, Michael McCandless wrote:



OK, I think I found one possibility here.  With ant's junit task, you
can define a custom formatter implementing this interface:

  org.apache.tools.ant.taskdefs.optional.junit.JUnitResultFormatter

That interface has a method endTestSuite that is invoked once at the
end of all the test cases.  So I can define a customer formatter, and
in this method, I can check with ConcurrentMergeScheduler and if any
unhandled exceptions has occurred, I can throw an exception and the
suite/testcase is marked as failed.  It seems to work.

This is a nice solution in that we don't have to modify every unit
test to do its own checking.  However, it's not really a "normal" use
case because formatters are supposed to just "format" the test result
output.  It also adds a dependence from Lucene's unit test sources to
ant.  But at least it does work ("progress not perfection").

And objections to this approach?  Is there a better approach?

Mike

"Michael McCandless" <[EMAIL PROTECTED]> wrote:


"Chris Hostetter" <[EMAIL PROTECTED]> wrote:


: But it'd be nice to do this across the board, ie, for any junit  
test

: if one of CMS's threads (or, threads launched elsewhere) hits an
: unhandled exception, fail the testcase that's currently running.
: I'll dig and see if there's some central way to do this with  
junit...


FYI: i did some casual investigation of this and the only thing that
jumped out at me is the static
Thread.setDefaultUncaughtExceptionHandler(...) added in 1.5.  for  
1.4
there doesn't seem to be a generic way to notice an uncaught  
exception

from any thread.


Thanks Hoss.

Catching the exception is actually not the hard part because I "own"
all the threads spawned by ConcurrentMergeScheduler.  What's  
tricky is

finding a way to force the currently running JUnit testcase or suite
to fail.  I'm digging through JUnit and ant's JUnitTestRunner sources
to see if there's some hook somewhere where we could insert a check,
just before the suite finishes, to assert that no exceptions were  
hit.

Or, if I can somehow "look up" the current Test that's running, I
could add an error to it.

If there are any JUnit and ant experts out there (I'm not!) please
chime in!

Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: whoops ... i may have screwed up the site.

2007-10-03 Thread Erik Hatcher
Hoss - that could be my fault.  I had updated both the Lucene and  
Solr sites in svn yesterday, and while logged in to svn up the Solr  
site I also did that for Lucene (even though I knew Grant's nightly  
script was to update it also).


Looks like all is well now.

Is Hudson publishing the site?   I was under the impression (per the  
wiki documentation) that Grant's nightly script updated it.


Erik


On Oct 3, 2007, at 3:21 AM, Chris Hostetter wrote:



I forgot that hudson now generates the site as part of hte nightly  
build, i did a manual "svn update" on people.apache.org:/www/ 
lucene.apache.org/java/docs and when i got some conflicts and  
warnings i still didn't remember and did and "svn revert -R ." to  
eliminate any local changes and get a prestine checkout from  
subversion.


Grant (et al):  I'm not sure if I've screwed things up in a way  
that the Hudson script own't be able to handle.  at this point i  
think i'm just going to leave well enough alone.


one thing we may want to consider: if the hudson pushes the site  
(instead of doing an svn checkout directly on people.a.o) we may  
want to remove all the .svn directories so if other people make  
stupid mistakes like i did it won't do anything except complain  
that the working directory isn't a subversion working repository.




-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Questions Lucene

2007-09-10 Thread Erik Hatcher


On Sep 10, 2007, at 7:56 PM, [EMAIL PROTECTED] wrote:
   1) What are the various languages supported by Lucene.?  
Looks like its able to handle only English . We are trying to see  
if it works with Japanese / Chinese and other characters

Can some one answer


Lucene internally uses UTF-8 (the Java modified version) so you won't  
have any encoding issues.  And everything is just text inside the  
index, so no problem with Chinese, Japanese, or any other language  
I've encountered - but certainly there are language-specific  
considerations such as stemming, stop word removal, and whether to do  
anything special to tokenize on "words" in non-whitespace-separated  
languages such as Chinese or use n-gramming, or just simple character  
tokenization.


   2) After Lucene indexes a given data set, how does Lucene  
handle incremental / dymanic change in the data. In other words,  
our data keeps changing ; how
   does Lucene handle this changing data. Does it re-index  
every new file entering this data set ?. Or Does it do it index the  
data in increments ?


There is really no such thing as an "update" operation, so the  
application is responsible for effecting that with a delete and re- 
add on a per-document basis.


  3) How does Lucene handle deleted files from a particular  
data set ?. What we are concerned is that, does Lucene  
automatically figure out if a particular file is deleted from the  
data set ?.
 and it immediately removes the index to that particular  
file ?
   4) Please consider the following Scenario. When Lucene  
is given the following files to Index.


 a) Files under /xyz/abc ( Say x.txt, y.txt, a.txt, b.txt,  
c.txt etc.. )
 b) Files under /def/ghi ( Say none.txt, dude.txt,  
hello.txt etc.. )
   So after Lucene finished indexing these file under these  
two directories. And a subsequent search for say a "key word" in  
hello.txt is made
 What does Lucene return; does it return i.e the fully  
qualified location of this file ? /def/ghi/hello.txt


Lucene is about text, not files per se.  It is your application that  
will map that kind of logic on top of Lucene.  Lucene itself knows  
nothing of the files you want to index, delete, search - you will  
build that mapping in yourself.  Your application will be responsible  
for keeping data and the index in sync.


   5) How does Lucene index a particular set of files. I.e  
*based* on key words ?. Based on sentences ? Based on what criterion ?


Again, it doesn't deal with "files"... your application deals with  
that, Lucene is handed text.  As for how it makes words in text  
searchable - read up on Lucene Analyzers.  They break the text into  
searchable terms.


  6) is Lucene multi-threaded ?. For example if Lucene is  
indexing a set of files in a given data set, and for example if  
there is a Huge file ( 2 GB file ). Does Lucene index this file in  
parts (i.e parallelyi.e in multi-threaded fashion ? or  
does it index this file sequentially


Lucene is isn't multi-threaded, but most operations are thread-safe  
so you can parallelize your application to index multiple documents  
simultaneously, for example.  You may be able to parallelize the  
parsing of those huge files but you'd need to bring that together  
into a single Document instance to hand to Lucene's IndexWriter.


 7) Also if a data set has multiple files, does Lucene process  
each file seperately in a different thread ? or does it do it  
sequentially


Again, this is up to your application entirely.

 8) Does lucene index only text files ?. We have few data bases  
is it possible for us to Index the data in these data bases ?


See above :)   All Lucene cares about is text.  How you get text to  
it matters not to Lucene.



 9) Are there any performance Bench Marks for Lucene


There is a benchmarker framework built into the trunk codebase  
suitable for making your own.  There's some stuff here: http:// 
lucene.apache.org/java/docs/benchmarks.html  and some good stuff  
linked from http://wiki.apache.org/lucene-java/BasicsOfPerformance  
that should get you started.


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [Lucene-java Wiki] Update of "HowToContribute" by GrantIngersoll

2007-08-08 Thread Erik Hatcher

Uh, nevermind!   I see that is in the "Please do not:" section :)

Erik


On Aug 8, 2007, at 8:07 AM, Erik Hatcher wrote:

Don't we discourage @author tags? I can't recall where the  
Lucene project sits on this issue, but it certainly has been  
debated and acted upon in many other ASF projects.


Erik



On Aug 7, 2007, at 6:27 PM, Apache Wiki wrote:


Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene- 
java Wiki" for change notification.


The following page has been changed by GrantIngersoll:
http://wiki.apache.org/lucene-java/HowToContribute

- 
-

   * comment out code that is now obsolete: just remove it.
   * insert comments around each change, marking the change: folks  
can use subversion to figure out what's changed and by whom.

   * make things public which are not required by end users.
+  * Add @author tags to your code.  Give yourself credit in the  
CHANGES.txt file.


  Please do:
   * try to adhere to the coding style of files you edit;



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [Lucene-java Wiki] Update of "HowToContribute" by GrantIngersoll

2007-08-08 Thread Erik Hatcher
Don't we discourage @author tags? I can't recall where the Lucene  
project sits on this issue, but it certainly has been debated and  
acted upon in many other ASF projects.


Erik



On Aug 7, 2007, at 6:27 PM, Apache Wiki wrote:


Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java  
Wiki" for change notification.


The following page has been changed by GrantIngersoll:
http://wiki.apache.org/lucene-java/HowToContribute

-- 


   * comment out code that is now obsolete: just remove it.
   * insert comments around each change, marking the change: folks  
can use subversion to figure out what's changed and by whom.

   * make things public which are not required by end users.
+  * Add @author tags to your code.  Give yourself credit in the  
CHANGES.txt file.


  Please do:
   * try to adhere to the coding style of files you edit;



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Assigned: (LUCENE-961) RegexCapabilities is not Serializable

2007-07-18 Thread Erik Hatcher (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher reassigned LUCENE-961:
---

Assignee: Erik Hatcher

> RegexCapabilities is not Serializable
> -
>
> Key: LUCENE-961
> URL: https://issues.apache.org/jira/browse/LUCENE-961
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: QueryParser
>Affects Versions: 2.2
>Reporter: Konrad Rokicki
>    Assignee: Erik Hatcher
>Priority: Minor
>
> The class RegexQuery is marked Serializable by its super class, but it 
> contains a RegexCapabilities which is not Serializable. Thus attempting to 
> serialize the query results in an exception. 
> Making RegexCapabilities serializable should be no problem since its 
> subclasses contain only serializable classes (java.util.regex.Pattern and 
> org.apache.regexp.RE).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Fwd: Call for Papers Opens for OS Summit Asia 2007

2007-06-15 Thread Erik Hatcher



Begin forwarded message:

From: J Aaron Farr <[EMAIL PROTECTED]>


Call for Papers Opens for OS Summit Asia 2007

The call for papers is now open for OS Summit Asia, to be held
November 26-30 at the Cyberport in Hong Kong.  This joint conference
between the Apache Software Foundation and the Eclipse Foundation will
be consist of two days of tutorials (Nov 26-27) and three days of
regular conference sessions (Nov 28-30).

The paper submission deadline is Friday, 13 July, 2007, Midnight PDT.

You may log in to the ApacheCon submission site to submit your
proposals.  Further details about the conference, submissions, and
fees can be found at:

  http://www.ossummit.com/cfp.html

Topics appropriate for submission include, but are not restricted to,
the following:

 * ASF-wide projects such as Apache HTTP server, Tomcat, Struts,
   Geronimo, mod_perl and XML Web Services

 * Eclipse-wide projects such as BI and Reporting Tools (BIRT), Web
   Tools Platform (WTP), Eclipse Modeling Framework (EMF), Data Tools
   Platform (DTP), Equinox and the Rich Client Platform (RCP)

 * Programming languages such as Java, Perl, Python, Ruby and PHP

 * Web development technologies and techniques including security,
   performance tuning, e-commerce and J2EE

 * New technologies and trends such as Web Services and Web 2.0

 * Open source community and business models, legal and marketing
   issues

 * Open source projects and activities in Asia, local efforts and case
   studies

Thanks and we hope to hear from you, and see you in Hong Kong!

--
The OSSummit Planners
[EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-898) contrib/javascript is not packaged into releases

2007-06-01 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500715
 ] 

Erik Hatcher commented on LUCENE-898:
-

It may still work ok, but my hunch is that changes to the QueryParser have made 
this javascript code more deprecated than anything.  

Even if we removed it from svn, it historically would still be there in case 
anyone really needed it.   

Again, I am +1 for removing it entirely after running it by the java-user list 
to see if anyone desires it.

> contrib/javascript is not packaged into releases
> 
>
> Key: LUCENE-898
> URL: https://issues.apache.org/jira/browse/LUCENE-898
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Build
>Reporter: Hoss Man
>Priority: Trivial
>
> the contrib/javascript directory is (apparently) a collection of javascript 
> utilities for lucene .. but it has not build files or any mechanism to 
> package it, so it is excluded form releases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-898) contrib/javascript is not packaged into releases

2007-05-31 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500453
 ] 

Erik Hatcher commented on LUCENE-898:
-

My vote is to remove the javascript contrib area entirely.  It doesn't really 
do all that much useful.  I'd be surprised if anyone really uses it.

> contrib/javascript is not packaged into releases
> 
>
> Key: LUCENE-898
> URL: https://issues.apache.org/jira/browse/LUCENE-898
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Build
>Reporter: Hoss Man
>Priority: Trivial
>
> the contrib/javascript directory is (apparently) a collection of javascript 
> utilities for lucene .. but it has not build files or any mechanism to 
> package it, so it is excluded form releases.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: svn commit: r543076 - in /lucene/java/trunk: ./ src/java/org/apache/lucene/index/ src/site/src/documentation/content/xdocs/ src/test/org/apache/lucene/index/

2007-05-31 Thread Erik Hatcher


On May 31, 2007, at 3:48 AM, [EMAIL PROTECTED] wrote:
+ 7. LUCENE-866: Adds multi-level skip lists to the posting lists.  
This speeds
+up most queries that use skipTo(), especially on big indexes  
with large posting
+lists. For average AND queries the speedup is about 20%, for  
queries that
+contain very frequence and very unique terms the speedup can  
be over 80%.

+(Michael Busch)


Minor typo frequence => frequent.

Erik



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Documentation Brainstorming

2007-05-30 Thread Erik Hatcher


On May 30, 2007, at 9:33 PM, Grant Ingersoll wrote:

I'd rather see each jar get its own javadoc,
or at the very least, indicate which jar each
class is defined in for the ones that aren't
part of the core.



Yeah, I don't like that all the contribs are built in together.   
What do others think?  I would vote for separating them out.


I concur with having the contrib docs separated.  I may have been the  
one (or at least assisted with it) who got the documentation build to  
fold it altogether as that was the goal at the time.  It'd be much  
easier, build-wise, if all artifacts were kept entirely separate for  
all the various contrib libraries and the core, as well as the demo.


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-885) clean up build files so contrib tests are run more easily

2007-05-29 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499740
 ] 

Erik Hatcher commented on LUCENE-885:
-

PQP was a hack I made long ago to mainly show how QP could be possibly 
improved. I'm fine with that class being removed altogether, or the failing 
tests commented out.  I don't use that class personally.

> clean up build files so contrib tests are run more easily
> -
>
> Key: LUCENE-885
> URL: https://issues.apache.org/jira/browse/LUCENE-885
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Build
>Reporter: Hoss Man
>Assignee: Hoss Man
> Attachments: LUCENE-885.patch, LUCENE-885.patch
>
>
> Per mailing list discussion...
> http://www.nabble.com/Tests%2C-Contribs%2C-and-Releases-tf3768924.html#a10655448
> Tests for contribs should be run when "ant test" is used,  existing "test" 
> target renamed to "test-core"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: DO NOT REPLY [Bug 4568] - new IndexReader.terms(myterm) skips over first term

2007-05-25 Thread Erik Hatcher

Please use JIRA, not Bugzilla.

I thought our Bugzilla was disabled?  If not, shouldn't it be?

Erik


On May 25, 2007, at 7:28 AM, [EMAIL PROTECTED] wrote:


DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=4568


[EMAIL PROTECTED] changed:

   What|Removed |Added
-- 
--

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |




--- Additional Comments From [EMAIL PROTECTED]   
2007-05-25 04:28 ---
Please see my code below: I create a index with two documents. When  
using term()
I get all terms, when using term(new Term(...)) I only get one. I'm  
using

lucene-2.1.0. The code produces the following output on my machine:

INFO: term(): contents:London
INFO: term(): unused:foobar
INFO: term(new Term()): unused:foobar

Code:

Directory store = new RAMDirectory();
IndexWriter writer = new IndexWriter(store, new  
WhitespaceAnalyzer(), true);


Document doc1 = new Document();
doc1.add(new Field(FIELD_NAME, "London", Field.Store.YES,
Field.Index.TOKENIZED));
writer.addDocument(doc1);

Document doc2 = new Document();
doc2.add(new Field("unused", "foobar", Field.Store.YES,
Field.Index.TOKENIZED));
writer.addDocument(doc2);

writer.optimize();
writer.close();

IndexReader indexReader = null;
TermEnum termEnum = null;

try
{
indexReader = IndexReader.open(store);

termEnum = indexReader.terms();

while (termEnum.next()) { LOGGER.log(Level.INFO, "term 
(): " +

termEnum.term()); }

termEnum = indexReader.terms(new Term(FIELD_NAME, ""));

while (termEnum.next()) { LOGGER.log(Level.INFO, "term 
(new Term()):

" + termEnum.term()); }
}
finally
{
if (indexReader != null) { indexReader.close(); }
}


--
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi? 
tab=email

--- You are receiving this mail because: ---
You are the assignee for the bug, or are watching the assignee.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-889) Standard tokenizer with punctuation output

2007-05-25 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499085
 ] 

Erik Hatcher commented on LUCENE-889:
-

This patch concerns me.  This changes default behavior in a very basic and 
commonly used piece of Lucene.  At the very least this should be made entirely 
optional and off by default.  

Thoughts?

> Standard tokenizer with punctuation output
> --
>
> Key: LUCENE-889
> URL: https://issues.apache.org/jira/browse/LUCENE-889
> Project: Lucene - Java
>  Issue Type: Improvement
>Affects Versions: 2.1
>Reporter: Karl Wettin
>Priority: Trivial
> Attachments: standard.patch, test.patch
>
>
> This patch adds punctuation (comma, period, question mark and exclamation 
> point)  tokens as output from the StandardTokenizer, and filters them out in 
> the StandardFilter.
> (I needed them for text classification reasons.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-874) Automatic reopen of IndexSearcher/IndexReader

2007-05-03 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493570
 ] 

Erik Hatcher commented on LUCENE-874:
-

Do note that Solr can be embedded: http://wiki.apache.org/solr/EmbeddedSolr
And there are improvements to this in the works too.

> Automatic reopen of IndexSearcher/IndexReader
> -
>
> Key: LUCENE-874
> URL: https://issues.apache.org/jira/browse/LUCENE-874
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: João Fonseca
>Priority: Minor
>
> To improve performance, a single instance of IndexSearcher should be used. 
> However, if the index is updated, it's hard to close/reopen it, because 
> multiple threads may be accessing it at the same time.
> Lucene should include an out-of-the-box solution to this problem. Either a 
> new class should be implemented to manage this behaviour (singleton 
> IndexSearcher, plus detection of a modified index, plus safely closing and 
> reopening the IndexSearcher) or this could be behind the scenes by the 
> IndexSearcher class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Fwd: Call for Papers Opens for ApacheCon US 2007

2007-04-16 Thread Erik Hatcher

The one valid use of cross-posting...

Begin forwarded message:


From: Rich Bowen <[EMAIL PROTECTED]>
Date: April 16, 2007 10:50:54 AM EDT
To: [EMAIL PROTECTED]
Subject: Call for Papers Opens for ApacheCon US 2007
Reply-To: [EMAIL PROTECTED]
Reply-To: [EMAIL PROTECTED]

PMCs, please send this announcement to your various users@ and  
devs@ mailing lists, as appropriate for your particular community.  
Remember, your project can only be represented at ApacheCon if your  
community submits talks proposals:






Call for Papers Opens for ApacheCon US 2007

The Call for Papers is now open for ApacheCon US, to be held  
November 12-16 at the Peachtree Westin, Atlanta. The conference  
will consist of two day of tutorials (November 12-13) and three  
days of regular conference sessions (November 14-16).


Please log in to the website at http://apachecon.com/html/ 
login.html to submit your proposal. Further details about fees and  
are avaialable on the CFP form.


Topics appropriate for submission to this conference are manifold,  
and may include but are not restricted to:


* ASF projects
* ASF-Incubated projects
* Scripting languages and dynamic content such as Java, Perl,  
Python, Ruby, XSL, and PHP
* New technologies and broader initiatives such as Web Services and  
Web 2.0
* Security and e-commerce, performance tuning, load balancing, and  
high availability

* Business and community issues surrounding the ASF and Open Source

The paper submission deadline is Monday, 28 April 2007, Midnight GMT.

Thanks, and we hope to hear from you, and to see you in Atlanta.
--
The ApacheCon Planners
[EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Caching in QueryFilter - why?

2007-04-04 Thread Erik Hatcher
CachingWrapperFilter came along after QueryFilter.  I think I added  
CachingWrapperFilter when I realized that every Filter should have  
the capability to be cached without having to implement it.  So, the  
only reason is "legacy".  I'm perfectly fine with removing the  
caching from QueryFilter in a future major release.


Erik

On Apr 4, 2007, at 5:57 PM, Otis Gospodnetic wrote:


Hi,

I'm looking at LUCENE-853, so I also looked at CachingWrapperFilter  
and then at QueryFilter.  I noticed QueryFilter does its own BitSet  
caching, and the caching part of its code is nearly identical to  
the code in CachingWrapperFilter.


Why is that?  Is there a good reason for that?

Thanks,
Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: publish to maven-repository

2007-04-04 Thread Erik Hatcher


On Apr 4, 2007, at 4:33 PM, Otis Gospodnetic wrote:
Eh, missing Jars in the Maven repo again.  Why does this always get  
dropped?


Because none of us Lucene committers care much about Maven?  :)

Perhaps it's time to keep a lucene-core.pom in our repo, rename it  
at release time (e.g. cp lucene-core.pom lucene-core-2.1.0.pom) and  
push the core jar + core POM out?


I don't know the Maven specifics, but I'm all for us maintaining the  
Maven POM file and bundling it with releases that get pushed to the  
repos.


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Updated: (LUCENE-843) improve how IndexWriter uses RAM to buffer added documents

2007-03-23 Thread Erik Hatcher


On Mar 22, 2007, at 8:13 PM, Marvin Humphrey wrote:

On Mar 22, 2007, at 3:18 PM, Michael McCandless wrote:


Actually is #2 a hard requirement?


A lot of Lucene users depend on having document number correspond  
to age, I think.  ISTR Hatcher at least recommending techniques  
that require it.


I may have recommended it only as "if you can guarantee you index in  
age order then you can ... ", but given FunctionQuery's ability to  
rank based on age given a date field per document its not needed.   
Guaranteeing any kind of order of document insertion (given updates  
that delete and re-add) is not really possible in most cases anyway.


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Assigned: (LUCENE-707) Lucene Java Site docs

2007-03-20 Thread Erik Hatcher (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher reassigned LUCENE-707:
---

Assignee: Erik Hatcher  (was: Grant Ingersoll)

> Lucene Java Site docs
> -
>
> Key: LUCENE-707
> URL: https://issues.apache.org/jira/browse/LUCENE-707
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Website
> Environment: N/A
>Reporter: Grant Ingersoll
> Assigned To: Erik Hatcher
>Priority: Minor
> Attachments: lucene.apache.org.patch
>
>
> It would be really nice if the Java site docs where consistent with the rest 
> of the Lucene family (namely, with navigation tabs, etc.) so that one can 
> easily go between Nutch, Hadoop, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Closed: (LUCENE-707) Lucene Java Site docs

2007-03-20 Thread Erik Hatcher (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher closed LUCENE-707.
---


Applied, thanks George!

> Lucene Java Site docs
> -
>
> Key: LUCENE-707
> URL: https://issues.apache.org/jira/browse/LUCENE-707
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Website
> Environment: N/A
>Reporter: Grant Ingersoll
> Assigned To: Grant Ingersoll
>Priority: Minor
> Attachments: lucene.apache.org.patch
>
>
> It would be really nice if the Java site docs where consistent with the rest 
> of the Lucene family (namely, with navigation tabs, etc.) so that one can 
> easily go between Nutch, Hadoop, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-446) FunctionQuery - score based on field value

2007-03-18 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12481944
 ] 

Erik Hatcher commented on LUCENE-446:
-

+1 to FunctionQuery being brought into Lucene proper.

> FunctionQuery - score based on field value
> --
>
> Key: LUCENE-446
> URL: https://issues.apache.org/jira/browse/LUCENE-446
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Affects Versions: 1.9
>Reporter: Yonik Seeley
> Attachments: function.zip, function.zip
>
>
> FunctionQuery can return a score based on a field's value or on it's ordinal 
> value.
> FunctionFactory subclasses define the details of the function.  There is 
> currently a LinearFloatFunction (a line specified by slope and intercept).
> Field values are typically obtained from FieldValueSourceFactory.  
> Implementations include FloatFieldSource, IntFieldSource, and OrdFieldSource.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Using FindBugs, JLint, or PMD?

2007-03-16 Thread Erik Hatcher
However, Fortify runs automated analysis of Lucene and many other  
codebases:




nabble/google up more details from Brian Chess on this forum  
regarding the details if you're curious.


Erik


On Mar 16, 2007, at 11:09 AM, Otis Gospodnetic wrote:


I don't think we use any of those tools.

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message 
From: Sung Kim <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Thursday, March 15, 2007 10:54:52 PM
Subject: Using FindBugs, JLint, or PMD?

Dear developers,

I'm a software researcher at MIT. We are developing an algorithm to
reprioritize warnings from FindBugs, JLint, and PMD using the software
change history. I was wondering if you (or your project) use any of
bug finding tools including FindBugs, JLint, and PMD in the Lucene
development cycle.

Thanks in advance.
Sung Kim <[EMAIL PROTECTED]>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-794) SpanScorer and SimpleSpanFragmenter for Contrib Highlighter

2007-03-14 Thread Erik Hatcher


On Mar 14, 2007, at 2:17 PM, Mark Miller (JIRA) wrote:
Just for thought, what about a SpanOr query with two sub Span  
queries that target different fields? Too obscure to care about?


You later mentioned that this works.  Really?!  Are you sure?

  public SpanOrQuery(SpanQuery[] clauses) {

// copy clauses array into an ArrayList
this.clauses = new ArrayList(clauses.length);
for (int i = 0; i < clauses.length; i++) {
  SpanQuery clause = clauses[i];
  if (i == 0) {   // check field
field = clause.getField();
  } else if (!clause.getField().equals(field)) {
throw new IllegalArgumentException("Clauses must have same  
field.");

  }
  this.clauses.add(clause);
}
  }





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Visibility of RegexCapabilities interface?

2007-03-06 Thread Erik Hatcher


On Mar 6, 2007, at 2:27 PM, Chris Hostetter wrote:

:PFRegexCapabilities.java:3:
: org.apache.lucene.search.regex.RegexCapabilities is not public in
: org.apache.lucene.search.regex; cannot be accessed from outside  
package


It looks like this *may* have been a mistake, the commit message says
"Many javadoc additions, and adding ASL to each file" ...

http://svn.apache.org/viewvc/lucene/java/trunk/contrib/regex/src/ 
java/org/apache/lucene/search/regex/RegexCapabilities.java? 
r1=359526&r2=381108


...but one other interfaces changed from being public to package
protecte at the same time: RegexQueryCapable ... so it's not clear  
to me

wether this was a concious choice to "protect" the API or not.

Erik?


For some reason I had been thinking that interfaces were always  
public, but I removed my ignorance (all methods are implicitly  
public, but the interface itself is not).  I've just committed a  
change to make these two interfaces public - no reason not to.   
Interestingly, one of them was already public locally which confused  
me a bit as well.


Anyway, all public now!

Sorry for the confusion, Mike.

Erik




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: simple query

2007-03-02 Thread Erik Hatcher


On Mar 1, 2007, at 11:58 PM, Gaurav Srivastava wrote:

When i  parse a string containing numbers in lucene it removs the  
number while parsing in query.toString() method..


please suggest some hint to problem


analyze your analyzer:  




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: commited docs vs wiki -- was: Re: [jira] Commented: (LUCENE-805) New Lucene Demo

2007-02-24 Thread Erik Hatcher


On Feb 24, 2007, at 1:24 AM, Chris Hostetter wrote:



: think we could move more to the Wiki.  One solution, would be to  
have

: a simple script that calls wget (or some crawler) and downloads all
: of the wiki.  It would, however, be better if the wiki supported

yeah .. that's a fairly crude approach that would result in a lot  
of the
useless navigation links being left in place -- and we'd need  
soemthing
somewaht custom to deal with changing the site links t opoint to  
the local

files


wget can change links to point locally.  For my upcoming Lucene/Solr  
workshop next week, I include the entire Solr wiki in the .zip I make  
available to the attendees.  From my Rakefile:


task :fetch_wiki do
  system("wget -P#{STAGE_DIR}/preconf -p --convert-links -r http:// 
wiki.apache.org/solr")

end

Erik




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-805) New Lucene Demo

2007-02-15 Thread Erik Hatcher


On Feb 15, 2007, at 2:33 PM, Grant Ingersoll wrote:
I think that code is great and is often self-documenting, but it  
only represents the result of someone writing the code, it doesn't  
explain the why part of it.  So, it would need English (and  
translations???) to explain why a particular approach was taken,  
along with possible alternatives.


So how about following the Solr lead with incredible wiki  
documentation and tutorial stuff.  The code could be contributed and  
then documented by the community.


I'd have no problem copy/pasting sections of LIA that were relevant  
and useful.  Or writing some stuff from scratch on the examples.


Erik





On Feb 15, 2007, at 2:25 PM, Erik Hatcher (JIRA) wrote:



[ https://issues.apache.org/jira/browse/LUCENE-805? 
page=com.atlassian.jira.plugin.system.issuetabpanels:comment- 
tabpanel#action_12473480 ]


Erik Hatcher commented on LUCENE-805:
-

That was my concern as well, Grant.  At least the LIA code is  
fairly well self documenting (we used JUnit for a reason :) and  
the build file itself is a nice example of how to launch  
applications and examples from a common starting point.


What other documentation would be needed to make this a palatable?


New Lucene Demo
---

Key: LUCENE-805
URL: https://issues.apache.org/jira/browse/ 
LUCENE-805

Project: Lucene - Java
 Issue Type: Improvement
 Components: Examples
   Reporter: Grant Ingersoll
Assigned To: Grant Ingersoll
   Priority: Minor

The much maligned demo, while useful, could use a breath of fresh  
air.  This issue is to start collecting requirements about what  
people would like to see in a demo and what they don't like in  
the current one.

Ideas (not necessarily in order of importance):
1. More in-depth tutorial explaining indexing/searching
2. Multilingual support/demonstration
3. Better demonstration of querying capabilities: Spans, Phrases,  
Wildcards, Filters, sorting, etc.

4. Dealing with different content types and pointers to resources
5. Wiki use cases links -- I think it would be cool to solicit  
people to contribute use cases to the docs.

6. Demonstration of contrib packages, esp. Highlighter
7. Performance issues/factors/tradeoffs.  Lucene lessons learned  
and best practices

Advanced tutorials:
1. Hadoop + Lucene
2. Writing custom analyzers/filters/tokenizers
3. Changing Scoring
4. Payloads (when they are committed)
Please contribute what else you would like to see.  I may be able  
to address some of these issues for my ApacheCon talk, but not  
all of them.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



--
Grant Ingersoll
http://www.grantingersoll.com/
http://www.paperoftheweek.com/



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] release Lucene 2.1

2007-02-15 Thread Erik Hatcher


On Feb 15, 2007, at 12:10 PM, karl wettin wrote:
I would not mind introducing Maven builds in Lucene. It would  
solve /at least/ this problem. And it would merge so great with my  
other projects. :) I'd be happy to help out , but there are some  
wicked anting going on in a lot of build.xml:s so I would probably  
need a lot of help from the contributors understanding whats going on.


Most of the build scripts could be halfway housed using maven- 
antrun-plugin.


I'm open to Maven builds, for the record.  I'll do what I can to help  
with understanding any of the wicked anting in there, but I don't  
know Maven so the best I'll be able to do is explain what is going  
on.   The main complexity we have is the contrib area, oh and  
JavaCC... the rest is straightforward stuff.


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-805) New Lucene Demo

2007-02-15 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473480
 ] 

Erik Hatcher commented on LUCENE-805:
-

That was my concern as well, Grant.  At least the LIA code is fairly well self 
documenting (we used JUnit for a reason :) and the build file itself is a nice 
example of how to launch applications and examples from a common starting 
point.  

What other documentation would be needed to make this a palatable?

> New Lucene Demo
> ---
>
> Key: LUCENE-805
> URL: https://issues.apache.org/jira/browse/LUCENE-805
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Examples
>Reporter: Grant Ingersoll
> Assigned To: Grant Ingersoll
>Priority: Minor
>
> The much maligned demo, while useful, could use a breath of fresh air.  This 
> issue is to start collecting requirements about what people would like to see 
> in a demo and what they don't like in the current one.
> Ideas (not necessarily in order of importance):
> 1. More in-depth tutorial explaining indexing/searching
> 2. Multilingual support/demonstration
> 3. Better demonstration of querying capabilities: Spans, Phrases, Wildcards, 
> Filters, sorting, etc.
> 4. Dealing with different content types and pointers to resources
> 5. Wiki use cases links -- I think it would be cool to solicit people to 
> contribute use cases to the docs. 
> 6. Demonstration of contrib packages, esp. Highlighter
> 7. Performance issues/factors/tradeoffs.  Lucene lessons learned and best 
> practices
> Advanced tutorials:
> 1. Hadoop + Lucene
> 2. Writing custom analyzers/filters/tokenizers
> 3. Changing Scoring
> 4. Payloads (when they are committed)
> Please contribute what else you would like to see.  I may be able to address 
> some of these issues for my ApacheCon talk, but not all of them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-805) New Lucene Demo

2007-02-15 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473419
 ] 

Erik Hatcher commented on LUCENE-805:
-

The examples from Lucene in Action are freely available and Otis and I are fine 
with assigning the ASL to them (its currently unspecified but implicitly ASLd). 
 If these would be useful, at least the  Indexer.java and Searcher.java which 
are better demos than current demo application, we're free to use that as a 
starter.  All the code could be contributed if folks are ok with that. 

In fact, maybe Otis and I should do the 2nd edition codebase within the Lucene 
svn somewhere so that it serves as a built-in example.

> New Lucene Demo
> ---
>
> Key: LUCENE-805
> URL: https://issues.apache.org/jira/browse/LUCENE-805
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Examples
>Reporter: Grant Ingersoll
> Assigned To: Grant Ingersoll
>Priority: Minor
>
> The much maligned demo, while useful, could use a breath of fresh air.  This 
> issue is to start collecting requirements about what people would like to see 
> in a demo and what they don't like in the current one.
> Ideas (not necessarily in order of importance):
> 1. More in-depth tutorial explaining indexing/searching
> 2. Multilingual support/demonstration
> 3. Better demonstration of querying capabilities: Spans, Phrases, Wildcards, 
> Filters, sorting, etc.
> 4. Dealing with different content types and pointers to resources
> 5. Wiki use cases links -- I think it would be cool to solicit people to 
> contribute use cases to the docs. 
> 6. Demonstration of contrib packages, esp. Highlighter
> 7. Performance issues/factors/tradeoffs.  Lucene lessons learned and best 
> practices
> Advanced tutorials:
> 1. Hadoop + Lucene
> 2. Writing custom analyzers/filters/tokenizers
> 3. Changing Scoring
> 4. Payloads (when they are committed)
> Please contribute what else you would like to see.  I may be able to address 
> some of these issues for my ApacheCon talk, but not all of them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] release Lucene 2.1

2007-02-15 Thread Erik Hatcher
I vote for release of 2.1 as-is... no one really uses that demo stuff  
anyway.  I'll tackle the binary custom demo build.xml file as soon as  
I can and commit that.  When folks complain, we can point them to the  
new build.xml file and they'll just plop that into a 2.1 binary  
release and it'll work.


Erik



On Feb 15, 2007, at 11:21 AM, Yonik Seeley wrote:


On 2/15/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote:

What's weird, is I don't think much has changed build wise from the
last release, yet we all of a sudden noticed all these things.


Yes, I just verified that things are pretty much in the same shape in
the 2.0.0 release.
contrib/(ant, lucli, regex) fail to build from the src dist.

-Yonik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] release Lucene 2.1

2007-02-15 Thread Erik Hatcher


On Feb 15, 2007, at 8:10 AM, DM Smith wrote:
I don't think that one should ever expect to build from a binary  
package. Even if one could.


Here's the hitch... the demo code that comes with Lucene, as sad as  
it is :(, gets shipped as source code in the binary distribution.   
This makes sense.   What I've just seen is that we distribute  
build.xml but not common-build.xml and thus the build doesn't work,  
but even copying common-build.xml to that directory the build still  
fails as its trying to build Lucene itself without source code.  We  
need a custom build.xml for the demo code, I think.  I can whip  
something like that up, but it'll take me a week or so to squeeze it  
in.  I don't think we should hold up a release for this issue - I  
suspect we shipped Lucene 2.0 like this as well.


On the positive side, the source distribution works fine (from trunk):

~/dev/lucene/dist/src/lucene-2.2-dev erik$ ant
Buildfile: build.xml

javacc-uptodate-check:

javacc-notice:

init:

clover.setup:

clover.info:
 [echo]
 [echo]   Clover not found. Code coverage reports disabled.
 [echo]

clover:

common.compile-core:
[mkdir] Created dir: /Users/erik/dev/lucene/dist/src/lucene-2.2- 
dev/build/classes/java
[javac] Compiling 204 source files to /Users/erik/dev/lucene/ 
dist/src/lucene-2.2-dev/build/classes/java

[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.

compile-core:
 [rmic] RMI Compiling 1 class to /Users/erik/dev/lucene/dist/src/ 
lucene-2.2-dev/build/classes/java


jar-core:
  [jar] Building jar: /Users/erik/dev/lucene/dist/src/lucene-2.2- 
dev/build/lucene-core-2.2-dev.jar


default:

BUILD SUCCESSFUL
Total time: 5 seconds



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] release Lucene 2.1

2007-02-15 Thread Erik Hatcher


On Feb 15, 2007, at 6:50 AM, Grant Ingersoll wrote:



On Feb 15, 2007, at 2:55 AM, Chris Hostetter wrote:



: > I'm not exactly sure if this is show stopper, but when I get the
: > binary, the build.xml that is included is not usable b/c it is
: > missing common-build.xml.

: Oops... I think we should fix this for the release if at all
: possible.  It is handy for folks to be able to pull down a  
buildable
: archive and rest assured that they are getting something built  
at the

: same time the binary was made.

I'm confused ... the binary builds don't even include src/java/ so  
it's

not a buildable archive by any strech of hte imagination -- how would
having the common-build.xml help assure people of anything?

i'm not even sure why we inlcude the build.xml in the binary  
releases.




Yeah, I'm not sure we need the build included for binary releases  
either, I just think it should work if it is included.


What's weird, is I don't think much has changed build wise from the  
last release, yet we all of a sudden noticed all these things.


Sorry, I was thinking by "binary" you meant the -src.(zip|tar.gz)  
"binary" and that is where the build was failing.  The -src  
distributions should be buildable, the purely binary releases should,  
of course, be only the .jar files and LICENSE files and such, but no  
source (except perhaps the demo code?).


I should shut up and go try the darn build and see what happens since  
I had my hands in there once up on a time.  Ok off to see what's  
up first hand


Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] release Lucene 2.1

2007-02-14 Thread Erik Hatcher


On Feb 14, 2007, at 2:18 PM, Grant Ingersoll wrote:
I'm not exactly sure if this is show stopper, but when I get the  
binary, the build.xml that is included is not usable b/c it is  
missing common-build.xml.


It may not be a big deal, b/c you don't necessarily need to build  
anything since it is a binary release, yet, the first thing I tried  
was ant -projecthelp


So, +1, but we may want to do something about it for the next release.


Oops... I think we should fix this for the release if at all  
possible.  It is handy for folks to be able to pull down a buildable  
archive and rest assured that they are getting something built at the  
same time the binary was made.


Otherwise, I really like how Yonik approached this release by  
putting it up in a staging area for us to try out before making it  
official.


yonik++

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [VOTE] release Lucene 2.1

2007-02-14 Thread Erik Hatcher

+1


On Feb 14, 2007, at 12:20 PM, Yonik Seeley wrote:


Release artifacts for review are at
http://people.apache.org/~yonik/staging_area/lucene/
Please vote to officially release these packages as Lucene 2.1.

-Yonik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Resolved: (LUCENE-797) Query for searching document whose title starts with ...

2007-02-07 Thread Erik Hatcher (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher resolved LUCENE-797.
-

Resolution: Invalid

The java-user e-mail list is the appropriate forum to ask questions.  The issue 
tracker is used for tracking bugs and feature enhancements.

If you did not tokenize the title, you could use a prefix query (title*) with 
QueryParser (though you will likely want to lowercase, and index a tokenized 
title into another field for full-text search capability).  

QueryParser does not currently support the SpanQuery's, but with a SpanQuery 
you could find terms at the beginning of a field.

> Query for searching document whose title starts with ...
> 
>
> Key: LUCENE-797
> URL: https://issues.apache.org/jira/browse/LUCENE-797
> Project: Lucene - Java
>  Issue Type: Task
>  Components: QueryParser
>Reporter: diasp
>
> Do you know the correct syntax for QueryParser to search all documents whose 
> field 'title' starts with a selected text?
> Thank you for your help.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Welcome Michael Busch

2007-02-01 Thread Erik Hatcher


On Feb 1, 2007, at 12:24 PM, Yonik Seeley wrote:

Welcome aboard, Michael!
So how about keeping the new-committer-introduction tradition  
alive :-)


Here's a new tradition we can start...

"Hi, my name is Erik, I write shitty code."


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Welcome Michael Busch

2007-02-01 Thread Erik Hatcher
Welcome Michael!   You're a tremendous asset to the Lucene community,  
and you've just made lives of the other hardcore committers (myself  
most definitely excluded) a lot easier.  You knocked my socks off in  
person.  You and your team are amazing.  I'm so thankful of the  
amazing community we have here.


Erik




On Feb 1, 2007, at 12:17 PM, Doug Cutting wrote:


The Lucene PMC has voted to add Michael Busch as a Lucene committer.

Welcome, Michael!

Doug

P.S. The traditional initiation ritual is to add yourself to the  
"Who We Are" page's source, then re-generate and re-publish the site.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lius into apache incubator

2007-01-31 Thread Erik Hatcher

I'll echo what both Otis and Mark have said.

Lius does look useful, but there are many non-ASL'd dependencies (on  
a quick glance in your lib directory) that would be very difficult to  
resolve with the codebase here at the ASF.


Erik


On Jan 31, 2007, at 5:19 AM, markharw00d wrote:

I would prefer to see a good open-source framework pulling together  
a collection of document parsers but which isn't tied directly to  
Lucene (that binding would be via *another* project).
If the parser framework extracted document text in a standard  
document-and-application-neutral form (XML/Java object?) this could  
underpin *any* IR/IE project wanting to make use of the parser  
functionality e.g. the GATE framework for example. That would  
ultimately make a much more valuable piece of functionality and is  
the approach taken by Stellent (used by many search engines,  
recently purchased by Oracle).



Cheers
Mark







___ All new  
Yahoo! Mail "The new Interface is stunning in its simplicity and  
ease of use." - PC Magazine http://uk.docs.yahoo.com/nowyoucan.html


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



"xml" query parser, except with JSON

2007-01-29 Thread Erik Hatcher
I'm curious, Mark, what you think about the XMLQueryParser being  
morphed into a JSON query parser.  Is it possible to introduce a new  
serialization format without rewriting the whole parser?  Or is it  
intimately tied to XML throughout?


Solr could really benefit from allowing queries to be pushed down to  
the client.  XML works, for sure, but I have a feeling Yonik's JSON  
parser would be faster than an XML parser.  Thoughts?


Thanks,
Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: jruby anyone?

2007-01-21 Thread Erik Hatcher


On Jan 21, 2007, at 9:08 PM, Steven Parkes wrote:

You know, I hate naming things. Anybody have any violent problems with
rubric?


already taken: http://rubric.rubyforge.org/

sorry, i don't have any suggestions at the moment.  dreaming up solr  
"flare" tapped me out :)


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



  1   2   3   4   5   >