[jira] Commented: (LUCENE-530) Extend NumberTools to support int/long/float/double to string

2007-07-15 Thread Mohammad Norouzi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512866
 ] 

Mohammad Norouzi commented on LUCENE-530:
-

Hi
I am using this nice class but because of my requirements I had to add 
following method, this will ease using this class

public static String encode(String stringToEncode,Class type) {
try {
Method valueOf = type.getMethod("valueOf",new Class[] 
{String.class});
Object value = valueOf.invoke(null,new Object[] 
{stringToEncode}); 
Method encode = 
NumericEncoder.class.getMethod("encode",new Class[] {type});
String result = (String)encode.invoke(null,new Object[] 
{value});
return result;
} catch (SecurityException e) {
e.printStackTrace();
} catch (NoSuchMethodException e) {
e.printStackTrace();
} catch (IllegalArgumentException e) {
e.printStackTrace();
} catch (IllegalAccessException e) {
e.printStackTrace();
} catch (InvocationTargetException e) {
logger.error("Exception in target method.");
e.printStackTrace();
}
return null;
}


by this method you no longer need to use if else statements, 
also this class needs a decode() method

> Extend NumberTools to support int/long/float/double to string
> -
>
> Key: LUCENE-530
> URL: https://issues.apache.org/jira/browse/LUCENE-530
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 1.9
>Reporter: Andy Hind
>Priority: Minor
>
> Extend Number tools to support int/long/float/double to string 
> So you can search using range queries on int/long/float/double, if you want.
> Here is the basis for how NumberTools cold be extended to support 
> int/long/double/float.
> As I only write these values to the index and fix tokenisation in searchesI 
> was not so fussed about the reverse transformations back to Strings.
> public class NumericEncoder
> {
> /*
>  * Constants for integer encoding
>  */
> static int INTEGER_SIGN_MASK = 0x8000;
> /*
>  * Constants for long encoding
>  */
> static long LONG_SIGN_MASK = 0x8000L;
> /*
>  * Constants for float encoding
>  */
> static int FLOAT_SIGN_MASK = 0x8000;
> static int FLOAT_EXPONENT_MASK = 0x7F80;
> static int FLOAT_MANTISSA_MASK = 0x007F;
> /*
>  * Constants for double encoding
>  */
> static long DOUBLE_SIGN_MASK = 0x8000L;
> static long DOUBLE_EXPONENT_MASK = 0x7FF0L;
> static long DOUBLE_MANTISSA_MASK = 0x000FL;
> private NumericEncoder()
> {
> super();
> }
> /**
>  * Encode an integer into a string that orders correctly using string
>  * comparison Integer.MIN_VALUE encodes as  and MAX_VALUE as
>  * .
>  * 
>  * @param intToEncode
>  * @return
>  */
> public static String encode(int intToEncode)
> {
> int replacement = intToEncode ^ INTEGER_SIGN_MASK;
> return encodeToHex(replacement);
> }
> /**
>  * Encode a long into a string that orders correctly using string 
> comparison
>  * Long.MIN_VALUE encodes as  and MAX_VALUE as
>  * .
>  * 
>  * @param longToEncode
>  * @return
>  */
> public static String encode(long longToEncode)
> {
> long replacement = longToEncode ^ LONG_SIGN_MASK;
> return encodeToHex(replacement);
> }
> /**
>  * Encode a float into a string that orders correctly according to string
>  * comparison. Note that there is no negative NaN but there are codings 
> that
>  * imply this. So NaN and -Infinity may not compare as expected.
>  * 
>  * @param floatToEncode
>  * @return
>  */
> public static String encode(float floatToEncode)
> {
> int bits = Float.floatToIntBits(floatToEncode);
> int sign = bits & FLOAT_SIGN_MASK;
> int exponent = bits & FLOAT_EXPONENT_MASK;
> int mantissa = bits & FLOAT_MANTISSA_MASK;
> if (sign != 0)
> {
> exponent ^= FLOAT_EXPONENT_MASK;
> mantissa ^= FLOAT_MANTISSA_MASK;
> }
> sign ^= FLOAT_SIGN_MASK;
> int replacement = sign | exponent | mantissa;
> return encodeToHex(replacement);
> }
> /**
>  * Encode a double into a string that orders correctly according to string
>  * comparison

Re: [jira] Created: (LUCENE-945) contrib/benchmark tests fail find data dirs

2007-07-15 Thread Doron Cohen
Slowly catching up...

Grant Ingersoll wrote:

> I think legally we are fine, since we aren't actually shipping it.  I
> just mean that people may not want to wait however long it takes to
> download it.  Of course, I don't know a work around other than to
> have some smaller set.

Measured the times - conrib/benchmark test now takes 4 minutes
on first run (with Reuters downloading) and 2 minutes in
following runs. I think this is not too bad (?)

Btw, The 2 mins includes the parallel test that indexes entire
Reuters collection. Once TestQualityRun is committed, it would
also index that entire collection, but I changed the parallel
test to index only a few documents (so it would not add 2 more
minutes).

> On Jun 30, 2007, at 10:10 AM, Doron Cohen wrote:
>
> > Grant Ingersoll <[EMAIL PROTECTED]> wrote on 30/06/2007 05:20:34:
> >
> >> Does this imply it is going to download the test collection for
> >> people when they don't have it when running tests?  I don't know if
> >> that is something people are going to want to happen.
> >
> > Yes it does.. and so would auto-build-bots - I am also using
> > the Reuters collection for TestQualityRun in LUCENE-836. Do
> > you mean legal wise? I should probably add a "DOWNLOAD"
> > warning here. Is there another issue with this?


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Best Practices for getting Strings from a position range

2007-07-15 Thread Chris Hostetter

: Do we have a best practice for going from, say a SpanQuery doc/
: position information and retrieving the actual range of positions of
: content from the Document?  Is it just to reanalyze the Document
: using the appropriate Analyzer and start recording once you hit the
: positions you are interested in?Seems like Term Vectors _could_
: help, but even my new Mapper approach patch (LUCENE-868) doesn't
: really help, because they are stored in a term-centric manner.  I
: guess what I am after is a position centric approach.  That is, give

this is kind of what i was suggesting in the last message i sent
to the java-user thread about paylods and SpanQueries (which i'm
guessing is what prompted this thread as well)...

http://www.nabble.com/Payloads-and-PhraseQuery-tf3988826.html#a11551628

my point was that currently, to retrieve a payload you need a
TermPositions instance, which is designed for iterating in the order of...
seek(term)
  skipTo(doc)
 nextPosition()
getPayload()
...which is great for getting the payload of every instance
(ie:position) of a specific term in a given document (or in every
document) but without serious changes to the Spans API, the ideal payload
API would let you say...
skipTo(doc)
   advance(startPosition)
 getPayload()
   while (nextPosition() < endPosition)
 getPosition()

but this seems like a nearly impossible API to implement given the natore
of hte inverted index and the fact that terms aren't ever stored in
position order.

there's a lot i really don't know/understand about the lucene term
position internals ... but as i recall, the datastructure written to disk
isn't actually a tree structure inverted index, it's a long sequence of
tuples correct?  so in theory you could scan along the tuples untill you
find the doc you are interested in, ignoring all of the term info along
the way, then whatever term you happen be on at the moment, you could scan
along all of the positions until you find one in the range you are
interested in -- assuming you do, then you record the current Term (and
read your payload data if interested)

if i remember correctly, the first part of this is easy, and relative fast
-- i think skipTo(doc) on a TermDoc or TermPositions will happily scan for
the first  pair with the correct docId, irregardless of the term
... the only thing i'm not sure about is how efficient it is to loop over
nextPosition() for every term you find to see if any of them are in your
range ... the best case scenerio is that the first position returned is
above the high end of your range, in which case you can stop immediately
and seek to the next term -- butthe worst case is that you call
nextPosition() over an over a lot of times before you get a position in
(or above) your rnage  an advancePosition(pos) that wokred like seek
or skipTo might be helpful here.

: I feel like I am missing something obvious.  I would suspect the
: highlighter needs to do this, but it seems to take the reanalyze
: approach as well (I admit, though, that I have little experience with
: the highlighter.)

as i understand it the default case is to reanalyze, but if you have
TermFreqVector info stored with positions (ie: a TermPositionVector) then
it can use that to construct a TokenStream by iterating over all terms and
writing them into a big array in position order (see the TermSources class
in the highlighter)

this makes sense when highlighting because it doesn't know what kind of
fragmenter is going to be used so it needs the whole TokenStream, but it
seems less then ideal when you are only interested in a small number of
position ranges that you know in advance.

: I am wondering if it would be useful to have an alternative Term
: Vector storage mechanism that was position centric.  Because we
: couldn't take advantage of the lexicographic compression, it would
: take up more disk space, but it would be a lot faster for these kinds

i'm not sure if it's really neccessary to store the data in a position
centric manner, assuming we have a way to "seek" by position like i
described above -- but then again i don't really know that what i
described above is all that possible/practical/performant.




-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-960) SpanQueryFilter addition

2007-07-15 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-960:
---

 Priority: Minor  (was: Trivial)
Lucene Fields: [New, Patch Available]  (was: [Patch Available, New])

> SpanQueryFilter addition
> 
>
> Key: LUCENE-960
> URL: https://issues.apache.org/jira/browse/LUCENE-960
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Grant Ingersoll
>Priority: Minor
> Attachments: SpanQueryFilter.patch
>
>
> Similar to the QueryFilter (or whatever it is called now) the SpanQueryFilter 
> is a regular Lucene Filter, but it also can return Spans-like information.  
> This is useful if you not only want to filter based on a Query, but you then 
> want to be able to compare how a given match from a new query compared to the 
> positions of the filtered SpanQuery.  Patch to come shortly also contains a 
> caching mechanism for the SpanQueryFilter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-960) SpanQueryFilter addition

2007-07-15 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-960:
---

Attachment: SpanQueryFilter.patch

Try again w/ an actual patch

> SpanQueryFilter addition
> 
>
> Key: LUCENE-960
> URL: https://issues.apache.org/jira/browse/LUCENE-960
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Grant Ingersoll
>Priority: Trivial
> Attachments: SpanQueryFilter.patch
>
>
> Similar to the QueryFilter (or whatever it is called now) the SpanQueryFilter 
> is a regular Lucene Filter, but it also can return Spans-like information.  
> This is useful if you not only want to filter based on a Query, but you then 
> want to be able to compare how a given match from a new query compared to the 
> positions of the filtered SpanQuery.  Patch to come shortly also contains a 
> caching mechanism for the SpanQueryFilter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-960) SpanQueryFilter addition

2007-07-15 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-960:
---

Attachment: (was: SpanQueryFilter.java)

> SpanQueryFilter addition
> 
>
> Key: LUCENE-960
> URL: https://issues.apache.org/jira/browse/LUCENE-960
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Grant Ingersoll
>Priority: Trivial
> Attachments: SpanQueryFilter.patch
>
>
> Similar to the QueryFilter (or whatever it is called now) the SpanQueryFilter 
> is a regular Lucene Filter, but it also can return Spans-like information.  
> This is useful if you not only want to filter based on a Query, but you then 
> want to be able to compare how a given match from a new query compared to the 
> positions of the filtered SpanQuery.  Patch to come shortly also contains a 
> caching mechanism for the SpanQueryFilter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-960) SpanQueryFilter addition

2007-07-15 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-960:
---

Attachment: SpanQueryFilter.java

Patch and tests for SpanQueryFilter

> SpanQueryFilter addition
> 
>
> Key: LUCENE-960
> URL: https://issues.apache.org/jira/browse/LUCENE-960
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Grant Ingersoll
>Priority: Trivial
> Attachments: SpanQueryFilter.java
>
>
> Similar to the QueryFilter (or whatever it is called now) the SpanQueryFilter 
> is a regular Lucene Filter, but it also can return Spans-like information.  
> This is useful if you not only want to filter based on a Query, but you then 
> want to be able to compare how a given match from a new query compared to the 
> positions of the filtered SpanQuery.  Patch to come shortly also contains a 
> caching mechanism for the SpanQueryFilter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-960) SpanQueryFilter addition

2007-07-15 Thread Grant Ingersoll (JIRA)
SpanQueryFilter addition


 Key: LUCENE-960
 URL: https://issues.apache.org/jira/browse/LUCENE-960
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Grant Ingersoll
Priority: Trivial


Similar to the QueryFilter (or whatever it is called now) the SpanQueryFilter 
is a regular Lucene Filter, but it also can return Spans-like information.  
This is useful if you not only want to filter based on a Query, but you then 
want to be able to compare how a given match from a new query compared to the 
positions of the filtered SpanQuery.  Patch to come shortly also contains a 
caching mechanism for the SpanQueryFilter

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Best Practices for getting Strings from a position range

2007-07-15 Thread Grant Ingersoll
Do we have a best practice for going from, say a SpanQuery doc/ 
position information and retrieving the actual range of positions of  
content from the Document?  Is it just to reanalyze the Document  
using the appropriate Analyzer and start recording once you hit the  
positions you are interested in?Seems like Term Vectors _could_  
help, but even my new Mapper approach patch (LUCENE-868) doesn't  
really help, because they are stored in a term-centric manner.  I  
guess what I am after is a position centric approach.  That is, give  
a Document, get a term vector (note, not a TermFreqVector) back that  
is ordered by position (thus, there may be duplicate entries for a  
given term that occurs in multiple positions)


I feel like I am missing something obvious.  I would suspect the  
highlighter needs to do this, but it seems to take the reanalyze  
approach as well (I admit, though, that I have little experience with  
the highlighter.)


I am wondering if it would be useful to have an alternative Term  
Vector storage mechanism that was position centric.  Because we  
couldn't take advantage of the lexicographic compression, it would  
take up more disk space, but it would be a lot faster for these kinds  
of things.  With this kind of approach, you could easily index into  
an array based on the result of a SpanQuery.start(), etc.  Of course,  
you would have to have a data structure that handled the multiple  
terms per position option, but I don't think that would be too hard,  
correct?


Just thinking out loud...

Cheers,
Grant

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Build failed in Hudson: Lucene-Nightly #152

2007-07-15 Thread hudson
See http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/152/changes

--
[...truncated 822 lines...]
A contrib/gdata-server/webroot/WEB-INF/classes/gdata-account.xsd
A contrib/gdata-server/CHANGES.txt
A contrib/gdata-server/lib
AUcontrib/gdata-server/lib/commons-collections-3.2.jar
AUcontrib/gdata-server/lib/gdata-client-1.0.jar
AUcontrib/gdata-server/lib/servlet-api.jar
AUcontrib/gdata-server/lib/xercesImpl.jar
AUcontrib/gdata-server/lib/commons-logging-1.1.jar
AUcontrib/gdata-server/lib/commons-beanutils.jar
AUcontrib/gdata-server/lib/log4j-1.2.13.jar
AUcontrib/gdata-server/lib/nekohtml.jar
AUcontrib/gdata-server/lib/commons-digester-1.7.jar
A contrib/gdata-server/src
A contrib/gdata-server/src/gom
A contrib/gdata-server/src/gom/src
A contrib/gdata-server/src/gom/src/test
A contrib/gdata-server/src/gom/src/test/org
A contrib/gdata-server/src/gom/src/test/org/apache
A contrib/gdata-server/src/gom/src/test/org/apache/lucene
A contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata
A contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom
A 
contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/GOMNamespaceTest.java
A contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core
A 
contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/GOMContentImplTest.java
A 
contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/GOMDocumentImplTest.java
A 
contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/GOMDateConstructImplTest.java
A 
contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/GOMTextConstructImplTest.java
A 
contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/GOMGenereatorImplTest.java
A 
contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/GOMCategoryTest.java
A 
contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/GOMIdImplTest.java
A 
contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/GOMLinkImplTest.java
A 
contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/ArbitraryGOMXmlTest.java
A 
contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/GOMSourceImplTest.java
A 
contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/GOMEntryImplTest.java
A 
contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/GOMAuthorImplTest.java
A 
contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/GOMFeedImplTest.java
A 
contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/GOMAttributeImplTest.java
A 
contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/SimpleGOMElementImplTest.java
A 
contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/AtomUriElementTest.java
A 
contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/GOMPersonImplTest.java
A contrib/gdata-server/src/gom/src/java
A contrib/gdata-server/src/gom/src/java/org
A contrib/gdata-server/src/gom/src/java/org/apache
A contrib/gdata-server/src/gom/src/java/org/apache/lucene
A contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata
A contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom
A 
contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom/GOMXmlEntity.java
A 
contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom/GOMSummary.java
A 
contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom/GOMLink.java
A 
contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom/GOMTime.java
A 
contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom/GOMLogo.java
A 
contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom/GOMAuthor.java
A 
contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom/GOMAttribute.java
A 
contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom/GOMFeed.java
A contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom/core
A 
contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom/core/GOMContributorImpl.java
A 
contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom/core/core-aid.uml
A 
contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom/core/GOMDocumentImpl.java
A 
contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom/core/GOMPublishedImpl.java
A 
contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom/core/GOMDateConstructImpl.java
A 
contrib/

[jira] Commented: (LUCENE-868) Making Term Vectors more accessible

2007-07-15 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512842
 ] 

Grant Ingersoll commented on LUCENE-868:


I also switched TermVectorMapper to be an abstract class per Yonik's suggestion.

> Making Term Vectors more accessible
> ---
>
> Key: LUCENE-868
> URL: https://issues.apache.org/jira/browse/LUCENE-868
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Store
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-868-v2.patch
>
>
> One of the big issues with term vector usage is that the information is 
> loaded into parallel arrays as it is loaded, which are then often times 
> manipulated again to use in the application (for instance, they are sorted by 
> frequency).
> Adding a callback mechanism that allows the vector loading to be handled by 
> the application would make this a lot more efficient.
> I propose to add to IndexReader:
> abstract public void getTermFreqVector(int docNumber, String field, 
> TermVectorMapper mapper) throws IOException;
> and a similar one for the all fields version
> Where TermVectorMapper is an interface with a single method:
> void map(String term, int frequency, int offset, int position);
> The TermVectorReader will be modified to just call the TermVectorMapper.  The 
> existing getTermFreqVectors will be reimplemented to use an implementation of 
> TermVectorMapper that creates the parallel arrays.  Additionally, some simple 
> implementations that automatically sort vectors will also be created.
> This is my first draft of this API and is subject to change.  I hope to have 
> a patch soon.
> See 
> http://www.gossamer-threads.com/lists/lucene/java-user/48003?search_string=get%20the%20total%20term%20frequency;#48003
>  for related information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-868) Making Term Vectors more accessible

2007-07-15 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-868:
---

Attachment: (was: LUCENE-868-v1.patch)

> Making Term Vectors more accessible
> ---
>
> Key: LUCENE-868
> URL: https://issues.apache.org/jira/browse/LUCENE-868
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Store
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-868-v2.patch
>
>
> One of the big issues with term vector usage is that the information is 
> loaded into parallel arrays as it is loaded, which are then often times 
> manipulated again to use in the application (for instance, they are sorted by 
> frequency).
> Adding a callback mechanism that allows the vector loading to be handled by 
> the application would make this a lot more efficient.
> I propose to add to IndexReader:
> abstract public void getTermFreqVector(int docNumber, String field, 
> TermVectorMapper mapper) throws IOException;
> and a similar one for the all fields version
> Where TermVectorMapper is an interface with a single method:
> void map(String term, int frequency, int offset, int position);
> The TermVectorReader will be modified to just call the TermVectorMapper.  The 
> existing getTermFreqVectors will be reimplemented to use an implementation of 
> TermVectorMapper that creates the parallel arrays.  Additionally, some simple 
> implementations that automatically sort vectors will also be created.
> This is my first draft of this API and is subject to change.  I hope to have 
> a patch soon.
> See 
> http://www.gossamer-threads.com/lists/lucene/java-user/48003?search_string=get%20the%20total%20term%20frequency;#48003
>  for related information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-868) Making Term Vectors more accessible

2007-07-15 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-868:
---

Attachment: LUCENE-868-v2.patch

New patch that passes all tests (and compiles against the memory contrib)

> Making Term Vectors more accessible
> ---
>
> Key: LUCENE-868
> URL: https://issues.apache.org/jira/browse/LUCENE-868
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Store
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: LUCENE-868-v1.patch, LUCENE-868-v2.patch
>
>
> One of the big issues with term vector usage is that the information is 
> loaded into parallel arrays as it is loaded, which are then often times 
> manipulated again to use in the application (for instance, they are sorted by 
> frequency).
> Adding a callback mechanism that allows the vector loading to be handled by 
> the application would make this a lot more efficient.
> I propose to add to IndexReader:
> abstract public void getTermFreqVector(int docNumber, String field, 
> TermVectorMapper mapper) throws IOException;
> and a similar one for the all fields version
> Where TermVectorMapper is an interface with a single method:
> void map(String term, int frequency, int offset, int position);
> The TermVectorReader will be modified to just call the TermVectorMapper.  The 
> existing getTermFreqVectors will be reimplemented to use an implementation of 
> TermVectorMapper that creates the parallel arrays.  Additionally, some simple 
> implementations that automatically sort vectors will also be created.
> This is my first draft of this API and is subject to change.  I hope to have 
> a patch soon.
> See 
> http://www.gossamer-threads.com/lists/lucene/java-user/48003?search_string=get%20the%20total%20term%20frequency;#48003
>  for related information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-957) Lucene RAM Directory doesn't work for Index Size > 8 GB

2007-07-15 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-957:
---

Attachment: lucene-957.patch

Previous patch apparently did not fix the bug - a casting problem in 
RAMOutputStream had to be fixed. 
Updated patch adds a test imitating ramFile larger than maxint. 
For this had to make the allocation of a new byte array in RAMFile overridable. 
The new test fails before fixing RAMOutputStream (affecting RAMDirectory 
constructor from FS). However the issues in RAMInputStream in fact do not cause 
failures, yet they should be fixed. 

With a test in place I now feel confident in this fix - will commit it in a day 
or two if there are no reservations.

> Lucene RAM Directory doesn't work for Index Size > 8 GB
> ---
>
> Key: LUCENE-957
> URL: https://issues.apache.org/jira/browse/LUCENE-957
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Store
>Reporter: Doron Cohen
>Assignee: Doron Cohen
> Attachments: lucene-957.patch, lucene-957.patch
>
>
> from user list - http://www.gossamer-threads.com/lists/lucene/java-user/50982
> Problem seems to be casting issues in RAMInputStream.
> Line 90:
>   bufferStart = BUFFER_SIZE * currentBufferIndex;
> both rhs are ints while lhs is long.
> so a very large product would first overflow MAX_INT, become negative, and 
> only then (auto) casted to long, but this is too late. 
> Line 91: 
>  bufferLength = (int) (length - bufferStart);
> both rhs are longs while lhs is int.
> so the (int) cast result may turn negative and the logic that follows would 
> be wrong.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed

2007-07-15 Thread Jason van Zyl
To whom it may engage...

This is an automated request, but not an unsolicited one. For 
more information please visit http://gump.apache.org/nagged.html, 
and/or contact the folk at [EMAIL PROTECTED]

Project lucene-java has an issue affecting its community integration.
This issue affects 3 projects,
 and has been outstanding for 5 runs.
The current state of this project is 'Failed', with reason 'Build Failed'.
For reference only, the following projects are affected by this:
- eyebrowse :  Web-based mail archive browsing
- jakarta-lucene :  Java Based Search Engine
- lucene-java :  Java Based Search Engine


Full details are available at:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/index.html

That said, some information snippets are provided here.

The following annotations (debug/informational/warning/error messages) were 
provided:
 -DEBUG- Sole output [lucene-core-15072007.jar] identifier set to project name
 -DEBUG- Dependency on javacc exists, no need to add for property javacc.home.
 -INFO- Failed with reason build failed
 -INFO- Failed to extract fallback artifacts from Gump Repository



The following work was performed:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/gump_work/build_lucene-java_lucene-java.html
Work Name: build_lucene-java_lucene-java (Type: Build)
Work ended in a state of : Failed
Elapsed: 29 secs
Command Line: /usr/lib/jvm/java-1.5.0-sun/bin/java -Djava.awt.headless=true 
-Xbootclasspath/p:/srv/gump/public/workspace/xml-commons/java/external/build/xml-apis.jar:/srv/gump/public/workspace/xml-xerces2/build/xercesImpl.jar
 org.apache.tools.ant.Main -Dgump.merge=/srv/gump/public/gump/work/merge.xml 
-Dbuild.sysclasspath=only -Dversion=15072007 
-Djavacc.home=/srv/gump/packages/javacc-3.1 package 
[Working Directory: /srv/gump/public/workspace/lucene-java]
CLASSPATH: 
/usr/lib/jvm/java-1.5.0-sun/lib/tools.jar:/srv/gump/public/workspace/lucene-java/build/classes/java:/srv/gump/public/workspace/lucene-java/build/classes/demo:/srv/gump/public/workspace/lucene-java/build/classes/test:/srv/gump/public/workspace/lucene-java/contrib/db/bdb/lib/db-4.3.29.jar:/srv/gump/public/workspace/lucene-java/contrib/gdata-server/lib/gdata-client-1.0.jar:/srv/gump/public/workspace/lucene-java/build/contrib/analyzers/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/ant/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/benchmark/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/db/bdb/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/db/bdb-je/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/gdata-server/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/highlighter/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/javascript/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/lucli/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/memory/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/queries/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/regex/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/similarity/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/snowball/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/spellchecker/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/surround/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/swing/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/wordnet/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/xml-query-parser/classes/java:/srv/gump/public/workspace/ant/dist/lib/ant-jmf.jar:/srv/gump/public/workspace/ant/dist/lib/ant-swing.jar:/srv/gump/public/workspace/ant/dist/lib/ant-apache-resolver.jar:/srv/gump/public/workspace/ant/dist/lib/ant-trax.jar:/srv/gump/public/workspace/ant/dist/lib/ant-junit.jar:/srv/gump/public/workspace/ant/dist/lib/ant-launcher.jar:/srv/gump/public/workspace/ant/dist/lib/ant-nodeps.jar:/srv/gump/public/workspace/ant/dist/lib/ant.jar:/srv/gump/packages/junit3.8.1/junit.jar:/srv/gump/public/workspace/xml-commons/java/build/resolver.jar:/srv/gump/packages/je-1.7.1/lib/je.jar:/srv/gump/public/workspace/jakarta-commons/digester/dist/commons-digester.jar:/srv/gump/public/workspace/jakarta-regexp/build/jakarta-regexp-15072007.jar:/srv/gump/packages/javacc-3.1/bin/lib/javacc.jar:/srv/gump/public/workspace/jline/target/jline-0.9.92-SNAPSHOT.jar:/srv/gump/packages/jtidy-04aug2000r7-dev/build/Tidy.jar:/srv/gump/public/workspace/junit/dist/junit-15072007.jar:/srv/gump/public/workspace/xml-commons/java/external/build/xml-apis-ext.jar:/srv/gump/public/workspace/jakarta-commons/logging/target/commons-logging-15072007.jar:/srv/gump/public/workspace/jakarta-commons/logging/target/commons-logging-api-15072007.jar:/srv/gump/public/workspace/jakarta-servletapi-5/jsr154/dist/lib/servlet-api.jar:/srv/gump/packages/nek

[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed

2007-07-15 Thread Jason van Zyl
To whom it may engage...

This is an automated request, but not an unsolicited one. For 
more information please visit http://gump.apache.org/nagged.html, 
and/or contact the folk at [EMAIL PROTECTED]

Project lucene-java has an issue affecting its community integration.
This issue affects 3 projects,
 and has been outstanding for 5 runs.
The current state of this project is 'Failed', with reason 'Build Failed'.
For reference only, the following projects are affected by this:
- eyebrowse :  Web-based mail archive browsing
- jakarta-lucene :  Java Based Search Engine
- lucene-java :  Java Based Search Engine


Full details are available at:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/index.html

That said, some information snippets are provided here.

The following annotations (debug/informational/warning/error messages) were 
provided:
 -DEBUG- Sole output [lucene-core-15072007.jar] identifier set to project name
 -DEBUG- Dependency on javacc exists, no need to add for property javacc.home.
 -INFO- Failed with reason build failed
 -INFO- Failed to extract fallback artifacts from Gump Repository



The following work was performed:
http://vmgump.apache.org/gump/public/lucene-java/lucene-java/gump_work/build_lucene-java_lucene-java.html
Work Name: build_lucene-java_lucene-java (Type: Build)
Work ended in a state of : Failed
Elapsed: 29 secs
Command Line: /usr/lib/jvm/java-1.5.0-sun/bin/java -Djava.awt.headless=true 
-Xbootclasspath/p:/srv/gump/public/workspace/xml-commons/java/external/build/xml-apis.jar:/srv/gump/public/workspace/xml-xerces2/build/xercesImpl.jar
 org.apache.tools.ant.Main -Dgump.merge=/srv/gump/public/gump/work/merge.xml 
-Dbuild.sysclasspath=only -Dversion=15072007 
-Djavacc.home=/srv/gump/packages/javacc-3.1 package 
[Working Directory: /srv/gump/public/workspace/lucene-java]
CLASSPATH: 
/usr/lib/jvm/java-1.5.0-sun/lib/tools.jar:/srv/gump/public/workspace/lucene-java/build/classes/java:/srv/gump/public/workspace/lucene-java/build/classes/demo:/srv/gump/public/workspace/lucene-java/build/classes/test:/srv/gump/public/workspace/lucene-java/contrib/db/bdb/lib/db-4.3.29.jar:/srv/gump/public/workspace/lucene-java/contrib/gdata-server/lib/gdata-client-1.0.jar:/srv/gump/public/workspace/lucene-java/build/contrib/analyzers/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/ant/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/benchmark/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/db/bdb/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/db/bdb-je/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/gdata-server/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/highlighter/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/javascript/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/lucli/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/memory/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/queries/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/regex/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/similarity/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/snowball/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/spellchecker/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/surround/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/swing/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/wordnet/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/xml-query-parser/classes/java:/srv/gump/public/workspace/ant/dist/lib/ant-jmf.jar:/srv/gump/public/workspace/ant/dist/lib/ant-swing.jar:/srv/gump/public/workspace/ant/dist/lib/ant-apache-resolver.jar:/srv/gump/public/workspace/ant/dist/lib/ant-trax.jar:/srv/gump/public/workspace/ant/dist/lib/ant-junit.jar:/srv/gump/public/workspace/ant/dist/lib/ant-launcher.jar:/srv/gump/public/workspace/ant/dist/lib/ant-nodeps.jar:/srv/gump/public/workspace/ant/dist/lib/ant.jar:/srv/gump/packages/junit3.8.1/junit.jar:/srv/gump/public/workspace/xml-commons/java/build/resolver.jar:/srv/gump/packages/je-1.7.1/lib/je.jar:/srv/gump/public/workspace/jakarta-commons/digester/dist/commons-digester.jar:/srv/gump/public/workspace/jakarta-regexp/build/jakarta-regexp-15072007.jar:/srv/gump/packages/javacc-3.1/bin/lib/javacc.jar:/srv/gump/public/workspace/jline/target/jline-0.9.92-SNAPSHOT.jar:/srv/gump/packages/jtidy-04aug2000r7-dev/build/Tidy.jar:/srv/gump/public/workspace/junit/dist/junit-15072007.jar:/srv/gump/public/workspace/xml-commons/java/external/build/xml-apis-ext.jar:/srv/gump/public/workspace/jakarta-commons/logging/target/commons-logging-15072007.jar:/srv/gump/public/workspace/jakarta-commons/logging/target/commons-logging-api-15072007.jar:/srv/gump/public/workspace/jakarta-servletapi-5/jsr154/dist/lib/servlet-api.jar:/srv/gump/packages/nek