[jira] Commented: (LUCENE-530) Extend NumberTools to support int/long/float/double to string
[ https://issues.apache.org/jira/browse/LUCENE-530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512866 ] Mohammad Norouzi commented on LUCENE-530: - Hi I am using this nice class but because of my requirements I had to add following method, this will ease using this class public static String encode(String stringToEncode,Class type) { try { Method valueOf = type.getMethod("valueOf",new Class[] {String.class}); Object value = valueOf.invoke(null,new Object[] {stringToEncode}); Method encode = NumericEncoder.class.getMethod("encode",new Class[] {type}); String result = (String)encode.invoke(null,new Object[] {value}); return result; } catch (SecurityException e) { e.printStackTrace(); } catch (NoSuchMethodException e) { e.printStackTrace(); } catch (IllegalArgumentException e) { e.printStackTrace(); } catch (IllegalAccessException e) { e.printStackTrace(); } catch (InvocationTargetException e) { logger.error("Exception in target method."); e.printStackTrace(); } return null; } by this method you no longer need to use if else statements, also this class needs a decode() method > Extend NumberTools to support int/long/float/double to string > - > > Key: LUCENE-530 > URL: https://issues.apache.org/jira/browse/LUCENE-530 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Affects Versions: 1.9 >Reporter: Andy Hind >Priority: Minor > > Extend Number tools to support int/long/float/double to string > So you can search using range queries on int/long/float/double, if you want. > Here is the basis for how NumberTools cold be extended to support > int/long/double/float. > As I only write these values to the index and fix tokenisation in searchesI > was not so fussed about the reverse transformations back to Strings. > public class NumericEncoder > { > /* > * Constants for integer encoding > */ > static int INTEGER_SIGN_MASK = 0x8000; > /* > * Constants for long encoding > */ > static long LONG_SIGN_MASK = 0x8000L; > /* > * Constants for float encoding > */ > static int FLOAT_SIGN_MASK = 0x8000; > static int FLOAT_EXPONENT_MASK = 0x7F80; > static int FLOAT_MANTISSA_MASK = 0x007F; > /* > * Constants for double encoding > */ > static long DOUBLE_SIGN_MASK = 0x8000L; > static long DOUBLE_EXPONENT_MASK = 0x7FF0L; > static long DOUBLE_MANTISSA_MASK = 0x000FL; > private NumericEncoder() > { > super(); > } > /** > * Encode an integer into a string that orders correctly using string > * comparison Integer.MIN_VALUE encodes as and MAX_VALUE as > * . > * > * @param intToEncode > * @return > */ > public static String encode(int intToEncode) > { > int replacement = intToEncode ^ INTEGER_SIGN_MASK; > return encodeToHex(replacement); > } > /** > * Encode a long into a string that orders correctly using string > comparison > * Long.MIN_VALUE encodes as and MAX_VALUE as > * . > * > * @param longToEncode > * @return > */ > public static String encode(long longToEncode) > { > long replacement = longToEncode ^ LONG_SIGN_MASK; > return encodeToHex(replacement); > } > /** > * Encode a float into a string that orders correctly according to string > * comparison. Note that there is no negative NaN but there are codings > that > * imply this. So NaN and -Infinity may not compare as expected. > * > * @param floatToEncode > * @return > */ > public static String encode(float floatToEncode) > { > int bits = Float.floatToIntBits(floatToEncode); > int sign = bits & FLOAT_SIGN_MASK; > int exponent = bits & FLOAT_EXPONENT_MASK; > int mantissa = bits & FLOAT_MANTISSA_MASK; > if (sign != 0) > { > exponent ^= FLOAT_EXPONENT_MASK; > mantissa ^= FLOAT_MANTISSA_MASK; > } > sign ^= FLOAT_SIGN_MASK; > int replacement = sign | exponent | mantissa; > return encodeToHex(replacement); > } > /** > * Encode a double into a string that orders correctly according to string > * comparison
Re: [jira] Created: (LUCENE-945) contrib/benchmark tests fail find data dirs
Slowly catching up... Grant Ingersoll wrote: > I think legally we are fine, since we aren't actually shipping it. I > just mean that people may not want to wait however long it takes to > download it. Of course, I don't know a work around other than to > have some smaller set. Measured the times - conrib/benchmark test now takes 4 minutes on first run (with Reuters downloading) and 2 minutes in following runs. I think this is not too bad (?) Btw, The 2 mins includes the parallel test that indexes entire Reuters collection. Once TestQualityRun is committed, it would also index that entire collection, but I changed the parallel test to index only a few documents (so it would not add 2 more minutes). > On Jun 30, 2007, at 10:10 AM, Doron Cohen wrote: > > > Grant Ingersoll <[EMAIL PROTECTED]> wrote on 30/06/2007 05:20:34: > > > >> Does this imply it is going to download the test collection for > >> people when they don't have it when running tests? I don't know if > >> that is something people are going to want to happen. > > > > Yes it does.. and so would auto-build-bots - I am also using > > the Reuters collection for TestQualityRun in LUCENE-836. Do > > you mean legal wise? I should probably add a "DOWNLOAD" > > warning here. Is there another issue with this? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Best Practices for getting Strings from a position range
: Do we have a best practice for going from, say a SpanQuery doc/ : position information and retrieving the actual range of positions of : content from the Document? Is it just to reanalyze the Document : using the appropriate Analyzer and start recording once you hit the : positions you are interested in?Seems like Term Vectors _could_ : help, but even my new Mapper approach patch (LUCENE-868) doesn't : really help, because they are stored in a term-centric manner. I : guess what I am after is a position centric approach. That is, give this is kind of what i was suggesting in the last message i sent to the java-user thread about paylods and SpanQueries (which i'm guessing is what prompted this thread as well)... http://www.nabble.com/Payloads-and-PhraseQuery-tf3988826.html#a11551628 my point was that currently, to retrieve a payload you need a TermPositions instance, which is designed for iterating in the order of... seek(term) skipTo(doc) nextPosition() getPayload() ...which is great for getting the payload of every instance (ie:position) of a specific term in a given document (or in every document) but without serious changes to the Spans API, the ideal payload API would let you say... skipTo(doc) advance(startPosition) getPayload() while (nextPosition() < endPosition) getPosition() but this seems like a nearly impossible API to implement given the natore of hte inverted index and the fact that terms aren't ever stored in position order. there's a lot i really don't know/understand about the lucene term position internals ... but as i recall, the datastructure written to disk isn't actually a tree structure inverted index, it's a long sequence of tuples correct? so in theory you could scan along the tuples untill you find the doc you are interested in, ignoring all of the term info along the way, then whatever term you happen be on at the moment, you could scan along all of the positions until you find one in the range you are interested in -- assuming you do, then you record the current Term (and read your payload data if interested) if i remember correctly, the first part of this is easy, and relative fast -- i think skipTo(doc) on a TermDoc or TermPositions will happily scan for the first pair with the correct docId, irregardless of the term ... the only thing i'm not sure about is how efficient it is to loop over nextPosition() for every term you find to see if any of them are in your range ... the best case scenerio is that the first position returned is above the high end of your range, in which case you can stop immediately and seek to the next term -- butthe worst case is that you call nextPosition() over an over a lot of times before you get a position in (or above) your rnage an advancePosition(pos) that wokred like seek or skipTo might be helpful here. : I feel like I am missing something obvious. I would suspect the : highlighter needs to do this, but it seems to take the reanalyze : approach as well (I admit, though, that I have little experience with : the highlighter.) as i understand it the default case is to reanalyze, but if you have TermFreqVector info stored with positions (ie: a TermPositionVector) then it can use that to construct a TokenStream by iterating over all terms and writing them into a big array in position order (see the TermSources class in the highlighter) this makes sense when highlighting because it doesn't know what kind of fragmenter is going to be used so it needs the whole TokenStream, but it seems less then ideal when you are only interested in a small number of position ranges that you know in advance. : I am wondering if it would be useful to have an alternative Term : Vector storage mechanism that was position centric. Because we : couldn't take advantage of the lexicographic compression, it would : take up more disk space, but it would be a lot faster for these kinds i'm not sure if it's really neccessary to store the data in a position centric manner, assuming we have a way to "seek" by position like i described above -- but then again i don't really know that what i described above is all that possible/practical/performant. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-960) SpanQueryFilter addition
[ https://issues.apache.org/jira/browse/LUCENE-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-960: --- Priority: Minor (was: Trivial) Lucene Fields: [New, Patch Available] (was: [Patch Available, New]) > SpanQueryFilter addition > > > Key: LUCENE-960 > URL: https://issues.apache.org/jira/browse/LUCENE-960 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Reporter: Grant Ingersoll >Priority: Minor > Attachments: SpanQueryFilter.patch > > > Similar to the QueryFilter (or whatever it is called now) the SpanQueryFilter > is a regular Lucene Filter, but it also can return Spans-like information. > This is useful if you not only want to filter based on a Query, but you then > want to be able to compare how a given match from a new query compared to the > positions of the filtered SpanQuery. Patch to come shortly also contains a > caching mechanism for the SpanQueryFilter -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-960) SpanQueryFilter addition
[ https://issues.apache.org/jira/browse/LUCENE-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-960: --- Attachment: SpanQueryFilter.patch Try again w/ an actual patch > SpanQueryFilter addition > > > Key: LUCENE-960 > URL: https://issues.apache.org/jira/browse/LUCENE-960 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Reporter: Grant Ingersoll >Priority: Trivial > Attachments: SpanQueryFilter.patch > > > Similar to the QueryFilter (or whatever it is called now) the SpanQueryFilter > is a regular Lucene Filter, but it also can return Spans-like information. > This is useful if you not only want to filter based on a Query, but you then > want to be able to compare how a given match from a new query compared to the > positions of the filtered SpanQuery. Patch to come shortly also contains a > caching mechanism for the SpanQueryFilter -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-960) SpanQueryFilter addition
[ https://issues.apache.org/jira/browse/LUCENE-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-960: --- Attachment: (was: SpanQueryFilter.java) > SpanQueryFilter addition > > > Key: LUCENE-960 > URL: https://issues.apache.org/jira/browse/LUCENE-960 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Reporter: Grant Ingersoll >Priority: Trivial > Attachments: SpanQueryFilter.patch > > > Similar to the QueryFilter (or whatever it is called now) the SpanQueryFilter > is a regular Lucene Filter, but it also can return Spans-like information. > This is useful if you not only want to filter based on a Query, but you then > want to be able to compare how a given match from a new query compared to the > positions of the filtered SpanQuery. Patch to come shortly also contains a > caching mechanism for the SpanQueryFilter -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-960) SpanQueryFilter addition
[ https://issues.apache.org/jira/browse/LUCENE-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-960: --- Attachment: SpanQueryFilter.java Patch and tests for SpanQueryFilter > SpanQueryFilter addition > > > Key: LUCENE-960 > URL: https://issues.apache.org/jira/browse/LUCENE-960 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Reporter: Grant Ingersoll >Priority: Trivial > Attachments: SpanQueryFilter.java > > > Similar to the QueryFilter (or whatever it is called now) the SpanQueryFilter > is a regular Lucene Filter, but it also can return Spans-like information. > This is useful if you not only want to filter based on a Query, but you then > want to be able to compare how a given match from a new query compared to the > positions of the filtered SpanQuery. Patch to come shortly also contains a > caching mechanism for the SpanQueryFilter -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-960) SpanQueryFilter addition
SpanQueryFilter addition Key: LUCENE-960 URL: https://issues.apache.org/jira/browse/LUCENE-960 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Grant Ingersoll Priority: Trivial Similar to the QueryFilter (or whatever it is called now) the SpanQueryFilter is a regular Lucene Filter, but it also can return Spans-like information. This is useful if you not only want to filter based on a Query, but you then want to be able to compare how a given match from a new query compared to the positions of the filtered SpanQuery. Patch to come shortly also contains a caching mechanism for the SpanQueryFilter -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Best Practices for getting Strings from a position range
Do we have a best practice for going from, say a SpanQuery doc/ position information and retrieving the actual range of positions of content from the Document? Is it just to reanalyze the Document using the appropriate Analyzer and start recording once you hit the positions you are interested in?Seems like Term Vectors _could_ help, but even my new Mapper approach patch (LUCENE-868) doesn't really help, because they are stored in a term-centric manner. I guess what I am after is a position centric approach. That is, give a Document, get a term vector (note, not a TermFreqVector) back that is ordered by position (thus, there may be duplicate entries for a given term that occurs in multiple positions) I feel like I am missing something obvious. I would suspect the highlighter needs to do this, but it seems to take the reanalyze approach as well (I admit, though, that I have little experience with the highlighter.) I am wondering if it would be useful to have an alternative Term Vector storage mechanism that was position centric. Because we couldn't take advantage of the lexicographic compression, it would take up more disk space, but it would be a lot faster for these kinds of things. With this kind of approach, you could easily index into an array based on the result of a SpanQuery.start(), etc. Of course, you would have to have a data structure that handled the multiple terms per position option, but I don't think that would be too hard, correct? Just thinking out loud... Cheers, Grant - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Build failed in Hudson: Lucene-Nightly #152
See http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/152/changes -- [...truncated 822 lines...] A contrib/gdata-server/webroot/WEB-INF/classes/gdata-account.xsd A contrib/gdata-server/CHANGES.txt A contrib/gdata-server/lib AUcontrib/gdata-server/lib/commons-collections-3.2.jar AUcontrib/gdata-server/lib/gdata-client-1.0.jar AUcontrib/gdata-server/lib/servlet-api.jar AUcontrib/gdata-server/lib/xercesImpl.jar AUcontrib/gdata-server/lib/commons-logging-1.1.jar AUcontrib/gdata-server/lib/commons-beanutils.jar AUcontrib/gdata-server/lib/log4j-1.2.13.jar AUcontrib/gdata-server/lib/nekohtml.jar AUcontrib/gdata-server/lib/commons-digester-1.7.jar A contrib/gdata-server/src A contrib/gdata-server/src/gom A contrib/gdata-server/src/gom/src A contrib/gdata-server/src/gom/src/test A contrib/gdata-server/src/gom/src/test/org A contrib/gdata-server/src/gom/src/test/org/apache A contrib/gdata-server/src/gom/src/test/org/apache/lucene A contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata A contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom A contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/GOMNamespaceTest.java A contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core A contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/GOMContentImplTest.java A contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/GOMDocumentImplTest.java A contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/GOMDateConstructImplTest.java A contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/GOMTextConstructImplTest.java A contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/GOMGenereatorImplTest.java A contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/GOMCategoryTest.java A contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/GOMIdImplTest.java A contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/GOMLinkImplTest.java A contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/ArbitraryGOMXmlTest.java A contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/GOMSourceImplTest.java A contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/GOMEntryImplTest.java A contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/GOMAuthorImplTest.java A contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/GOMFeedImplTest.java A contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/GOMAttributeImplTest.java A contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/SimpleGOMElementImplTest.java A contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/AtomUriElementTest.java A contrib/gdata-server/src/gom/src/test/org/apache/lucene/gdata/gom/core/GOMPersonImplTest.java A contrib/gdata-server/src/gom/src/java A contrib/gdata-server/src/gom/src/java/org A contrib/gdata-server/src/gom/src/java/org/apache A contrib/gdata-server/src/gom/src/java/org/apache/lucene A contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata A contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom A contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom/GOMXmlEntity.java A contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom/GOMSummary.java A contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom/GOMLink.java A contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom/GOMTime.java A contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom/GOMLogo.java A contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom/GOMAuthor.java A contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom/GOMAttribute.java A contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom/GOMFeed.java A contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom/core A contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom/core/GOMContributorImpl.java A contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom/core/core-aid.uml A contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom/core/GOMDocumentImpl.java A contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom/core/GOMPublishedImpl.java A contrib/gdata-server/src/gom/src/java/org/apache/lucene/gdata/gom/core/GOMDateConstructImpl.java A contrib/
[jira] Commented: (LUCENE-868) Making Term Vectors more accessible
[ https://issues.apache.org/jira/browse/LUCENE-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512842 ] Grant Ingersoll commented on LUCENE-868: I also switched TermVectorMapper to be an abstract class per Yonik's suggestion. > Making Term Vectors more accessible > --- > > Key: LUCENE-868 > URL: https://issues.apache.org/jira/browse/LUCENE-868 > Project: Lucene - Java > Issue Type: New Feature > Components: Store >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Attachments: LUCENE-868-v2.patch > > > One of the big issues with term vector usage is that the information is > loaded into parallel arrays as it is loaded, which are then often times > manipulated again to use in the application (for instance, they are sorted by > frequency). > Adding a callback mechanism that allows the vector loading to be handled by > the application would make this a lot more efficient. > I propose to add to IndexReader: > abstract public void getTermFreqVector(int docNumber, String field, > TermVectorMapper mapper) throws IOException; > and a similar one for the all fields version > Where TermVectorMapper is an interface with a single method: > void map(String term, int frequency, int offset, int position); > The TermVectorReader will be modified to just call the TermVectorMapper. The > existing getTermFreqVectors will be reimplemented to use an implementation of > TermVectorMapper that creates the parallel arrays. Additionally, some simple > implementations that automatically sort vectors will also be created. > This is my first draft of this API and is subject to change. I hope to have > a patch soon. > See > http://www.gossamer-threads.com/lists/lucene/java-user/48003?search_string=get%20the%20total%20term%20frequency;#48003 > for related information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-868) Making Term Vectors more accessible
[ https://issues.apache.org/jira/browse/LUCENE-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-868: --- Attachment: (was: LUCENE-868-v1.patch) > Making Term Vectors more accessible > --- > > Key: LUCENE-868 > URL: https://issues.apache.org/jira/browse/LUCENE-868 > Project: Lucene - Java > Issue Type: New Feature > Components: Store >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Attachments: LUCENE-868-v2.patch > > > One of the big issues with term vector usage is that the information is > loaded into parallel arrays as it is loaded, which are then often times > manipulated again to use in the application (for instance, they are sorted by > frequency). > Adding a callback mechanism that allows the vector loading to be handled by > the application would make this a lot more efficient. > I propose to add to IndexReader: > abstract public void getTermFreqVector(int docNumber, String field, > TermVectorMapper mapper) throws IOException; > and a similar one for the all fields version > Where TermVectorMapper is an interface with a single method: > void map(String term, int frequency, int offset, int position); > The TermVectorReader will be modified to just call the TermVectorMapper. The > existing getTermFreqVectors will be reimplemented to use an implementation of > TermVectorMapper that creates the parallel arrays. Additionally, some simple > implementations that automatically sort vectors will also be created. > This is my first draft of this API and is subject to change. I hope to have > a patch soon. > See > http://www.gossamer-threads.com/lists/lucene/java-user/48003?search_string=get%20the%20total%20term%20frequency;#48003 > for related information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-868) Making Term Vectors more accessible
[ https://issues.apache.org/jira/browse/LUCENE-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-868: --- Attachment: LUCENE-868-v2.patch New patch that passes all tests (and compiles against the memory contrib) > Making Term Vectors more accessible > --- > > Key: LUCENE-868 > URL: https://issues.apache.org/jira/browse/LUCENE-868 > Project: Lucene - Java > Issue Type: New Feature > Components: Store >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Attachments: LUCENE-868-v1.patch, LUCENE-868-v2.patch > > > One of the big issues with term vector usage is that the information is > loaded into parallel arrays as it is loaded, which are then often times > manipulated again to use in the application (for instance, they are sorted by > frequency). > Adding a callback mechanism that allows the vector loading to be handled by > the application would make this a lot more efficient. > I propose to add to IndexReader: > abstract public void getTermFreqVector(int docNumber, String field, > TermVectorMapper mapper) throws IOException; > and a similar one for the all fields version > Where TermVectorMapper is an interface with a single method: > void map(String term, int frequency, int offset, int position); > The TermVectorReader will be modified to just call the TermVectorMapper. The > existing getTermFreqVectors will be reimplemented to use an implementation of > TermVectorMapper that creates the parallel arrays. Additionally, some simple > implementations that automatically sort vectors will also be created. > This is my first draft of this API and is subject to change. I hope to have > a patch soon. > See > http://www.gossamer-threads.com/lists/lucene/java-user/48003?search_string=get%20the%20total%20term%20frequency;#48003 > for related information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-957) Lucene RAM Directory doesn't work for Index Size > 8 GB
[ https://issues.apache.org/jira/browse/LUCENE-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-957: --- Attachment: lucene-957.patch Previous patch apparently did not fix the bug - a casting problem in RAMOutputStream had to be fixed. Updated patch adds a test imitating ramFile larger than maxint. For this had to make the allocation of a new byte array in RAMFile overridable. The new test fails before fixing RAMOutputStream (affecting RAMDirectory constructor from FS). However the issues in RAMInputStream in fact do not cause failures, yet they should be fixed. With a test in place I now feel confident in this fix - will commit it in a day or two if there are no reservations. > Lucene RAM Directory doesn't work for Index Size > 8 GB > --- > > Key: LUCENE-957 > URL: https://issues.apache.org/jira/browse/LUCENE-957 > Project: Lucene - Java > Issue Type: Bug > Components: Store >Reporter: Doron Cohen >Assignee: Doron Cohen > Attachments: lucene-957.patch, lucene-957.patch > > > from user list - http://www.gossamer-threads.com/lists/lucene/java-user/50982 > Problem seems to be casting issues in RAMInputStream. > Line 90: > bufferStart = BUFFER_SIZE * currentBufferIndex; > both rhs are ints while lhs is long. > so a very large product would first overflow MAX_INT, become negative, and > only then (auto) casted to long, but this is too late. > Line 91: > bufferLength = (int) (length - bufferStart); > both rhs are longs while lhs is int. > so the (int) cast result may turn negative and the logic that follows would > be wrong. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed
To whom it may engage... This is an automated request, but not an unsolicited one. For more information please visit http://gump.apache.org/nagged.html, and/or contact the folk at [EMAIL PROTECTED] Project lucene-java has an issue affecting its community integration. This issue affects 3 projects, and has been outstanding for 5 runs. The current state of this project is 'Failed', with reason 'Build Failed'. For reference only, the following projects are affected by this: - eyebrowse : Web-based mail archive browsing - jakarta-lucene : Java Based Search Engine - lucene-java : Java Based Search Engine Full details are available at: http://vmgump.apache.org/gump/public/lucene-java/lucene-java/index.html That said, some information snippets are provided here. The following annotations (debug/informational/warning/error messages) were provided: -DEBUG- Sole output [lucene-core-15072007.jar] identifier set to project name -DEBUG- Dependency on javacc exists, no need to add for property javacc.home. -INFO- Failed with reason build failed -INFO- Failed to extract fallback artifacts from Gump Repository The following work was performed: http://vmgump.apache.org/gump/public/lucene-java/lucene-java/gump_work/build_lucene-java_lucene-java.html Work Name: build_lucene-java_lucene-java (Type: Build) Work ended in a state of : Failed Elapsed: 29 secs Command Line: /usr/lib/jvm/java-1.5.0-sun/bin/java -Djava.awt.headless=true -Xbootclasspath/p:/srv/gump/public/workspace/xml-commons/java/external/build/xml-apis.jar:/srv/gump/public/workspace/xml-xerces2/build/xercesImpl.jar org.apache.tools.ant.Main -Dgump.merge=/srv/gump/public/gump/work/merge.xml -Dbuild.sysclasspath=only -Dversion=15072007 -Djavacc.home=/srv/gump/packages/javacc-3.1 package [Working Directory: /srv/gump/public/workspace/lucene-java] CLASSPATH: /usr/lib/jvm/java-1.5.0-sun/lib/tools.jar:/srv/gump/public/workspace/lucene-java/build/classes/java:/srv/gump/public/workspace/lucene-java/build/classes/demo:/srv/gump/public/workspace/lucene-java/build/classes/test:/srv/gump/public/workspace/lucene-java/contrib/db/bdb/lib/db-4.3.29.jar:/srv/gump/public/workspace/lucene-java/contrib/gdata-server/lib/gdata-client-1.0.jar:/srv/gump/public/workspace/lucene-java/build/contrib/analyzers/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/ant/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/benchmark/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/db/bdb/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/db/bdb-je/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/gdata-server/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/highlighter/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/javascript/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/lucli/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/memory/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/queries/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/regex/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/similarity/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/snowball/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/spellchecker/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/surround/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/swing/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/wordnet/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/xml-query-parser/classes/java:/srv/gump/public/workspace/ant/dist/lib/ant-jmf.jar:/srv/gump/public/workspace/ant/dist/lib/ant-swing.jar:/srv/gump/public/workspace/ant/dist/lib/ant-apache-resolver.jar:/srv/gump/public/workspace/ant/dist/lib/ant-trax.jar:/srv/gump/public/workspace/ant/dist/lib/ant-junit.jar:/srv/gump/public/workspace/ant/dist/lib/ant-launcher.jar:/srv/gump/public/workspace/ant/dist/lib/ant-nodeps.jar:/srv/gump/public/workspace/ant/dist/lib/ant.jar:/srv/gump/packages/junit3.8.1/junit.jar:/srv/gump/public/workspace/xml-commons/java/build/resolver.jar:/srv/gump/packages/je-1.7.1/lib/je.jar:/srv/gump/public/workspace/jakarta-commons/digester/dist/commons-digester.jar:/srv/gump/public/workspace/jakarta-regexp/build/jakarta-regexp-15072007.jar:/srv/gump/packages/javacc-3.1/bin/lib/javacc.jar:/srv/gump/public/workspace/jline/target/jline-0.9.92-SNAPSHOT.jar:/srv/gump/packages/jtidy-04aug2000r7-dev/build/Tidy.jar:/srv/gump/public/workspace/junit/dist/junit-15072007.jar:/srv/gump/public/workspace/xml-commons/java/external/build/xml-apis-ext.jar:/srv/gump/public/workspace/jakarta-commons/logging/target/commons-logging-15072007.jar:/srv/gump/public/workspace/jakarta-commons/logging/target/commons-logging-api-15072007.jar:/srv/gump/public/workspace/jakarta-servletapi-5/jsr154/dist/lib/servlet-api.jar:/srv/gump/packages/nek
[EMAIL PROTECTED]: Project lucene-java (in module lucene-java) failed
To whom it may engage... This is an automated request, but not an unsolicited one. For more information please visit http://gump.apache.org/nagged.html, and/or contact the folk at [EMAIL PROTECTED] Project lucene-java has an issue affecting its community integration. This issue affects 3 projects, and has been outstanding for 5 runs. The current state of this project is 'Failed', with reason 'Build Failed'. For reference only, the following projects are affected by this: - eyebrowse : Web-based mail archive browsing - jakarta-lucene : Java Based Search Engine - lucene-java : Java Based Search Engine Full details are available at: http://vmgump.apache.org/gump/public/lucene-java/lucene-java/index.html That said, some information snippets are provided here. The following annotations (debug/informational/warning/error messages) were provided: -DEBUG- Sole output [lucene-core-15072007.jar] identifier set to project name -DEBUG- Dependency on javacc exists, no need to add for property javacc.home. -INFO- Failed with reason build failed -INFO- Failed to extract fallback artifacts from Gump Repository The following work was performed: http://vmgump.apache.org/gump/public/lucene-java/lucene-java/gump_work/build_lucene-java_lucene-java.html Work Name: build_lucene-java_lucene-java (Type: Build) Work ended in a state of : Failed Elapsed: 29 secs Command Line: /usr/lib/jvm/java-1.5.0-sun/bin/java -Djava.awt.headless=true -Xbootclasspath/p:/srv/gump/public/workspace/xml-commons/java/external/build/xml-apis.jar:/srv/gump/public/workspace/xml-xerces2/build/xercesImpl.jar org.apache.tools.ant.Main -Dgump.merge=/srv/gump/public/gump/work/merge.xml -Dbuild.sysclasspath=only -Dversion=15072007 -Djavacc.home=/srv/gump/packages/javacc-3.1 package [Working Directory: /srv/gump/public/workspace/lucene-java] CLASSPATH: /usr/lib/jvm/java-1.5.0-sun/lib/tools.jar:/srv/gump/public/workspace/lucene-java/build/classes/java:/srv/gump/public/workspace/lucene-java/build/classes/demo:/srv/gump/public/workspace/lucene-java/build/classes/test:/srv/gump/public/workspace/lucene-java/contrib/db/bdb/lib/db-4.3.29.jar:/srv/gump/public/workspace/lucene-java/contrib/gdata-server/lib/gdata-client-1.0.jar:/srv/gump/public/workspace/lucene-java/build/contrib/analyzers/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/ant/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/benchmark/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/db/bdb/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/db/bdb-je/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/gdata-server/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/highlighter/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/javascript/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/lucli/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/memory/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/queries/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/regex/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/similarity/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/snowball/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/spellchecker/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/surround/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/swing/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/wordnet/classes/java:/srv/gump/public/workspace/lucene-java/build/contrib/xml-query-parser/classes/java:/srv/gump/public/workspace/ant/dist/lib/ant-jmf.jar:/srv/gump/public/workspace/ant/dist/lib/ant-swing.jar:/srv/gump/public/workspace/ant/dist/lib/ant-apache-resolver.jar:/srv/gump/public/workspace/ant/dist/lib/ant-trax.jar:/srv/gump/public/workspace/ant/dist/lib/ant-junit.jar:/srv/gump/public/workspace/ant/dist/lib/ant-launcher.jar:/srv/gump/public/workspace/ant/dist/lib/ant-nodeps.jar:/srv/gump/public/workspace/ant/dist/lib/ant.jar:/srv/gump/packages/junit3.8.1/junit.jar:/srv/gump/public/workspace/xml-commons/java/build/resolver.jar:/srv/gump/packages/je-1.7.1/lib/je.jar:/srv/gump/public/workspace/jakarta-commons/digester/dist/commons-digester.jar:/srv/gump/public/workspace/jakarta-regexp/build/jakarta-regexp-15072007.jar:/srv/gump/packages/javacc-3.1/bin/lib/javacc.jar:/srv/gump/public/workspace/jline/target/jline-0.9.92-SNAPSHOT.jar:/srv/gump/packages/jtidy-04aug2000r7-dev/build/Tidy.jar:/srv/gump/public/workspace/junit/dist/junit-15072007.jar:/srv/gump/public/workspace/xml-commons/java/external/build/xml-apis-ext.jar:/srv/gump/public/workspace/jakarta-commons/logging/target/commons-logging-15072007.jar:/srv/gump/public/workspace/jakarta-commons/logging/target/commons-logging-api-15072007.jar:/srv/gump/public/workspace/jakarta-servletapi-5/jsr154/dist/lib/servlet-api.jar:/srv/gump/packages/nek