[jira] [Commented] (SOLR-2358) Distributing Indexing

2011-12-05 Thread Mark Miller (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163346#comment-13163346
 ] 

Mark Miller commented on SOLR-2358:
---

I just made it so that version can be specified on delete's in solrxml and did 
the work necessary for distrib deletes to work with versioning. You can do 
delete by id now.

> Distributing Indexing
> -
>
> Key: SOLR-2358
> URL: https://issues.apache.org/jira/browse/SOLR-2358
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud, update
>Reporter: William Mayor
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-2358.patch
>
>
> The first steps towards creating distributed indexing functionality in Solr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3370) Support for a "SpanNotNearQuery"

2011-12-05 Thread Trejkaz (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163307#comment-13163307
 ] 

Trejkaz commented on LUCENE-3370:
-

Well, I ran with a modified version of SpanNotQuery for some time and nobody 
noticed any issues with it, but I just found the one thing which SpanNotQuery 
does differently from SpanNearQuery which makes it unsuitable for this task.

With a SpanNearQuery, if you have "cat" in the document only once, and you 
search for span-near("cat","cat"), you will get no hits.  It doesn't regard 
terms as being "near" themselves.

However with a SpanNotQuery, if you have "cat" in the document only once, and 
you search for span-not("cat","cat"), you *also* get no hits, because you have 
subtracted all the spans you got in the first round.

Since SpanNotNearQuery works like an expanded SpanNotQuery, it inherits this 
behaviour.  Thus, SpanNearQuery and SpanNotNearQuery end up in a situation 
where, quite confusingly to someone who doesn't know how they work, the results 
when added together for some reason do not give the full set of spans you would 
have had before applying the additional query.


> Support for a "SpanNotNearQuery"
> 
>
> Key: LUCENE-3370
> URL: https://issues.apache.org/jira/browse/LUCENE-3370
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/search
>Reporter: Trejkaz
>
> Sometimes you want to find an instance of a span which does not hit near some 
> other span query.  SpanNotQuery only excludes exact hits on the term, but 
> sometimes you want to exclude hits 1 away from the first, and other times you 
> might want the range to be wider.
> So a SpanNotNearQuery could be useful.  
> SpanNotQuery is actually very close, and adding slop+inOrder support to it is 
> probably sufficient to make a SpanNotNearQuery. :)
> There appears to be one project which has done it in this fashion, although 
> this particular code looks like it's out of date:
> http://www.koders.com/java/fid933A84488EBE1F3492B19DE01B2A4FC1D68DA258.aspx?s=ArrayQuery

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Take a look at this...

2011-12-05 Thread Pradeep Pujari
Hola!hope was fading fast finding this was such a relief its crazy how 
the tables have turned I had to share this with someonehttp://www.llantasgigantes.com.mx/profile/89SimonWalker/";>http://www.llantasgigantes.com.mx/profile/89SimonWalker/see
 you


[jira] [Commented] (SOLR-2487) Do not include slf4j-jdk14 jar in WAR

2011-12-05 Thread Hoss Man (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163224#comment-13163224
 ] 

Hoss Man commented on SOLR-2487:


Jan: +1

> Do not include slf4j-jdk14 jar in WAR
> -
>
> Key: SOLR-2487
> URL: https://issues.apache.org/jira/browse/SOLR-2487
> Project: Solr
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2, 4.0
>Reporter: Jan Høydahl
>  Labels: logging, slf4j
> Attachments: SOLR-2487.patch, SOLR-2487.patch
>
>
> I know we've intentionally bundled slf4j-jdk14-1.5.5.jar in the war to help 
> newbies get up and running. But I find myself re-packaging the war for every 
> customer when adapting to their choice of logger framework, which is 
> counter-productive.
> It would be sufficient to have the jdk-logging binding in example/lib to let 
> the example and tutorial still work OOTB but as soon as you deploy solr.war 
> to production you're forced to explicitly decide what logging to use.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2935) Better docs for numeric FieldTypes

2011-12-05 Thread Hoss Man (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-2935.


   Resolution: Fixed
Fix Version/s: 4.0
   3.6

Committed revision 1210714. - trunk
Committed revision 1210718. - 3x


> Better docs for numeric FieldTypes
> --
>
> Key: SOLR-2935
> URL: https://issues.apache.org/jira/browse/SOLR-2935
> Project: Solr
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Hoss Man
>Assignee: Hoss Man
> Fix For: 3.6, 4.0
>
> Attachments: SOLR-2935.patch
>
>
> It was recently pointed out to me that if you don't come from a java 
> background, understanding the range of legal values for "TrieIntField" vs 
> "TrieLongField" may not be obvious to you (particularly if you are use to 
> dealing with databases that have INT, SMALLINT, TINYINT, etc... with UNSIGNED 
> vs SIGNED modifiers).  That subsequently made me realize that to this day the 
> javadocs for the various FieldTypes don't explain the diff between the 
> TrieFoo, SortableFoo, and Foo field types.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2358) Distributing Indexing

2011-12-05 Thread Mark Miller (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163178#comment-13163178
 ] 

Mark Miller commented on SOLR-2358:
---

We are starting to get some stable, usable stuff here (even though there is 
much to do!). We are also starting to get some users that are interested in 
using this stuff (critical feedback there). So I'd like to propose we try and 
merge the branch into trunk sooner rather than later, and then iterate from 
there. Anything too experimental in the future could move back onto a branch 
again. This will make the merge a bit more digestible as well - rather than 
building up a crazy amount of differences on the branch. There are also a 
variety of improvements and fixes in the testing framework and elsewhere that 
would be nice to get back into trunk. Perhaps within couple/few weeks, after we 
stabilize and finish up some hanging work?

> Distributing Indexing
> -
>
> Key: SOLR-2358
> URL: https://issues.apache.org/jira/browse/SOLR-2358
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud, update
>Reporter: William Mayor
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-2358.patch
>
>
> The first steps towards creating distributed indexing functionality in Solr

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2509) spellcheck: StringIndexOutOfBoundsException: String index out of range: -1

2011-12-05 Thread Yonik Seeley (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163114#comment-13163114
 ] 

Yonik Seeley commented on SOLR-2509:


bq. Indeed, this test scenario was added during a refactoring (r1022768) with 
no JIRA # or bug mentioned at all in the comments. 

My commit :-)

The commit comment said "tests: fix resource leaks and simplify", and hopefully 
that's all I did!

Looking back wrt pixma, it looks like I replaced this:
{code}
-  @Test
-  public void testCollate2() throws Exception {
-SolrCore core = h.getCore();
-SearchComponent speller = core.getSearchComponent("spellcheck");
-assertTrue("speller is null and it shouldn't be", speller != null);
-
-ModifiableSolrParams params = new ModifiableSolrParams();
-params.add(CommonParams.QT, "spellCheckCompRH");
-params.add(SpellCheckComponent.SPELLCHECK_BUILD, "true");
-params.add(CommonParams.Q, "pixma-a-b-c-d-e-f-g");
-params.add(SpellCheckComponent.COMPONENT_NAME, "true");
-params.add(SpellCheckComponent.SPELLCHECK_COLLATE, "true");
-
-SolrRequestHandler handler = core.getRequestHandler("spellCheckCompRH");
-SolrQueryResponse rsp = new SolrQueryResponse();
-rsp.add("responseHeader", new SimpleOrderedMap());
-handler.handleRequest(new LocalSolrQueryRequest(core, params), rsp);
-NamedList values = rsp.getValues();
-NamedList spellCheck = (NamedList) values.get("spellcheck");
-NamedList suggestions = (NamedList) spellCheck.get("suggestions");
-String collation = (String) suggestions.get("collation");
-assertEquals("pixmaa", collation);
-  }
{code}

With this:

{code}
+assertJQ(req("json.nl","map", "qt",rh, SpellCheckComponent.COMPONENT_NAME, 
"true", "q","pixma-a-b-c-d-e-f-g", SpellCheckComponent.SPELLCHECK_COLLATE, 
"true")
+   ,"/spellcheck/suggestions/collation=='pixmaa'"
+);
{code}


> spellcheck: StringIndexOutOfBoundsException: String index out of range: -1
> --
>
> Key: SOLR-2509
> URL: https://issues.apache.org/jira/browse/SOLR-2509
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 3.1
> Environment: Debian Lenny
> JAVA Version "1.6.0_20"
>Reporter: Thomas Gambier
>Assignee: Erick Erickson
>Priority: Blocker
> Attachments: SOLR-2509.patch, SOLR-2509.patch, document.xml, 
> schema.xml, solrconfig.xml
>
>
> Hi,
> I'm a french user of SOLR and i've encountered a problem since i've installed 
> SOLR 3.1.
> I've got an error with this query : 
> cle_frbr:"LYSROUGE1149-73190"
> *SEE COMMENTS BELOW*
> I've tested to escape the minus char and the query worked :
> cle_frbr:"LYSROUGE1149(BACKSLASH)-73190"
> But, strange fact, if i change one letter in my query it works :
> cle_frbr:"LASROUGE1149-73190"
> I've tested the same query on SOLR 1.4 and it works !
> Can someone test the query on next line on a 3.1 SOLR version and tell me if 
> he have the same problem ? 
> yourfield:"LYSROUGE1149-73190"
> Where do the problem come from ?
> Thank you by advance for your help.
> Tom

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2509) spellcheck: StringIndexOutOfBoundsException: String index out of range: -1

2011-12-05 Thread James Dyer (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163103#comment-13163103
 ] 

James Dyer commented on SOLR-2509:
--

Steffen's changes are most certainly correct.  The index contains "pixmaa" and 
we are querying on "pixma-a-b-c-d-e-f-g".  The spelling index is using analyzer 
"lowerpunctfilt" (solrconfig-spellcheckcomponent.xml, line 44) which includes 
WordDelimiterFilter and "generateWordParts=1".  So we would expect this query 
to tokenize down to "pixma" "a" "b" "c" "d" "e" "f" "g".  As the Collate 
feature is only supposed to replace the misspelled token with the new one, I 
wonder why this test scenario would expect all 8 tokens to be replaced by 1 
token (!).

Indeed, this test scenario was added during a refactoring (r1022768) with no 
JIRA # or bug mentioned at all in the comments.  So we can't know for sure why 
it was added.  I'm thinking this is invalid.  I would expect the correct 
collation to be "pixma-a-b-c-d-e-f-g".  

Just for grins, I put a "println" in SpellingQueryConverter to show the start & 
end offsets for each token before and after the patch.  In both cases, we get 
the same token texts, but prior to the patch the offset values are clearly 
wrong.

--before:
TOKEN: pixma so=0 eo=19
TOKEN: a so=0 eo=19
TOKEN: b so=0 eo=19
TOKEN: c so=0 eo=19
TOKEN: d so=0 eo=19
TOKEN: e so=0 eo=19
TOKEN: f so=0 eo=19
TOKEN: g so=0 eo=19
TOKEN: pixmaabcdefg so=0 eo=19

--after:
TOKEN: pixma so=0 eo=5
TOKEN: a so=6 eo=7
TOKEN: b so=8 eo=9
TOKEN: c so=10 eo=11
TOKEN: d so=12 eo=13
TOKEN: e so=14 eo=15
TOKEN: f so=16 eo=17
TOKEN: g so=18 eo=19
TOKEN: pixmaabcdefg so=0 eo=19

 

> spellcheck: StringIndexOutOfBoundsException: String index out of range: -1
> --
>
> Key: SOLR-2509
> URL: https://issues.apache.org/jira/browse/SOLR-2509
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 3.1
> Environment: Debian Lenny
> JAVA Version "1.6.0_20"
>Reporter: Thomas Gambier
>Assignee: Erick Erickson
>Priority: Blocker
> Attachments: SOLR-2509.patch, SOLR-2509.patch, document.xml, 
> schema.xml, solrconfig.xml
>
>
> Hi,
> I'm a french user of SOLR and i've encountered a problem since i've installed 
> SOLR 3.1.
> I've got an error with this query : 
> cle_frbr:"LYSROUGE1149-73190"
> *SEE COMMENTS BELOW*
> I've tested to escape the minus char and the query worked :
> cle_frbr:"LYSROUGE1149(BACKSLASH)-73190"
> But, strange fact, if i change one letter in my query it works :
> cle_frbr:"LASROUGE1149-73190"
> I've tested the same query on SOLR 1.4 and it works !
> Can someone test the query on next line on a 3.1 SOLR version and tell me if 
> he have the same problem ? 
> yourfield:"LYSROUGE1149-73190"
> Where do the problem come from ?
> Thank you by advance for your help.
> Tom

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-3.x - Build # 11692 - Failure

2011-12-05 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/11692/

1 tests failed.
REGRESSION:  
org.apache.lucene.index.TestIndexWriterExceptions.testRandomExceptionsThreads

Error Message:
thread Indexer 3: hit unexpected failure

Stack Trace:
junit.framework.AssertionFailedError: thread Indexer 3: hit unexpected failure
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:147)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)
at 
org.apache.lucene.index.TestIndexWriterExceptions.testRandomExceptionsThreads(TestIndexWriterExceptions.java:237)
at 
org.apache.lucene.util.LuceneTestCase$2$1.evaluate(LuceneTestCase.java:432)




Build Log (for compile errors):
[...truncated 7954 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3615) Make it easier to run Test2BTerms

2011-12-05 Thread Hoss Man (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162986#comment-13162986
 ] 

Hoss Man commented on LUCENE-3615:
--

bq. Because of this, all of the tests behave in totally different ways that you 
cant really assign a weight to, e.g. take a look at the history of test times 
for this test:

"Weight" may be the wrong term... i wasn't suggesting that it would be any sort 
of quantitative, comparable metric of how long the test would take -- my point 
was just that having a numeric annotation where bigger means "this test does 
more stuff" would allow people to run more or less tests as they see fit with 
simple configuration, regardless of whether their idea of a test to be run 
"Nightly" jives directly with the @Nightly annotation (maybe i want to only run 
@Nightly tests on weekends?)

As things stand, we have regular tests, and then we have @Nightly tests, and 
then we have @Slow tests ... hypothetically: if we add a new test later that's 
not nearly as bad as Test2BTerms, so we still want it to run as part of a "full 
test run" but is bad enough that we don't want to jenkins to do it was part of 
our  "@Nightly" run, we  have to consider some intermediate "@SortOfSlow" 
attribute ... hence my suggestion that instead of adding more special case 
annotations (and more build params for deciding when to exececute what), we 
just use an arbitrary  range of numbers and two simple "min" and "max" build 
params to pick the tests to run.

...anyway ... it was just an idea.





> Make it easier to run Test2BTerms
> -
>
> Key: LUCENE-3615
> URL: https://issues.apache.org/jira/browse/LUCENE-3615
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-3615.patch, LUCENE-3615.patch, LUCENE-3615.patch
>
>
> Currently, Test2BTerms has an @Ignore annotation which means that the only 
> way to run it as a test is to edit the file.
> There are a couple of options to fix this:
> # Add a main() so it can be invoked via the command line outside of the test 
> framework
> # Add some new annotations that mark it as slow or weekly or something like 
> that and have the test target ignore @slow (or whatever) by default, but can 
> also turn it on.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2943) DIHCacheWriter & DIHCacheProcessor (entity processor)

2011-12-05 Thread James Dyer (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-2943:
-

Description: 
This is a spin-off of SOLR-2382.

Currently DIH requires users to retrieve, join and index all data for a full or 
delta update in one big step.  This issue is to allow us to break this into 
individual steps.  The idea is to have multiple "data-config.xml" files, some 
of which retrieve and cache data while others join and index data.  

This is useful when Solr Records are a conglomeration of several data sources.  
With this feature, each data source can be retrieved and cached separately.  
Once all data sources have been retrieved, they can be joined and indexed in a 
final step.  When doing a delta update, only the data sources that change need 
to have their caches updated (or frequently-changing data can remain un-cached 
while caching the more static data).  This is particularly useful in light of 
the fact that Lucene/Solr cannot do a true "update" operation.  DIH Caches also 
provide a handy way to archive source data for which there is no stable 
system-of-record.

Implementation Details:

- The DIHCacheWriter allows us to write the final (root entity) DIH output to a 
DIHCache rather than to Solr.  Caches can be created from scratch 
("full-update") or existing caches can be modified ("delta-update").

- The DIHCacheProcessor is an Entity Processor that reads a DIHCache.  This 
Entity Processor can be used for both Root Entities and Child Entities.  Cached 
data can be read back, joined to other Entities and indexed.

- Both DIHCacheWriter and DIHCacheProcessor support partitioning.  
DIHCacheWriter can write to a partitioned cache while DIHCacheProcessor can 
read back a particular partition.  This can be handy when indexing to multiple 
shards.

- This patch is 100% stand-alone from the rest of DIH, so while users can patch 
and rebuild the DIH .jar file to include these classes, it is unnecessary.  To 
use this functionality, simply include the code here in the classpath. (ex: in 
SOLR_HOME/lib)

- In addition to this patch, a persistent cache implementation is required. 
  - See SOLR-2948 for a DIH Cache Implementation built on Lucene (no additional 
dependencies). 
  - See SOLR-2613 for a DIH Cache Implementation backed with BDB-JE (we use 
this in Production).
  - Other Cache Implementations (hopefully) will be developed in the future and 
become available for general use.

- This patch includes extensive unit tests.  A MockDIHCache that supports 
persistence and delta updates facilitates the tests.  Do not attempt to use 
MockDIHCache for anything other than testing or as a reference for developing 
your own DIHCache implementations.


  was:
This is a spin-off of SOLR-2382.

Currently DIH requires users to retrieve, join and index all data for a full or 
delta update in one big step.  This issue is to allow us to break this into 
individual steps.  The idea is to have multiple "data-config.xml" files, some 
of which retrieve and cache data while others join and index data.  

This is useful when Solr Records are a conglomeration of several data sources.  
With this feature, each data source can be retrieved and cached separately.  
Once all data sources have been retrieved, they can be joined and indexed in a 
final step.  When doing a delta update, only the data sources that change need 
to have their caches updated (or frequently-changing data can remain un-cached 
while caching the more static data).  This is particularly useful in light of 
the fact that Lucene/Solr cannot do a true "update" operation.  DIH Caches also 
provide a handy way to archive source data for which there is no stable 
system-of-record.

Implementation Details:

- The DIHCacheWriter allows us to write the final (root entity) DIH output to a 
DIHCache rather than to Solr.  Caches can be created from scratch 
("full-update") or existing caches can be modified ("delta-update").

- The DIHCacheProcessor is an Entity Processor that reads a DIHCache.  This 
Entity Processor can be used for both Root Entities and Child Entities.  Cached 
data can be read back, joined to other Entities and indexed.

- Both DIHCacheWriter and DIHCacheProcessor support partitioning.  
DIHCacheWriter can write to a partitioned cache while DIHCacheProcessor can 
read back a particular partition.  This can be handy when indexing to multiple 
shards.

- This patch is 100% stand-alone from the rest of DIH, so while users can patch 
and rebuild the DIH .jar file to include these classes, it is unnecessary.  To 
use this functionality, simply include the code here in the classpath. (ex: in 
SOLR_HOME/lib)

- In addition to this patch, a persistent cache implementation is required.  
See SOLR-2613 for a DIH Cache Implementation backed with BDB-JE.  Other Cache 
Implementations (hopefully) will be developed in the future and become 
available

[JENKINS] Lucene-Solr-tests-only-3.x-java7 - Build # 1175 - Failure

2011-12-05 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x-java7/1175/

2 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest

Error Message:
Cannot delete 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/solr/build/solr-core/test/3/solrtest-SignatureUpdateProcessorFactoryTest-1323111362955/index/_e.frq

Stack Trace:
java.io.IOException: Cannot delete 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/solr/build/solr-core/test/3/solrtest-SignatureUpdateProcessorFactoryTest-1323111362955/index/_e.frq
at org.apache.lucene.store.FSDirectory.deleteFile(FSDirectory.java:296)
at 
org.apache.lucene.store.MockDirectoryWrapper.deleteFile(MockDirectoryWrapper.java:370)
at 
org.apache.lucene.store.MockDirectoryWrapper.crash(MockDirectoryWrapper.java:243)
at 
org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:535)
at 
org.apache.solr.SolrTestCaseJ4.closeDirectories(SolrTestCaseJ4.java:82)
at org.apache.solr.SolrTestCaseJ4.deleteCore(SolrTestCaseJ4.java:290)
at 
org.apache.solr.SolrTestCaseJ4.afterClassSolrTestCase(SolrTestCaseJ4.java:72)


FAILED:  
junit.framework.TestSuite.org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest

Error Message:
java.lang.AssertionError: directory of test was not closed, opened from: 
org.apache.solr.core.MockDirectoryFactory.open(MockDirectoryFactory.java:34)

Stack Trace:
java.lang.RuntimeException: java.lang.AssertionError: directory of test was not 
closed, opened from: 
org.apache.solr.core.MockDirectoryFactory.open(MockDirectoryFactory.java:34)
at 
org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:310)
at 
org.apache.lucene.util.LuceneTestCase.checkResourcesAfterClass(LuceneTestCase.java:349)
at 
org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:278)




Build Log (for compile errors):
[...truncated 14636 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2948) DIH Cache backed w/Lucene

2011-12-05 Thread James Dyer (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-2948:
-

Attachment: SOLR-2948.patch

This initial version has never been used in a production environment, but I 
have used (an earlier version of) this in a development context.  No doubt this 
would be adequate in many situations but likely could stand some improvement.  
Unit tests included and all pass.

> DIH Cache backed w/Lucene
> -
>
> Key: SOLR-2948
> URL: https://issues.apache.org/jira/browse/SOLR-2948
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Affects Versions: 4.0
>Reporter: James Dyer
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-2948.patch
>
>
> This is a DIH Cache Implementation that supports persistence and delta 
> updates on the cache.  The cache is backed by a stand-alone Lucene index.  By 
> requiring no additional dependencies, this allows users to easily use the DIH 
> Cache persistence functionality (see SOLR-2943).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2948) DIH Cache backed w/Lucene

2011-12-05 Thread James Dyer (Created) (JIRA)
DIH Cache backed w/Lucene
-

 Key: SOLR-2948
 URL: https://issues.apache.org/jira/browse/SOLR-2948
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 4.0
Reporter: James Dyer
Priority: Minor
 Fix For: 4.0


This is a DIH Cache Implementation that supports persistence and delta updates 
on the cache.  The cache is backed by a stand-alone Lucene index.  By requiring 
no additional dependencies, this allows users to easily use the DIH Cache 
persistence functionality (see SOLR-2943).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2613) DIH Cache backed w/bdb-je

2011-12-05 Thread James Dyer (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-2613:
-

Attachment: SOLR-2613.patch

updated to fix a parameter-naming bug.

> DIH Cache backed w/bdb-je
> -
>
> Key: SOLR-2613
> URL: https://issues.apache.org/jira/browse/SOLR-2613
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Affects Versions: 4.0
>Reporter: James Dyer
>Priority: Minor
> Attachments: SOLR-2613.patch, SOLR-2613.patch, SOLR-2613.patch, 
> SOLR-2613.patch, SOLR-2613.patch, SOLR-2613.patch, SOLR-2613.patch
>
>
> This is spun out of SOLR-2382, which provides a framework for multiple 
> cacheing implementations with DIH.  This cache implementation is fast & 
> flexible, supporting persistence and delta updates.  However, it depends on 
> Berkley Database Java Edition so in order to evaluate this and use it you 
> must download bdb-je from Oracle and accept the license requirements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2943) DIHCacheWriter & DIHCacheProcessor (entity processor)

2011-12-05 Thread James Dyer (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-2943:
-

Attachment: SOLR-2943.patch

updated patch.  fixes a parameter-naming bug.

> DIHCacheWriter & DIHCacheProcessor (entity processor)
> -
>
> Key: SOLR-2943
> URL: https://issues.apache.org/jira/browse/SOLR-2943
> Project: Solr
>  Issue Type: New Feature
>  Components: contrib - DataImportHandler
>Affects Versions: 4.0
>Reporter: James Dyer
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-2943.patch, SOLR-2943.patch
>
>
> This is a spin-off of SOLR-2382.
> Currently DIH requires users to retrieve, join and index all data for a full 
> or delta update in one big step.  This issue is to allow us to break this 
> into individual steps.  The idea is to have multiple "data-config.xml" files, 
> some of which retrieve and cache data while others join and index data.  
> This is useful when Solr Records are a conglomeration of several data 
> sources.  With this feature, each data source can be retrieved and cached 
> separately.  Once all data sources have been retrieved, they can be joined 
> and indexed in a final step.  When doing a delta update, only the data 
> sources that change need to have their caches updated (or frequently-changing 
> data can remain un-cached while caching the more static data).  This is 
> particularly useful in light of the fact that Lucene/Solr cannot do a true 
> "update" operation.  DIH Caches also provide a handy way to archive source 
> data for which there is no stable system-of-record.
> Implementation Details:
> - The DIHCacheWriter allows us to write the final (root entity) DIH output to 
> a DIHCache rather than to Solr.  Caches can be created from scratch 
> ("full-update") or existing caches can be modified ("delta-update").
> - The DIHCacheProcessor is an Entity Processor that reads a DIHCache.  This 
> Entity Processor can be used for both Root Entities and Child Entities.  
> Cached data can be read back, joined to other Entities and indexed.
> - Both DIHCacheWriter and DIHCacheProcessor support partitioning.  
> DIHCacheWriter can write to a partitioned cache while DIHCacheProcessor can 
> read back a particular partition.  This can be handy when indexing to 
> multiple shards.
> - This patch is 100% stand-alone from the rest of DIH, so while users can 
> patch and rebuild the DIH .jar file to include these classes, it is 
> unnecessary.  To use this functionality, simply include the code here in the 
> classpath. (ex: in SOLR_HOME/lib)
> - In addition to this patch, a persistent cache implementation is required.  
> See SOLR-2613 for a DIH Cache Implementation backed with BDB-JE.  Other Cache 
> Implementations (hopefully) will be developed in the future and become 
> available for general use.
> - This patch includes extensive unit tests.  A MockDIHCache that supports 
> persistence and delta updates facilitates the tests.  Do not attempt to use 
> MockDIHCache for anything other than testing or as a reference for developing 
> your own DIHCache implementations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2208) Token div exceeds length of provided text sized 4114

2011-12-05 Thread Matan Zinger (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162898#comment-13162898
 ] 

Matan Zinger commented on LUCENE-2208:
--

Hello Guys,

I am blocked with bug as well.

Is there any update / progress on this subject?

Thank you in advance...

> Token div exceeds length of provided text sized 4114
> 
>
> Key: LUCENE-2208
> URL: https://issues.apache.org/jira/browse/LUCENE-2208
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/highlighter
>Affects Versions: 3.0
> Environment:  diagnostics = {os.version=5.1, os=Windows XP, 
> lucene.version=3.0.0 883080 - 2009-11-22 15:43:58, source=flush, os.arch=x86, 
> java.version=1.6.0_12, java.vendor=Sun Microsystems Inc.}
>
>Reporter: Ramazan VARLIKLI
> Attachments: LUCENE-2208.patch, LUCENE-2208_test.patch
>
>
> I have a doc which contains html codes. I want to strip html tags and make 
> the test clear after then apply highlighter on the clear text . But 
> highlighter throws an exceptions if I strip out the html characters  , if i 
> don't strip out , it works fine. It just confuses me at the moment 
> I copy paste 3 thing here from the console as it may contain special 
> characters which might cause the problem.
> 1 -) Here is the html text 
>   Starter
>   
> 
> 
>  Learning path: History
>   Key question
>   Did transport fuel the industrial revolution?
>   Learning Objective
> 
>   To categorise points as for or against an argument
>   
> 
>   What to do?
>   
> Watch the clip: Transport fuelled the industrial 
> revolution.
>   
>   The clips claims that transport fuelled the industrial 
> revolution. Some historians argue that the industrial revolution only 
> happened because of developments in transport.
> 
>   Read the statements below and decide which 
> points are for and which points are against the argument 
> that industry expanded in the 18th and 19th centuries because of developments 
> in transport.
>   
>   
>   
>   Industry expanded because of inventions and 
> the discovery of steam power.
>   Improvements in transport allowed goods to 
> be sold all over the country and all over the world so there were more 
> customers to develop industry for.
>   Developments in transport allowed 
> resources, such as coal from mines and cotton from America to come together 
> to manufacture products.
>   Transport only developed because industry 
> needed it. It was slow to develop as money was spent on improving roads, then 
> building canals and the replacing them with railways in order to keep up with 
> industry.
>   
>   
>   Now try to think of 2 more statements of your 
> own.
>   
> 
> 
>   
>   Main activity
>   
> 
> Learning path: 
> History
>   Learning Objective
>   
> To select evidence to support points
>   
>   What to do?
>   
>   Choose the 4 points that you think are most important - 
> try to be balanced by having two for and two 
> against.
> Write one in each of the point boxes of the 
> paragraphs on the sheet  class="link-internal">Constructing a balanced argument. You 
> might like to re write the points in your own words and use connectives to 
> link the paragraphs.
>   
> In history and in any argument, you need evidence 
> to support your points.
> Find evidence from these sources and from 
> your own knowledge to support each of your points:
> 
>  href="../servlet/link?template=vid¯o=setResource&resourceID=2044" 
> class="link-internal">At a toll gate
>  href="../servlet/link?macro=setResource&template=vid&resourceID=2046" 
> class="link-internal">Canals
>  href="../servlet/link?macro=setResource&template=vid&resourceID=2043" 
> class="link-internal">Growing cities: traffic
>href="../servlet/link?macro=setResource&template=vid&resourceID=2047" 
> class="link-internal">Impact of the railway 
>href="../servlet/link?macro=setResource&template=vid&resourceID=

[jira] [Commented] (SOLR-2880) Investigate adding an overseer that can assign shards, later do re-balancing, etc

2011-12-05 Thread Sami Siren (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162811#comment-13162811
 ] 

Sami Siren commented on SOLR-2880:
--

bq. Why does the overseer class have it's own cloud state and watches on live 
nodes and stuff?

The watch for live nodes is also used for adding watches for node states: when 
a new node pops up a watch is generated for /node_states/

bq. The ZkControllers ZkStateReader is already tracking all this stuff and 
should be the owner of the cloud state, shouldn't it?

Yeah, makes sense. I'll see how that would work.

> Investigate adding an overseer that can assign shards, later do re-balancing, 
> etc
> -
>
> Key: SOLR-2880
> URL: https://issues.apache.org/jira/browse/SOLR-2880
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 4.0
>
> Attachments: SOLR-2880-merge-elections.patch, SOLR-2880.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[Lucene.Net] [jira] [Created] (LUCENENET-459) Italian stemmer (from SnowballAnalyzer) does not work

2011-12-05 Thread Santiago M. Mola (Created) (JIRA)
Italian stemmer (from SnowballAnalyzer) does not work
-

 Key: LUCENENET-459
 URL: https://issues.apache.org/jira/browse/LUCENENET-459
 Project: Lucene.Net
  Issue Type: Bug
  Components: Lucene.Net Contrib
Affects Versions: Lucene.Net 2.9.2, Lucene.Net 2.9.4
Reporter: Santiago M. Mola


Italian stemmer does not work.

Consider this code:

var englishAnalyzer = new SnowballAnalyzer("English");
var tk = englishAnalyzer.TokenStream("text", new 
StringReader("horses"));
var ta = 
(TermAttribute)tk.GetAttribute(typeof(TermAttribute));
tk.IncrementToken();
Console.WriteLine("English stemmer: horses -> " + 
ta.Term());

var italianAnalyzer = new SnowballAnalyzer("Italian");
tk = italianAnalyzer.TokenStream("text", new 
StringReader("abbandonata"));
ta = 
(TermAttribute)tk.GetAttribute(typeof(TermAttribute));
tk.IncrementToken();
Console.WriteLine("Italian stemmer: abbandonata -> " + 
ta.Term());

It outputs:

English stemmer: horses -> hors
Italian stemmer: abbandonata -> abbandonata

While Java Lucene 2.9.4 outputs:

English stemmer: horses -> hors
Italian stemmer: abbandonata -> abbandon


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (LUCENE-3619) in trunk if you switch up omitNorms while indexing, you get a corrumpt norms file

2011-12-05 Thread Robert Muir (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3619.
-

   Resolution: Fixed
Fix Version/s: 4.0

> in trunk if you switch up omitNorms while indexing, you get a corrumpt norms 
> file
> -
>
> Key: LUCENE-3619
> URL: https://issues.apache.org/jira/browse/LUCENE-3619
> Project: Lucene - Java
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-3619.patch
>
>
> document 1 has 
>   body: norms=true
>   title: norms=true
> document 2 has 
>   body: norms=false
>   title: norms=true
> when seeing 'body' for the first time, normswriterperfield gets 'initial 
> fieldinfo' and 
> saves it away, which says norms=true
> however, at flush time we dont check, so we write the norms happily anyway.
> then SegmentReader reads the norms later: it skips "body" since it omits norms
> and if you ask for the norms of 'title' it instead returns the bogus "body" 
> norms.
> asserting that SegmentReader "plans to" read the whole .nrm file exposes the 
> bug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [Lucene.Net] Lucene.net twitter account and chat room

2011-12-05 Thread Simone Chiaretta
Oh.. I thought the CMS was a bit more "dynamic"... not just a bunch of
static files :)
Maybe an unofficial site could be done... the official stuff, releases, and
other institutional news will stay on the ASF site, while announcements,
articles, demos and more dynamic and frequently released news might stay on
this external site.

But yes, the twitter gadget might help... also adding a "Follow me on
twitter" button in the page will help raising awareness about the twitter
account

Simo

On Mon, Dec 5, 2011 at 2:08 PM, Michael Herndon wrote:

> oh. I totally misread that.
>
> its *possible. though the data would need a place to live outside of the
> apache cms and I don't know apache's view on that kind of thing at the
> moment or what the other available options are.
>
> To my knowledge, we're currently limited to putting things into svn.   I
> don't like putting docs into the source when its a ridiculous amount of
> static html files. I'd rather store them as a zip and have the site unzip
> them into a directory. I think others feels the same way but don't quote me
> on that.   On the plus side it does have a staging mechanism built into the
> process before you publish to production.
>
> Though it would be nice to have a .NET or Mono website for the simple fact
> we could put up live demos of Lucene.Net and index the mailing list,
> website, wiki, twitter feed,, chat logs, articles, etc in one place.  Also
> the docs would then be able to use the binary format is more compact than
> the purely html format.
>
> But a twitter widget on the site would help visibility =).Just my 2.5
> cents worth.
>
> @Prescott, let me know when you want to work on that release checklist.
>
> - Michael.
>
>
>
>
>
> On Sun, Dec 4, 2011 at 5:12 PM, Simone Chiaretta <
> simone.chiare...@gmail.com> wrote:
>
>> That's not exactly the same thing...
>> I meant a way to have a kind of blog for release announcements and
>> similar things (just the same things that are now in the homepage), not a
>> list of 140chars messages :)
>>
>> ---
>> Simone Chiaretta
>> @simonech
>> Sent from a tablet
>>
>> On 04/dic/2011, at 22:02, Michael Herndon  wrote:
>>
>> > The JavaScript twitter widget should work.
>> >
>> > Sent from my Windows Phone
>> > From: Simone Chiaretta
>> > Sent: 12/4/2011 2:34 PM
>> > To: lucene-net-...@lucene.apache.org
>> > Subject: Re: [Lucene.Net] Lucene.net twitter account and chat room
>> > One good idea, but not sure if possible with ASF CMS, is to have a
>> > feed with the news that are now in home page, and the possibility to
>> > link to single news. Well... a blog :)
>> > Would be much better than adding all news one after the other in the
>> home page
>> >
>> > Simo
>> >
>> > ---
>> > Simone Chiaretta
>> > @simonech
>> > Sent from a tablet
>> >
>> > On 04/dic/2011, at 07:27, michael herndon 
>> wrote:
>> >
>> >> Maybe we use should build a mail/chat/social media search with
>> >> lucene.netat some point in the future?  I'm sure there is a way to log
>> >> the chat.
>> >>
>> >> I posted up a tweet tonight on the release. better later than never.
>> >>
>> >> I have two more tweets scheduled using hootsuite, one to thank Simone
>> for
>> >> the packages, another to ask for article and application submissions on
>> >> monday.  If anyone else has ideas for the branding aspect, do share.
>> I'll
>> >> try to check the feed daily.
>> >>
>> >> hashtag: #lucenenet
>> >>
>> >> - Michael.
>> >>
>> >> On Fri, Dec 2, 2011 at 2:14 PM, Troy Howard 
>> wrote:
>> >>
>> >>> Re: Twitter
>> >>>
>> >>> Sadly not a single tweet has been sent out on our twitter account.
>> >>> Really need to remedy that.
>> >>>
>> >>> Re: IRC/realtime chat
>> >>>
>> >>> There have been some good reasons expressed by various folks at Apache
>> >>> (and in our team)  that realtime chat in channels which are not
>> >>> publicly logged should generally be discouraged. This is because it's
>> >>> all too easy to have a discussion in which only a few members of the
>> >>> community are present, and make decisions without any opportunity for
>> >>> the rest of the community to have input and without the ability to
>> >>> review the reasoning or discourse later. The same holds true for user
>> >>> support, as it's much better to have that public and logged in a
>> >>> mailing list message so that others might find that through searches
>> >>> and use as a reference.
>> >>>
>> >>> That said, people do use IRC/IM from time to time, but we prefer to
>> >>> keep most if not all of the communications public and on the Apache
>> >>> mailing lists. So feel free to set up a chat room and chat with
>> >>> whomever wants to join about whatever topic, but for most things at
>> >>> Apache the philosophy is "mailing list, or it didn't happen". :)
>> >>>
>> >>> Thanks,
>> >>> Troy
>> >>>
>> >>>
>> >>> On Fri, Dec 2, 2011 at 10:03 AM, Prescott Nasser <
>> geobmx...@hotmail.com>
>> >>> wrote:
>> 
>> >
>> > I just saw that there is a twitter account for Lucene.n

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 11685 - Failure

2011-12-05 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/11685/

2 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest

Error Message:
Cannot delete 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/build/solr-core/test/4/solrtest-SignatureUpdateProcessorFactoryTest-1323090765194/index/_d.tii

Stack Trace:
java.io.IOException: Cannot delete 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/build/solr-core/test/4/solrtest-SignatureUpdateProcessorFactoryTest-1323090765194/index/_d.tii
at org.apache.lucene.store.FSDirectory.deleteFile(FSDirectory.java:296)
at 
org.apache.lucene.store.MockDirectoryWrapper.deleteFile(MockDirectoryWrapper.java:370)
at 
org.apache.lucene.store.MockDirectoryWrapper.crash(MockDirectoryWrapper.java:243)
at 
org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:535)
at 
org.apache.solr.SolrTestCaseJ4.closeDirectories(SolrTestCaseJ4.java:82)
at org.apache.solr.SolrTestCaseJ4.deleteCore(SolrTestCaseJ4.java:290)
at 
org.apache.solr.SolrTestCaseJ4.afterClassSolrTestCase(SolrTestCaseJ4.java:72)


FAILED:  
junit.framework.TestSuite.org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest

Error Message:
java.lang.AssertionError: directory of test was not closed, opened from: 
org.apache.solr.core.MockDirectoryFactory.open(MockDirectoryFactory.java:34)

Stack Trace:
java.lang.RuntimeException: java.lang.AssertionError: directory of test was not 
closed, opened from: 
org.apache.solr.core.MockDirectoryFactory.open(MockDirectoryFactory.java:34)
at 
org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:310)
at 
org.apache.lucene.util.LuceneTestCase.checkResourcesAfterClass(LuceneTestCase.java:349)
at 
org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:278)




Build Log (for compile errors):
[...truncated 14640 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3298) FST has hard limit max size of 2.1 GB

2011-12-05 Thread Commented

[ 
https://issues.apache.org/jira/browse/LUCENE-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162752#comment-13162752
 ] 

Carlos González-Cadenas commented on LUCENE-3298:
-

Yeap, at the beginning of this project we tried to implement this autocomplete 
system using regular inverted indexes, but the response time required for 
autocomplete to work from a user perspective is very low (<50ms), and it would 
be quite hard to achieve such a performance with inverted indexes. 

I still think this is the way to go, but as you say we have to be careful with 
the data generation part. Most of the work should be put in making sure that 
the data is well distributed and organized in order to avoid combinatorial 
explosion.

Let me go in detail with the sources of data permutations and the reasoning 
behind them:

1) With regards to infix matches, if a user types "barcelona" we want to match 
"hotels in barcelona". In order to achieve this, we generate:

hotels in barcelona => hotels in barcelona
in barcelona => hotels in barcelona
barcelona => hotels in barcelona

The FST should be able to conflate these prefixes nicely in just one path, 
right?. Therefore this part shouldn't be a problem.

2) In addition, another feature we want to achieve is to be able to match 
inputs without prepositions. That means that if the user types "hotels 
barcelona jacuzzi", we should be able to match "hoteles in barcelona with 
jacuzzi". Now the only way we envision of doing it properly is to generate this 
permutation within the data:

hotels barcelona jacuzzi => hotels in barcelona with jacuzzi

I can see how this can explode the FST by creating different branches. 
Theoretically this could be done at runtime without the need of generating the 
data, but we don't see a way to do it in a clean way. To make things more 
complicated :) we've implemented fuzzy matching at query time (we use a 
levenshtein automata generated with the user input + an edit distance and then 
we intersect with the FST), and this is making very complicated to do 
preposition handling at query time. 

3) PP permutations (i.e. hoteles in barcelona with jacuzzi and hoteles with 
jacuzzi in barcelona). I don't really see a way to work around this. Probably 
we need to be careful and only generate these permutations for the top-K 
cities, in order to limit the potential size.

Summarizing, I believe that we can reduce the set of "bad permutations" a lot 
if we can figure out how to implement the prepositions at runtime. If you have 
any ideas, let me know. Thanks! :)




> FST has hard limit max size of 2.1 GB
> -
>
> Key: LUCENE-3298
> URL: https://issues.apache.org/jira/browse/LUCENE-3298
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/FSTs
>Reporter: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-3298.patch
>
>
> The FST uses a single contiguous byte[] under the hood, which in java is 
> indexed by int so we cannot grow this over Integer.MAX_VALUE.  It also 
> internally encodes references to this array as vInt.
> We could switch this to a paged byte[] and make the far larger.
> But I think this is low priority... I'm not going to work on it any time soon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3298) FST has hard limit max size of 2.1 GB

2011-12-05 Thread Dawid Weiss (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162719#comment-13162719
 ] 

Dawid Weiss commented on LUCENE-3298:
-

If you have so many permutations then they become different paths in the FST 
and it will grow exponentially to the number of input words/ combinations. To 
be honest, this looks more suitable for a regular inverted index search.

> FST has hard limit max size of 2.1 GB
> -
>
> Key: LUCENE-3298
> URL: https://issues.apache.org/jira/browse/LUCENE-3298
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/FSTs
>Reporter: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-3298.patch
>
>
> The FST uses a single contiguous byte[] under the hood, which in java is 
> indexed by int so we cannot grow this over Integer.MAX_VALUE.  It also 
> internally encodes references to this array as vInt.
> We could switch this to a paged byte[] and make the far larger.
> But I think this is low priority... I'm not going to work on it any time soon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3298) FST has hard limit max size of 2.1 GB

2011-12-05 Thread Commented

[ 
https://issues.apache.org/jira/browse/LUCENE-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13162718#comment-13162718
 ] 

Carlos González-Cadenas commented on LUCENE-3298:
-

Hello Dawid,

The sentences have variants at different levels. The first is the one you 
mention, different prefixes for different accomodation types. The second one is 
different positions of the prepositional phrases of the query (i.e. "hotels in 
barcelona with jacuzzi" and "hotels with jacuzzi in barcelona"). The third one 
we have is sentences with and without prepositions ("hotels barcelona jacuzzi").

W.r.t the patch, sorry, I got confused. James, do you have a version of this 
patch that works with trunk?

Thanks a lot.

> FST has hard limit max size of 2.1 GB
> -
>
> Key: LUCENE-3298
> URL: https://issues.apache.org/jira/browse/LUCENE-3298
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/FSTs
>Reporter: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-3298.patch
>
>
> The FST uses a single contiguous byte[] under the hood, which in java is 
> indexed by int so we cannot grow this over Integer.MAX_VALUE.  It also 
> internally encodes references to this array as vInt.
> We could switch this to a paged byte[] and make the far larger.
> But I think this is low priority... I'm not going to work on it any time soon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 1162 - Failure

2011-12-05 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/1162/

2 tests failed.
REGRESSION:  org.apache.solr.cloud.BasicDistributedZkTest.testDistribSearch

Error Message:
Could not connect to ZooKeeper 127.0.0.1:46439 within 3 ms

Stack Trace:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 
127.0.0.1:46439 within 3 ms
at 
org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:124)
at 
org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:121)
at 
org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:84)
at 
org.apache.solr.common.cloud.SolrZkClient.(SolrZkClient.java:65)
at 
org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:71)
at 
org.apache.solr.cloud.AbstractDistributedZkTestCase.setUp(AbstractDistributedZkTestCase.java:47)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)


FAILED:  junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTest

Error Message:
java.lang.AssertionError: ensure your setUp() calls super.setUp() and your 
tearDown() calls super.tearDown()!!!

Stack Trace:
java.lang.RuntimeException: java.lang.AssertionError: ensure your setUp() calls 
super.setUp() and your tearDown() calls super.tearDown()!!!
at 
org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:402)
at 
org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:344)




Build Log (for compile errors):
[...truncated 11483 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-3.x - Build # 11682 - Failure

2011-12-05 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/11682/

2 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest

Error Message:
Cannot delete 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/build/solr-core/test/4/solrtest-SignatureUpdateProcessorFactoryTest-1323078096363/index/_9.prx

Stack Trace:
java.io.IOException: Cannot delete 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/build/solr-core/test/4/solrtest-SignatureUpdateProcessorFactoryTest-1323078096363/index/_9.prx
at org.apache.lucene.store.FSDirectory.deleteFile(FSDirectory.java:296)
at 
org.apache.lucene.store.MockDirectoryWrapper.deleteFile(MockDirectoryWrapper.java:370)
at 
org.apache.lucene.store.MockDirectoryWrapper.crash(MockDirectoryWrapper.java:243)
at 
org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:535)
at 
org.apache.solr.SolrTestCaseJ4.closeDirectories(SolrTestCaseJ4.java:82)
at org.apache.solr.SolrTestCaseJ4.deleteCore(SolrTestCaseJ4.java:290)
at 
org.apache.solr.SolrTestCaseJ4.afterClassSolrTestCase(SolrTestCaseJ4.java:72)


FAILED:  
junit.framework.TestSuite.org.apache.solr.update.processor.SignatureUpdateProcessorFactoryTest

Error Message:
java.lang.AssertionError: directory of test was not closed, opened from: 
org.apache.solr.core.MockDirectoryFactory.open(MockDirectoryFactory.java:34)

Stack Trace:
java.lang.RuntimeException: java.lang.AssertionError: directory of test was not 
closed, opened from: 
org.apache.solr.core.MockDirectoryFactory.open(MockDirectoryFactory.java:34)
at 
org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:310)
at 
org.apache.lucene.util.LuceneTestCase.checkResourcesAfterClass(LuceneTestCase.java:349)
at 
org.apache.lucene.util.LuceneTestCase.afterClassLuceneTestCaseJ4(LuceneTestCase.java:278)




Build Log (for compile errors):
[...truncated 14639 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org