date:20110517


 [ 
https://issues.apache.org/jira/browse/LUCENE-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-3104:


Attachment: LUCENE-3104.patch

Totally non-working patch, likely located in the wrong directories, but putting 
it up here so people can start to get a feel for how this works.  The 
test-patch script can be run by hand and also via Jenkins.

> Hook up Automated Patch Checking for Lucene/Solr
> 
>
> Key: LUCENE-3104
> URL: https://issues.apache.org/jira/browse/LUCENE-3104
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Grant Ingersoll
> Attachments: LUCENE-3104.patch
>
>
> It would be really great if we could get feedback to contributors sooner on 
> many things that are basic (tests exist, patch applies cleanly, etc.)
> From Nigel Daley on builds@a.o
> {quote}
> I revamped the precommit testing in the fall so that it doesn't use Jira 
> email anymore to trigger a build.  The process is controlled by
> https://builds.apache.org/hudson/job/PreCommit-Admin/
> which has some documentation up at the top of the job.  You can look at the 
> config of the job (do you have access?) to see what it's doing.  Any project 
> could use this same admin job -- you just need to ask me to add the project 
> to the Jira filter used by the admin job 
> (https://issues.apache.org/jira/sr/jira.issueviews:searchrequest-xml/12313474/SearchRequest-12313474.xml?tempMax=100
>  ) once you have the downstream job(s) setup for your specific project.  For 
> Hadoop we have 3 downstream builds configured which also have some 
> documentation:
> https://builds.apache.org/hudson/job/PreCommit-HADOOP-Build/
> https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/
> https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/
> {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Building from Eclipse

2011-05-17 Thread Tom Hill

try just doing

ant eclipse

To generate the project for eclipse.

Tom


On Tue, May 17, 2011 at 3:40 PM, Daniel Serodio (lists)
 wrote:
> I'm a newbie Solr user, and I'm trying to build Solr 3.1.0 from Eclipse. On
> the Solr wiki I found Paolo Castagna's .classpath and .project files for
> setting up Eclipse, but I couldn't build Lucene because it's missing
> src/demo/
>
> I removed this source folder from .classpath but now it's missing
> org.apache.lucene.util.LuceneTestCase, org.apache.lucene.store._TestHelper,
> org.apache.lucene.util._TestUtil, etc.
>
> Where should these classes be?
>
> Thanks in advance,
> Daniel Serodio
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Building from Eclipse

2011-05-17 Thread Daniel Serodio (lists)

I'm a newbie Solr user, and I'm trying to build Solr 3.1.0 from Eclipse. 
On the Solr wiki I found Paolo Castagna's .classpath and .project files 
for setting up Eclipse, but I couldn't build Lucene because it's missing 
src/demo/


I removed this source folder from .classpath but now it's missing 
org.apache.lucene.util.LuceneTestCase, 
org.apache.lucene.store._TestHelper, org.apache.lucene.util._TestUtil, etc.


Where should these classes be?

Thanks in advance,
Daniel Serodio

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2193) Re-architect Update Handler

2011-05-17 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035086#comment-13035086
 ] 

Yonik Seeley commented on SOLR-2193:


bq. Crazy idea: drop the notion of commits all together (or make it an expert 
thing for the hard core). Default it to 1 second.

That should just be a matter of configuration after this patch... set a default 
of commitWithin=1000 in the (an) update request handler.

I think that commitWithin should mean soft commit.  Users of commitWithin care 
about when the changes become visible, not when they are guaranteed to be 
fsync'd.


> Re-architect Update Handler
> ---
>
> Key: SOLR-2193
> URL: https://issues.apache.org/jira/browse/SOLR-2193
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 4.0
>
> Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch, 
> SOLR-2193.patch
>
>
> The update handler needs an overhaul.
> A few goals I think we might want to look at:
> 1. Cleanup - drop DirectUpdateHandler(2) line - move to something like 
> UpdateHandler, DefaultUpdateHandler
> 2. Expose the SolrIndexWriter in the api or add the proper abstractions to 
> get done what we now do with special casing:
> if (directupdatehandler2)
>   success
>  else
>   failish
> 3. Stop closing the IndexWriter and start using commit (still lazy IW init 
> though).
> 4. Drop iwAccess, iwCommit locks and sync mostly at the Lucene level.
> 5. Keep NRT support in mind.
> 6. Keep microsharding in mind (maintain logical index as multiple physical 
> indexes)
> 7. Address the current issues we face because multiple original/'reloaded' 
> cores can have a different IndexWriter on the same index.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2193) Re-architect Update Handler


[ 
https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035075#comment-13035075
 ] 

Grant Ingersoll commented on SOLR-2193:
---

Crazy idea: drop the notion of commits all together (or make it an expert thing 
for the hard core).  Default it to 1 second.  I wonder how all of this plays 
with warming/caching, etc.  Do you even need those things in this type of setup?

> Re-architect Update Handler
> ---
>
> Key: SOLR-2193
> URL: https://issues.apache.org/jira/browse/SOLR-2193
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 4.0
>
> Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch, 
> SOLR-2193.patch
>
>
> The update handler needs an overhaul.
> A few goals I think we might want to look at:
> 1. Cleanup - drop DirectUpdateHandler(2) line - move to something like 
> UpdateHandler, DefaultUpdateHandler
> 2. Expose the SolrIndexWriter in the api or add the proper abstractions to 
> get done what we now do with special casing:
> if (directupdatehandler2)
>   success
>  else
>   failish
> 3. Stop closing the IndexWriter and start using commit (still lazy IW init 
> though).
> 4. Drop iwAccess, iwCommit locks and sync mostly at the Lucene level.
> 5. Keep NRT support in mind.
> 6. Keep microsharding in mind (maintain logical index as multiple physical 
> indexes)
> 7. Address the current issues we face because multiple original/'reloaded' 
> cores can have a different IndexWriter on the same index.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2500) TestSolrCoreProperties sometimes fails with "no such core: core0"


[ 
https://issues.apache.org/jira/browse/SOLR-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035070#comment-13035070
 ] 

Uwe Schindler commented on SOLR-2500:
-

I get crazy with that test...

It passes isolated, but fails on my machine every run together with other tests.

> TestSolrCoreProperties sometimes fails with "no such core: core0"
> -
>
> Key: SOLR-2500
> URL: https://issues.apache.org/jira/browse/SOLR-2500
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Robert Muir
>
> [junit] Testsuite: 
> org.apache.solr.client.solrj.embedded.TestSolrProperties
> [junit] Testcase: 
> testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): 
> Caused an ERROR
> [junit] No such core: core0
> [junit] org.apache.solr.common.SolrException: No such core: core0
> [junit] at 
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118)
> [junit] at 
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
> [junit] at 
> org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260)
> [junit] at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (SOLR-1395) Integrate Katta

2011-05-17 Thread Jamie Johnson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034887#comment-13034887
 ] 

Jamie Johnson edited comment on SOLR-1395 at 5/17/11 9:23 PM:
--

I think I have most of this running, but I still have a disconnect.  I've done 
the following:
1. Patched
2. Compiled
3. Run web application with additional request handler added to solrconfig.mxl
4. Started katta
5. Started DeployableSolrKattaServer

Now if I execute a query 
(http://localhost:8983/solr/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on&distrib=true)
 I get net.sf.katta.util.KattaException: No shards for indices: [*], which 
makes perfect sense since I have no indices deployed.  As a simple test I 
deployed an index that comes stock with katta (bin/katta addIndex testIndex 
src/test/testIndexA 2), and execute my query again and I get no results (which 
also makes sense since that index does not match my solr config).

All of that being said what is the process for publishing a core to katta?  Is 
there a way to use the standard http methods to add to the index (using 
something like java -jar post.jar *.xml)?  If not how is it done?  Any insight 
into this would be greatly appreciated.

  was (Author: jej2003):
Is there any updated documentation for how to do this?  I've attempted to 
run through the patching process but the exact steps are not clear since the 
versions have changed significantly.  
  
> Integrate Katta
> ---
>
> Key: SOLR-1395
> URL: https://issues.apache.org/jira/browse/SOLR-1395
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 3.2
>
> Attachments: SOLR-1395.patch, SOLR-1395.patch, SOLR-1395.patch, 
> back-end.log, front-end.log, hadoop-core-0.19.0.jar, katta-core-0.6-dev.jar, 
> katta-solrcores.jpg, katta.node.properties, katta.zk.properties, 
> log4j-1.2.13.jar, solr-1395-1431-3.patch, solr-1395-1431-4.patch, 
> solr-1395-1431-katta0.6.patch, solr-1395-1431-katta0.6.patch, 
> solr-1395-1431.patch, solr-1395-katta-0.6.2-1.patch, 
> solr-1395-katta-0.6.2-2.patch, solr-1395-katta-0.6.2-3.patch, 
> solr-1395-katta-0.6.2.patch, test-katta-core-0.6-dev.jar, 
> zkclient-0.1-dev.jar, zookeeper-3.2.1.jar
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> We'll integrate Katta into Solr so that:
> * Distributed search uses Hadoop RPC
> * Shard/SolrCore distribution and management
> * Zookeeper based failover
> * Indexes may be built using Hadoop

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Fuzzy search always returning docs sorted by the highest match

2011-05-17 Thread Guilherme Aiolfi

Hi,

I want to do a fuzzy search and always return documents no matter what the
score. So, to do this, I'm tried sorting by strdist() in solr 3.1. It worked
great and does ALMOST exactly what I wanted. The problem is that the
algorithms supported  jw, ngram and edit are not the best fit for my
scenario.

The best results come from StrikeAMatch (
http://www.devarticles.com/c/a/Development-Cycles/How-to-Strike-a-Match/).
So, I've found this link https://issues.apache.org/jira/browse/LUCENE-2230 that
implemented what I wanted. But I was told that I should use trunk because
there were some really great news about fuzzy search there.

I read this article explaining some changes
http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html.
But I still don't think it replaces the StrikeAMatch algo, because that one
can have best results in searches like "abc" comparing to strings like "abc
company inc" (distance > 2).

But still, Fuad Efendi told me that StrikeAMatch is toys for kids compare to
the state of lucene trunk. So here I'm, I want to know how 4.0 will help
achieve what I want.

Thanks.

[jira] [Commented] (SOLR-2193) Re-architect Update Handler


[ 
https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035051#comment-13035051
 ] 

Mark Miller commented on SOLR-2193:
---

If we go with this separate softAutoCommit as an option, still need to think 
about overlapping hard/soft commits.

Eg you might want to do a soft commit every 4 seconds and a hardcommit every 16 
seconds, but on the 16th second you don't necessarily want to do both types of 
commit (though not likely that big a deal). I've got logic to avoid this in the 
commit by doc case, but nothing for the time based auto commit.

CommitWithin support is also still an interesting additional option - as well 
as Yonik's adea about specifying a staleness hint at query time.

> Re-architect Update Handler
> ---
>
> Key: SOLR-2193
> URL: https://issues.apache.org/jira/browse/SOLR-2193
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 4.0
>
> Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch, 
> SOLR-2193.patch
>
>
> The update handler needs an overhaul.
> A few goals I think we might want to look at:
> 1. Cleanup - drop DirectUpdateHandler(2) line - move to something like 
> UpdateHandler, DefaultUpdateHandler
> 2. Expose the SolrIndexWriter in the api or add the proper abstractions to 
> get done what we now do with special casing:
> if (directupdatehandler2)
>   success
>  else
>   failish
> 3. Stop closing the IndexWriter and start using commit (still lazy IW init 
> though).
> 4. Drop iwAccess, iwCommit locks and sync mostly at the Lucene level.
> 5. Keep NRT support in mind.
> 6. Keep microsharding in mind (maintain logical index as multiple physical 
> indexes)
> 7. Address the current issues we face because multiple original/'reloaded' 
> cores can have a different IndexWriter on the same index.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Fuzzy search always returning docs sorted by the highest match

2011-05-17 Thread Guilherme Aiolfi

I'm re-sending my first message because I've just received the mailing-list
confirmation. If it's a duplicated, forget about this one.

Hi,

I want to do a fuzzy search and always return documents no matter what the
score. So, to do this, I'm tried sorting by strdist() in solr 3.1. It worked
great and does ALMOST exactly what I wanted. The problem is that the
algorithms supported  jw, ngram and edit are not the best fit for my
scenario.

The best results come from StrikeAMatch (
http://www.devarticles.com/c/a/Development-Cycles/How-to-Strike-a-Match/).
So, I've found this link https://issues.apache.org/jira/browse/LUCENE-2230 that
implemented what I wanted. But I was told that I should use trunk because
there were some really great news about fuzzy search there.

I read this article explaining some changes
http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html.
But I still don't think it replaces the StrikeAMatch algo, because that one
can have best results in searches like "abc" comparing to strings like "abc
company inc" (distance > 2).

But still, Fuad Efendi told me that StrikeAMatch is toys for kids compare to
the state of lucene trunk. So here I'm, I want to know how 4.0 will help
achieve what I want.

Thanks.

[jira] [Commented] (LUCENE-3104) Hook up Automated Patch Checking for Lucene/Solr


[ 
https://issues.apache.org/jira/browse/LUCENE-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035028#comment-13035028
 ] 

Grant Ingersoll commented on LUCENE-3104:
-

Allen, please ask your questions on solr-u...@lucene.apache.org and please 
delete the comment, as this issue is not the appropriate place for this 
question.
Thanks,
Grant

> Hook up Automated Patch Checking for Lucene/Solr
> 
>
> Key: LUCENE-3104
> URL: https://issues.apache.org/jira/browse/LUCENE-3104
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Grant Ingersoll
>
> It would be really great if we could get feedback to contributors sooner on 
> many things that are basic (tests exist, patch applies cleanly, etc.)
> From Nigel Daley on builds@a.o
> {quote}
> I revamped the precommit testing in the fall so that it doesn't use Jira 
> email anymore to trigger a build.  The process is controlled by
> https://builds.apache.org/hudson/job/PreCommit-Admin/
> which has some documentation up at the top of the job.  You can look at the 
> config of the job (do you have access?) to see what it's doing.  Any project 
> could use this same admin job -- you just need to ask me to add the project 
> to the Jira filter used by the admin job 
> (https://issues.apache.org/jira/sr/jira.issueviews:searchrequest-xml/12313474/SearchRequest-12313474.xml?tempMax=100
>  ) once you have the downstream job(s) setup for your specific project.  For 
> Hadoop we have 3 downstream builds configured which also have some 
> documentation:
> https://builds.apache.org/hudson/job/PreCommit-HADOOP-Build/
> https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/
> https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/
> {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3104) Hook up Automated Patch Checking for Lucene/Solr

2011-05-17 Thread allen fu (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035021#comment-13035021
 ] 

allen fu commented on LUCENE-3104:
--

hi,Grant Ingersoll,I got a question about 
'https://issues.apache.org/jira/browse/SOLR-926' ,I want to know how u fix the 
bug in solr 1.4.
now,i got a question in replication of SolrCore.I change the dataDir of the old 
core,and reload(corename) to create a new SolrCore. But there are still some 
request handle by old SolrCore and throw a Exception

EVERE: java.util.concurrent.RejectedExecutionException
at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1477)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:384)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:856)
at 
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:45)
at 
java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:606)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1175)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:350)




So , I want to figure out how u fixed the bug in solr 1.4.
thank you 

> Hook up Automated Patch Checking for Lucene/Solr
> 
>
> Key: LUCENE-3104
> URL: https://issues.apache.org/jira/browse/LUCENE-3104
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Grant Ingersoll
>
> It would be really great if we could get feedback to contributors sooner on 
> many things that are basic (tests exist, patch applies cleanly, etc.)
> From Nigel Daley on builds@a.o
> {quote}
> I revamped the precommit testing in the fall so that it doesn't use Jira 
> email anymore to trigger a build.  The process is controlled by
> https://builds.apache.org/hudson/job/PreCommit-Admin/
> which has some documentation up at the top of the job.  You can look at the 
> config of the job (do you have access?) to see what it's doing.  Any project 
> could use this same admin job -- you just need to ask me to add the project 
> to the Jira filter used by the admin job 
> (https://issues.apache.org/jira/sr/jira.issueviews:searchrequest-xml/12313474/SearchRequest-12313474.xml?tempMax=100
>  ) once you have the downstream job(s) setup for your specific project.  For 
> Hadoop we have 3 downstream builds configured which also have some 
> documentation:
> https://builds.apache.org/hudson/job/PreCommit-HADOOP-Build/
> https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/
> https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/
> {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3108) Land DocValues on trunk


[ 
https://issues.apache.org/jira/browse/LUCENE-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035017#comment-13035017
 ] 

Michael McCandless commented on LUCENE-3108:


This is an awesome change!

Phew been a long time since I looked at this branch!

Some questions on a quick pass -- still need to iterate/dig deeper:

  * We have some stale jdocs that reference .setIntValue methods (they
are now .setInt)

  * Hmm do we have byte ordering problems?  Ie, if I write index on
machine with little-endian but then try to load values on
big-endian...?  I think we're OK (we seem to always use
IndexOutput.writeInt, and we convert float-to-raw-int-bits using
java's APIs)?

  * Since we dynamically reserve a value to mean "unset", does that
mean there are some datasets we cannot index?  Or... do we tap
into the unused bit of a long, ie the sentinel value can be
negative?  But if the data set spans Long.MIN_VALUE to
Long.MAX_VALUE, what do we do...?

  * How come codecID changed from String to int on the branch?

  * What are oal.util.Pair and ParallelArray for?

  * FloatsRef should state in the jdocs that it's really slicing a
double[]?

  * Can SortField somehow detect whether the needed field was stored
in FC vs DV and pick the right comparator accordingly...?  Kind of
like how NumericField can detect whether the ints are encoded as
"plain text" or as NF?  We can open a new issue for this,
post-landing...

  * It looks like we can sort by int/long/float/double pulled from DV,
but not by terms?  This is fine for landing... but I think we
should open a post-landing issue to also make FieldComparators for
the Terms cases?

  * Should we rename oal.index.values.Type -> .ValueType?  Just
because... it looks so generic when its imported & used as "Type"
somewhere?


> Land DocValues on trunk
> ---
>
> Key: LUCENE-3108
> URL: https://issues.apache.org/jira/browse/LUCENE-3108
> Project: Lucene - Java
>  Issue Type: Task
>  Components: core/index, core/search, core/store
>Affects Versions: CSF branch, 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
> Fix For: 4.0
>
>
> Its time to move another feature from branch to trunk. I want to start this 
> process now while still a couple of issues remain on the branch. Currently I 
> am down to a single nocommit (javadocs on DocValues.java) and a couple of 
> testing TODOs (explicit multithreaded tests and unoptimized with deletions) 
> but I think those are not worth separate issues so we can resolve them as we 
> go. 
> The already created issues (LUCENE-3075 and LUCENE-3074) should not block 
> this process here IMO, we can fix them once we are on trunk. 
> Here is a quick feature overview of what has been implemented:
>  * DocValues implementations for Ints (based on PackedInts), Float 32 / 64, 
> Bytes (fixed / variable size each in sorted, straight and deref variations)
>  * Integration into Flex-API, Codec provides a 
> PerDocConsumer->DocValuesConsumer (write) / PerDocValues->DocValues (read) 
>  * By-Default enabled in all codecs except of PreFlex
>  * Follows other flex-API patterns like non-segment reader throw UOE forcing 
> MultiPerDocValues if on DirReader etc.
>  * Integration into IndexWriter, FieldInfos etc.
>  * Random-testing enabled via RandomIW - injecting random DocValues into 
> documents
>  * Basic checks in CheckIndex (which runs after each test)
>  * FieldComparator for int and float variants (Sorting, currently directly 
> integrated into SortField, this might go into a separate DocValuesSortField 
> eventually)
>  * Extended TestSort for DocValues
>  * RAM-Resident random access API plus on-disk DocValuesEnum (currently only 
> sequential access) -> Source.java / DocValuesEnum.java
>  * Extensible Cache implementation for RAM-Resident DocValues (by-default 
> loaded into RAM only once and freed once IR is closed) -> SourceCache.java
>  
> PS: Currently the RAM resident API is named Source (Source.java) which seems 
> too generic. I think we should rename it into RamDocValues or something like 
> that, suggestion welcome!   
> Any comments, questions (rants :)) are very much appreciated.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (SOLR-2168) Velocity facet output for facet missing

2011-05-17 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035015#comment-13035015
 ] 

Erik Hatcher edited comment on SOLR-2168 at 5/17/11 8:05 PM:
-

Alas not yet, Peter.  Sorry.

  was (Author: ehatcher):
Alas not, Peter.  Sorry.
  
> Velocity facet output for facet missing
> ---
>
> Key: SOLR-2168
> URL: https://issues.apache.org/jira/browse/SOLR-2168
> Project: Solr
>  Issue Type: Bug
>  Components: Response Writers
>Affects Versions: 3.1
>Reporter: Peter Wolanin
>Priority: Minor
> Attachments: SOLR-2168.patch
>
>
> If I add fact.missing to the facet params for a field, the Veolcity output 
> has in the facet list:
> $facet.name (9220)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2168) Velocity facet output for facet missing

2011-05-17 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035015#comment-13035015
 ] 

Erik Hatcher commented on SOLR-2168:


Alas not, Peter.  Sorry.

> Velocity facet output for facet missing
> ---
>
> Key: SOLR-2168
> URL: https://issues.apache.org/jira/browse/SOLR-2168
> Project: Solr
>  Issue Type: Bug
>  Components: Response Writers
>Affects Versions: 3.1
>Reporter: Peter Wolanin
>Priority: Minor
> Attachments: SOLR-2168.patch
>
>
> If I add fact.missing to the facet params for a field, the Veolcity output 
> has in the facet list:
> $facet.name (9220)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Apache Jenkins emails

2011-05-17 Thread Robert Muir

On Tue, May 17, 2011 at 3:38 PM, Marvin Humphrey  wrote:
> On Tue, May 17, 2011 at 03:09:31PM -0400, Michael McCandless wrote:
>> Yeah I agree... build failures should be as annoying as possible ;)
>
> Congratulations -- mission accomplished!  They are certainly annoying to me,
> and probably to anyone else subscribed to dev who isn't a committer.
>

Marvin, I'm not sure you can really assume that. If a test fails
anyone who wants to contribute can look at the failure and try to
create a jira issue/patch, I don't think they need to be a committer.

Additionally due to the nature of our tests, anyone who wants to
contribute to the project can simply download the tests and try to
find failures, opening jira issues for ones that they find (for
example selckin does this, and has found a lot of good ones lately).

If you don't care about tests at all, you can easily filter this stuff
with your email client by looking for [JENKINS].

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3113) fix analyzer bugs found by MockTokenizer


[ 
https://issues.apache.org/jira/browse/LUCENE-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035007#comment-13035007
 ] 

Robert Muir commented on LUCENE-3113:
-

thanks for reviewing Steven, I agree! I've made this change and will commit 
shortly.

> fix analyzer bugs found by MockTokenizer
> 
>
> Key: LUCENE-3113
> URL: https://issues.apache.org/jira/browse/LUCENE-3113
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Robert Muir
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3113.patch, LUCENE-3113.patch
>
>
> In LUCENE-3064, we beefed up MockTokenizer with assertions, and I've switched 
> over the analysis tests to use MockTokenizer for better coverage.
> However, this found a few bugs (one of which is LUCENE-3106):
> * incrementToken() after it returns false in CommonGramsQueryFilter, 
> HyphenatedWordsFilter, ShingleFilter, SynonymFilter
> * missing end() implementation for PrefixAwareTokenFilter
> * double reset() in QueryAutoStopWordAnalyzer and ReusableAnalyzerBase
> * missing correctOffset()s in MockTokenizer itself.
> I think it would be nice to just fix all the bugs on one issue... I've fixed 
> everything except Shingle and Synonym

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3084) MergePolicy.OneMerge.segments should be List not SegmentInfos


 [ 
https://issues.apache.org/jira/browse/LUCENE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3084:
--

Attachment: LUCENE-3084-trunk-only.patch

Now I improved SegmentInfos more:

- It now uses a Map/Set to enforce that the SI only contains each segment one 
time.
- Faster contains() because Set-backed

As said before: asList() and asSet() are unmodifiable, so consistency between 
List and Set/Map is enforced.

The Set is itsself a Map. The values contain the index of segment 
in the infos. This speeds up indexOf() calls, needed for asserts and 
remove(SI). As on remove or reorder operations the indexes are no longer 
correct, a separate boolean is used to mark the Map as inconsistent. It is then 
regenerated on the next indexOf() call. IndexOf is seldom, butthe keySet() is 
still consistent, so delaying this update is fine.

All tests pass. I think the cleanup of SegmentInfos is ready to commit.

> MergePolicy.OneMerge.segments should be List not SegmentInfos
> --
>
> Key: LUCENE-3084
> URL: https://issues.apache.org/jira/browse/LUCENE-3084
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3084-trunk-only.patch, 
> LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
> LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, 
> LUCENE-3084-trunk-only.patch, LUCENE-3084-trunk-only.patch, LUCENE-3084.patch
>
>
> SegmentInfos carries a bunch of fields beyond the list of SI, but for merging 
> purposes these fields are unused.
> We should cutover to List instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Apache Jenkins emails

2011-05-17 Thread Marvin Humphrey

On Tue, May 17, 2011 at 03:09:31PM -0400, Michael McCandless wrote:
> Yeah I agree... build failures should be as annoying as possible ;)

Congratulations -- mission accomplished!  They are certainly annoying to me,
and probably to anyone else subscribed to dev who isn't a committer.

Marvin Humphrey


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2230) Lucene Fuzzy Search: BK-Tree can improve performance 3-20 times.

2011-05-17 Thread Fuad Efendi (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034999#comment-13034999
 ] 

Fuad Efendi commented on LUCENE-2230:
-

I believe this issue should be closed due to significant performance 
improvements related to LUCENE-2089 and LUCENE-2258.
I don't think there is any interest from the community to continue with this 
(BK Tree and "Strike a Match") naive approach; although some people found it 
useful. Of course we might have few more distance implementations as a separate 
improvement.

Please close it.


Thanks

> Lucene Fuzzy Search: BK-Tree can improve performance 3-20 times.
> 
>
> Key: LUCENE-2230
> URL: https://issues.apache.org/jira/browse/LUCENE-2230
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 3.0
> Environment: Lucene currently uses brute force full-terms scanner and 
> calculates distance for each term. New BKTree structure improves performance 
> in average 20 times when distance is 1, and 3 times when distance is 3. I 
> tested with index size several millions docs, and 250,000 terms. 
> New algo uses integer distances between objects.
>Reporter: Fuad Efendi
> Attachments: BKTree.java, Distance.java, DistanceImpl.java, 
> FuzzyTermEnumNEW.java, FuzzyTermEnumNEW.java
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> W. Burkhard and R. Keller. Some approaches to best-match file searching, 
> CACM, 1973
> http://portal.acm.org/citation.cfm?doid=362003.362025
> I was inspired by 
> http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK-Trees (Nick 
> Johnson, Google).
> Additionally, simplified algorythm at 
> http://www.catalysoft.com/articles/StrikeAMatch.html seems to be much more 
> logically correct than Levenstein distance, and it is 3-5 times faster 
> (isolated tests).
> Big list od distance implementations:
> http://www.dcs.shef.ac.uk/~sam/stringmetrics.htm

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Apache Jenkins emails

2011-05-17 Thread Shai Erera

> Hmm... wouldn't this "help to ignore" build failures, while current situation 
> encourages solving them?

I don't think current situation encourages resolving the issues more
than it would discourage if we grouped all emails together.

And I don't believe people will ignore a Jenkins failure thread, if
they don't ignore the separate emails today.

True, for those who ignore build failures - it will help them ignore
them more easily :)

Those who don't ignore will continue to monitor. And from what I can
tell, many failures are not due to code issues, but Jenkins server
issues.

Shai

On Tuesday, May 17, 2011, Doron Cohen  wrote:
> Hmm... wouldn't this "help to ignore" build failures, while current situation 
> encourages solving them? :)
>
> I mean, unlike threading JIRA issues which is more convenient now, for build 
> failures this would hide some info - thread title would indicate the oldest 
> failure no.
>
> In spite of the above, if others still like to change in this way, I'll be 
> fine with it.
>
> Doron
>
> On Sun, May 15, 2011 at 6:16 PM, Shai Erera  wrote:
> Well, Gmail ignores (for grouping) everything that in between brackets []. 
> That's how we made all issue emails appear under the same thread, the status 
> (Commented, Created, Resolved etc.) now appears in brackets.
>
> So, I think that if we put the build # in brackets, the rest of the message 
> is the same for all failures. So instead of:
>
> "[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8042 - Still Failing"
>
> we write
>
> "[JENKINS] Lucene-Solr-tests-only-trunk - [Build # 8042] - Still Failing"
>
> Or
>
> "[JENKINS] [Build # 8042] Lucene-Solr-tests-only-trunk Failed"
>
> Remove the word "still" altogether (it's redundant) and move the build number 
> to the start of the subject.
>
> Shai
>
>
> On Sun, May 15, 2011 at 6:08 PM, Uwe Schindler  wrote:
> It’s possible to change the header, as the mails are already customized. How 
> should it look like (I don’t use f*g Gmail)
>
>  -
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 
> Bremenhttp://www.thetaphi.de 
>
> eMail: u...@thetaphi.de
>
>
> From: Shai Erera [mailto:ser...@gmail.com]
> Sent: Sunday, May 15, 2011 5:02 PM
> To: dev@lucene.apache.org
> Subject: Apache Jenkins emails
>
>  Hi
>
> Is it possible to change the subject format of the emails Jenkins server 
> sends? I was thinking, if we put the build # in [], all failures will be 
> grouped under one thread (in Gmail). Since we have so many of them, it will 
> at least collapse all of them into a single thread. We can still tell the 
> failure of each email as well as the build #.
>
> What do you think?
>
> Shai
>
>
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2168) Velocity facet output for facet missing

2011-05-17 Thread Peter Wolanin (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034994#comment-13034994
 ] 

Peter Wolanin commented on SOLR-2168:
-

Did this change to the templates get committed to the actual Solr repo?

> Velocity facet output for facet missing
> ---
>
> Key: SOLR-2168
> URL: https://issues.apache.org/jira/browse/SOLR-2168
> Project: Solr
>  Issue Type: Bug
>  Components: Response Writers
>Affects Versions: 3.1
>Reporter: Peter Wolanin
>Priority: Minor
> Attachments: SOLR-2168.patch
>
>
> If I add fact.missing to the facet params for a field, the Veolcity output 
> has in the facet list:
> $facet.name (9220)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Lucene/Solr JIRA

2011-05-17 Thread Steven A Rowe

On 5/17/2011 at 3:02 PM, Chris Hostetter wrote:
> If we were starting from scratch, i'd agree with you that having a single
> Jira project makes more sense, but given where we are today, i think we
> should probably keep them distinct -- partly from a "pain of migration"
> standpoint on our end, but also from a user expecations standpoint -- i
> think the Solr users/community as a whole is use to the existence of the
> SOLR project in Jira, and use to the SOLR-* issue naming convention, and
> it would likely be more confusing for *them* to change now.

+1

[jira] [Commented] (LUCENE-3092) NRTCachingDirectory, to buffer small segments in a RAMDir


[ 
https://issues.apache.org/jira/browse/LUCENE-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034989#comment-13034989
 ] 

Michael McCandless commented on LUCENE-3092:


That's a great point Yonik -- in fact the TestNRTCachingDirectory already
relies on this generic-ness (pulls a newDirectory() from LuceneTestCase).

> NRTCachingDirectory, to buffer small segments in a RAMDir
> -
>
> Key: LUCENE-3092
> URL: https://issues.apache.org/jira/browse/LUCENE-3092
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Michael McCandless
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3092-listener.patch, LUCENE-3092.patch, 
> LUCENE-3092.patch, LUCENE-3092.patch, LUCENE-3092.patch
>
>
> I created this simply Directory impl, whose goal is reduce IO
> contention in a frequent reopen NRT use case.
> The idea is, when reopening quickly, but not indexing that much
> content, you wind up with many small files created with time, that can
> possibly stress the IO system eg if merges, searching are also
> fighting for IO.
> So, NRTCachingDirectory puts these newly created files into a RAMDir,
> and only when they are merged into a too-large segment, does it then
> write-through to the real (delegate) directory.
> This lets you spend some RAM to reduce I0.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3092) NRTCachingDirectory, to buffer small segments in a RAMDir

2011-05-17 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034985#comment-13034985
 ] 

Yonik Seeley commented on LUCENE-3092:
--

bq. I can't think of any reason why you'd want to wrap another RAMDir with 
NRTCD?

Tests?  It's nice to have a test use a RAMDirectory for speed, but still follow 
the same code path as FSDirectory for debugging + orthogonality.
AFAIK, most Solr tests use RAMDirectory by default.  There's no benefit to 
restricting it, right?

> NRTCachingDirectory, to buffer small segments in a RAMDir
> -
>
> Key: LUCENE-3092
> URL: https://issues.apache.org/jira/browse/LUCENE-3092
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Michael McCandless
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3092-listener.patch, LUCENE-3092.patch, 
> LUCENE-3092.patch, LUCENE-3092.patch, LUCENE-3092.patch
>
>
> I created this simply Directory impl, whose goal is reduce IO
> contention in a frequent reopen NRT use case.
> The idea is, when reopening quickly, but not indexing that much
> content, you wind up with many small files created with time, that can
> possibly stress the IO system eg if merges, searching are also
> fighting for IO.
> So, NRTCachingDirectory puts these newly created files into a RAMDir,
> and only when they are merged into a too-large segment, does it then
> write-through to the real (delegate) directory.
> This lets you spend some RAM to reduce I0.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Apache Jenkins emails

2011-05-17 Thread Michael McCandless

Yeah I agree... build failures should be as annoying as possible ;)

Mike

http://blog.mikemccandless.com

On Tue, May 17, 2011 at 2:58 PM, Doron Cohen  wrote:
> Hmm... wouldn't this "help to ignore" build failures, while current
> situation encourages solving them? :)
>
> I mean, unlike threading JIRA issues which is more convenient now, for build
> failures this would hide some info - thread title would indicate the oldest
> failure no.
>
> In spite of the above, if others still like to change in this way, I'll be
> fine with it.
>
> Doron
>
> On Sun, May 15, 2011 at 6:16 PM, Shai Erera  wrote:
>>
>> Well, Gmail ignores (for grouping) everything that in between brackets [].
>> That's how we made all issue emails appear under the same thread, the status
>> (Commented, Created, Resolved etc.) now appears in brackets.
>>
>> So, I think that if we put the build # in brackets, the rest of the
>> message is the same for all failures. So instead of:
>>
>> "[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8042 - Still Failing"
>>
>> we write
>>
>> "[JENKINS] Lucene-Solr-tests-only-trunk - [Build # 8042] - Still Failing"
>>
>> Or
>>
>> "[JENKINS] [Build # 8042] Lucene-Solr-tests-only-trunk Failed"
>>
>> Remove the word "still" altogether (it's redundant) and move the build
>> number to the start of the subject.
>>
>> Shai
>>
>> On Sun, May 15, 2011 at 6:08 PM, Uwe Schindler  wrote:
>>>
>>> It’s possible to change the header, as the mails are already customized.
>>> How should it look like (I don’t use f*g Gmail)
>>>
>>>
>>>
>>> -
>>>
>>> Uwe Schindler
>>>
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>
>>> http://www.thetaphi.de
>>>
>>> eMail: u...@thetaphi.de
>>>
>>>
>>>
>>> From: Shai Erera [mailto:ser...@gmail.com]
>>> Sent: Sunday, May 15, 2011 5:02 PM
>>> To: dev@lucene.apache.org
>>> Subject: Apache Jenkins emails
>>>
>>>
>>>
>>> Hi
>>>
>>> Is it possible to change the subject format of the emails Jenkins server
>>> sends? I was thinking, if we put the build # in [], all failures will be
>>> grouped under one thread (in Gmail). Since we have so many of them, it will
>>> at least collapse all of them into a single thread. We can still tell the
>>> failure of each email as well as the build #.
>>>
>>> What do you think?
>>>
>>> Shai
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3092) NRTCachingDirectory, to buffer small segments in a RAMDir


[ 
https://issues.apache.org/jira/browse/LUCENE-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034981#comment-13034981
 ] 

Michael McCandless commented on LUCENE-3092:


I committed it to 3.x as well so this will be in 3.2 :)

I can't think of any reason why you'd want to wrap another RAMDir with 
NRTCD?  We can fix the docs to state this.  Can you work out the 
wording/patch?  Or just go ahead and commit a fix :)

> NRTCachingDirectory, to buffer small segments in a RAMDir
> -
>
> Key: LUCENE-3092
> URL: https://issues.apache.org/jira/browse/LUCENE-3092
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Michael McCandless
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3092-listener.patch, LUCENE-3092.patch, 
> LUCENE-3092.patch, LUCENE-3092.patch, LUCENE-3092.patch
>
>
> I created this simply Directory impl, whose goal is reduce IO
> contention in a frequent reopen NRT use case.
> The idea is, when reopening quickly, but not indexing that much
> content, you wind up with many small files created with time, that can
> possibly stress the IO system eg if merges, searching are also
> fighting for IO.
> So, NRTCachingDirectory puts these newly created files into a RAMDir,
> and only when they are merged into a too-large segment, does it then
> write-through to the real (delegate) directory.
> This lets you spend some RAM to reduce I0.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

If we were starting from scratch, i'd agree with you that having a single
Jira project makes more sense, but given where we are today, i think we
should probably keep them distinct -- partly from a "pain of migration"
standpoint on our end, but also from a user expecations standpoint -- i
think the Solr users/community as a whole is use to the existence of the
SOLR project in Jira, and use to the SOLR-* issue naming convention, and
it would likely be more confusing for *them* to change now.

: * With modules, we now have components in the Lucene JIRA project for
: different modules (some under modules/* some under lucene/contrib/*). Will
: we have the same components duplication in the Solr JIRA project?

when we discussed this before, it seemed clear that top level modules
should be tracked as LUCENE issues, so i see no reason why there would be
duplications.

: * Where do users go to open a bug report for a module - Lucene or Solr
: projects? I'd hate to see that they open it under their "favorite" (or
: worse. random picking) project. If so, it'll become a mess.

the user bases tend to be very distinct -- if people are dealing with the
lucene java API directly they file a LUCENE bug, if they are dealing with
the Solr HTTP or client layer (SolrJ) APIs they file a Solr bug.

If an issue is filed in a place where we think it doesn't make sense, the
issue can easily be moved (and Jira does a redirect for anyone following
old links)

: * Administration -- everything needs to be done twice. Create versions (same
: one !) on both projects, close issues (after release) etc.

given the low overhead of this, it doesn't seem all that problematic.

: * Managing a release now means I should monitor two JIRA projects for the
: 3.2 (an example) version issues. Why?

Here's an example of a filter that shows you all issues marked to be fixed
in 3.2 in both projects...

https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=%28project+%3D+SOLR+OR+project+%3D+LUCENE%29+AND+fixVersion+%3D+%223.2%22+AND+resolution+%3D+Unresolved+ORDER+BY+updated+DESC%2C+key+DESC%2C+priority+DESC

: I guess I'm not too sure what do two JIRA projects give us. Now that it is
: the same project, why not make our (committers and contributors) life easier

Short answer: trade off "ease of use for committers + pain of
migration" against "ease of use for users" ... doesn't seem like a strong
need to change.

: It's already becoming confusing:

neither of these examples seem that confusing to me...

: LUCENE-3097: post grouping faceting -- a great example for a module that
: both Lucene and Solr users can use. Opened under Lucene project, and
: depends on Solr issues (not a big deal)

it's an issue for implementing a top level module, therforce it goes in
LUCENE. it doesn't depend on any Solr issue, it's marked as being blocked
by another issue about adding another top level module

: LUCENE-3104: could easily have been opened under the Solr project. I
: don't know why it was opened under Lucene (random maybe?)

Because it's about improving the hudson build which operates at the top
level of the tree

-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Apache Jenkins emails

2011-05-17 Thread Doron Cohen

Hmm... wouldn't this "help to ignore" build failures, while current
situation encourages solving them? :)

I mean, unlike threading JIRA issues which is more convenient now, for build
failures this would hide some info - thread title would indicate the oldest
failure no.

In spite of the above, if others still like to change in this way, I'll be
fine with it.

Doron

On Sun, May 15, 2011 at 6:16 PM, Shai Erera  wrote:

> Well, Gmail ignores (for grouping) everything that in between brackets [].
> That's how we made all issue emails appear under the same thread, the status
> (Commented, Created, Resolved etc.) now appears in brackets.
>
> So, I think that if we put the build # in brackets, the rest of the message
> is the same for all failures. So instead of:
>
> "[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8042 - Still Failing"
>
> we write
>
> "[JENKINS] Lucene-Solr-tests-only-trunk - [Build # 8042] - Still Failing"
>
> Or
>
> "[JENKINS] [Build # 8042] Lucene-Solr-tests-only-trunk Failed"
>
> Remove the word "still" altogether (it's redundant) and move the build
> number to the start of the subject.
>
> Shai
>
> On Sun, May 15, 2011 at 6:08 PM, Uwe Schindler  wrote:
>
>> It’s possible to change the header, as the mails are already customized.
>> How should it look like (I don’t use f*g Gmail)
>>
>>
>>
>> -
>>
>> Uwe Schindler
>>
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>
>> http://www.thetaphi.de
>>
>> eMail: u...@thetaphi.de
>>
>>
>>
>> *From:* Shai Erera [mailto:ser...@gmail.com]
>> *Sent:* Sunday, May 15, 2011 5:02 PM
>> *To:* dev@lucene.apache.org
>> *Subject:* Apache Jenkins emails
>>
>>
>>
>> Hi
>>
>> Is it possible to change the subject format of the emails Jenkins server
>> sends? I was thinking, if we put the build # in [], all failures will be
>> grouped under one thread (in Gmail). Since we have so many of them, it will
>> at least collapse all of them into a single thread. We can still tell the
>> failure of each email as well as the build #.
>>
>> What do you think?
>>
>> Shai
>>
>
>

[jira] [Commented] (LUCENE-3104) Hook up Automated Patch Checking for Lucene/Solr


[ 
https://issues.apache.org/jira/browse/LUCENE-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034964#comment-13034964
 ] 

Grant Ingersoll commented on LUCENE-3104:
-

General Docs started at http://wiki.apache.org/general/PreCommitBuilds

> Hook up Automated Patch Checking for Lucene/Solr
> 
>
> Key: LUCENE-3104
> URL: https://issues.apache.org/jira/browse/LUCENE-3104
> Project: Lucene - Java
>  Issue Type: Task
>Reporter: Grant Ingersoll
>
> It would be really great if we could get feedback to contributors sooner on 
> many things that are basic (tests exist, patch applies cleanly, etc.)
> From Nigel Daley on builds@a.o
> {quote}
> I revamped the precommit testing in the fall so that it doesn't use Jira 
> email anymore to trigger a build.  The process is controlled by
> https://builds.apache.org/hudson/job/PreCommit-Admin/
> which has some documentation up at the top of the job.  You can look at the 
> config of the job (do you have access?) to see what it's doing.  Any project 
> could use this same admin job -- you just need to ask me to add the project 
> to the Jira filter used by the admin job 
> (https://issues.apache.org/jira/sr/jira.issueviews:searchrequest-xml/12313474/SearchRequest-12313474.xml?tempMax=100
>  ) once you have the downstream job(s) setup for your specific project.  For 
> Hadoop we have 3 downstream builds configured which also have some 
> documentation:
> https://builds.apache.org/hudson/job/PreCommit-HADOOP-Build/
> https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/
> https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/
> {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (PYLUCENE-9) QueryParser replacing stop words with wildcards

2011-05-17 Thread Christopher Currens (JIRA)


[ 
https://issues.apache.org/jira/browse/PYLUCENE-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034961#comment-13034961
 ] 

Christopher Currens commented on PYLUCENE-9:


We can close it.  Thanks for the help.

> QueryParser replacing stop words with wildcards
> ---
>
> Key: PYLUCENE-9
> URL: https://issues.apache.org/jira/browse/PYLUCENE-9
> Project: PyLucene
>  Issue Type: Bug
> Environment: Windows XP 32-bit Sp3, Ubuntu 10.04.2 LTS i686 
> GNU/Linux, jdk1.6.0_23
>Reporter: Christopher Currens
>
> Was using query parser to build a query.  In Java Lucene (as well as 
> Lucene.Net), the query "Calendar Item as Msg" (quotes included), is parsed 
> properly as FullText:"calendar item msg" in Java Lucene and Lucene.Net.  In 
> pylucene, it is parsed as: FullText:"calendar item ? msg".  This causes 
> obvious problems when comparing search results from python, java and .net.
> Initially, I thought it was the Analyzer I was using, but I've tried the 
> StandardAnalyzer and StopAnalyzer, which work properly in Java and .Net, but 
> not pylucene.
> Here is code I've used to reproduce the issue:
> >>> from lucene import StandardAnalyzer, StopAnalyzer, QueryParser, Version
> >>> analyzer = StandardAnalyzer(Version.LUCENE_30)
> >>> query = QueryParser(Version.LUCENE_30, "FullText", analyzer)
> >>> parsedQuery = query.parse("\"Calendar Item as Msg\"")
> >>> parsedQuery
> 
> >>> analyzer = StopAnalyzer(Version.LUCENE_30)
> >>> query = QueryParser(Version.LUCENE_30)
> >>> parsedQuery = query.parse("\"Calendar Item as Msg\"")
> >>> parsedQuery
> 
> I've noticed this in pylucene 2.9.4, 2.9.3, and 3.0.3

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (LUCENE-3113) fix analyzer bugs found by MockTokenizer

2011-05-17 Thread Steven Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034956#comment-13034956
 ] 

Steven Rowe commented on LUCENE-3113:
-

+1

bq. the ShingleAnalyzerWrapper was double-resetting

Your patch just removes the reset call:

{noformat}
@@ -201,7 +201,6 @@
   TokenStream result = defaultAnalyzer.reusableTokenStream(fieldName, 
reader);
   if (result == streams.wrapped) {
 /* the wrapped analyzer reused the stream */
-streams.shingle.reset(); 
   } else {
 /* the wrapped analyzer did not, create a new shingle around the new 
one */
 streams.wrapped = result;
{noformat}

but inverting the condition would read better:

{noformat}
   TokenStream result = defaultAnalyzer.reusableTokenStream(fieldName, 
reader);
-  if (result == streams.wrapped) {
-/* the wrapped analyzer reused the stream */
-streams.shingle.reset(); 
-  } else {
-/* the wrapped analyzer did not, create a new shingle around the new 
one */
+  if (result != streams.wrapped) {
+// The wrapped analyzer did not reuse the stream. 
+// Wrap the new stream with a new ShingleFilter.
 streams.wrapped = result;
 streams.shingle = new ShingleFilter(streams.wrapped);
   }
{noformat}


> fix analyzer bugs found by MockTokenizer
> 
>
> Key: LUCENE-3113
> URL: https://issues.apache.org/jira/browse/LUCENE-3113
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Robert Muir
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3113.patch, LUCENE-3113.patch
>
>
> In LUCENE-3064, we beefed up MockTokenizer with assertions, and I've switched 
> over the analysis tests to use MockTokenizer for better coverage.
> However, this found a few bugs (one of which is LUCENE-3106):
> * incrementToken() after it returns false in CommonGramsQueryFilter, 
> HyphenatedWordsFilter, ShingleFilter, SynonymFilter
> * missing end() implementation for PrefixAwareTokenFilter
> * double reset() in QueryAutoStopWordAnalyzer and ReusableAnalyzerBase
> * missing correctOffset()s in MockTokenizer itself.
> I think it would be nice to just fix all the bugs on one issue... I've fixed 
> everything except Shingle and Synonym

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Bulk changing issues in JIRA

2011-05-17 Thread Shai Erera

Created http://wiki.apache.org/lucene-java/BulkIssuesUpdate

Thanks Mark !

Shai

On Tue, May 17, 2011 at 9:01 PM, Mark Miller  wrote:

> Thanks Shai! Would make a great addition to the wiki ;)
>
> On May 16, 2011, at 11:47 PM, Shai Erera wrote:
>
> > Hi
> >
> > If you ever wondered how to bulk change issues in JIRA, here's the
> procedure:
> >
> > * View a list of issues, e.g. by query/filter
> >
> > * At the top-right you'll find this:
> >
> >
> > * Click on "Tools" and select
> >
> >
> >
> > * The screen changes so that next to each issue there's a check box.
> >
> > * Mark all the issues you want to change and click "Next"
> >
> > * Select the operation (e.g. Edit)
> >
> > * The next screen (followed by choosing operation "Edit") lets you edit
> the issues. Note this at the bottom:
> >
> >
> >
> > Deselect if you don't want to spam the list :).
> >
> > FYI,
> > Shai
>
> - Mark Miller
> lucidimagination.com
>
> Lucene/Solr User Conference
> May 25-26, San Francisco
> www.lucenerevolution.org
>
>
>
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Re: Lucene/Solr JIRA

2011-05-17 Thread Ryan McKinley

> Can we merge the two?

gut reaction says +1, but after thinking about how it would work, i'm +0

Would we just stop accepting new tickets on one system, but still keep
track of both?  for how long?
Would we move open issues from SOLR to LUCENE?  migrate the comments/history/etc

In the end I think the two systems are fine -- not ideal, and they
should map (more or less) to where the entry should go in CHANGES.txt

ryan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3111) TestFSTs.testRandomWords failure


 [ 
https://issues.apache.org/jira/browse/LUCENE-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3111.


   Resolution: Fixed
Fix Version/s: 4.0

> TestFSTs.testRandomWords failure
> 
>
> Key: LUCENE-3111
> URL: https://issues.apache.org/jira/browse/LUCENE-3111
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: selckin
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 4.0
>
> Attachments: LUCENE-3111.patch
>
>
> Was running some while(1) tests on the docvalues branch (r1103705) and the 
> following test failed:
> {code}
> [junit] Testsuite: org.apache.lucene.util.automaton.fst.TestFSTs
> [junit] Testcase: 
> testRandomWords(org.apache.lucene.util.automaton.fst.TestFSTs): FAILED
> [junit] expected:<771> but was:
> [junit] junit.framework.AssertionFailedError: expected:<771> but 
> was:
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.verifyUnPruned(TestFSTs.java:540)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:496)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:359)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.doTest(TestFSTs.java:319)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.testRandomWords(TestFSTs.java:940)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.testRandomWords(TestFSTs.java:915)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211)
> [junit] 
> [junit] 
> [junit] Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 7.628 sec
> [junit] 
> [junit] - Standard Error -
> [junit] NOTE: Ignoring nightly-only test method 'testBigSet'
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestFSTs 
> -Dtestmethod=testRandomWords -Dtests.seed=-269475578956012681:0
> [junit] NOTE: test params are: codec=PreFlex, locale=ar, 
> timezone=America/Blanc-Sablon
> [junit] NOTE: all tests run in this JVM:
> [junit] [TestToken, TestCodecs, TestIndexReaderReopen, 
> TestIndexWriterMerging, TestNoDeletionPolicy, TestParallelReaderEmptyIndex, 
> TestParallelTermEnum, TestPerSegmentDeletes, TestSegmentReader, 
> TestSegmentTermDocs, TestStressAdvance, TestTermVectorsReader, 
> TestSurrogates, TestMultiFieldQueryParser, TestAutomatonQuery, 
> TestBooleanScorer, TestFuzzyQuery, TestMultiTermConstantScore, 
> TestNumericRangeQuery64, TestPositiveScoresOnlyCollector, TestPrefixFilter, 
> TestQueryTermVector, TestScorerPerf, TestSloppyPhraseQuery, 
> TestSpansAdvanced, TestWindowsMMap, TestRamUsageEstimator, TestSmallFloat, 
> TestUnicodeUtil, TestFSTs]
> [junit] NOTE: Linux 2.6.37-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 
> (64-bit)/cpus=8,threads=1,free=137329960,total=208207872
> [junit] -  ---
> [junit] TEST org.apache.lucene.util.automaton.fst.TestFSTs FAILED
> {code}
> I am not able to reproduce

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3111) TestFSTs.testRandomWords failure


 [ 
https://issues.apache.org/jira/browse/LUCENE-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3111:
---

Attachment: LUCENE-3111.patch

OK I found this -- if you try to add the same output, twice, for the empty 
string, then the builder fails to realize this is a TwoInts and makes a single 
int output!

Thank you random testing :)

I'll commit shortly...

> TestFSTs.testRandomWords failure
> 
>
> Key: LUCENE-3111
> URL: https://issues.apache.org/jira/browse/LUCENE-3111
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: selckin
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-3111.patch
>
>
> Was running some while(1) tests on the docvalues branch (r1103705) and the 
> following test failed:
> {code}
> [junit] Testsuite: org.apache.lucene.util.automaton.fst.TestFSTs
> [junit] Testcase: 
> testRandomWords(org.apache.lucene.util.automaton.fst.TestFSTs): FAILED
> [junit] expected:<771> but was:
> [junit] junit.framework.AssertionFailedError: expected:<771> but 
> was:
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.verifyUnPruned(TestFSTs.java:540)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:496)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:359)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.doTest(TestFSTs.java:319)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.testRandomWords(TestFSTs.java:940)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.testRandomWords(TestFSTs.java:915)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211)
> [junit] 
> [junit] 
> [junit] Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 7.628 sec
> [junit] 
> [junit] - Standard Error -
> [junit] NOTE: Ignoring nightly-only test method 'testBigSet'
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestFSTs 
> -Dtestmethod=testRandomWords -Dtests.seed=-269475578956012681:0
> [junit] NOTE: test params are: codec=PreFlex, locale=ar, 
> timezone=America/Blanc-Sablon
> [junit] NOTE: all tests run in this JVM:
> [junit] [TestToken, TestCodecs, TestIndexReaderReopen, 
> TestIndexWriterMerging, TestNoDeletionPolicy, TestParallelReaderEmptyIndex, 
> TestParallelTermEnum, TestPerSegmentDeletes, TestSegmentReader, 
> TestSegmentTermDocs, TestStressAdvance, TestTermVectorsReader, 
> TestSurrogates, TestMultiFieldQueryParser, TestAutomatonQuery, 
> TestBooleanScorer, TestFuzzyQuery, TestMultiTermConstantScore, 
> TestNumericRangeQuery64, TestPositiveScoresOnlyCollector, TestPrefixFilter, 
> TestQueryTermVector, TestScorerPerf, TestSloppyPhraseQuery, 
> TestSpansAdvanced, TestWindowsMMap, TestRamUsageEstimator, TestSmallFloat, 
> TestUnicodeUtil, TestFSTs]
> [junit] NOTE: Linux 2.6.37-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 
> (64-bit)/cpus=8,threads=1,free=137329960,total=208207872
> [junit] -  ---
> [junit] TEST org.apache.lucene.util.automaton.fst.TestFSTs FAILED
> {code}
> I am not able to reproduce

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3111) TestFSTs.testRandomWords failure


[ 
https://issues.apache.org/jira/browse/LUCENE-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034949#comment-13034949
 ] 

Robert Muir commented on LUCENE-3111:
-

See revision 1104452, 5 tests had this problem... I think LuceneTestCase can 
catch it always now.

> TestFSTs.testRandomWords failure
> 
>
> Key: LUCENE-3111
> URL: https://issues.apache.org/jira/browse/LUCENE-3111
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: selckin
>Assignee: Michael McCandless
>Priority: Minor
>
> Was running some while(1) tests on the docvalues branch (r1103705) and the 
> following test failed:
> {code}
> [junit] Testsuite: org.apache.lucene.util.automaton.fst.TestFSTs
> [junit] Testcase: 
> testRandomWords(org.apache.lucene.util.automaton.fst.TestFSTs): FAILED
> [junit] expected:<771> but was:
> [junit] junit.framework.AssertionFailedError: expected:<771> but 
> was:
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.verifyUnPruned(TestFSTs.java:540)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:496)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:359)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.doTest(TestFSTs.java:319)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.testRandomWords(TestFSTs.java:940)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.testRandomWords(TestFSTs.java:915)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211)
> [junit] 
> [junit] 
> [junit] Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 7.628 sec
> [junit] 
> [junit] - Standard Error -
> [junit] NOTE: Ignoring nightly-only test method 'testBigSet'
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestFSTs 
> -Dtestmethod=testRandomWords -Dtests.seed=-269475578956012681:0
> [junit] NOTE: test params are: codec=PreFlex, locale=ar, 
> timezone=America/Blanc-Sablon
> [junit] NOTE: all tests run in this JVM:
> [junit] [TestToken, TestCodecs, TestIndexReaderReopen, 
> TestIndexWriterMerging, TestNoDeletionPolicy, TestParallelReaderEmptyIndex, 
> TestParallelTermEnum, TestPerSegmentDeletes, TestSegmentReader, 
> TestSegmentTermDocs, TestStressAdvance, TestTermVectorsReader, 
> TestSurrogates, TestMultiFieldQueryParser, TestAutomatonQuery, 
> TestBooleanScorer, TestFuzzyQuery, TestMultiTermConstantScore, 
> TestNumericRangeQuery64, TestPositiveScoresOnlyCollector, TestPrefixFilter, 
> TestQueryTermVector, TestScorerPerf, TestSloppyPhraseQuery, 
> TestSpansAdvanced, TestWindowsMMap, TestRamUsageEstimator, TestSmallFloat, 
> TestUnicodeUtil, TestFSTs]
> [junit] NOTE: Linux 2.6.37-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 
> (64-bit)/cpus=8,threads=1,free=137329960,total=208207872
> [junit] -  ---
> [junit] TEST org.apache.lucene.util.automaton.fst.TestFSTs FAILED
> {code}
> I am not able to reproduce

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene/Solr JIRA

2011-05-17 Thread Mark Miller


On May 17, 2011, at 2:22 PM, Shai Erera wrote:

> Can we merge the two?

+1. Due to history and other possible pain points, I don't know that it's the 
right practical idea at the end of the upcoming discussion, but it's certainly 
a good idea.

- Mark Miller
lucidimagination.com

Lucene/Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org






-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2119) IndexSchema should log warning if is declared with charfilter/tokenizer/tokenfiler out of order


[ 
https://issues.apache.org/jira/browse/SOLR-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034939#comment-13034939
 ] 

Mark Miller commented on SOLR-2119:
---

bq. I think this would be a good service to those users who trip the hard error 
on upgrade: it means Solr is not doing what they thought they asked it to do.

+1

> IndexSchema should log warning if  is declared with 
> charfilter/tokenizer/tokenfiler out of order
> --
>
> Key: SOLR-2119
> URL: https://issues.apache.org/jira/browse/SOLR-2119
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Hoss Man
> Fix For: 3.2, 4.0
>
>
> There seems to be a segment of hte user population that has a hard time 
> understanding the distinction between a charfilter, a tokenizer, and a 
> tokenfilter -- while we can certianly try to improve the documentation about 
> what exactly each does, and when they take affect in the analysis chain, one 
> other thing we should do is try to educate people when they constuct their 
>  in a way that doesn't make any sense.
> at the moment, some people are attempting to do things like "move the Foo 
>  before the " to try and get certain behavior ... 
> at a minimum we should log a warning in this case that doing that doesn't 
> have the desired effect
> (we could easily make such a situation fail to initialize, but i'm not 
> convinced that would be the best course of action, since some people may have 
> schema's where they have declared a charFilter or tokenizer out of order 
> relative to their tokenFilters, but are still getting "correct" results that 
> work for them, and breaking their instance on upgrade doens't seem like it 
> would be productive)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Lucene/Solr JIRA

2011-05-17 Thread Shai Erera

Hi

Today we have separate JIRA projects for Lucene and Solr. This, IMO, starts
to become confusing and difficult to maintain. I'll explain:

* With modules, we now have components in the Lucene JIRA project for
different modules (some under modules/* some under lucene/contrib/*). Will
we have the same components duplication in the Solr JIRA project?

* Where do users go to open a bug report for a module - Lucene or Solr
projects? I'd hate to see that they open it under their "favorite" (or
worse. random picking) project. If so, it'll become a mess.

* Administration -- everything needs to be done twice. Create versions (same
one !) on both projects, close issues (after release) etc.

* Managing a release now means I should monitor two JIRA projects for the
3.2 (an example) version issues. Why?

I guess I'm not too sure what do two JIRA projects give us. Now that it is
the same project, why not make our (committers and contributors) life easier
by having one JIRA project w/ components:
lucene/core
lucene/contrib/
modules/
solr/core
solr/contrib/
general/* (test, build)

It's already becoming confusing:
LUCENE-3097: post grouping faceting -- a great example for a module that
both Lucene and Solr users can use. Opened under Lucene project, and depends
on Solr issues (not a big deal)
LUCENE-3104: could easily have been opened under the Solr project. I don't
know why it was opened under Lucene (random maybe?)

Can we merge the two?

Shai

Re: Bulk changing issues in JIRA

2011-05-17 Thread Mark Miller

Thanks Shai! Would make a great addition to the wiki ;)

On May 16, 2011, at 11:47 PM, Shai Erera wrote:

> Hi
> 
> If you ever wondered how to bulk change issues in JIRA, here's the procedure:
> 
> * View a list of issues, e.g. by query/filter
> 
> * At the top-right you'll find this:
> 
> 
> * Click on "Tools" and select
> 
> 
> 
> * The screen changes so that next to each issue there's a check box.
> 
> * Mark all the issues you want to change and click "Next"
> 
> * Select the operation (e.g. Edit)
> 
> * The next screen (followed by choosing operation "Edit") lets you edit the 
> issues. Note this at the bottom:
> 
> 
> 
> Deselect if you don't want to spam the list :).
> 
> FYI,
> Shai

- Mark Miller
lucidimagination.com

Lucene/Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org






-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3092) NRTCachingDirectory, to buffer small segments in a RAMDir

2011-05-17 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034927#comment-13034927
 ] 

Shai Erera commented on LUCENE-3092:


Mike, this is a great idea ! If there are any chances it will be released in 
3.2, I think one of our NRT apps can make good use of it.

Question - I see that NRTCD ctor takes a Directory. Is there any reason to pass 
RAMDir to NRTCD? I assume you use a Directory for any other Dir impls out there 
that may not sub-class e.g., FSDir, which is ok - so can we at least document 
that this Dir is not useful if you intend to pass RAMDir to it?

Unless I am wrong and it is useful w/ RAMDir as well. 

> NRTCachingDirectory, to buffer small segments in a RAMDir
> -
>
> Key: LUCENE-3092
> URL: https://issues.apache.org/jira/browse/LUCENE-3092
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Michael McCandless
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3092-listener.patch, LUCENE-3092.patch, 
> LUCENE-3092.patch, LUCENE-3092.patch, LUCENE-3092.patch
>
>
> I created this simply Directory impl, whose goal is reduce IO
> contention in a frequent reopen NRT use case.
> The idea is, when reopening quickly, but not indexing that much
> content, you wind up with many small files created with time, that can
> possibly stress the IO system eg if merges, searching are also
> fighting for IO.
> So, NRTCachingDirectory puts these newly created files into a RAMDir,
> and only when they are merged into a too-large segment, does it then
> write-through to the real (delegate) directory.
> This lets you spend some RAM to reduce I0.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3092) NRTCachingDirectory, to buffer small segments in a RAMDir


 [ 
https://issues.apache.org/jira/browse/LUCENE-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3092.


Resolution: Fixed

> NRTCachingDirectory, to buffer small segments in a RAMDir
> -
>
> Key: LUCENE-3092
> URL: https://issues.apache.org/jira/browse/LUCENE-3092
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Michael McCandless
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3092-listener.patch, LUCENE-3092.patch, 
> LUCENE-3092.patch, LUCENE-3092.patch, LUCENE-3092.patch
>
>
> I created this simply Directory impl, whose goal is reduce IO
> contention in a frequent reopen NRT use case.
> The idea is, when reopening quickly, but not indexing that much
> content, you wind up with many small files created with time, that can
> possibly stress the IO system eg if merges, searching are also
> fighting for IO.
> So, NRTCachingDirectory puts these newly created files into a RAMDir,
> and only when they are merged into a too-large segment, does it then
> write-through to the real (delegate) directory.
> This lets you spend some RAM to reduce I0.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2193) Re-architect Update Handler


[ 
https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034918#comment-13034918
 ] 

Mark Miller commented on SOLR-2193:
---

Next I need to look at the thread safety of CommitTracker under the new locking 
system.

> Re-architect Update Handler
> ---
>
> Key: SOLR-2193
> URL: https://issues.apache.org/jira/browse/SOLR-2193
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 4.0
>
> Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch, 
> SOLR-2193.patch
>
>
> The update handler needs an overhaul.
> A few goals I think we might want to look at:
> 1. Cleanup - drop DirectUpdateHandler(2) line - move to something like 
> UpdateHandler, DefaultUpdateHandler
> 2. Expose the SolrIndexWriter in the api or add the proper abstractions to 
> get done what we now do with special casing:
> if (directupdatehandler2)
>   success
>  else
>   failish
> 3. Stop closing the IndexWriter and start using commit (still lazy IW init 
> though).
> 4. Drop iwAccess, iwCommit locks and sync mostly at the Lucene level.
> 5. Keep NRT support in mind.
> 6. Keep microsharding in mind (maintain logical index as multiple physical 
> indexes)
> 7. Address the current issues we face because multiple original/'reloaded' 
> cores can have a different IndexWriter on the same index.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3092) NRTCachingDirectory, to buffer small segments in a RAMDir


[ 
https://issues.apache.org/jira/browse/LUCENE-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034911#comment-13034911
 ] 

Michael McCandless commented on LUCENE-3092:


Alas I haven't had time to really dig into perf gains here... but I suspect on 
systems where IO is in contention (due to ongoing cold searching, or merging), 
and reopen rate is highish, that this should be a decent win since we don't 
burden the IO system with many tiny files.

> NRTCachingDirectory, to buffer small segments in a RAMDir
> -
>
> Key: LUCENE-3092
> URL: https://issues.apache.org/jira/browse/LUCENE-3092
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Michael McCandless
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3092-listener.patch, LUCENE-3092.patch, 
> LUCENE-3092.patch, LUCENE-3092.patch, LUCENE-3092.patch
>
>
> I created this simply Directory impl, whose goal is reduce IO
> contention in a frequent reopen NRT use case.
> The idea is, when reopening quickly, but not indexing that much
> content, you wind up with many small files created with time, that can
> possibly stress the IO system eg if merges, searching are also
> fighting for IO.
> So, NRTCachingDirectory puts these newly created files into a RAMDir,
> and only when they are merged into a too-large segment, does it then
> write-through to the real (delegate) directory.
> This lets you spend some RAM to reduce I0.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3098) Grouped total count


 [ 
https://issues.apache.org/jira/browse/LUCENE-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3098.


Resolution: Fixed

Committed.  I made a small change to TestGrouping (renamed one variable) and 
tweaked jdocs a bit on AllGroupsCollector.

This is a great addition to the grouping module -- thanks Martijn!

> Grouped total count
> ---
>
> Key: LUCENE-3098
> URL: https://issues.apache.org/jira/browse/LUCENE-3098
> Project: Lucene - Java
>  Issue Type: New Feature
>Reporter: Martijn van Groningen
>Assignee: Michael McCandless
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3098-3x.patch, LUCENE-3098-3x.patch, 
> LUCENE-3098.patch, LUCENE-3098.patch, LUCENE-3098.patch, LUCENE-3098.patch, 
> LUCENE-3098.patch
>
>
> When grouping currently you can get two counts:
> * Total hit count. Which counts all documents that matched the query.
> * Total grouped hit count. Which counts all documents that have been grouped 
> in the top N groups.
> Since the end user gets groups in his search result instead of plain 
> documents with grouping. The total number of groups as total count makes more 
> sense in many situations. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2193) Re-architect Update Handler


 [ 
https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-2193:
--

Attachment: SOLR-2193.patch

Here is a new patch - couple tests, couple fixes, etc, etc. Still has no 
commitWithin type support for soft commits.

Tested and made auto soft commit code work.

I spent some time today firing documents rapidly at Solr with a soft commit max 
time of 1 second. Fantastic results at about 100 wikipedia documents per 
second. Didn't change any other example settings this time.

> Re-architect Update Handler
> ---
>
> Key: SOLR-2193
> URL: https://issues.apache.org/jira/browse/SOLR-2193
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 4.0
>
> Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch, 
> SOLR-2193.patch
>
>
> The update handler needs an overhaul.
> A few goals I think we might want to look at:
> 1. Cleanup - drop DirectUpdateHandler(2) line - move to something like 
> UpdateHandler, DefaultUpdateHandler
> 2. Expose the SolrIndexWriter in the api or add the proper abstractions to 
> get done what we now do with special casing:
> if (directupdatehandler2)
>   success
>  else
>   failish
> 3. Stop closing the IndexWriter and start using commit (still lazy IW init 
> though).
> 4. Drop iwAccess, iwCommit locks and sync mostly at the Lucene level.
> 5. Keep NRT support in mind.
> 6. Keep microsharding in mind (maintain logical index as multiple physical 
> indexes)
> 7. Address the current issues we face because multiple original/'reloaded' 
> cores can have a different IndexWriter on the same index.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3111) TestFSTs.testRandomWords failure


[ 
https://issues.apache.org/jira/browse/LUCENE-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034900#comment-13034900
 ] 

Robert Muir commented on LUCENE-3111:
-

I have an idea how i think i can make LuceneTestCase fail if a test does 
this... i'll see if i can improve the setup/tearDown checking this way so we 
don't have this issue again.

> TestFSTs.testRandomWords failure
> 
>
> Key: LUCENE-3111
> URL: https://issues.apache.org/jira/browse/LUCENE-3111
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: selckin
>Assignee: Michael McCandless
>Priority: Minor
>
> Was running some while(1) tests on the docvalues branch (r1103705) and the 
> following test failed:
> {code}
> [junit] Testsuite: org.apache.lucene.util.automaton.fst.TestFSTs
> [junit] Testcase: 
> testRandomWords(org.apache.lucene.util.automaton.fst.TestFSTs): FAILED
> [junit] expected:<771> but was:
> [junit] junit.framework.AssertionFailedError: expected:<771> but 
> was:
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.verifyUnPruned(TestFSTs.java:540)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:496)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:359)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.doTest(TestFSTs.java:319)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.testRandomWords(TestFSTs.java:940)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.testRandomWords(TestFSTs.java:915)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211)
> [junit] 
> [junit] 
> [junit] Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 7.628 sec
> [junit] 
> [junit] - Standard Error -
> [junit] NOTE: Ignoring nightly-only test method 'testBigSet'
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestFSTs 
> -Dtestmethod=testRandomWords -Dtests.seed=-269475578956012681:0
> [junit] NOTE: test params are: codec=PreFlex, locale=ar, 
> timezone=America/Blanc-Sablon
> [junit] NOTE: all tests run in this JVM:
> [junit] [TestToken, TestCodecs, TestIndexReaderReopen, 
> TestIndexWriterMerging, TestNoDeletionPolicy, TestParallelReaderEmptyIndex, 
> TestParallelTermEnum, TestPerSegmentDeletes, TestSegmentReader, 
> TestSegmentTermDocs, TestStressAdvance, TestTermVectorsReader, 
> TestSurrogates, TestMultiFieldQueryParser, TestAutomatonQuery, 
> TestBooleanScorer, TestFuzzyQuery, TestMultiTermConstantScore, 
> TestNumericRangeQuery64, TestPositiveScoresOnlyCollector, TestPrefixFilter, 
> TestQueryTermVector, TestScorerPerf, TestSloppyPhraseQuery, 
> TestSpansAdvanced, TestWindowsMMap, TestRamUsageEstimator, TestSmallFloat, 
> TestUnicodeUtil, TestFSTs]
> [junit] NOTE: Linux 2.6.37-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 
> (64-bit)/cpus=8,threads=1,free=137329960,total=208207872
> [junit] -  ---
> [junit] TEST org.apache.lucene.util.automaton.fst.TestFSTs FAILED
> {code}
> I am not able to reproduce

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3111) TestFSTs.testRandomWords failure


[ 
https://issues.apache.org/jira/browse/LUCENE-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034899#comment-13034899
 ] 

Michael McCandless commented on LUCENE-3111:


OK this reproduces the bug, once you add the missing calls to 
super.setUp/tearDown:

{noformat}
ant test -Dtestcase=TestFSTs -Dtestmethod=testRandomWords 
-Dtests.seed=6166279653770643480:6589011488658196383
{noformat}

> TestFSTs.testRandomWords failure
> 
>
> Key: LUCENE-3111
> URL: https://issues.apache.org/jira/browse/LUCENE-3111
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: selckin
>Assignee: Michael McCandless
>Priority: Minor
>
> Was running some while(1) tests on the docvalues branch (r1103705) and the 
> following test failed:
> {code}
> [junit] Testsuite: org.apache.lucene.util.automaton.fst.TestFSTs
> [junit] Testcase: 
> testRandomWords(org.apache.lucene.util.automaton.fst.TestFSTs): FAILED
> [junit] expected:<771> but was:
> [junit] junit.framework.AssertionFailedError: expected:<771> but 
> was:
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.verifyUnPruned(TestFSTs.java:540)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:496)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:359)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.doTest(TestFSTs.java:319)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.testRandomWords(TestFSTs.java:940)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.testRandomWords(TestFSTs.java:915)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211)
> [junit] 
> [junit] 
> [junit] Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 7.628 sec
> [junit] 
> [junit] - Standard Error -
> [junit] NOTE: Ignoring nightly-only test method 'testBigSet'
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestFSTs 
> -Dtestmethod=testRandomWords -Dtests.seed=-269475578956012681:0
> [junit] NOTE: test params are: codec=PreFlex, locale=ar, 
> timezone=America/Blanc-Sablon
> [junit] NOTE: all tests run in this JVM:
> [junit] [TestToken, TestCodecs, TestIndexReaderReopen, 
> TestIndexWriterMerging, TestNoDeletionPolicy, TestParallelReaderEmptyIndex, 
> TestParallelTermEnum, TestPerSegmentDeletes, TestSegmentReader, 
> TestSegmentTermDocs, TestStressAdvance, TestTermVectorsReader, 
> TestSurrogates, TestMultiFieldQueryParser, TestAutomatonQuery, 
> TestBooleanScorer, TestFuzzyQuery, TestMultiTermConstantScore, 
> TestNumericRangeQuery64, TestPositiveScoresOnlyCollector, TestPrefixFilter, 
> TestQueryTermVector, TestScorerPerf, TestSloppyPhraseQuery, 
> TestSpansAdvanced, TestWindowsMMap, TestRamUsageEstimator, TestSmallFloat, 
> TestUnicodeUtil, TestFSTs]
> [junit] NOTE: Linux 2.6.37-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 
> (64-bit)/cpus=8,threads=1,free=137329960,total=208207872
> [junit] -  ---
> [junit] TEST org.apache.lucene.util.automaton.fst.TestFSTs FAILED
> {code}
> I am not able to reproduce

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3111) TestFSTs.testRandomWords failure


[ 
https://issues.apache.org/jira/browse/LUCENE-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034895#comment-13034895
 ] 

Michael McCandless commented on LUCENE-3111:


Doh!

+1 for findbugs.

> TestFSTs.testRandomWords failure
> 
>
> Key: LUCENE-3111
> URL: https://issues.apache.org/jira/browse/LUCENE-3111
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: selckin
>Assignee: Michael McCandless
>Priority: Minor
>
> Was running some while(1) tests on the docvalues branch (r1103705) and the 
> following test failed:
> {code}
> [junit] Testsuite: org.apache.lucene.util.automaton.fst.TestFSTs
> [junit] Testcase: 
> testRandomWords(org.apache.lucene.util.automaton.fst.TestFSTs): FAILED
> [junit] expected:<771> but was:
> [junit] junit.framework.AssertionFailedError: expected:<771> but 
> was:
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.verifyUnPruned(TestFSTs.java:540)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:496)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:359)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.doTest(TestFSTs.java:319)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.testRandomWords(TestFSTs.java:940)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.testRandomWords(TestFSTs.java:915)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211)
> [junit] 
> [junit] 
> [junit] Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 7.628 sec
> [junit] 
> [junit] - Standard Error -
> [junit] NOTE: Ignoring nightly-only test method 'testBigSet'
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestFSTs 
> -Dtestmethod=testRandomWords -Dtests.seed=-269475578956012681:0
> [junit] NOTE: test params are: codec=PreFlex, locale=ar, 
> timezone=America/Blanc-Sablon
> [junit] NOTE: all tests run in this JVM:
> [junit] [TestToken, TestCodecs, TestIndexReaderReopen, 
> TestIndexWriterMerging, TestNoDeletionPolicy, TestParallelReaderEmptyIndex, 
> TestParallelTermEnum, TestPerSegmentDeletes, TestSegmentReader, 
> TestSegmentTermDocs, TestStressAdvance, TestTermVectorsReader, 
> TestSurrogates, TestMultiFieldQueryParser, TestAutomatonQuery, 
> TestBooleanScorer, TestFuzzyQuery, TestMultiTermConstantScore, 
> TestNumericRangeQuery64, TestPositiveScoresOnlyCollector, TestPrefixFilter, 
> TestQueryTermVector, TestScorerPerf, TestSloppyPhraseQuery, 
> TestSpansAdvanced, TestWindowsMMap, TestRamUsageEstimator, TestSmallFloat, 
> TestUnicodeUtil, TestFSTs]
> [junit] NOTE: Linux 2.6.37-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 
> (64-bit)/cpus=8,threads=1,free=137329960,total=208207872
> [junit] -  ---
> [junit] TEST org.apache.lucene.util.automaton.fst.TestFSTs FAILED
> {code}
> I am not able to reproduce

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1395) Integrate Katta

2011-05-17 Thread Jamie Johnson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034887#comment-13034887
 ] 

Jamie Johnson commented on SOLR-1395:
-

Is there any updated documentation for how to do this?  I've attempted to run 
through the patching process but the exact steps are not clear since the 
versions have changed significantly.  

> Integrate Katta
> ---
>
> Key: SOLR-1395
> URL: https://issues.apache.org/jira/browse/SOLR-1395
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 1.4
>Reporter: Jason Rutherglen
>Priority: Minor
> Fix For: 3.2
>
> Attachments: SOLR-1395.patch, SOLR-1395.patch, SOLR-1395.patch, 
> back-end.log, front-end.log, hadoop-core-0.19.0.jar, katta-core-0.6-dev.jar, 
> katta-solrcores.jpg, katta.node.properties, katta.zk.properties, 
> log4j-1.2.13.jar, solr-1395-1431-3.patch, solr-1395-1431-4.patch, 
> solr-1395-1431-katta0.6.patch, solr-1395-1431-katta0.6.patch, 
> solr-1395-1431.patch, solr-1395-katta-0.6.2-1.patch, 
> solr-1395-katta-0.6.2-2.patch, solr-1395-katta-0.6.2-3.patch, 
> solr-1395-katta-0.6.2.patch, test-katta-core-0.6-dev.jar, 
> zkclient-0.1-dev.jar, zookeeper-3.2.1.jar
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> We'll integrate Katta into Solr so that:
> * Distributed search uses Hadoop RPC
> * Shard/SolrCore distribution and management
> * Zookeeper based failover
> * Indexes may be built using Hadoop

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2424) extracted text from tika has no spaces

2011-05-17 Thread Andrzej Bialecki (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034886#comment-13034886
 ] 

Andrzej Bialecki  commented on SOLR-2424:
-

Liam, what version of the cmd-line tika app did you use for this test? was it 
the exact same version as the one in Solr?

> extracted text from tika has no spaces
> --
>
> Key: SOLR-2424
> URL: https://issues.apache.org/jira/browse/SOLR-2424
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 3.1
>Reporter: Yonik Seeley
> Attachments: ET2000 Service Manual.pdf
>
>
> Try this:
> curl 
> "http://localhost:8983/solr/update/extract?extractOnly=true&wt=json&indent=true";
>   -F "tutorial=@tutorial.pdf"
> And you get text output w/o spaces: 
> "ThisdocumentcoversthebasicsofrunningSolru"...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3092) NRTCachingDirectory, to buffer small segments in a RAMDir

2011-05-17 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034884#comment-13034884
 ] 

David Smiley commented on LUCENE-3092:
--

This looks cool.  Any performance measurements?  Perhaps a forthcoming post on 
Mike's blog? :-)

> NRTCachingDirectory, to buffer small segments in a RAMDir
> -
>
> Key: LUCENE-3092
> URL: https://issues.apache.org/jira/browse/LUCENE-3092
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Michael McCandless
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3092-listener.patch, LUCENE-3092.patch, 
> LUCENE-3092.patch, LUCENE-3092.patch, LUCENE-3092.patch
>
>
> I created this simply Directory impl, whose goal is reduce IO
> contention in a frequent reopen NRT use case.
> The idea is, when reopening quickly, but not indexing that much
> content, you wind up with many small files created with time, that can
> possibly stress the IO system eg if merges, searching are also
> fighting for IO.
> So, NRTCachingDirectory puts these newly created files into a RAMDir,
> and only when they are merged into a too-large segment, does it then
> write-through to the real (delegate) directory.
> This lets you spend some RAM to reduce I0.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3111) TestFSTs.testRandomWords failure


[ 
https://issues.apache.org/jira/browse/LUCENE-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034880#comment-13034880
 ] 

Robert Muir commented on LUCENE-3111:
-

ok, the problem is the test overrides setup() but doesnt call super.setup(), 
and it does the same with tearDown()

Currently the way LuceneTestCase checks this is very "crude", in other words if 
you make this mistake with one, or the other, but not both, it will catch it!

The only workaround i know of to find test bugs like this is to install 
findbugs. it has a specific check for this exact test bug! we could run it on 
all of our tests.

> TestFSTs.testRandomWords failure
> 
>
> Key: LUCENE-3111
> URL: https://issues.apache.org/jira/browse/LUCENE-3111
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: selckin
>Assignee: Michael McCandless
>Priority: Minor
>
> Was running some while(1) tests on the docvalues branch (r1103705) and the 
> following test failed:
> {code}
> [junit] Testsuite: org.apache.lucene.util.automaton.fst.TestFSTs
> [junit] Testcase: 
> testRandomWords(org.apache.lucene.util.automaton.fst.TestFSTs): FAILED
> [junit] expected:<771> but was:
> [junit] junit.framework.AssertionFailedError: expected:<771> but 
> was:
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.verifyUnPruned(TestFSTs.java:540)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:496)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:359)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.doTest(TestFSTs.java:319)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.testRandomWords(TestFSTs.java:940)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.testRandomWords(TestFSTs.java:915)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211)
> [junit] 
> [junit] 
> [junit] Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 7.628 sec
> [junit] 
> [junit] - Standard Error -
> [junit] NOTE: Ignoring nightly-only test method 'testBigSet'
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestFSTs 
> -Dtestmethod=testRandomWords -Dtests.seed=-269475578956012681:0
> [junit] NOTE: test params are: codec=PreFlex, locale=ar, 
> timezone=America/Blanc-Sablon
> [junit] NOTE: all tests run in this JVM:
> [junit] [TestToken, TestCodecs, TestIndexReaderReopen, 
> TestIndexWriterMerging, TestNoDeletionPolicy, TestParallelReaderEmptyIndex, 
> TestParallelTermEnum, TestPerSegmentDeletes, TestSegmentReader, 
> TestSegmentTermDocs, TestStressAdvance, TestTermVectorsReader, 
> TestSurrogates, TestMultiFieldQueryParser, TestAutomatonQuery, 
> TestBooleanScorer, TestFuzzyQuery, TestMultiTermConstantScore, 
> TestNumericRangeQuery64, TestPositiveScoresOnlyCollector, TestPrefixFilter, 
> TestQueryTermVector, TestScorerPerf, TestSloppyPhraseQuery, 
> TestSpansAdvanced, TestWindowsMMap, TestRamUsageEstimator, TestSmallFloat, 
> TestUnicodeUtil, TestFSTs]
> [junit] NOTE: Linux 2.6.37-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 
> (64-bit)/cpus=8,threads=1,free=137329960,total=208207872
> [junit] -  ---
> [junit] TEST org.apache.lucene.util.automaton.fst.TestFSTs FAILED
> {code}
> I am not able to reproduce

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3111) TestFSTs.testRandomWords failure


[ 
https://issues.apache.org/jira/browse/LUCENE-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034876#comment-13034876
 ] 

Robert Muir commented on LUCENE-3111:
-

This sounds like a bug in either the test or test-infra.

I'm not able to reproduce but if I run this test with -Dtests.iter=100, i'm 
able to produce a similar failure (again not reproducible).

So first I'd like to see if we can find the "reproducibility bug". This is the 
most important to me :)

> TestFSTs.testRandomWords failure
> 
>
> Key: LUCENE-3111
> URL: https://issues.apache.org/jira/browse/LUCENE-3111
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: selckin
>Assignee: Michael McCandless
>Priority: Minor
>
> Was running some while(1) tests on the docvalues branch (r1103705) and the 
> following test failed:
> {code}
> [junit] Testsuite: org.apache.lucene.util.automaton.fst.TestFSTs
> [junit] Testcase: 
> testRandomWords(org.apache.lucene.util.automaton.fst.TestFSTs): FAILED
> [junit] expected:<771> but was:
> [junit] junit.framework.AssertionFailedError: expected:<771> but 
> was:
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.verifyUnPruned(TestFSTs.java:540)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:496)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:359)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.doTest(TestFSTs.java:319)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.testRandomWords(TestFSTs.java:940)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.testRandomWords(TestFSTs.java:915)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211)
> [junit] 
> [junit] 
> [junit] Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 7.628 sec
> [junit] 
> [junit] - Standard Error -
> [junit] NOTE: Ignoring nightly-only test method 'testBigSet'
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestFSTs 
> -Dtestmethod=testRandomWords -Dtests.seed=-269475578956012681:0
> [junit] NOTE: test params are: codec=PreFlex, locale=ar, 
> timezone=America/Blanc-Sablon
> [junit] NOTE: all tests run in this JVM:
> [junit] [TestToken, TestCodecs, TestIndexReaderReopen, 
> TestIndexWriterMerging, TestNoDeletionPolicy, TestParallelReaderEmptyIndex, 
> TestParallelTermEnum, TestPerSegmentDeletes, TestSegmentReader, 
> TestSegmentTermDocs, TestStressAdvance, TestTermVectorsReader, 
> TestSurrogates, TestMultiFieldQueryParser, TestAutomatonQuery, 
> TestBooleanScorer, TestFuzzyQuery, TestMultiTermConstantScore, 
> TestNumericRangeQuery64, TestPositiveScoresOnlyCollector, TestPrefixFilter, 
> TestQueryTermVector, TestScorerPerf, TestSloppyPhraseQuery, 
> TestSpansAdvanced, TestWindowsMMap, TestRamUsageEstimator, TestSmallFloat, 
> TestUnicodeUtil, TestFSTs]
> [junit] NOTE: Linux 2.6.37-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 
> (64-bit)/cpus=8,threads=1,free=137329960,total=208207872
> [junit] -  ---
> [junit] TEST org.apache.lucene.util.automaton.fst.TestFSTs FAILED
> {code}
> I am not able to reproduce

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-2521) TestJoin.testRandom fails

2011-05-17 Thread Yonik Seeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley reassigned SOLR-2521:
--

Assignee: Yonik Seeley

> TestJoin.testRandom fails
> -
>
> Key: SOLR-2521
> URL: https://issues.apache.org/jira/browse/SOLR-2521
> Project: Solr
>  Issue Type: Bug
>Reporter: Michael McCandless
>Assignee: Yonik Seeley
> Fix For: 4.0
>
>
> Hit this random failure; it reproduces on trunk:
> {noformat}
> [junit] Testsuite: org.apache.solr.TestJoin
> [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 4.512 sec
> [junit] 
> [junit] - Standard Error -
> [junit] 2011-05-16 12:51:46 org.apache.solr.TestJoin testRandomJoin
> [junit] SEVERE: GROUPING MISMATCH: mismatch: '0'!='1' @ response/numFound
> [junit]   
> request=LocalSolrQueryRequest{echoParams=all&indent=true&q={!join+from%3Dsmall_i+to%3Dsmall3_is}*:*&wt=json}
> [junit]   result={
> [junit]   "responseHeader":{
> [junit] "status":0,
> [junit] "QTime":0,
> [junit] "params":{
> [junit]   "echoParams":"all",
> [junit]   "indent":"true",
> [junit]   "q":"{!join from=small_i to=small3_is}*:*",
> [junit]   "wt":"json"}},
> [junit]   "response":{"numFound":1,"start":0,"docs":[
> [junit]   {
> [junit] "id":"NXEA",
> [junit] "score_f":87.90162,
> [junit] "small3_ss":["N",
> [junit]   "v",
> [junit]   "n"],
> [junit] "small_i":4,
> [junit] "small2_i":1,
> [junit] "small2_is":[2],
> [junit] "small3_is":[69,
> [junit]   88,
> [junit]   54,
> [junit]   80,
> [junit]   75,
> [junit]   83,
> [junit]   57,
> [junit]   73,
> [junit]   85,
> [junit]   52,
> [junit]   50,
> [junit]   88,
> [junit]   51,
> [junit]   89,
> [junit]   12,
> [junit]   8,
> [junit]   19,
> [junit]   23,
> [junit]   53,
> [junit]   75,
> [junit]   26,
> [junit]   99,
> [junit]   0,
> [junit]   44]}]
> [junit]   }}
> [junit]   expected={"numFound":0,"start":0,"docs":[]}
> [junit]   model={"NXEA":"Doc(0):[id=NXEA, score_f=87.90162, small3_ss=[N, 
> v, n], small_i=4, small2_i=1, small2_is=2, small3_is=[69, 88, 54, 80, 75, 83, 
> 57, 73, 85, 52, 50, 88, 51, 89, 12, 8, 19, 23, 53, 75, 26, 99, 0, 
> 44]]","JSLZ":"Doc(1):[id=JSLZ, score_f=11.198811, small2_ss=[c, d], 
> small3_ss=[b, R, H, Q, O, f, C, e, Z, u, z, u, w, I, f, _, Y, r, w, u], 
> small_i=6, small2_is=[2, 3], small3_is=[22, 1]]","FAWX":"Doc(2):[id=FAWX, 
> score_f=25.524109, small_s=d, small3_ss=[O, D, X, `, W, z, k, M, j, m, r, [, 
> E, P, w, ^, y, T, e, R, V, H, g, e, I], small_i=2, small2_is=[2, 1], 
> small3_is=[95, 42]]","GDDZ":"Doc(3):[id=GDDZ, score_f=8.483642, small2_ss=[b, 
> e], small3_ss=[o, i, y, l, I, O, r, O, f, d, E, e, d, f, b, P], small2_is=[6, 
> 6], small3_is=[36, 48, 9, 8, 40, 40, 68]]","RBIQ":"Doc(4):[id=RBIQ, 
> score_f=97.06258, small_s=b, small2_s=c, small2_ss=[e, e], small_i=2, 
> small2_is=6, small3_is=[13, 77, 96, 45]]","LRDM":"Doc(5):[id=LRDM, 
> score_f=82.302124, small_s=b, small2_s=a, small2_ss=d, small3_ss=[H, m, O, D, 
> I, J, U, D, f, N, ^, m, I, j, L, s, F, h, A, `, c, j], small2_i=2, 
> small2_is=[2, 7], small3_is=[81, 31, 78, 23, 88, 1, 7, 86, 20, 7, 40, 52, 
> 100, 81, 34, 45, 87, 72, 14, 5]]"}
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestJoin 
> -Dtestmethod=testRandomJoin 
> -Dtests.seed=-4998031941344546449:8541928265064992444
> [junit] NOTE: test params are: codec=RandomCodecProvider: {id=MockRandom, 
> small2_ss=Standard, small2_is=MockFixedIntBlock(blockSize=1738), 
> small2_s=MockFixedIntBlock(blockSize=1738), 
> small3_is=MockVariableIntBlock(baseBlockSize=77), 
> small_i=MockFixedIntBlock(blockSize=1738), 
> small_s=MockVariableIntBlock(baseBlockSize=77), score_f=MockSep, 
> small2_i=Pulsing(freqCutoff=9), small3_ss=SimpleText}, locale=sr_BA, 
> timezone=America/Barbados
> [junit] NOTE: all tests run in this JVM:
> [junit] [TestJoin]
> [junit] NOTE: Linux 2.6.33.6-147.fc13.x86_64 amd64/Sun Microsystems Inc. 
> 1.6.0_21 (64-bit)/cpus=24,threads=1,free=252342544,total=308084736
> [junit] -  ---
> [junit] Testcase: testRandomJoin(org.apache.solr.TestJoin):   FAILED
> [junit] mismatch: '0'!='1' @ response/numFound
> [junit] junit.framework.AssertionFailedError: mismatch: '0'!='1' @ 
> response/numFound
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCase

[Lucene.Net] [jira] [Resolved] (LUCENENET-410) Lucene In Action (LIA book) samples for .NET.

2011-05-17 Thread Prescott Nasser (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENENET-410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prescott Nasser resolved LUCENENET-410.
---

Resolution: Not A Problem

> Lucene In Action (LIA book) samples for .NET.
> -
>
> Key: LUCENENET-410
> URL: https://issues.apache.org/jira/browse/LUCENENET-410
> Project: Lucene.Net
>  Issue Type: New Feature
>Reporter: Pasha Bizhan
>Priority: Minor
> Attachments: liabook1_net_samples.zip
>
>
> First edition, Lucene.Net 1.4
> Not all samples from the book, only suitable for .NET. 
> For example nutch samples excluded.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (LUCENE-3111) TestFSTs.testRandomWords failure


[ 
https://issues.apache.org/jira/browse/LUCENE-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034868#comment-13034868
 ] 

Michael McCandless commented on LUCENE-3111:


I'm also not able to reproduce...

> TestFSTs.testRandomWords failure
> 
>
> Key: LUCENE-3111
> URL: https://issues.apache.org/jira/browse/LUCENE-3111
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: selckin
>Assignee: Michael McCandless
>Priority: Minor
>
> Was running some while(1) tests on the docvalues branch (r1103705) and the 
> following test failed:
> {code}
> [junit] Testsuite: org.apache.lucene.util.automaton.fst.TestFSTs
> [junit] Testcase: 
> testRandomWords(org.apache.lucene.util.automaton.fst.TestFSTs): FAILED
> [junit] expected:<771> but was:
> [junit] junit.framework.AssertionFailedError: expected:<771> but 
> was:
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.verifyUnPruned(TestFSTs.java:540)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:496)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:359)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.doTest(TestFSTs.java:319)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.testRandomWords(TestFSTs.java:940)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.testRandomWords(TestFSTs.java:915)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211)
> [junit] 
> [junit] 
> [junit] Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 7.628 sec
> [junit] 
> [junit] - Standard Error -
> [junit] NOTE: Ignoring nightly-only test method 'testBigSet'
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestFSTs 
> -Dtestmethod=testRandomWords -Dtests.seed=-269475578956012681:0
> [junit] NOTE: test params are: codec=PreFlex, locale=ar, 
> timezone=America/Blanc-Sablon
> [junit] NOTE: all tests run in this JVM:
> [junit] [TestToken, TestCodecs, TestIndexReaderReopen, 
> TestIndexWriterMerging, TestNoDeletionPolicy, TestParallelReaderEmptyIndex, 
> TestParallelTermEnum, TestPerSegmentDeletes, TestSegmentReader, 
> TestSegmentTermDocs, TestStressAdvance, TestTermVectorsReader, 
> TestSurrogates, TestMultiFieldQueryParser, TestAutomatonQuery, 
> TestBooleanScorer, TestFuzzyQuery, TestMultiTermConstantScore, 
> TestNumericRangeQuery64, TestPositiveScoresOnlyCollector, TestPrefixFilter, 
> TestQueryTermVector, TestScorerPerf, TestSloppyPhraseQuery, 
> TestSpansAdvanced, TestWindowsMMap, TestRamUsageEstimator, TestSmallFloat, 
> TestUnicodeUtil, TestFSTs]
> [junit] NOTE: Linux 2.6.37-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 
> (64-bit)/cpus=8,threads=1,free=137329960,total=208207872
> [junit] -  ---
> [junit] TEST org.apache.lucene.util.automaton.fst.TestFSTs FAILED
> {code}
> I am not able to reproduce

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3112) Add IW.add/updateDocuments to support nested documents


[ 
https://issues.apache.org/jira/browse/LUCENE-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034854#comment-13034854
 ] 

Robert Muir commented on LUCENE-3112:
-

{quote}
I suppose we could consider changing the index format today to record
which docs are subs... but I think we don't need to. Maybe I should
strengthen the @experimental to explain the risk that a future
reindexing could be required?
{quote}

I think this would be perfect. I certainly don't want to hold up this 
improvement, yet, in the future I just didnt want us to be in a 
situation where we say 'well if only we had recorded this information,
now its not possible to do XYZ because someone COULD have used 
add/updateDocuments() for some arbitrary reason and we will 'split' 
their grouped ids'.

We could also include in the note that various existing 
IndexSorters/Splitters are unaware about this, so use with caution :)


> Add IW.add/updateDocuments to support nested documents
> --
>
> Key: LUCENE-3112
> URL: https://issues.apache.org/jira/browse/LUCENE-3112
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3112.patch
>
>
> I think nested documents (LUCENE-2454) is a very compelling addition
> to Lucene.  It's also a popular (many votes) issue.
> Beyond supporting nested document querying, which is already an
> incredible addition since it preserves the relational model on
> indexing normalized content (eg, DB tables, XML docs), LUCENE-2454
> should also enable speedups in grouping implementation when you group
> by a nested field.
> For the same reason, it can also enable very fast post-group facet
> counting impl (LUCENE-3097) when you what to
> count(distinct(nestedField)), instead of unique documents, as your
> "identifier".  I expect many apps that use faceting need this ability
> (to count(distinct(nestedField)) not distinct(docID)).
> To support these use cases, I believe the only core change needed is
> the ability to atomically add or update multiple documents, which you
> cannot do today since in between add/updateDocument calls a flush (eg
> due to commit or getReader()) could occur.
> This new API (addDocuments(Iterable), updateDocuments(Term
> delTerm, Iterable) would also further guarantee that the
> documents are assigned sequential docIDs in the order the iterator
> provided them, and that the docIDs all reside in one segment.
> Segment merging never splits segments apart, so this invariant would
> hold even as merges/optimizes take place.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3114) PrefixAndSuffixAwareTokenFilter code cleanup

PrefixAndSuffixAwareTokenFilter code cleanup


 Key: LUCENE-3114
 URL: https://issues.apache.org/jira/browse/LUCENE-3114
 Project: Lucene - Java
  Issue Type: Task
  Components: modules/analysis
Reporter: Robert Muir


as noted on LUCENE-3113, I think this tokenstream is difficult to review.

In my opinion just changing the 'private PrefixAwareTokenFilter suffix' to 
'private PrefixAwareTokenFilter prefixAndSuffix' would work wonders.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3112) Add IW.add/updateDocuments to support nested documents


[ 
https://issues.apache.org/jira/browse/LUCENE-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034850#comment-13034850
 ] 

Michael McCandless commented on LUCENE-3112:


{quote}
We should really think through the consequences of this though.

If core features of lucene become implemented in a way that they rely upon 
these sequential docids, we then lock ourselves out of future optimizations 
such as reordering docids for optimal index compression.
{quote}

I agree it's somewhat dangerous we are making an (experimental)
guarantee that these docIDs will remain adjacent "forever".  We
normally are very protective about letting apps rely on docID
assignment/order.

But, I think this will not be "core" functionality that relies on
sub-docs (adjacent docs), but rather modules -- grouping, faceting,
nestedqueries/queries.  And, even if you use these modules, it's
optional whether the app did sub-docs.  Ie we would still have the
'generic" grouping collector, but then also an optimized one that
takes advantage of sub-docs.

Finally, I think doing this today would not preclude doing docID
reording in the future because the sub docs would be recomputable
based on the "identifier" field which grouped them in the first
place.

Ie the worst case future scenario (an app uses this new sub-docs
feature, but then has a big index they don't want to reindex and wants
to take advantage of a future docid reording compression we add) would
still be solvable because we could use this identifier field to find
blocks of sub-docs.

I suppose we could consider changing the index format today to record
which docs are subs... but I think we don't need to.  Maybe I should
strengthen the @experimental to explain the risk that a future
reindexing could be required?


> Add IW.add/updateDocuments to support nested documents
> --
>
> Key: LUCENE-3112
> URL: https://issues.apache.org/jira/browse/LUCENE-3112
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3112.patch
>
>
> I think nested documents (LUCENE-2454) is a very compelling addition
> to Lucene.  It's also a popular (many votes) issue.
> Beyond supporting nested document querying, which is already an
> incredible addition since it preserves the relational model on
> indexing normalized content (eg, DB tables, XML docs), LUCENE-2454
> should also enable speedups in grouping implementation when you group
> by a nested field.
> For the same reason, it can also enable very fast post-group facet
> counting impl (LUCENE-3097) when you what to
> count(distinct(nestedField)), instead of unique documents, as your
> "identifier".  I expect many apps that use faceting need this ability
> (to count(distinct(nestedField)) not distinct(docID)).
> To support these use cases, I believe the only core change needed is
> the ability to atomically add or update multiple documents, which you
> cannot do today since in between add/updateDocument calls a flush (eg
> due to commit or getReader()) could occur.
> This new API (addDocuments(Iterable), updateDocuments(Term
> delTerm, Iterable) would also further guarantee that the
> documents are assigned sequential docIDs in the order the iterator
> provided them, and that the docIDs all reside in one segment.
> Segment merging never splits segments apart, so this invariant would
> hold even as merges/optimizes take place.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3113) fix analyzer bugs found by MockTokenizer


[ 
https://issues.apache.org/jira/browse/LUCENE-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034846#comment-13034846
 ] 

Robert Muir commented on LUCENE-3113:
-

Uwe, I think i'll open a followup issue to clean up the code about 
PrefixAndSuffixAwareTF. I don't like how tricky it is.


> fix analyzer bugs found by MockTokenizer
> 
>
> Key: LUCENE-3113
> URL: https://issues.apache.org/jira/browse/LUCENE-3113
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Robert Muir
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3113.patch, LUCENE-3113.patch
>
>
> In LUCENE-3064, we beefed up MockTokenizer with assertions, and I've switched 
> over the analysis tests to use MockTokenizer for better coverage.
> However, this found a few bugs (one of which is LUCENE-3106):
> * incrementToken() after it returns false in CommonGramsQueryFilter, 
> HyphenatedWordsFilter, ShingleFilter, SynonymFilter
> * missing end() implementation for PrefixAwareTokenFilter
> * double reset() in QueryAutoStopWordAnalyzer and ReusableAnalyzerBase
> * missing correctOffset()s in MockTokenizer itself.
> I think it would be nice to just fix all the bugs on one issue... I've fixed 
> everything except Shingle and Synonym

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3102) Few issues with CachingCollector


[ 
https://issues.apache.org/jira/browse/LUCENE-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034841#comment-13034841
 ] 

Michael McCandless commented on LUCENE-3102:


Patch looks great!  But, can we rename curupto -> curUpto (and same for 
curbase)?  Ie, so it matches the other camelCaseVariables we have here...

Thank you!

> Few issues with CachingCollector
> 
>
> Key: LUCENE-3102
> URL: https://issues.apache.org/jira/browse/LUCENE-3102
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: core/search
>Reporter: Shai Erera
>Assignee: Shai Erera
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3102-factory.patch, LUCENE-3102.patch, 
> LUCENE-3102.patch
>
>
> CachingCollector (introduced in LUCENE-1421) has few issues:
> # Since the wrapped Collector may support out-of-order collection, the 
> document IDs cached may be out-of-order (depends on the Query) and thus 
> replay(Collector) will forward document IDs out-of-order to a Collector that 
> may not support it.
> # It does not clear cachedScores + cachedSegs upon exceeding RAM limits
> # I think that instead of comparing curScores to null, in order to determine 
> if scores are requested, we should have a specific boolean - for clarity
> # This check "if (base + nextLength > maxDocsToCache)" (line 168) can be 
> relaxed? E.g., what if nextLength is, say, 512K, and I cannot satisfy the 
> maxDocsToCache constraint, but if it was 10K I would? Wouldn't we still want 
> to try and cache them?
> Also:
> * The TODO in line 64 (having Collector specify needsScores()) -- why do we 
> need that if CachingCollector ctor already takes a boolean "cacheScores"? I 
> think it's better defined explicitly than implicitly?
> * Let's introduce a factory method for creating a specialized version if 
> scoring is requested / not (i.e., impl the TODO in line 189)
> * I think it's a useful collector, which stands on its own and not specific 
> to grouping. Can we move it to core?
> * How about using OpenBitSet instead of int[] for doc IDs?
> ** If the number of hits is big, we'd gain some RAM back, and be able to 
> cache more entries
> ** NOTE: OpenBitSet can only be used for in-order collection only. So we can 
> use that if the wrapped Collector does not support out-of-order
> * Do you think we can modify this Collector to not necessarily wrap another 
> Collector? We have such Collector which stores (in-memory) all matching doc 
> IDs + scores (if required). Those are later fed into several processes that 
> operate on them (e.g. fetch more info from the index etc.). I am thinking, we 
> can make CachingCollector *optionally* wrap another Collector and then 
> someone can reuse it by setting RAM limit to unlimited (we should have a 
> constant for that) in order to simply collect all matching docs + scores.
> * I think a set of dedicated unit tests for this class alone would be good.
> That's it so far. Perhaps, if we do all of the above, more things will pop up.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3097) Post grouping faceting


[ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034836#comment-13034836
 ] 

Michael McCandless commented on LUCENE-3097:


Right, this'd mean all docs sharing a given group value are contiguous and in 
the same segment.  The app would have to ensure this, in order to use a 
collector that takes advantage of it.


> Post grouping faceting
> --
>
> Key: LUCENE-3097
> URL: https://issues.apache.org/jira/browse/LUCENE-3097
> Project: Lucene - Java
>  Issue Type: New Feature
>Reporter: Martijn van Groningen
>Priority: Minor
> Fix For: 3.2, 4.0
>
>
> This issues focuses on implementing post grouping faceting.
> * How to handle multivalued fields. What field value to show with the facet.
> * Where the facet counts should be based on
> ** Facet counts can be based on the normal documents. Ungrouped counts. 
> ** Facet counts can be based on the groups. Grouped counts.
> ** Facet counts can be based on the combination of group value and facet 
> value. Matrix counts.   
> And properly more implementation options.
> The first two methods are implemented in the SOLR-236 patch. For the first 
> option it calculates a DocSet based on the individual documents from the 
> query result. For the second option it calculates a DocSet for all the most 
> relevant documents of a group. Once the DocSet is computed the FacetComponent 
> and StatsComponent use one the DocSet to create facets and statistics.  
> This last one is a bit more complex. I think it is best explained with an 
> example. Lets say we search on travel offers:
> |||hotel||departure_airport||duration||
> |Hotel a|AMS|5
> |Hotel a|DUS|10
> |Hotel b|AMS|5
> |Hotel b|AMS|10
> If we group by hotel and have a facet for airport. Most end users expect 
> (according to my experience off course) the following airport facet:
> AMS: 2
> DUS: 1
> The above result can't be achieved by the first two methods. You either get 
> counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3092) NRTCachingDirectory, to buffer small segments in a RAMDir


[ 
https://issues.apache.org/jira/browse/LUCENE-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034835#comment-13034835
 ] 

Michael McCandless commented on LUCENE-3092:


Thanks Simon; I'll commit soon...

> NRTCachingDirectory, to buffer small segments in a RAMDir
> -
>
> Key: LUCENE-3092
> URL: https://issues.apache.org/jira/browse/LUCENE-3092
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Michael McCandless
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3092-listener.patch, LUCENE-3092.patch, 
> LUCENE-3092.patch, LUCENE-3092.patch, LUCENE-3092.patch
>
>
> I created this simply Directory impl, whose goal is reduce IO
> contention in a frequent reopen NRT use case.
> The idea is, when reopening quickly, but not indexing that much
> content, you wind up with many small files created with time, that can
> possibly stress the IO system eg if merges, searching are also
> fighting for IO.
> So, NRTCachingDirectory puts these newly created files into a RAMDir,
> and only when they are merged into a too-large segment, does it then
> write-through to the real (delegate) directory.
> This lets you spend some RAM to reduce I0.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-1421) Ability to group search results by field


[ 
https://issues.apache.org/jira/browse/LUCENE-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034828#comment-13034828
 ] 

Michael McCandless commented on LUCENE-1421:


I'm only testing groupSort and sort by relevance now in the nightly bench.

I'll add sort-by-title, groupSort-by-relevance cases too, so we test that.  
Hmm, though: this content set is alphabetized by title I believe, so it's not 
really a good test.  (I suspect that's why the TermQuery sorting by title is 
faster 

bq. Do think when new features are added that these also need be added to this 
test suite? Or is this perfomance test suite just for the basic features?

Well, in general I'd love to have wider coverage in the nightly perf test...  
really it's only a start now.  But there's no hard rule we have to add new 
functions into the nightly bench...

> Ability to group search results by field
> 
>
> Key: LUCENE-1421
> URL: https://issues.apache.org/jira/browse/LUCENE-1421
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: core/search
>Reporter: Artyom Sokolov
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-1421.patch, LUCENE-1421.patch, 
> lucene-grouping.patch
>
>
> It would be awesome to group search results by specified field. Some 
> functionality was provided for Apache Solr but I think it should be done in 
> Core Lucene. There could be some useful information like total hits about 
> collapsed data like total count and so on.
> Thanks,
> Artyom

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2193) Re-architect Update Handler


[ 
https://issues.apache.org/jira/browse/SOLR-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034818#comment-13034818
 ] 

Mark Miller commented on SOLR-2193:
---

I've got some fixes for this, and I've started on some tests and other minor 
steps forward. I'll put it up before too long.

> Re-architect Update Handler
> ---
>
> Key: SOLR-2193
> URL: https://issues.apache.org/jira/browse/SOLR-2193
> Project: Solr
>  Issue Type: Improvement
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 4.0
>
> Attachments: SOLR-2193.patch, SOLR-2193.patch, SOLR-2193.patch
>
>
> The update handler needs an overhaul.
> A few goals I think we might want to look at:
> 1. Cleanup - drop DirectUpdateHandler(2) line - move to something like 
> UpdateHandler, DefaultUpdateHandler
> 2. Expose the SolrIndexWriter in the api or add the proper abstractions to 
> get done what we now do with special casing:
> if (directupdatehandler2)
>   success
>  else
>   failish
> 3. Stop closing the IndexWriter and start using commit (still lazy IW init 
> though).
> 4. Drop iwAccess, iwCommit locks and sync mostly at the Lucene level.
> 5. Keep NRT support in mind.
> 6. Keep microsharding in mind (maintain logical index as multiple physical 
> indexes)
> 7. Address the current issues we face because multiple original/'reloaded' 
> cores can have a different IndexWriter on the same index.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3113) fix analyzer bugs found by MockTokenizer


[ 
https://issues.apache.org/jira/browse/LUCENE-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034816#comment-13034816
 ] 

Uwe Schindler commented on LUCENE-3113:
---

A quick check on the fixes in the implementations: all fine. I was just 
confused about PrefixAndSuffixAwareTF, but thats fine (Robert explained it to 
me - this Filters are very complicated from the code/class hierarchy design 
*g*).

I did not verify the Tests, I assume its just dumb search-replacements.

> fix analyzer bugs found by MockTokenizer
> 
>
> Key: LUCENE-3113
> URL: https://issues.apache.org/jira/browse/LUCENE-3113
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Robert Muir
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3113.patch, LUCENE-3113.patch
>
>
> In LUCENE-3064, we beefed up MockTokenizer with assertions, and I've switched 
> over the analysis tests to use MockTokenizer for better coverage.
> However, this found a few bugs (one of which is LUCENE-3106):
> * incrementToken() after it returns false in CommonGramsQueryFilter, 
> HyphenatedWordsFilter, ShingleFilter, SynonymFilter
> * missing end() implementation for PrefixAwareTokenFilter
> * double reset() in QueryAutoStopWordAnalyzer and ReusableAnalyzerBase
> * missing correctOffset()s in MockTokenizer itself.
> I think it would be nice to just fix all the bugs on one issue... I've fixed 
> everything except Shingle and Synonym

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2091) Add BM25 Scoring to Lucene

2011-05-17 Thread Shrinath (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034815#comment-13034815
 ] 

Shrinath commented on LUCENE-2091:
--

Hi, 

Don't be harsh if I am asking this in a wrong place, 
but could someone tell me if the linked patch is better than 
http://nlp.uned.es/~jperezi/Lucene-BM25/ 


> Add BM25 Scoring to Lucene
> --
>
> Key: LUCENE-2091
> URL: https://issues.apache.org/jira/browse/LUCENE-2091
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: modules/other
>Reporter: Yuval Feinstein
>Priority: Minor
> Fix For: 4.0
>
> Attachments: BM25SimilarityProvider.java, LUCENE-2091.patch, 
> persianlucene.jpg
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of 
> Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed 
> boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime 
> somewhat.
> I would like to contribute the code to Lucene under contrib. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3113) fix analyzer bugs found by MockTokenizer


[ 
https://issues.apache.org/jira/browse/LUCENE-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034806#comment-13034806
 ] 

Robert Muir commented on LUCENE-3113:
-

I think this patch is ready to commit, i'll wait and see if anyone feels like 
reviewing it :)

> fix analyzer bugs found by MockTokenizer
> 
>
> Key: LUCENE-3113
> URL: https://issues.apache.org/jira/browse/LUCENE-3113
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Robert Muir
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3113.patch, LUCENE-3113.patch
>
>
> In LUCENE-3064, we beefed up MockTokenizer with assertions, and I've switched 
> over the analysis tests to use MockTokenizer for better coverage.
> However, this found a few bugs (one of which is LUCENE-3106):
> * incrementToken() after it returns false in CommonGramsQueryFilter, 
> HyphenatedWordsFilter, ShingleFilter, SynonymFilter
> * missing end() implementation for PrefixAwareTokenFilter
> * double reset() in QueryAutoStopWordAnalyzer and ReusableAnalyzerBase
> * missing correctOffset()s in MockTokenizer itself.
> I think it would be nice to just fix all the bugs on one issue... I've fixed 
> everything except Shingle and Synonym

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

SpanNearQuery - inOrder parameter

2011-05-17 Thread Gregory Tarr

I attach a junit test which shows strange behaviour of the inOrder
parameter on the SpanNearQuery constructor, using Lucene 2.9.4.

My understanding of this parameter is that true forces the order and
false doesn't care about the order.

Using true always works. However using false works fine when the terms
in the query are distinct, but if they are equivalent, e.g. searching
for "john john", I do not get the expected results. The workaround seems
to be to always use true for queries with repeated terms.

Any help?

Thanks

Greg

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.TopDocsCollector;
import org.apache.lucene.search.TopScoreDocCollector;
import org.apache.lucene.search.spans.SpanNearQuery;
import org.apache.lucene.search.spans.SpanQuery;
import org.apache.lucene.search.spans.SpanTermQuery;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.Version;
import org.junit.Assert;
import org.junit.Test; 
public class TestSpanNearQueryInOrder { 
@Test
public void testSpanNearQueryInOrder() {
RAMDirectory directory = new RAMDirectory();
IndexWriter writer = new IndexWriter(directory, new
StandardAnalyzer(Version.LUCENE_29), true,
IndexWriter.MaxFieldLength.UNLIMITED);
TopDocsCollector collector = TopScoreDocCollector.create(3, false);

Document doc = new Document();

// DOC1
doc.add(new Field("text","   ", Field.Store.YES,
Field.Index.ANALYZED));

writer.addDocument(doc);
doc = new Document(); 

// DOC2
doc.add(new Field("text","   "));

writer.addDocument(doc); 
doc = new Document();

// DOC3
doc.add(new Field("text","     "));

writer.addDocument(doc);
writer.optimize();
writer.close(); 
searcher = new IndexSearcher(directory, false); 
SpanQuery[] clauses = new SpanQuery[2];
clauses[0] = new SpanTermQuery(new Term("text", ""));
clauses[1] = new SpanTermQuery(new Term("text", ""));

// Don't care about order, so setting inOrder = false
SpanNearQuery q = new SpanNearQuery(clauses, 1, false);
searcher.search(q, collector); 
// This assert fails - 3 docs are returned. Expecting only DOC2 and DOC3
Assert.assertEquals("Check 2 results", 2, collector.getTotalHits()); 
collector = new TopScoreDocCollector.create(3, false);
clauses = new SpanQuery[2];
clauses[0] = new SpanTermQuery(new Term("text", ""));
clauses[1] = new SpanTermQuery(new Term("text", ""));

// Don't care about order, so setting inOrder = false
q = new SpanNearQuery(clauses, 0, false);
searcher.search(q, collector); 
// This assert fails - 3 docs are returned. Expecting only DOC2
Assert.assertEquals("Check 1 result", 1, collector.getTotalHits()); 
} 
}
 <> 

Please consider the environment before printing this email.

This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately.
Statements of intent shall only become binding when confirmed in hard copy by 
an authorised signatory.  The contents of this email may relate to dealings 
with other companies within the Detica Limited group of companies.

Detica Limited is registered in England under No: 1337451.

Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England.



TestSpanNearQueryInOrder.java
Description: TestSpanNearQueryInOrder.java

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3113) fix analyzer bugs found by MockTokenizer


 [ 
https://issues.apache.org/jira/browse/LUCENE-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3113:


Attachment: LUCENE-3113.patch

updated patch, fixing the bugs in Synonyms and ShingleFilter.

also, i found two more bugs: the ShingleAnalyzerWrapper was double-resetting, 
and the PrefixAndSuffixAwareTokenFilter was missing end() also


> fix analyzer bugs found by MockTokenizer
> 
>
> Key: LUCENE-3113
> URL: https://issues.apache.org/jira/browse/LUCENE-3113
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Robert Muir
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3113.patch, LUCENE-3113.patch
>
>
> In LUCENE-3064, we beefed up MockTokenizer with assertions, and I've switched 
> over the analysis tests to use MockTokenizer for better coverage.
> However, this found a few bugs (one of which is LUCENE-3106):
> * incrementToken() after it returns false in CommonGramsQueryFilter, 
> HyphenatedWordsFilter, ShingleFilter, SynonymFilter
> * missing end() implementation for PrefixAwareTokenFilter
> * double reset() in QueryAutoStopWordAnalyzer and ReusableAnalyzerBase
> * missing correctOffset()s in MockTokenizer itself.
> I think it would be nice to just fix all the bugs on one issue... I've fixed 
> everything except Shingle and Synonym

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2445) unknown handler: standard

2011-05-17 Thread Koji Sekiguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-2445.
--

   Resolution: Fixed
Fix Version/s: 3.1.1

Committed revision 1104270 for 3.1.1.
Thanks Gabriele for your patience!

> unknown handler: standard
> -
>
> Key: SOLR-2445
> URL: https://issues.apache.org/jira/browse/SOLR-2445
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.4.1, 3.1, 3.2, 4.0
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 3.1.1, 3.2, 4.0
>
> Attachments: SOLR-2445.patch, qt-form-jsp.patch
>
>
> To reproduce the problem using example config, go form.jsp, use standard for 
> qt (it is default) then click Search.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3113) fix analyzer bugs found by MockTokenizer


 [ 
https://issues.apache.org/jira/browse/LUCENE-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3113:


Attachment: LUCENE-3113.patch

attached is a patch, the synonyms and shingles tests still fail.

> fix analyzer bugs found by MockTokenizer
> 
>
> Key: LUCENE-3113
> URL: https://issues.apache.org/jira/browse/LUCENE-3113
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Robert Muir
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3113.patch
>
>
> In LUCENE-3064, we beefed up MockTokenizer with assertions, and I've switched 
> over the analysis tests to use MockTokenizer for better coverage.
> However, this found a few bugs (one of which is LUCENE-3106):
> * incrementToken() after it returns false in CommonGramsQueryFilter, 
> HyphenatedWordsFilter, ShingleFilter, SynonymFilter
> * missing end() implementation for PrefixAwareTokenFilter
> * double reset() in QueryAutoStopWordAnalyzer and ReusableAnalyzerBase
> * missing correctOffset()s in MockTokenizer itself.
> I think it would be nice to just fix all the bugs on one issue... I've fixed 
> everything except Shingle and Synonym

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3113) fix analyzer bugs found by MockTokenizer


 [ 
https://issues.apache.org/jira/browse/LUCENE-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3113:


  Component/s: modules/analysis
Fix Version/s: 4.0
   3.2

> fix analyzer bugs found by MockTokenizer
> 
>
> Key: LUCENE-3113
> URL: https://issues.apache.org/jira/browse/LUCENE-3113
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Robert Muir
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3113.patch
>
>
> In LUCENE-3064, we beefed up MockTokenizer with assertions, and I've switched 
> over the analysis tests to use MockTokenizer for better coverage.
> However, this found a few bugs (one of which is LUCENE-3106):
> * incrementToken() after it returns false in CommonGramsQueryFilter, 
> HyphenatedWordsFilter, ShingleFilter, SynonymFilter
> * missing end() implementation for PrefixAwareTokenFilter
> * double reset() in QueryAutoStopWordAnalyzer and ReusableAnalyzerBase
> * missing correctOffset()s in MockTokenizer itself.
> I think it would be nice to just fix all the bugs on one issue... I've fixed 
> everything except Shingle and Synonym

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3113) fix analyzer bugs found by MockTokenizer

fix analyzer bugs found by MockTokenizer


 Key: LUCENE-3113
 URL: https://issues.apache.org/jira/browse/LUCENE-3113
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-3113.patch

In LUCENE-3064, we beefed up MockTokenizer with assertions, and I've switched 
over the analysis tests to use MockTokenizer for better coverage.

However, this found a few bugs (one of which is LUCENE-3106):
* incrementToken() after it returns false in CommonGramsQueryFilter, 
HyphenatedWordsFilter, ShingleFilter, SynonymFilter
* missing end() implementation for PrefixAwareTokenFilter
* double reset() in QueryAutoStopWordAnalyzer and ReusableAnalyzerBase
* missing correctOffset()s in MockTokenizer itself.

I think it would be nice to just fix all the bugs on one issue... I've fixed 
everything except Shingle and Synonym

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Reopened] (SOLR-2445) unknown handler: standard

2011-05-17 Thread Koji Sekiguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reopened SOLR-2445:
--


Seems that no one objects about applying the patch to 3.1.1. Reopening.

> unknown handler: standard
> -
>
> Key: SOLR-2445
> URL: https://issues.apache.org/jira/browse/SOLR-2445
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.4.1, 3.1, 3.2, 4.0
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: SOLR-2445.patch, qt-form-jsp.patch
>
>
> To reproduce the problem using example config, go form.jsp, use standard for 
> qt (it is default) then click Search.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3110) ASCIIFoldingFilter wrongly folds german Umlauts


[ 
https://issues.apache.org/jira/browse/LUCENE-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034764#comment-13034764
 ] 

Robert Muir commented on LUCENE-3110:
-

another option, is to use the German2 stemmer from snowball, which is a 
variation on the german stemmer
designed to handle these cases.

If you use GermanAnalyzer in 3.1 it uses this stemmer by default.

> ASCIIFoldingFilter wrongly folds german Umlauts
> ---
>
> Key: LUCENE-3110
> URL: https://issues.apache.org/jira/browse/LUCENE-3110
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 3.1
>Reporter: Michael Gaber
>
> the german umlauts are currently mapped as follows.
> Ä/ä => A/a
> Ö/ö => O/o
> Ü/ü => U/u
> the correct mapping would be
> Ä/ä => Ae/ae
> Ö/ö => Oe/oe
> Ü/ü => Ue/ue
> so the corresponding rows in the switch statement should be moved down to the 
> ae/oe/ue positions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3110) ASCIIFoldingFilter wrongly folds german Umlauts

2011-05-17 Thread Steven Rowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe resolved LUCENE-3110.
-

Resolution: Won't Fix

See LUCENE-1696, where Robert Muir advocates using an ICU collation filter 
instead of locale-sensitive accent stripping.

> ASCIIFoldingFilter wrongly folds german Umlauts
> ---
>
> Key: LUCENE-3110
> URL: https://issues.apache.org/jira/browse/LUCENE-3110
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 3.1
>Reporter: Michael Gaber
>
> the german umlauts are currently mapped as follows.
> Ä/ä => A/a
> Ö/ö => O/o
> Ü/ü => U/u
> the correct mapping would be
> Ä/ä => Ae/ae
> Ö/ö => Oe/oe
> Ü/ü => Ue/ue
> so the corresponding rows in the switch statement should be moved down to the 
> ae/oe/ue positions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3112) Add IW.add/updateDocuments to support nested documents


[ 
https://issues.apache.org/jira/browse/LUCENE-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034750#comment-13034750
 ] 

Michael McCandless commented on LUCENE-3112:


bq. Yet, I think you should push the document iteration etc into DWPT to 
actually apply the delterm only once to make it really atomic.

Ahh good point -- it's wrong just passing that delTerm down N times, too.  I'll 
fix.

bq. I also wonder if we should allow multiple delTerm e.g. Tuple otherwise you would be bound to one delterm pre "collection" but what 
if you want to remove only one of the "sub-documents"?

So, this won't work today w/ nested querying, if I understand it right.  Ie, if 
you only update one of the subs, now your subdocs are no longer sequential (nor 
in one segment).  So I think "design for today" here...?

Someday, when we implement incremental field updates correctly, so that updates 
are written as stacked segments against the original segment containing the 
document, at that point I think we can add an API that lets you update multiple 
docs atomically?
{quote}

> Add IW.add/updateDocuments to support nested documents
> --
>
> Key: LUCENE-3112
> URL: https://issues.apache.org/jira/browse/LUCENE-3112
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3112.patch
>
>
> I think nested documents (LUCENE-2454) is a very compelling addition
> to Lucene.  It's also a popular (many votes) issue.
> Beyond supporting nested document querying, which is already an
> incredible addition since it preserves the relational model on
> indexing normalized content (eg, DB tables, XML docs), LUCENE-2454
> should also enable speedups in grouping implementation when you group
> by a nested field.
> For the same reason, it can also enable very fast post-group facet
> counting impl (LUCENE-3097) when you what to
> count(distinct(nestedField)), instead of unique documents, as your
> "identifier".  I expect many apps that use faceting need this ability
> (to count(distinct(nestedField)) not distinct(docID)).
> To support these use cases, I believe the only core change needed is
> the ability to atomically add or update multiple documents, which you
> cannot do today since in between add/updateDocument calls a flush (eg
> due to commit or getReader()) could occur.
> This new API (addDocuments(Iterable), updateDocuments(Term
> delTerm, Iterable) would also further guarantee that the
> documents are assigned sequential docIDs in the order the iterator
> provided them, and that the docIDs all reside in one segment.
> Segment merging never splits segments apart, so this invariant would
> hold even as merges/optimizes take place.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Solr Config XML DTD's

2011-05-17 Thread Michael McCandless

https://issues.apache.org/jira/browse/SOLR-2119 is a good example
where we are failing to catch mis-configuration on startup.

Is there some way we can baby step here?  EG use one of these XML
validation packages, incrementally, on only sub-strings from the XML?
(Or simpler is to just do the checking ourselves w/ custom code).

Mike

http://blog.mikemccandless.com

On Wed, May 4, 2011 at 10:50 PM, Michael Sokolov  wrote:
> I'm not sure you will find anyone wanting to put in this effort now, but
> another suggestion for a general approach might be:
>
> 1 very basic static analysis to catch what you can - this should be a pretty
> minimal effort only given what can reasonably be achieved
>
> 2 throw runtime errors as Hoss says (probably already doing this well
> enough, but maybe some incremental improvements are needed?)
>
> 3 an option to run a "configtest" like httpd provides that preloads all
> declared handlers/plugins/modules etc, instantiates them and gives them an
> opportunity to read their config and throw whatever errors they find.  This
> way you can set a standard (error on unrecognized parameter, say) in some
> core areas, and distribute the effort.  This is a hugely useful sanity check
> to be able to run when you want to make config changes and not have your
> server fall over when it starts (or worse - later).
>
> -Mike "kibitzer" Sokolov
>
> On 5/4/2011 6:55 PM, Chris Hostetter wrote:
>>
>> As i said: any improvements to help catch the mistakes we can identify
>> would be great, but we should maintain perspective of the effort/gain
>> tradeoff given that there is likely nothing we can do about the basic
>> problem of "a string that won't be evaluated until runtime"
>>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2119) IndexSchema should log warning if is declared with charfilter/tokenizer/tokenfiler out of order


[ 
https://issues.apache.org/jira/browse/SOLR-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034740#comment-13034740
 ] 

Michael McCandless commented on SOLR-2119:
--

+1 for hard error.

In general for problems we can detect at startup we should not start the 
server.  Users rarely see/do something about the warnings.

I think this would be a good service to those users who trip the hard error on 
upgrade: it means Solr is not doing what they thought they asked it to do.

> IndexSchema should log warning if  is declared with 
> charfilter/tokenizer/tokenfiler out of order
> --
>
> Key: SOLR-2119
> URL: https://issues.apache.org/jira/browse/SOLR-2119
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Hoss Man
> Fix For: 3.2, 4.0
>
>
> There seems to be a segment of hte user population that has a hard time 
> understanding the distinction between a charfilter, a tokenizer, and a 
> tokenfilter -- while we can certianly try to improve the documentation about 
> what exactly each does, and when they take affect in the analysis chain, one 
> other thing we should do is try to educate people when they constuct their 
>  in a way that doesn't make any sense.
> at the moment, some people are attempting to do things like "move the Foo 
>  before the " to try and get certain behavior ... 
> at a minimum we should log a warning in this case that doing that doesn't 
> have the desired effect
> (we could easily make such a situation fail to initialize, but i'm not 
> convinced that would be the best course of action, since some people may have 
> schema's where they have declared a charFilter or tokenizer out of order 
> relative to their tokenFilters, but are still getting "correct" results that 
> work for them, and breaking their instance on upgrade doens't seem like it 
> would be productive)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2119) IndexSchema should log warning if is declared with charfilter/tokenizer/tokenfiler out of order


 [ 
https://issues.apache.org/jira/browse/SOLR-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated SOLR-2119:
-

Fix Version/s: 4.0
   3.2

> IndexSchema should log warning if  is declared with 
> charfilter/tokenizer/tokenfiler out of order
> --
>
> Key: SOLR-2119
> URL: https://issues.apache.org/jira/browse/SOLR-2119
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Hoss Man
> Fix For: 3.2, 4.0
>
>
> There seems to be a segment of hte user population that has a hard time 
> understanding the distinction between a charfilter, a tokenizer, and a 
> tokenfilter -- while we can certianly try to improve the documentation about 
> what exactly each does, and when they take affect in the analysis chain, one 
> other thing we should do is try to educate people when they constuct their 
>  in a way that doesn't make any sense.
> at the moment, some people are attempting to do things like "move the Foo 
>  before the " to try and get certain behavior ... 
> at a minimum we should log a warning in this case that doing that doesn't 
> have the desired effect
> (we could easily make such a situation fail to initialize, but i'm not 
> convinced that would be the best course of action, since some people may have 
> schema's where they have declared a charFilter or tokenizer out of order 
> relative to their tokenFilters, but are still getting "correct" results that 
> work for them, and breaking their instance on upgrade doens't seem like it 
> would be productive)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-3111) TestFSTs.testRandomWords failure


 [ 
https://issues.apache.org/jira/browse/LUCENE-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-3111:
--

Assignee: Michael McCandless

> TestFSTs.testRandomWords failure
> 
>
> Key: LUCENE-3111
> URL: https://issues.apache.org/jira/browse/LUCENE-3111
> Project: Lucene - Java
>  Issue Type: Bug
>Reporter: selckin
>Assignee: Michael McCandless
>Priority: Minor
>
> Was running some while(1) tests on the docvalues branch (r1103705) and the 
> following test failed:
> {code}
> [junit] Testsuite: org.apache.lucene.util.automaton.fst.TestFSTs
> [junit] Testcase: 
> testRandomWords(org.apache.lucene.util.automaton.fst.TestFSTs): FAILED
> [junit] expected:<771> but was:
> [junit] junit.framework.AssertionFailedError: expected:<771> but 
> was:
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.verifyUnPruned(TestFSTs.java:540)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:496)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs$FSTTester.doTest(TestFSTs.java:359)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.doTest(TestFSTs.java:319)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.testRandomWords(TestFSTs.java:940)
> [junit]   at 
> org.apache.lucene.util.automaton.fst.TestFSTs.testRandomWords(TestFSTs.java:915)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211)
> [junit] 
> [junit] 
> [junit] Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 7.628 sec
> [junit] 
> [junit] - Standard Error -
> [junit] NOTE: Ignoring nightly-only test method 'testBigSet'
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestFSTs 
> -Dtestmethod=testRandomWords -Dtests.seed=-269475578956012681:0
> [junit] NOTE: test params are: codec=PreFlex, locale=ar, 
> timezone=America/Blanc-Sablon
> [junit] NOTE: all tests run in this JVM:
> [junit] [TestToken, TestCodecs, TestIndexReaderReopen, 
> TestIndexWriterMerging, TestNoDeletionPolicy, TestParallelReaderEmptyIndex, 
> TestParallelTermEnum, TestPerSegmentDeletes, TestSegmentReader, 
> TestSegmentTermDocs, TestStressAdvance, TestTermVectorsReader, 
> TestSurrogates, TestMultiFieldQueryParser, TestAutomatonQuery, 
> TestBooleanScorer, TestFuzzyQuery, TestMultiTermConstantScore, 
> TestNumericRangeQuery64, TestPositiveScoresOnlyCollector, TestPrefixFilter, 
> TestQueryTermVector, TestScorerPerf, TestSloppyPhraseQuery, 
> TestSpansAdvanced, TestWindowsMMap, TestRamUsageEstimator, TestSmallFloat, 
> TestUnicodeUtil, TestFSTs]
> [junit] NOTE: Linux 2.6.37-gentoo amd64/Sun Microsystems Inc. 1.6.0_25 
> (64-bit)/cpus=8,threads=1,free=137329960,total=208207872
> [junit] -  ---
> [junit] TEST org.apache.lucene.util.automaton.fst.TestFSTs FAILED
> {code}
> I am not able to reproduce

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3112) Add IW.add/updateDocuments to support nested documents

2011-05-17 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034734#comment-13034734
 ] 

Jason Rutherglen commented on LUCENE-3112:
--

I think perhaps like a Hadoop input format split, we can define meta-data at 
the segment level as to where the documents live so that if one is 'splitting' 
the index, as is being implemented with HBase, the 'splitter' can be 'smart'.

> Add IW.add/updateDocuments to support nested documents
> --
>
> Key: LUCENE-3112
> URL: https://issues.apache.org/jira/browse/LUCENE-3112
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3112.patch
>
>
> I think nested documents (LUCENE-2454) is a very compelling addition
> to Lucene.  It's also a popular (many votes) issue.
> Beyond supporting nested document querying, which is already an
> incredible addition since it preserves the relational model on
> indexing normalized content (eg, DB tables, XML docs), LUCENE-2454
> should also enable speedups in grouping implementation when you group
> by a nested field.
> For the same reason, it can also enable very fast post-group facet
> counting impl (LUCENE-3097) when you what to
> count(distinct(nestedField)), instead of unique documents, as your
> "identifier".  I expect many apps that use faceting need this ability
> (to count(distinct(nestedField)) not distinct(docID)).
> To support these use cases, I believe the only core change needed is
> the ability to atomically add or update multiple documents, which you
> cannot do today since in between add/updateDocument calls a flush (eg
> due to commit or getReader()) could occur.
> This new API (addDocuments(Iterable), updateDocuments(Term
> delTerm, Iterable) would also further guarantee that the
> documents are assigned sequential docIDs in the order the iterator
> provided them, and that the docIDs all reside in one segment.
> Segment merging never splits segments apart, so this invariant would
> hold even as merges/optimizes take place.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3112) Add IW.add/updateDocuments to support nested documents


[ 
https://issues.apache.org/jira/browse/LUCENE-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13034730#comment-13034730
 ] 

Robert Muir commented on LUCENE-3112:
-

We should really think through the consequences of this though.

If core features of lucene become implemented in a way that they rely upon 
these sequential docids, we then lock ourselves out of future optimizations 
such as reordering docids for optimal index compression.


> Add IW.add/updateDocuments to support nested documents
> --
>
> Key: LUCENE-3112
> URL: https://issues.apache.org/jira/browse/LUCENE-3112
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3112.patch
>
>
> I think nested documents (LUCENE-2454) is a very compelling addition
> to Lucene.  It's also a popular (many votes) issue.
> Beyond supporting nested document querying, which is already an
> incredible addition since it preserves the relational model on
> indexing normalized content (eg, DB tables, XML docs), LUCENE-2454
> should also enable speedups in grouping implementation when you group
> by a nested field.
> For the same reason, it can also enable very fast post-group facet
> counting impl (LUCENE-3097) when you what to
> count(distinct(nestedField)), instead of unique documents, as your
> "identifier".  I expect many apps that use faceting need this ability
> (to count(distinct(nestedField)) not distinct(docID)).
> To support these use cases, I believe the only core change needed is
> the ability to atomically add or update multiple documents, which you
> cannot do today since in between add/updateDocument calls a flush (eg
> due to commit or getReader()) could occur.
> This new API (addDocuments(Iterable), updateDocuments(Term
> delTerm, Iterable) would also further guarantee that the
> documents are assigned sequential docIDs in the order the iterator
> provided them, and that the docIDs all reside in one segment.
> Segment merging never splits segments apart, so this invariant would
> hold even as merges/optimizes take place.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3084) MergePolicy.OneMerge.segments should be List not SegmentInfos