date:20110707


 [ 
https://issues.apache.org/jira/browse/LUCENENET-432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Digy resolved LUCENENET-432.


   Resolution: Fixed
Fix Version/s: Lucene.Net 2.9.4
   Lucene.Net 2.9.2
 Assignee: Digy

Patch committed to trunk  2.9.4g branch

 Concurrency issues in SegmentInfo.Files() (LUCENE-2584)
 ---

 Key: LUCENENET-432
 URL: https://issues.apache.org/jira/browse/LUCENENET-432
 Project: Lucene.Net
  Issue Type: Bug
Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g
Reporter: Digy
Assignee: Digy
 Fix For: Lucene.Net 2.9.2, Lucene.Net 2.9.4

 Attachments: SegmentInfo.patch


 The multi-threaded call of the files() in SegmentInfo could lead to the 
 ConcurrentModificationException if one thread is not finished additions to 
 the ArrayList (files) yet while the other thread already obtained it as 
 cached.
 https://issues.apache.org/jira/browse/LUCENE-2584

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[Lucene.Net] [jira] [Resolved] (LUCENENET-430) Contrib.ChainedFilter


 [ 
https://issues.apache.org/jira/browse/LUCENENET-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Digy resolved LUCENENET-430.


Resolution: Fixed

Instead of creating a small project, I put it into Contrib.Analyzers.

 Contrib.ChainedFilter
 -

 Key: LUCENENET-430
 URL: https://issues.apache.org/jira/browse/LUCENENET-430
 Project: Lucene.Net
  Issue Type: New Feature
Affects Versions: Lucene.Net 2.9.4g
Reporter: Digy
Priority: Minor
 Fix For: Lucene.Net 2.9.4g

 Attachments: ChainedFilter.cs, ChainedFilterTest.cs


 Port of lucene.Java 3.0.3's ChainedFilter  its test cases.
 See the StackOverflow question: How to combine multiple filters within one 
 search?
 http://stackoverflow.com/questions/6570477/multiple-filters-in-lucene-net

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[Lucene.Net] [jira] [Commented] (LUCENENET-418) LuceneTestCase should not have a static method could throw exceptions.


[ 
https://issues.apache.org/jira/browse/LUCENENET-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061125#comment-13061125
 ] 

Digy commented on LUCENENET-418:


It works! Thanks.
DIGY

 LuceneTestCase should not have a static method could throw exceptions.  
 

 Key: LUCENENET-418
 URL: https://issues.apache.org/jira/browse/LUCENENET-418
 Project: Lucene.Net
  Issue Type: Bug
  Components: Lucene.Net Test
Affects Versions: Lucene.Net 3.x
 Environment: Linux, OSX, etc 
Reporter: michael herndon
Assignee: michael herndon
  Labels: test
 Fix For: Lucene.Net 2.9.4g

   Original Estimate: 2m
  Remaining Estimate: 2m

 Throwing an exception in a base classes for 90% tests in a static method 
 makes it hard to debug the issue in nunit.
 The test results came back saying that TestFixtureSetup was causing an issue 
 even though it was the Static Constructor causing problems and this then 
 propagates to all the tests that stem from LuceneTestCase. 
 The TEMP_DIR needs to be moved to a static util class as a property or even a 
 mixin method.  This caused me hours to debug and figure out the real issue as 
 the underlying exception method never bubbled up.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [Lucene.Net] Lucene Steroids

2011-07-07 Thread Robert Stewart

I have built something similar using NTFS hard-links and re-using existing 
local snapshot files, etc.  It runs in production for 3+ years now with more 
than 100 million docs, and distributes new snapshots from master servers every 
minute.  It does not use any rsync, but only leverages unique file names in 
lucene - it only copies files not already existing on slaves, and uses NTFS 
hard links to copy existing local files into new snapshot directory. Also, on 
the masters, it just uses NTFS hard links to create a new snapshot of the 
master index, and then slaves just look for new snapshot directories on the 
master servers.  When new directory shows up, it looks at existing local 
snapshot to see which files are new on master (or have been deleted by master), 
and then only copies new files.  It does not need to send any explicit commit 
operations, and there is no explicit communication between masters and slaves 
(slaves just look in some remote directory for new snapshot sub-directories).   
This has worked great with no problems at all.  All this was built prior to 
SOLR being available on windows.  Going forward we are transitioning to Java 
and SOLR on Linux (it is just to hard to keep up with improvements otherwise 
IMO).



On Jul 6, 2011, at 8:22 PM, Guilherme Balena Versiani wrote:

 Hi,
 
 I am working on a derived work of Solr for .NET. The purpose is to obtain a 
 similar solution of Lucene replication available at Solr, but without the 
 need to port all Solr code.
 
 There is a SnapShooter, SnapPuller and a SnapInstaller. The SnapShooter does 
 similar work as in Solr script. The SnapPuller uses cwRsync to replicate the 
 database between machines, but without storing the 
 snapshot.current.MACHINENAME files on master, as cwRsync does no support sync 
 with the server. The SnapInstaller tries to substitute the Lucene database 
 files in-place -- the Lucene application should use a SteroidsFSDirectory 
 that creates a special SteroidsFSIndexInput that permits to rename files in 
 use; after that, SnapInstaller sends a commit operation through a Windows 
 named pipe to the application to reset its current IndexSearcher instance.
 
 This solution has the suggestive name of Lucene Steroids, and was hosted in 
 BitBucket.org. What is the best way to continue to distribute it? Should I 
 continue to maintain it on BitBucket.org or should I apply to Lucene.NET 
 project (I don't know how) to include it on Contrib modules?
 
 The current code is available at http://bitbucket.org/guibv/lucene.steroids. 
 The work is incomplete; the first stable version should be available on next 
 few days.
 
 Best regards,
 Guilherme Balena Versiani.

[Lucene.Net] [jira] [Created] (LUCENENET-433) AttributeSource can have an invalid computed state (LUCENE-3042)

AttributeSource can have an invalid computed state (LUCENE-3042)


 Key: LUCENENET-433
 URL: https://issues.apache.org/jira/browse/LUCENENET-433
 Project: Lucene.Net
  Issue Type: Bug
Reporter: Digy
 Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g


If you work a tokenstream, consume it, then reuse it and add an attribute to 
it, the computed state is wrong.
thus for example, clearAttributes() will not actually clear the attribute added.
So in some situations, addAttribute is not actually clearing the computed state 
when it should.

https://issues.apache.org/jira/browse/LUCENE-3042


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[Lucene.Net] [jira] [Commented] (LUCENENET-433) AttributeSource can have an invalid computed state (LUCENE-3042)


[ 
https://issues.apache.org/jira/browse/LUCENENET-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061214#comment-13061214
 ] 

Digy commented on LUCENENET-433:


Here is the test case
{code}
[Test]
public void Test_LUCENE_3042_LUCENENET_433()
{
String testString = t;

Analyzer analyzer = new 
Lucene.Net.Analysis.Standard.StandardAnalyzer();
TokenStream stream = analyzer.ReusableTokenStream(dummy, new 
System.IO.StringReader(testString));
stream.Reset();
while (stream.IncrementToken())
{
// consume
}
stream.End();
stream.Close();

AssertAnalyzesToReuse(analyzer, testString, new String[] { t });
}
{code}

 AttributeSource can have an invalid computed state (LUCENE-3042)
 

 Key: LUCENENET-433
 URL: https://issues.apache.org/jira/browse/LUCENENET-433
 Project: Lucene.Net
  Issue Type: Bug
Reporter: Digy
 Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g


 If you work a tokenstream, consume it, then reuse it and add an attribute to 
 it, the computed state is wrong.
 thus for example, clearAttributes() will not actually clear the attribute 
 added.
 So in some situations, addAttribute is not actually clearing the computed 
 state when it should.
 https://issues.apache.org/jira/browse/LUCENE-3042

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[Lucene.Net] [jira] [Resolved] (LUCENENET-172) This patch fixes the unexceptional exceptions ecountered in FastCharStream and SupportClass

[
https://issues.apache.org/jira/browse/LUCENENET-172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Digy resolved LUCENENET-172.

Resolution: Fixed
Assignee: Digy (was: Scott Lombard)

Fixed in 2.9.4g. No fix for 2.9.4

This patch fixes the unexceptional exceptions ecountered in FastCharStream
and SupportClass
---

Key: LUCENENET-172
URL: https://issues.apache.org/jira/browse/LUCENENET-172
Project: Lucene.Net
Issue Type: Improvement
Components: Lucene.Net Core
Affects Versions: Lucene.Net 2.3.1, Lucene.Net 2.3.2
Reporter: Ben Martz
Assignee: Digy
Priority: Minor
Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g

Attachments: lucene_2.3.1_exceptions_fix.patch,
lucene_2.9.4g_exceptions_fix

The java version of Lucene handles end-of-file in FastCharStream by throwing
an exception. This behavior has been ported to .NET but the behavior carries
an unacceptable cost in the .NET environment. This patch is based on the
prior work in LUCENENET-8 and LUCENENET-11, which I gratefully acknowledge
for the solution. While I understand that this patch is outside of the
current project specification in that it deviates from the pure nature of
the port, I believe that it is very important to make the patch available to
any developer looking to leverage Lucene.Net in their project. Thanks for
your consideration.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (LUCENE-3279) Allow CFS be empty


[ 
https://issues.apache.org/jira/browse/LUCENE-3279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061063#comment-13061063
 ] 

Simon Willnauer commented on LUCENE-3279:
-

I plan to commit this soon if nobody objects.

 Allow CFS be empty
 --

 Key: LUCENE-3279
 URL: https://issues.apache.org/jira/browse/LUCENE-3279
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3279.patch


 since we changed CFS semantics slightly closing a CFS directory on an error 
 can lead to an exception. Yet, an empty CFS is still a valid CFS so for 
 consistency we should allow CFS to be empty.
 here is an example:
 {noformat}
 1 tests failed.
 REGRESSION:  
 org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull
 Error Message:
 CFS has no entries
 Stack Trace:
 java.lang.IllegalStateException: CFS has no entries
at 
 org.apache.lucene.store.CompoundFileWriter.close(CompoundFileWriter.java:139)
at 
 org.apache.lucene.store.CompoundFileDirectory.close(CompoundFileDirectory.java:181)
at 
 org.apache.lucene.store.DefaultCompoundFileDirectory.close(DefaultCompoundFileDirectory.java:58)
at 
 org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:139)
at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4252)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3863)
at 
 org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37)
at 
 org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2715)
at 
 org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2710)
at 
 org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2706)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3513)
at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2064)
at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2031)
at 
 org.apache.lucene.index.TestIndexWriterOnDiskFull.addDoc(TestIndexWriterOnDiskFull.java:539)
at 
 org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull(TestIndexWriterOnDiskFull.java:74)
at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3216) Store DocValues per segment instead of per field


[ 
https://issues.apache.org/jira/browse/LUCENE-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061066#comment-13061066
 ] 

Simon Willnauer commented on LUCENE-3216:
-

I plan to commit this soon if nobody objects.

 Store DocValues per segment instead of per field
 

 Key: LUCENE-3216
 URL: https://issues.apache.org/jira/browse/LUCENE-3216
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, 
 LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, 
 LUCENE-3216_floats.patch


 currently we are storing docvalues per field which results in at least one 
 file per field that uses docvalues (or at most two per field per segment 
 depending on the impl.). Yet, we should try to by default pack docvalues into 
 a single file if possible. To enable this we need to hold all docvalues in 
 memory during indexing and write them to disk once we flush a segment. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3262) Facet benchmarking

2011-07-07 Thread Toke Eskildsen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Toke Eskildsen updated LUCENE-3262:
---

Attachment: TestPerformanceHack.java
CorpusGenerator.java

I've attached a second shot at faceting performance testing. It separates the
taxonomy generation into a CorpusGenerator (maybe similar to the
RandomTaxonomyWriter that Robert calls for in LUCENE-3264?).

Proper setup of faceting tweaks for the new faceting module is not done at all
and not something I find myself qualified for.

Facet benchmarking
--

Key: LUCENE-3262
URL: https://issues.apache.org/jira/browse/LUCENE-3262
Project: Lucene - Java
Issue Type: New Feature
Components: modules/benchmark, modules/facet
Reporter: Shai Erera
Attachments: CorpusGenerator.java, TestPerformanceHack.java

A spin off from LUCENE-3079. We should define few benchmarks for faceting
scenarios, so we can evaluate the new faceting module as well as any
improvement we'd like to consider in the future (such as cutting over to
docvalues, implement FST-based caches etc.).
Toke attached a preliminary test case to LUCENE-3079, so I'll attach it here
as a starting point.
We've also done some preliminary job for extending Benchmark for faceting, so
I'll attach it here as well.
We should perhaps create a Wiki page where we clearly describe the benchmark
scenarios, then include results of 'default settings' and 'optimized
settings', or something like that.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3287) Allow ability to set maxDocCharsToAnalyze in WeightedSpanTermExtractor

2011-07-07 Thread Jahangir Anwari (JIRA)

Allow ability to set maxDocCharsToAnalyze in WeightedSpanTermExtractor
--

 Key: LUCENE-3287
 URL: https://issues.apache.org/jira/browse/LUCENE-3287
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/highlighter
Affects Versions: 3.3
Reporter: Jahangir Anwari
Priority: Trivial


In WeightedSpanTermExtractor the default maxDocCharsToAnalyze value is 0. This 
inhibits us from getting the weighted span terms in any custom code(e.g 
attached CustomHighlighter.java) that uses WeightedSpanTermExtractor. Currently 
the setMaxDocCharsToAnalyze() method is protected, which prevents us from 
setting  maxDocCharsToAnalyze to a value greater than 0. Changing the method to 
public would give us the ability to set the maxDocCharsToAnalyze.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3287) Allow ability to set maxDocCharsToAnalyze in WeightedSpanTermExtractor

2011-07-07 Thread Jahangir Anwari (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jahangir Anwari updated LUCENE-3287:


Attachment: WeightedSpanTermExtractor.patch
CustomHighlighter.java

 Allow ability to set maxDocCharsToAnalyze in WeightedSpanTermExtractor
 --

 Key: LUCENE-3287
 URL: https://issues.apache.org/jira/browse/LUCENE-3287
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/highlighter
Affects Versions: 3.3
Reporter: Jahangir Anwari
Priority: Trivial
 Attachments: CustomHighlighter.java, WeightedSpanTermExtractor.patch


 In WeightedSpanTermExtractor the default maxDocCharsToAnalyze value is 0. 
 This inhibits us from getting the weighted span terms in any custom code(e.g 
 attached CustomHighlighter.java) that uses WeightedSpanTermExtractor. 
 Currently the setMaxDocCharsToAnalyze() method is protected, which prevents 
 us from setting  maxDocCharsToAnalyze to a value greater than 0. Changing the 
 method to public would give us the ability to set the maxDocCharsToAnalyze.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1945) Allow @Field annotations in nested classes using DocumentObjectBinder

2011-07-07 Thread Monica Storfjord (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061129#comment-13061129
 ] 

Monica Storfjord commented on SOLR-1945:


What is the status on this proposal? It would be a great feature and very 
beneficial to my current project! :) Do you have a full solution on this and 
witch version do you think this feature will be released in?

- Monica

 Allow @Field annotations in nested classes using DocumentObjectBinder
 -

 Key: SOLR-1945
 URL: https://issues.apache.org/jira/browse/SOLR-1945
 Project: Solr
  Issue Type: Improvement
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: SOLR-1945.patch


 see 
 http://search.lucidimagination.com/search/document/d909d909420aeb4e/does_solrj_support_nested_annotated_beans
 Would be nice to be able to pass an object graph to solrj with @field 
 annotations rather than just a top level class

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2452) rewrite solr build system


 [ 
https://issues.apache.org/jira/browse/SOLR-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated SOLR-2452:
--

Attachment: SOLR-2452-post-reshuffling.patch

This patch restores all of Solr's build targets from trunk; the build system 
rewrite is feature-complete at this point.  (The reshuffling scripts requires 
no further changes.)

I moved {{lucene-lib/}} directories to under {{build/}}, and eliminated 
per-contrib {{clean}} target actions - instead, {{ant clean}} just deletes 
{{solr/build/}}, {{solr/dist/}}, {{solr/package/}}, and 
{{solr/example/solr/lib/}}.

Before I commit this patch to the branch, I want to put the build through its 
paces and examine differences between the outputs from trunk and from 
branches/solr2452 with this patch applied.

One difference I found so far: on trunk the Solr create-package target includes 
duplicate javadocs for some non-contrib modules (core  solrj I think): in the 
uber-javadocs, and again for the javadocs produced for maven.  The per-contrib 
javadocs, by contrast, are excluded.  This makes the compressed binary package 
about 1.8MB larger than it needs to be, IIRC.


 rewrite solr build system
 -

 Key: SOLR-2452
 URL: https://issues.apache.org/jira/browse/SOLR-2452
 Project: Solr
  Issue Type: Task
  Components: Build
Reporter: Robert Muir
Assignee: Steven Rowe
 Fix For: 3.4, 4.0

 Attachments: SOLR-2452-post-reshuffling.patch, 
 SOLR-2452-post-reshuffling.patch, SOLR-2452-post-reshuffling.patch, 
 SOLR-2452.dir.reshuffle.sh, SOLR-2452.dir.reshuffle.sh


 As discussed some in SOLR-2002 (but that issue is long and hard to follow), I 
 think we should rewrite the solr build system.
 Its slow, cumbersome, and messy, and makes it hard for us to improve things.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3288) 'Thus terms are represented ...' should be 'Thus fields are represented ...'

2011-07-07 Thread Paul Foster (JIRA)

'Thus terms are represented ...'  should be 'Thus fields are represented ...'
-

 Key: LUCENE-3288
 URL: https://issues.apache.org/jira/browse/LUCENE-3288
 Project: Lucene - Java
  Issue Type: Bug
  Components: general/website
Affects Versions: 3.1
 Environment: n/a
Reporter: Paul Foster
Priority: Trivial
 Fix For: 3.1.1


In the last paragraph of 
http://lucene.apache.org/java/3_1_0/fileformats.html#Definitions, second 
sentance, it says:

   Thus terms are represented as a pair of 
   strings, the first naming the field, and 
   the second naming text within the field. 

Shouldn't it start Thus fields are ...  ?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1945) Allow @Field annotations in nested classes using DocumentObjectBinder

2011-07-07 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061183#comment-13061183
 ] 

Mark Miller commented on SOLR-1945:
---

hmmm...I don't remember. I'll take a look again.

 Allow @Field annotations in nested classes using DocumentObjectBinder
 -

 Key: SOLR-1945
 URL: https://issues.apache.org/jira/browse/SOLR-1945
 Project: Solr
  Issue Type: Improvement
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: SOLR-1945.patch


 see 
 http://search.lucidimagination.com/search/document/d909d909420aeb4e/does_solrj_support_nested_annotated_beans
 Would be nice to be able to pass an object graph to solrj with @field 
 annotations rather than just a top level class

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2623) Solr JMX MBeans do not survive core reloads

2011-07-07 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061190#comment-13061190
 ] 

Shalin Shekhar Mangar commented on SOLR-2623:
-

There's another bug with core reload that I found while running Alexey's test. 
Suppose there's only one core with name X and you reload X, it then becomes 
registered with  as the core name. So all your jmx monitoring is now useless 
because the key names have changed.

 Solr JMX MBeans do not survive core reloads
 ---

 Key: SOLR-2623
 URL: https://issues.apache.org/jira/browse/SOLR-2623
 Project: Solr
  Issue Type: Bug
  Components: multicore
Affects Versions: 1.4, 1.4.1, 3.1, 3.2
Reporter: Alexey Serba
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Attachments: SOLR-2623.patch, SOLR-2623.patch, SOLR-2623.patch


 Solr JMX MBeans do not survive core reloads
 {noformat:title=Steps to reproduce}
 sh cd example
 sh vi multicore/core0/conf/solrconfig.xml # enable jmx
 sh java -Dcom.sun.management.jmxremote -Dsolr.solr.home=multicore -jar 
 start.jar
 sh echo 'open 8842 # 8842 is java pid
  domain solr/core0
  beans
  ' | java -jar jmxterm-1.0-alpha-4-uber.jar
 
 solr/core0:id=core0,type=core
 solr/core0:id=org.apache.solr.handler.StandardRequestHandler,type=org.apache.solr.handler.StandardRequestHandler
 solr/core0:id=org.apache.solr.handler.StandardRequestHandler,type=standard
 solr/core0:id=org.apache.solr.handler.XmlUpdateRequestHandler,type=/update
 solr/core0:id=org.apache.solr.handler.XmlUpdateRequestHandler,type=org.apache.solr.handler.XmlUpdateRequestHandler
 ...
 solr/core0:id=org.apache.solr.search.SolrIndexSearcher,type=searcher
 solr/core0:id=org.apache.solr.update.DirectUpdateHandler2,type=updateHandler
 sh curl 'http://localhost:8983/solr/admin/cores?action=RELOADcore=core0'
 sh echo 'open 8842 # 8842 is java pid
  domain solr/core0
  beans
  ' | java -jar jmxterm-1.0-alpha-4-uber.jar
 # there's only one bean left after Solr core reload
 solr/core0:id=org.apache.solr.search.SolrIndexSearcher,type=Searcher@2e831a91 
 main
 {noformat}
 The root cause of this is Solr core reload behavior:
 # create new core (which overwrites existing registered MBeans)
 # register new core and close old one (we remove/un-register MBeans on 
 oldCore.close)
 The correct sequence is:
 # unregister MBeans from old core
 # create and register new core
 # close old core without touching MBeans

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3289) FST should allow controlling how hard builder tries to share suffixes

FST should allow controlling how hard builder tries to share suffixes
-

 Key: LUCENE-3289
 URL: https://issues.apache.org/jira/browse/LUCENE-3289
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.4, 4.0


Today we have a boolean option to the FST builder telling it whether
it should share suffixes.

If you turn this off, building is much faster, uses much less RAM, and
the resulting FST is a prefix trie.  But, the FST is larger than it
needs to be.  When it's on, the builder maintains a node hash holding
every node seen so far in the FST -- this uses up RAM and slows things
down.

On a dataset that Elmer (see java-user thread Autocompletion on large
index on Jul 6 2011) provided (thank you!), which is 1.32 M titles
avg 67.3 chars per title, building with suffix sharing on took 22.5
seconds, required 1.25 GB heap, and produced 91.6 MB FST.  With suffix
sharing off, it was 8.2 seconds, 450 MB heap and 129 MB FST.

I think we should allow this boolean to be shade-of-gray instead:
usually, how well suffixes can share is a function of how far they are
from the end of the string, so, by adding a tunable N to only share
when suffix length  N, we can let caller make reasonable tradeoffs. 


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3289) FST should allow controlling how hard builder tries to share suffixes


 [ 
https://issues.apache.org/jira/browse/LUCENE-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3289:
---

Attachment: LUCENE-3289.patch

Initial rough patch showing the idea.

 FST should allow controlling how hard builder tries to share suffixes
 -

 Key: LUCENE-3289
 URL: https://issues.apache.org/jira/browse/LUCENE-3289
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3289.patch


 Today we have a boolean option to the FST builder telling it whether
 it should share suffixes.
 If you turn this off, building is much faster, uses much less RAM, and
 the resulting FST is a prefix trie.  But, the FST is larger than it
 needs to be.  When it's on, the builder maintains a node hash holding
 every node seen so far in the FST -- this uses up RAM and slows things
 down.
 On a dataset that Elmer (see java-user thread Autocompletion on large
 index on Jul 6 2011) provided (thank you!), which is 1.32 M titles
 avg 67.3 chars per title, building with suffix sharing on took 22.5
 seconds, required 1.25 GB heap, and produced 91.6 MB FST.  With suffix
 sharing off, it was 8.2 seconds, 450 MB heap and 129 MB FST.
 I think we should allow this boolean to be shade-of-gray instead:
 usually, how well suffixes can share is a function of how far they are
 from the end of the string, so, by adding a tunable N to only share
 when suffix length  N, we can let caller make reasonable tradeoffs. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3289) FST should allow controlling how hard builder tries to share suffixes


[ 
https://issues.apache.org/jira/browse/LUCENE-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061195#comment-13061195
 ] 

Michael McCandless commented on LUCENE-3289:


NOTE: patch applies to 3.x.

I ran the patch on the titles, varying the max prefix sharing length:

||Len||FST Size||Seconds||
|1|135446807|8.2|
|2|137632702|8.5|
|3|135177994|8.3|
|4|132782016|8.3|
|5|130415331|8.4|
|6|128086200|8.0|
|7|125797396|8.2|
|8|123552157|8.5|
|9|121358375|8.4|
|10|119228942|8.1|
|11|117181180|8.8|
|12|115229788|8.7|
|13|113388260|9.5|
|14|111664442|9.0|
|15|110059167|9.2|
|16|108572519|9.7|
|17|107201905|9.8|
|18|105942576|10.3|
|19|104791497|10.1|
|20|103745678|11.1|
|21|102801693|10.8|
|22|101957797|11.4|
|23|101206564|11.1|
|24|100541849|11.0|
|25|99956443|11.1|
|26|99443232|12.9|
|27|98995194|13.2|
|28|98604680|13.9|
|29|98264184|13.5|
|30|97969241|13.6|
|31|97714049|13.8|
|32|97494104|14.3|
|33|97304045|14.0|
|34|97140033|14.3|
|35|96998942|14.6|
|36|96877590|16.5|
|37|96773039|16.9|
|38|96682961|16.6|
|39|96605160|17.8|
|40|96537687|18.3|
|41|96479286|17.8|
|42|96428710|17.5|
|43|96384659|18.9|
|44|96346174|17.0|
|45|96312826|19.3|
|46|96283545|17.8|
|47|96257708|19.4|
|48|96235159|19.0|
|49|96215220|18.7|
|50|96197450|19.6|
|51|96181539|17.3|
|52|96167235|16.9|
|53|96154490|17.7|
|54|96143081|18.8|
|55|96132905|17.4|
|56|96123776|17.5|
|57|96115462|20.7|
|58|96108051|19.2|
|59|96101249|19.1|
|60|96095107|18.7|
|ALL|96020343|22.5|

Very very odd that FST size first goes up at N=2... not yet sure why.  But from 
this curve it looks like there is a sweet spot around maybe N=24.  I didn't 
measure required heap here, but it also will go down as N goes down.


 FST should allow controlling how hard builder tries to share suffixes
 -

 Key: LUCENE-3289
 URL: https://issues.apache.org/jira/browse/LUCENE-3289
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3289.patch


 Today we have a boolean option to the FST builder telling it whether
 it should share suffixes.
 If you turn this off, building is much faster, uses much less RAM, and
 the resulting FST is a prefix trie.  But, the FST is larger than it
 needs to be.  When it's on, the builder maintains a node hash holding
 every node seen so far in the FST -- this uses up RAM and slows things
 down.
 On a dataset that Elmer (see java-user thread Autocompletion on large
 index on Jul 6 2011) provided (thank you!), which is 1.32 M titles
 avg 67.3 chars per title, building with suffix sharing on took 22.5
 seconds, required 1.25 GB heap, and produced 91.6 MB FST.  With suffix
 sharing off, it was 8.2 seconds, 450 MB heap and 129 MB FST.
 I think we should allow this boolean to be shade-of-gray instead:
 usually, how well suffixes can share is a function of how far they are
 from the end of the string, so, by adding a tunable N to only share
 when suffix length  N, we can let caller make reasonable tradeoffs. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2793) Directory createOutput and openInput should take an IOContext


 [ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-2793:


Attachment: LUCENE-2793_final.patch

I committed the latest patch, merged the branch with trunk and created a final 
diff for review. I think this is ready and I would like to reintegrate rather 
sooner than later.

reviews welcome

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793_final.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2635) FieldAnalysisRequestHandler; Expose Filter- Tokenizer-Settings


[ 
https://issues.apache.org/jira/browse/SOLR-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061223#comment-13061223
 ] 

Uwe Schindler commented on SOLR-2635:
-

How would you expose the args map? The problem of the current namedList is that 
its not easy to insert that in a backwards compatible way?

I am currentyl looking into it, hopefully i will find a solution.

 FieldAnalysisRequestHandler; Expose Filter-  Tokenizer-Settings
 

 Key: SOLR-2635
 URL: https://issues.apache.org/jira/browse/SOLR-2635
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, web gui
Reporter: Stefan Matheis (steffkes)
Priority: Minor

 The [current/old Analysis 
 Page|http://files.mathe.is/solr-admin/04_analysis_verbose_cur.png] exposes 
 the Filter-  Tokenizer-Settings -- the FieldAnalysisRequestHandler not :/
 This Information is already available on the 
 [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] 
 (through LukeRequestHandler) - so we could load this in parallel and grab the 
 required informations .. but it would be easier if we could add this 
 Information, so that we have all relevant Information at one Place.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-2635) FieldAnalysisRequestHandler; Expose Filter- Tokenizer-Settings


 [ 
https://issues.apache.org/jira/browse/SOLR-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned SOLR-2635:
---

Assignee: Uwe Schindler

 FieldAnalysisRequestHandler; Expose Filter-  Tokenizer-Settings
 

 Key: SOLR-2635
 URL: https://issues.apache.org/jira/browse/SOLR-2635
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Uwe Schindler
Priority: Minor

 The [current/old Analysis 
 Page|http://files.mathe.is/solr-admin/04_analysis_verbose_cur.png] exposes 
 the Filter-  Tokenizer-Settings -- the FieldAnalysisRequestHandler not :/
 This Information is already available on the 
 [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] 
 (through LukeRequestHandler) - so we could load this in parallel and grab the 
 required informations .. but it would be easier if we could add this 
 Information, so that we have all relevant Information at one Place.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3279) Allow CFS be empty


 [ 
https://issues.apache.org/jira/browse/LUCENE-3279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-3279.
-

Resolution: Fixed

Committed to trunk in revision 1143766

backported to 3.x in revision 1143775

 Allow CFS be empty
 --

 Key: LUCENE-3279
 URL: https://issues.apache.org/jira/browse/LUCENE-3279
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3279.patch


 since we changed CFS semantics slightly closing a CFS directory on an error 
 can lead to an exception. Yet, an empty CFS is still a valid CFS so for 
 consistency we should allow CFS to be empty.
 here is an example:
 {noformat}
 1 tests failed.
 REGRESSION:  
 org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull
 Error Message:
 CFS has no entries
 Stack Trace:
 java.lang.IllegalStateException: CFS has no entries
at 
 org.apache.lucene.store.CompoundFileWriter.close(CompoundFileWriter.java:139)
at 
 org.apache.lucene.store.CompoundFileDirectory.close(CompoundFileDirectory.java:181)
at 
 org.apache.lucene.store.DefaultCompoundFileDirectory.close(DefaultCompoundFileDirectory.java:58)
at 
 org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:139)
at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4252)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3863)
at 
 org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37)
at 
 org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2715)
at 
 org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2710)
at 
 org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2706)
at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3513)
at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2064)
at 
 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:2031)
at 
 org.apache.lucene.index.TestIndexWriterOnDiskFull.addDoc(TestIndexWriterOnDiskFull.java:539)
at 
 org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull(TestIndexWriterOnDiskFull.java:74)
at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3216) Store DocValues per segment instead of per field


 [ 
https://issues.apache.org/jira/browse/LUCENE-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-3216.
-

   Resolution: Fixed
Lucene Fields: [New, Patch Available]  (was: [New])

Committed in revision 1143776.

 Store DocValues per segment instead of per field
 

 Key: LUCENE-3216
 URL: https://issues.apache.org/jira/browse/LUCENE-3216
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, 
 LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, 
 LUCENE-3216_floats.patch


 currently we are storing docvalues per field which results in at least one 
 file per field that uses docvalues (or at most two per field per segment 
 depending on the impl.). Yet, we should try to by default pack docvalues into 
 a single file if possible. To enable this we need to hold all docvalues in 
 memory during indexing and write them to disk once we flush a segment. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked


[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061230#comment-13061230
 ] 

Uwe Schindler commented on SOLR-2399:
-

One additional question on the new analysis page:
- Does your code also support CharFilters? I just ask because I had no time to 
try it out, it just came into my mind when i worked on FieldAnalysisReqHandler. 
The problem is that CharFilters return a different set of attributes and only 
one token.

 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
 SOLR-2399-110606.patch, SOLR-2399-110622.patch, SOLR-2399-110702.patch, 
 SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, 
 SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, 
 SOLR-2399-wip-notice.patch, SOLR-2399.patch


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 *Features:*
 * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
 * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
 * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
 * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
 SOLR-2400)
 * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
 * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
 * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
 * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
 * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
 * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
 ** Stub (using static data)
 Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3233) HuperDuperSynonymsFilter™


[ 
https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061232#comment-13061232
 ] 

Robert Muir commented on LUCENE-3233:
-

so i don't forget, lets not waste an arc bitflag marking an arc as 'first'... 

I hear the secret is instead arc.target == startNode

 HuperDuperSynonymsFilter™
 -

 Key: LUCENE-3233
 URL: https://issues.apache.org/jira/browse/LUCENE-3233
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, synonyms.zip


 The current synonymsfilter uses a lot of ram and cpu, especially at build 
 time.
 I think yesterday I heard about huge synonyms files three times.
 So, I think we should use an FST-based structure, sharing the inputs and 
 outputs.
 And we should be more efficient with the tokenStream api, e.g. using 
 save/restoreState instead of cloneAttributes()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2623) Solr JMX MBeans do not survive core reloads

2011-07-07 Thread Shalin Shekhar Mangar (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-2623:


Attachment: SOLR-2623.patch

Here's a patch which fixes the issue. I've reused Alexey's tests with the 
solution I proposed earlier.

The problem with the core name changing across reloads is something we can 
address in another issue.

 Solr JMX MBeans do not survive core reloads
 ---

 Key: SOLR-2623
 URL: https://issues.apache.org/jira/browse/SOLR-2623
 Project: Solr
  Issue Type: Bug
  Components: multicore
Affects Versions: 1.4, 1.4.1, 3.1, 3.2
Reporter: Alexey Serba
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Attachments: SOLR-2623.patch, SOLR-2623.patch, SOLR-2623.patch, 
 SOLR-2623.patch


 Solr JMX MBeans do not survive core reloads
 {noformat:title=Steps to reproduce}
 sh cd example
 sh vi multicore/core0/conf/solrconfig.xml # enable jmx
 sh java -Dcom.sun.management.jmxremote -Dsolr.solr.home=multicore -jar 
 start.jar
 sh echo 'open 8842 # 8842 is java pid
  domain solr/core0
  beans
  ' | java -jar jmxterm-1.0-alpha-4-uber.jar
 
 solr/core0:id=core0,type=core
 solr/core0:id=org.apache.solr.handler.StandardRequestHandler,type=org.apache.solr.handler.StandardRequestHandler
 solr/core0:id=org.apache.solr.handler.StandardRequestHandler,type=standard
 solr/core0:id=org.apache.solr.handler.XmlUpdateRequestHandler,type=/update
 solr/core0:id=org.apache.solr.handler.XmlUpdateRequestHandler,type=org.apache.solr.handler.XmlUpdateRequestHandler
 ...
 solr/core0:id=org.apache.solr.search.SolrIndexSearcher,type=searcher
 solr/core0:id=org.apache.solr.update.DirectUpdateHandler2,type=updateHandler
 sh curl 'http://localhost:8983/solr/admin/cores?action=RELOADcore=core0'
 sh echo 'open 8842 # 8842 is java pid
  domain solr/core0
  beans
  ' | java -jar jmxterm-1.0-alpha-4-uber.jar
 # there's only one bean left after Solr core reload
 solr/core0:id=org.apache.solr.search.SolrIndexSearcher,type=Searcher@2e831a91 
 main
 {noformat}
 The root cause of this is Solr core reload behavior:
 # create new core (which overwrites existing registered MBeans)
 # register new core and close old one (we remove/un-register MBeans on 
 oldCore.close)
 The correct sequence is:
 # unregister MBeans from old core
 # create and register new core
 # close old core without touching MBeans

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3284) Move contribs/modules away from QueryParser dependency

2011-07-07 Thread Chris Male (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061238#comment-13061238
 ] 

Chris Male commented on LUCENE-3284:


After looking at the analysis-common dependencies, I think they can be 
refactored out. There isn't any need to actually form Querys, the same testing 
can be done by asserting the tokenstream contents. I'll work on those and 
upload a new patch.

 Move contribs/modules away from QueryParser dependency
 --

 Key: LUCENE-3284
 URL: https://issues.apache.org/jira/browse/LUCENE-3284
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/queryparser, modules/queryparser
Reporter: Chris Male
 Attachments: LUCENE-3284.patch


 Some contribs and modules depend on the core QueryParser just for simplicity 
 in their tests.  We should apply the same process as I did to the core tests, 
 and move them away from using the QueryParser where possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 9390 - Failure

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/9390/

All tests passed

Build Log (for compile errors):
[...truncated 10091 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2638) A CoreContainer Plugin interface to create Container level Services

2011-07-07 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061240#comment-13061240
 ] 

Noble Paul commented on SOLR-2638:
--

I'm preparing a mega patch which abstracts out Zookeeper as a complete plugin. 
It also simplifies the configuration 

 A CoreContainer Plugin interface to create Container level Services
 ---

 Key: SOLR-2638
 URL: https://issues.apache.org/jira/browse/SOLR-2638
 Project: Solr
  Issue Type: New Feature
  Components: multicore
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-2638.patch


 It can help register services such as Zookeeper .
 interface
 {code:java}
 public abstract class ContainerPlugin {
   /**Called before initializing any core.
* @param container
* @param attrs
*/
   public abstract void init(CoreContainer container, MapString,String 
 attrs);
   /**Callback after all cores are initialized
*/
   public void postInit(){}
   /**Callback after each core is created, but before registration
* @param core
*/
   public void onCoreCreate(SolrCore core){}
   /**Callback for server shutdown
*/
   public void shutdown(){}
 }
 {code}
 It may be specified in solr.xml as
 {code:xml}
 solr
   plugin name=zk class=solr.ZookeeperService param1=val1 param2=val2 
 zkClientTimeout=8000/
   cores adminPath=/admin/cores defaultCoreName=collection1 
 host=127.0.0.1 hostPort=${hostPort:8983} hostContext=solr 
 core name=collection1 shard=${shard:} 
 collection=${collection:collection1} config=${solrconfig:solrconfig.xml} 
 instanceDir=./
   /cores
 /solr
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3233) HuperDuperSynonymsFilter™


 [ 
https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3233:
---

Attachment: LUCENE-3233.patch

New patch, moving the root arcs cache into FST, not using up our last precious 
arc bit.

 HuperDuperSynonymsFilter™
 -

 Key: LUCENE-3233
 URL: https://issues.apache.org/jira/browse/LUCENE-3233
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, synonyms.zip


 The current synonymsfilter uses a lot of ram and cpu, especially at build 
 time.
 I think yesterday I heard about huge synonyms files three times.
 So, I think we should use an FST-based structure, sharing the inputs and 
 outputs.
 And we should be more efficient with the tokenStream api, e.g. using 
 save/restoreState instead of cloneAttributes()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 9392 - Failure

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/9392/

1 tests failed.
REGRESSION:  
org.apache.lucene.index.TestIndexWriterCommit.testCommitThreadSafety

Error Message:
null

Stack Trace:
junit.framework.AssertionFailedError: 
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1435)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1353)
at 
org.apache.lucene.index.TestIndexWriterCommit.testCommitThreadSafety(TestIndexWriterCommit.java:366)




Build Log (for compile errors):
[...truncated 1250 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 9392 - Failure

2011-07-07 Thread Simon Willnauer

my bad, I committed a fix

On Thu, Jul 7, 2011 at 3:27 PM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/9392/

 1 tests failed.
 REGRESSION:  
 org.apache.lucene.index.TestIndexWriterCommit.testCommitThreadSafety

 Error Message:
 null

 Stack Trace:
 junit.framework.AssertionFailedError:
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1435)
        at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1353)
        at 
 org.apache.lucene.index.TestIndexWriterCommit.testCommitThreadSafety(TestIndexWriterCommit.java:366)




 Build Log (for compile errors):
 [...truncated 1250 lines...]



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[Lucene.Net] [jira] [Commented] (LUCENENET-433) AttributeSource can have an invalid computed state (LUCENE-3042)


[ 
https://issues.apache.org/jira/browse/LUCENENET-433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061304#comment-13061304
 ] 

Digy commented on LUCENENET-433:


Committed to 2.9.4g branch

 AttributeSource can have an invalid computed state (LUCENE-3042)
 

 Key: LUCENENET-433
 URL: https://issues.apache.org/jira/browse/LUCENENET-433
 Project: Lucene.Net
  Issue Type: Bug
Reporter: Digy
 Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g

 Attachments: LUCENENET-433.patch


 If you work a tokenstream, consume it, then reuse it and add an attribute to 
 it, the computed state is wrong.
 thus for example, clearAttributes() will not actually clear the attribute 
 added.
 So in some situations, addAttribute is not actually clearing the computed 
 state when it should.
 https://issues.apache.org/jira/browse/LUCENE-3042

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2011-07-07 Thread Thomas Fischer (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061310#comment-13061310
 ] 

Thomas Fischer commented on SOLR-1604:
--

Will the complexphrase search works fine with e.g. GOK:PXB 80?, it will throw 
an exception if there is no space present, e.g. GOK:PXB80?. The exception is:
Unknown query type org.apache.lucene.search.WildcardQuery found in phrase 
query string PXB80?

 Wildcards, ORs etc inside Phrase Queries
 

 Key: SOLR-1604
 URL: https://issues.apache.org/jira/browse/SOLR-1604
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, ComplexPhraseQueryParser.java, SOLR-1604.patch


 Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports 
 wildcards, ORs, ranges, fuzzies inside phrase queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2635) FieldAnalysisRequestHandler; Expose Filter- Tokenizer-Settings


[ 
https://issues.apache.org/jira/browse/SOLR-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061313#comment-13061313
 ] 

Stefan Matheis (steffkes) commented on SOLR-2635:
-

Maybe we can append this List to the existing output .. like it's actually done 
for highlighting on the select handler?
Just a suggestion:

{code:xml}?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeader
int name=status0/int
int name=QTime37/int
/lst
lst name=analysis
!-- .. --
/lst
lst name=settings
lst name=field_types
lst name=text_general_rev
lst name=index
arr 
name=org.apache.lucene.analysis.standard.StandardTokenizer
lst
!-- settings --
/lst
/arr
lst
/lst
/lst
/lst
/response{code}

That will work w/o problems, as long as the list of used Filter and Tokenizer 
is unique. If there is at least One, which is used more than once -- the 
relation is only defined through the order of the list, but we could maybe add 
an counter to the existing output, then it's also no problem : 

 FieldAnalysisRequestHandler; Expose Filter-  Tokenizer-Settings
 

 Key: SOLR-2635
 URL: https://issues.apache.org/jira/browse/SOLR-2635
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Uwe Schindler
Priority: Minor

 The [current/old Analysis 
 Page|http://files.mathe.is/solr-admin/04_analysis_verbose_cur.png] exposes 
 the Filter-  Tokenizer-Settings -- the FieldAnalysisRequestHandler not :/
 This Information is already available on the 
 [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] 
 (through LukeRequestHandler) - so we could load this in parallel and grab the 
 required informations .. but it would be easier if we could add this 
 Information, so that we have all relevant Information at one Place.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2634) Publish nightly snapshots, please


 [ 
https://issues.apache.org/jira/browse/SOLR-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benson Margulies updated SOLR-2634:
---

Attachment: SOLR-2634.patch

Very small patch that allows nexus deployment.

 Publish nightly snapshots, please
 -

 Key: SOLR-2634
 URL: https://issues.apache.org/jira/browse/SOLR-2634
 Project: Solr
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.4
Reporter: Benson Margulies
Assignee: Steven Rowe
 Attachments: SOLR-2634.patch, SOLR-2634.patch


 If you added 'mvn deploy' to the jenkins job, the nightly snapshots would 
 push to repository.apache.org as snapshots, where maven could get them 
 without having to manually download and deploy them. Please?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2634) Publish nightly snapshots, please


 [ 
https://issues.apache.org/jira/browse/SOLR-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benson Margulies updated SOLR-2634:
---

Attachment: SOLR-2634.patch

Very simple patch that enables deployment, optionally, to nexus.

 Publish nightly snapshots, please
 -

 Key: SOLR-2634
 URL: https://issues.apache.org/jira/browse/SOLR-2634
 Project: Solr
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.4
Reporter: Benson Margulies
Assignee: Steven Rowe
 Attachments: SOLR-2634.patch, SOLR-2634.patch


 If you added 'mvn deploy' to the jenkins job, the nightly snapshots would 
 push to repository.apache.org as snapshots, where maven could get them 
 without having to manually download and deploy them. Please?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2635) FieldAnalysisRequestHandler; Expose Filter- Tokenizer-Settings


[ 
https://issues.apache.org/jira/browse/SOLR-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061323#comment-13061323
 ] 

Uwe Schindler commented on SOLR-2635:
-

This solution might work, i just don't like it, because it decouples the 
settings from the output and makes correlation harder. But thats of course the 
same for highlighting.

The list of tokenizers and filters is not necessarily unique, but order would 
be, so access via index (like for highlighting) is fine. Its possible to add 
the same TokenFilter at several places in the analysis chain, so a lookup by 
class name is impossible.

 FieldAnalysisRequestHandler; Expose Filter-  Tokenizer-Settings
 

 Key: SOLR-2635
 URL: https://issues.apache.org/jira/browse/SOLR-2635
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Uwe Schindler
Priority: Minor

 The [current/old Analysis 
 Page|http://files.mathe.is/solr-admin/04_analysis_verbose_cur.png] exposes 
 the Filter-  Tokenizer-Settings -- the FieldAnalysisRequestHandler not :/
 This Information is already available on the 
 [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] 
 (through LukeRequestHandler) - so we could load this in parallel and grab the 
 required informations .. but it would be easier if we could add this 
 Information, so that we have all relevant Information at one Place.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2634) Publish nightly snapshots, please


[ 
https://issues.apache.org/jira/browse/SOLR-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061325#comment-13061325
 ] 

Benson Margulies commented on SOLR-2634:


So, if you apply the patch, folks like me can trivially deliver snapshots to 
repo managers that use password authentication.

You can deliver to the Apache snapshot repo by changing your jenkins job to 
look like:

ant generate-maven-artifacts 
-Dm2.repository.url=https://repository.apache.org/content/repositories/snapshots/
 -Dm2.repository.username=whoever -Dm2.repository.password=whatever

There is some scheme on the jenkins instance for these credentials, I can 
research it for you.

My elders and betters at d...@maven.apache.org tell me that the thing that you 
have is really not a good idea from either a Jenkins or a Maven standpoint. 
Following my recipe here will change nothing about the publicity/policy issues, 
it will retain some old snapshots which might be useful, and it will generally 
work better.


 Publish nightly snapshots, please
 -

 Key: SOLR-2634
 URL: https://issues.apache.org/jira/browse/SOLR-2634
 Project: Solr
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.4
Reporter: Benson Margulies
Assignee: Steven Rowe
 Attachments: SOLR-2634.patch, SOLR-2634.patch


 If you added 'mvn deploy' to the jenkins job, the nightly snapshots would 
 push to repository.apache.org as snapshots, where maven could get them 
 without having to manually download and deploy them. Please?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2635) FieldAnalysisRequestHandler; Expose Filter- Tokenizer-Settings


[ 
https://issues.apache.org/jira/browse/SOLR-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061330#comment-13061330
 ] 

Stefan Matheis (steffkes) commented on SOLR-2635:
-

Hm yes, correct :/ Then, what about an additional {{settings=true}} -parameter 
for this Handler which adds a second lst-Element with the used Settings?

{code:xml}arr name=org.apache.lucene.analysis.standard.StandardTokenizer
lst
!-- .. existing output ..  --
/lst
lst name=settings
!-- settings --
/lst
/arr{code}

The JSON-Output for this Handler is already not the best, but that should be 
still usable.

 FieldAnalysisRequestHandler; Expose Filter-  Tokenizer-Settings
 

 Key: SOLR-2635
 URL: https://issues.apache.org/jira/browse/SOLR-2635
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Uwe Schindler
Priority: Minor

 The [current/old Analysis 
 Page|http://files.mathe.is/solr-admin/04_analysis_verbose_cur.png] exposes 
 the Filter-  Tokenizer-Settings -- the FieldAnalysisRequestHandler not :/
 This Information is already available on the 
 [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] 
 (through LukeRequestHandler) - so we could load this in parallel and grab the 
 required informations .. but it would be easier if we could add this 
 Information, so that we have all relevant Information at one Place.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2392) Enable flexible scoring

[
https://issues.apache.org/jira/browse/LUCENE-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061332#comment-13061332
]

Robert Muir commented on LUCENE-2392:
-

I think we need to commit the refactoring portions (separating TF-IDF out) to
trunk very soon, because its really difficult
to keep this branch in sync with trunk, e.g. lots of activity and refactoring
going on.

I'd like to get this merged in as quickly as possible. I don't think the svn
history is interesting, especially given
all the frustrations I am having with merging... The easiest way will be to
commit a patch, I'll get everything in shape
and upload one soon, like, today.

Enable flexible scoring
---

Key: LUCENE-2392
URL: https://issues.apache.org/jira/browse/LUCENE-2392
Project: Lucene - Java
Issue Type: Improvement
Components: core/search
Reporter: Michael McCandless
Assignee: Michael McCandless
Fix For: flexscoring branch

Attachments: LUCENE-2392.patch, LUCENE-2392.patch, LUCENE-2392.patch,
LUCENE-2392_take2.patch

This is a first step (nowhere near committable!), implementing the
design iterated to in the recent Baby steps towards making Lucene's
scoring more flexible java-dev thread.
The idea is (if you turn it on for your Field; it's off by default) to
store full stats in the index, into a new _X.sts file, per doc (X
field) in the index.
And then have FieldSimilarityProvider impls that compute doc's boost
bytes (norms) from these stats.
The patch is able to index the stats, merge them when segments are
merged, and provides an iterator-only API. It also has starting point
for per-field Sims that use the stats iterator API to compute boost
bytes. But it's not at all tied into actual searching! There's still
tons left to do, eg, how does one configure via Field/FieldType which
stats one wants indexed.
All tests pass, and I added one new TestStats unit test.
The stats I record now are:
- field's boost
- field's unique term count (a b c a a b -- 3)
- field's total term count (a b c a a b -- 6)
- total term count per-term (sum of total term count for all docs
that have this term)
Still need at least the total term count for each field.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3167) Make lucene/solr a OSGI bundle through Ant

2011-07-07 Thread Luca Stancapiano (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Stancapiano updated LUCENE-3167:
-

Attachment: lucene_trunk.patch

I expose this patch:

  macrodef name=build-manifest description=Builds a manifest file
attribute name=title default=Lucene Search Engine: 
${ant.project.name} /
attribute name=bndtempDir default=${build.dir}/temp/
sequential
  xmlproperty file=${ant.file} collapseAttributes=true prefix=bnd/
  property name=bndclasspath refid=classpath/
  taskdef resource=aQute/bnd/ant/taskdef.properties / 
  mkdir dir=@{bndtempDir}/
  bnd 
  classpath=${bndclasspath} 
  eclipse=false 
  failok=false 
  exceptions=true
  files=${common.dir}/lucene.bnd 
  output=@{bndtempDir}/${final.name}-temp.jar / 
  copy todir=${common.dir}/build flatten=true
  resources
  url 
url=jar:file://@{bndtempDir}/${final.name}-temp.jar!/META-INF/MANIFEST.MF/
  /resources
  /copy
/sequential
  /macrodef

It rewrites the build-manifest macrodef because bndlib cannot append the 
information of the manifest.mf. I moved the information of the manifest in the 
lucene.bnd file appending the new osgi info:


Export-Package: *;-split-package:=merge-first
Specification-Title: Lucene Search Engine: ${ant.project.name}
Specification-Version: ${spec.version}
Specification-Vendor: The Apache Software Foundation
Implementation-Title: org.apache.lucene
Implementation-Version: ${version} ${svnversion} - ${DSTAMP} ${TSTAMP}
Implementation-Vendor: The Apache Software Foundation
X-Compile-Source-JDK: ${javac.source}
X-Compile-Target-JDK: ${javac.target}
Bundle-License: http://www.apache.org/licenses/LICENSE-2.0.txt
Bundle-SymbolicName: org.apache.lucene.${name}
Bundle-Name: Lucene Search Engine: ${ant.project.name}
Bundle-Vendor: The Apache Software Foundation
Bundle-Version: ${version}
Bundle-Description: ${bnd.project.description}
Bundle-DocUR: http://www.apache.org/


I tested on lucene and solr modules and all jars are created with the correct 
OSGI info in the manifest.mf. Unluckily bndlib is not flexible so if you use 
bndlib you are forced to:

- precompile the classes
- create a temp directory with a temporary jar
- extract the new manifest from the jar and put it in the shared directory




 Make lucene/solr a OSGI bundle through Ant
 --

 Key: LUCENE-3167
 URL: https://issues.apache.org/jira/browse/LUCENE-3167
 Project: Lucene - Java
  Issue Type: New Feature
 Environment: bndtools
Reporter: Luca Stancapiano
 Attachments: lucene_trunk.patch


 We need to make a bundle thriugh Ant, so the binary can be published and no 
 more need the download of the sources. Actually to get a OSGI bundle we need 
 to use maven tools and build the sources. Here the reference for the creation 
 of the OSGI bundle through Maven:
 https://issues.apache.org/jira/browse/LUCENE-1344
 Bndtools could be used inside Ant

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2635) FieldAnalysisRequestHandler; Expose Filter- Tokenizer-Settings


[ 
https://issues.apache.org/jira/browse/SOLR-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061341#comment-13061341
 ] 

Uwe Schindler commented on SOLR-2635:
-

I was already thinking about an extra param to enable the settings. But like 
for highlighting, we should add them as a separate list with relation via 
lst-index. Is this fine?

To fix the output perfelctly, each list inside the anaysis component array 
should have a key like tokens, settings, but that would make it 
incompatible. Also the CharFilter output would need some improvements (I prefer 
to return the CharFilter output like a single token in other compoenents, 
currently its one level higher - it has no lst). But thats out of scope for 
this issue.

 FieldAnalysisRequestHandler; Expose Filter-  Tokenizer-Settings
 

 Key: SOLR-2635
 URL: https://issues.apache.org/jira/browse/SOLR-2635
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Uwe Schindler
Priority: Minor

 The [current/old Analysis 
 Page|http://files.mathe.is/solr-admin/04_analysis_verbose_cur.png] exposes 
 the Filter-  Tokenizer-Settings -- the FieldAnalysisRequestHandler not :/
 This Information is already available on the 
 [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] 
 (through LukeRequestHandler) - so we could load this in parallel and grab the 
 required informations .. but it would be easier if we could add this 
 Information, so that we have all relevant Information at one Place.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2634) Publish nightly snapshots, please

[
https://issues.apache.org/jira/browse/SOLR-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061342#comment-13061342
]

Steven Rowe commented on SOLR-2634:
---

bq. There is some scheme on the jenkins instance for these credentials, I can
research it for you.

Please do.

bq. My elders and betters at d...@maven.apache.org tell me that the thing that
you have is really not a good idea from either a Jenkins or a Maven standpoint.

The thing that you have? Have you rigged a device that can spy my goiter
through the tubes of the interweb? Quiet acceptance of my elders' and betters'
judgments is a virtue that I, sadly, lack; you have my admiration. A pointer
to the mailing list discussion(s) to which you appear to be referring would be
helpful.

Publish nightly snapshots, please
-

Key: SOLR-2634
URL: https://issues.apache.org/jira/browse/SOLR-2634
Project: Solr
Issue Type: Improvement
Components: Build
Affects Versions: 3.4
Reporter: Benson Margulies
Assignee: Steven Rowe
Attachments: SOLR-2634.patch, SOLR-2634.patch

If you added 'mvn deploy' to the jenkins job, the nightly snapshots would
push to repository.apache.org as snapshots, where maven could get them
without having to manually download and deploy them. Please?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked


[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061348#comment-13061348
 ] 

Stefan Matheis (steffkes) commented on SOLR-2399:
-

bq. Does your code also support CharFilters?
Thanks Uwe -- actually it does not. i've just checked the default-enabled 
fields  types from the example package. I'll try to fix that and update my 
last patch.

{code:xml}?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeader
int name=status0/int
int name=QTime1/int
/lst
lst name=analysis
lst name=field_types
lst name=text_char_norm
lst name=index
str 
name=org.apache.lucene.analysis.charfilter.MappingCharFilterFoo/str
arr 
name=org.apache.lucene.analysis.core.WhitespaceTokenizer
lst
str name=textFoo/str
str name=raw_bytes[46 6f 6f]/str
int name=start0/int
int name=end3/int
int name=position1/int
arr name=positionHistory
int1/int
/arr
str name=typeword/str
/lst
/arr
/lst
/lst
/lst
lst name=field_names//lst
/response{code}

will create an _virtual_ object for CharFilters so that they have on property 
{{text}} - should be okay? Especially in Combination with other Filters  
Tokenizer which have more than that.

 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
 SOLR-2399-110606.patch, SOLR-2399-110622.patch, SOLR-2399-110702.patch, 
 SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, 
 SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, 
 SOLR-2399-wip-notice.patch, SOLR-2399.patch


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 *Features:*
 * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
 * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
 * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
 * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
 SOLR-2400)
 * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
 * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
 * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
 * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
 * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
 * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
 ** Stub (using static data)
 Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2634) Publish nightly snapshots, please


 [ 
https://issues.apache.org/jira/browse/SOLR-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated SOLR-2634:
--

Affects Version/s: 4.0
Fix Version/s: 4.0
   3.4

 Publish nightly snapshots, please
 -

 Key: SOLR-2634
 URL: https://issues.apache.org/jira/browse/SOLR-2634
 Project: Solr
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.4, 4.0
Reporter: Benson Margulies
Assignee: Steven Rowe
 Fix For: 3.4, 4.0

 Attachments: SOLR-2634.patch, SOLR-2634.patch


 If you added 'mvn deploy' to the jenkins job, the nightly snapshots would 
 push to repository.apache.org as snapshots, where maven could get them 
 without having to manually download and deploy them. Please?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2634) Publish nightly snapshots, please


 [ 
https://issues.apache.org/jira/browse/SOLR-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated SOLR-2634:
--

Attachment: SOLR-2634.patch

bq. So, if you apply the patch, folks like me can trivially deliver snapshots 
to repo managers that use password authentication.

I agree, this is a good addition.  This version of your patch adds password 
auth in two more places where it's required.

I tested that the additions do no harm to the local-repo use case for {{ant 
generate-maven-artifacts}}.  

I'll commit shortly.

 Publish nightly snapshots, please
 -

 Key: SOLR-2634
 URL: https://issues.apache.org/jira/browse/SOLR-2634
 Project: Solr
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.4, 4.0
Reporter: Benson Margulies
Assignee: Steven Rowe
 Fix For: 3.4, 4.0

 Attachments: SOLR-2634.patch, SOLR-2634.patch, SOLR-2634.patch


 If you added 'mvn deploy' to the jenkins job, the nightly snapshots would 
 push to repository.apache.org as snapshots, where maven could get them 
 without having to manually download and deploy them. Please?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2634) Publish nightly snapshots, please


[ 
https://issues.apache.org/jira/browse/SOLR-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061389#comment-13061389
 ] 

Steven Rowe commented on SOLR-2634:
---

Benson, I committed your patch to trunk in r1143878 and branch_3x in r1143882.



 Publish nightly snapshots, please
 -

 Key: SOLR-2634
 URL: https://issues.apache.org/jira/browse/SOLR-2634
 Project: Solr
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.4, 4.0
Reporter: Benson Margulies
Assignee: Steven Rowe
 Fix For: 3.4, 4.0

 Attachments: SOLR-2634.patch, SOLR-2634.patch, SOLR-2634.patch


 If you added 'mvn deploy' to the jenkins job, the nightly snapshots would 
 push to repository.apache.org as snapshots, where maven could get them 
 without having to manually download and deploy them. Please?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2399) Solr Admin Interface, reworked


 [ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) updated SOLR-2399:


Attachment: SOLR-2399-110702.patch

Patch based on SVN-Rev {{1143882}}, works now also with CharFilter-Output.

Screenshot: [Normal|http://files.mathe.is/solr-admin/04_analysis-cf.png], 
[Verbose|http://files.mathe.is/solr-admin/04_analysis_verbose-cf.png]

 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
 SOLR-2399-110606.patch, SOLR-2399-110622.patch, SOLR-2399-110702.patch, 
 SOLR-2399-110702.patch, SOLR-2399-admin-interface.patch, 
 SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, 
 SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 *Features:*
 * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
 * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
 * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
 * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
 SOLR-2400)
 * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
 * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
 * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
 * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
 * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
 * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
 ** Stub (using static data)
 Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

[
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061391#comment-13061391
]

Simon Willnauer commented on LUCENE-2878:
-

hey Mike, I applied all your patches and walked through, this looks great. I
mean this entire thing is far from committable but I think we should take this
further and open a branch for it. I want to commit both your latest patch and
the highlighter prototype and work from there.

{quote}So after working with this a bit more (and reading the paper), I see now
that it's really not necessary to cache positions in the iterators. So never
mind all that! In the end, for some uses like highlighting I think somebody
needs to cache positions (I put it in a ScorePosDoc created by the
PosCollector), but I agree that doesn't belong in the lower level
iterator.{quote}

after looking into your patch I think I understand now what is needed to enable
low level stuff like highlighting. what is missing here is a positions
collector interface that you can pass in and that collects positions on the
lowest levels like for pharses or simple terms. The PositionIterator itself
(btw. i think we should call it Positions or something along those lines - try
to not introduce spans in the name :) ) should accept this collector and simply
call back each low level position if needed. For highlighting I think we should
also go a two stage approach. First stage does the matching (with or without
positions) and second stage takes the first stages resutls and does the
highlighting. that way we don't slow down the query and the second one can even
choose a different rewrite method (for MTQ this is needed as we don't have
positions on filters)

{quote}
As I'm learning more, I am beginning to see this is going to require sweeping
updates. Basically everywhere we currently create a DocsEnum, we might now want
to create a DocsAndPositionsEnum, and then the options (needs
positions/payloads) have to be threaded through all the surrounding APIs. I
wonder if it wouldn't make sense to encapsulate those options
(needsPositions/needsPayloads) in some kind of EnumConfig object. Just in case,
down the line, there is some other information that gets stored in the index,
and wants to be made available during scoring, then the required change would
be much less painful to implement.
{quote}

what do you mean by sweeping updates? For the enum config I think we only have
2 or 3 places where we need to make the decision. 1. TermScorer 2. PhraseScorer
(maybe 2. goes away anyway) so this is not needed for now I think?
{quote}
I'm thinking for example (Robert M's idea), that it might be nice to have a
positions-offsets map in the index (this would be better for highlighting than
term vectors). Maybe this would just be part of payload, but maybe not? And it
seems possible there could be other things like that we don't know about yet?
{quote}

yeah this would be awesome... next step :)

Allow Scorer to expose positions and payloads aka. nuke spans
--

Key: LUCENE-2878
URL: https://issues.apache.org/jira/browse/LUCENE-2878
Project: Lucene - Java
Issue Type: Improvement
Components: core/search
Affects Versions: Bulk Postings branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Labels: gsoc2011, lucene-gsoc-11, mentor
Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch,
LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch,
LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch,
PosHighlighter.patch

Currently we have two somewhat separate types of queries, the one which can
make use of positions (mainly spans) and payloads (spans). Yet Span*Query
doesn't really do scoring comparable to what other queries do and at the end
of the day they are duplicating lot of code all over lucene. Span*Queries are
also limited to other Span*Query instances such that you can not use a
TermQuery or a BooleanQuery with SpanNear or anthing like that.
Beside of the Span*Query limitation other queries lacking a quiet interesting
feature since they can not score based on term proximity since scores doesn't
expose any positional information. All those problems bugged me for a while
now so I stared working on that using the bulkpostings API. I would have done
that first cut on trunk but TermScorer is working on BlockReader that do not
expose positions while the one in this branch does. I started adding a new
Positions class which users can pull from a scorer, to prevent unnecessary
positions enums I added ScorerContext#needsPositions and eventually
Scorere#needsPayloads to create the corresponding enum on demand. Yet,
currently only TermQuery / TermScorer

[jira] [Commented] (SOLR-2635) FieldAnalysisRequestHandler; Expose Filter- Tokenizer-Settings


[ 
https://issues.apache.org/jira/browse/SOLR-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061392#comment-13061392
 ] 

Stefan Matheis (steffkes) commented on SOLR-2635:
-

bq. Is this fine?
Yes, that should be good to work with :)

 FieldAnalysisRequestHandler; Expose Filter-  Tokenizer-Settings
 

 Key: SOLR-2635
 URL: https://issues.apache.org/jira/browse/SOLR-2635
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis, web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Uwe Schindler
Priority: Minor

 The [current/old Analysis 
 Page|http://files.mathe.is/solr-admin/04_analysis_verbose_cur.png] exposes 
 the Filter-  Tokenizer-Settings -- the FieldAnalysisRequestHandler not :/
 This Information is already available on the 
 [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] 
 (through LukeRequestHandler) - so we could load this in parallel and grab the 
 required informations .. but it would be easier if we could add this 
 Information, so that we have all relevant Information at one Place.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3167) Make lucene/solr a OSGI bundle through Ant

2011-07-07 Thread Luca Stancapiano (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061393#comment-13061393
 ] 

Luca Stancapiano commented on LUCENE-3167:
--

I created an issue on https://github.com/bnd/bnd/issues/70 to parametrize the 
bnd ant task

 Make lucene/solr a OSGI bundle through Ant
 --

 Key: LUCENE-3167
 URL: https://issues.apache.org/jira/browse/LUCENE-3167
 Project: Lucene - Java
  Issue Type: New Feature
 Environment: bndtools
Reporter: Luca Stancapiano
 Attachments: lucene_trunk.patch


 We need to make a bundle thriugh Ant, so the binary can be published and no 
 more need the download of the sources. Actually to get a OSGI bundle we need 
 to use maven tools and build the sources. Here the reference for the creation 
 of the OSGI bundle through Maven:
 https://issues.apache.org/jira/browse/LUCENE-1344
 Bndtools could be used inside Ant

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061395#comment-13061395
 ] 

Robert Muir commented on LUCENE-2878:
-

{quote}
For highlighting I think we should also go a two stage approach. First stage 
does the matching (with or without positions) and second stage takes the first 
stages resutls and does the highlighting. that way we don't slow down the query 
and the second one can even choose a different rewrite method (for MTQ this is 
needed as we don't have positions on filters)
{quote}

I think this would be a good approach, its the same algorithm really that you 
generally want for positional scoring: score all the docs the 'fast' way then 
reorder only the top-N (e.g. first two pages of results), which will require 
using the positioniterator and doing some calculation that you typically add to 
the score.

So if we can generalize this in a way where you can do this in your collector, 
I think it would be reusable for this as well.

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: Bulk Postings branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2634) Publish nightly snapshots, please


[ 
https://issues.apache.org/jira/browse/SOLR-2634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061408#comment-13061408
 ] 

Benson Margulies commented on SOLR-2634:


Thank you.

 Publish nightly snapshots, please
 -

 Key: SOLR-2634
 URL: https://issues.apache.org/jira/browse/SOLR-2634
 Project: Solr
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.4, 4.0
Reporter: Benson Margulies
Assignee: Steven Rowe
 Fix For: 3.4, 4.0

 Attachments: SOLR-2634.patch, SOLR-2634.patch, SOLR-2634.patch


 If you added 'mvn deploy' to the jenkins job, the nightly snapshots would 
 push to repository.apache.org as snapshots, where maven could get them 
 without having to manually download and deploy them. Please?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2392) Enable flexible scoring

[
https://issues.apache.org/jira/browse/LUCENE-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061420#comment-13061420
]

Simon Willnauer commented on LUCENE-2392:
-

{quote}
I'd like to get this merged in as quickly as possible. I don't think the svn
history is interesting, especially given
all the frustrations I am having with merging... The easiest way will be to
commit a patch, I'll get everything in shape
and upload one soon, like, today.
{quote}

+1 even if this is not entirely in shape we can still iterate on trunk.

Enable flexible scoring
---

Attachments: LUCENE-2392.patch, LUCENE-2392.patch, LUCENE-2392.patch,
LUCENE-2392_take2.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Getting patches (with tests!) committed

2011-07-07 Thread Smiley, David W.

On Jul 6, 2011, at 6:44 PM, Erick Erickson wrote:

In the past I've had to ping the dev list with an include patch XYZ
please message

Yeah... doesn't that strike you as a problem though? Maybe I should dig up all
my issues and start bugging the dev list with Commit this, pretty please?
messages. Or not and they will say in the JIRA graveyard.

But I've just assigned it to myself, I'll see if I can get it
committed, I'm new enough at the process that I need the practice

I noticed, thanks.

Best
Erick

On Wed, Jul 6, 2011 at 1:51 PM, Smiley, David W. dsmi...@mitre.org wrote:
How do committers recommend that patch contributors (like me) get their
patches committed? At the moment I'm thinking of this one:
https://issues.apache.org/jira/browse/SOLR-2535
This is a regression bug. I found the bug, I added a patch which fixes the
bug and tested that it was fixed. The tests are actually new tests that
tested code that wasn't tested before. I put the fix version in JIRA as
3.3 at the time I did this, because it was ready to go. Well 3.3 came and
went, and the version got bumped to 3.4. There are no processes in place
for committers to recognize completed patches. I think that's a problem.
It's very discouraging, as the contributor. I think prior to a release and
ideally at other occasions, issues assigned to the next release number
should actually be examined.Granted there are ~250 of them on the Solr
side:
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=project+%3D+SOLR+AND+resolution+%3D+Unresolved+AND+fixVersion+%3D+12316683+ORDER+BY+priority+DESC
And some initial triage could separate the wheat from the chaff.

~ David Smiley
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2392) Enable flexible scoring

[
https://issues.apache.org/jira/browse/LUCENE-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-2392:

Attachment: LUCENE-2392.patch

Attached is a patch, with this CHANGES entry:

{noformat}
* LUCENE-2392: Decoupled vector space scoring from Query/Weight/Scorer. If you
extended Similarity directly before, you should extend TFIDFSimilarity
instead.
Similarity is now a lower-level API to implement other scoring algorithms.
See MIGRATE.txt for more details.
{noformat}

I would like to commit this, and then proceed onward with issues such as
LUCENE-3220 and LUCENE-3221

Enable flexible scoring
---

Attachments: LUCENE-2392.patch, LUCENE-2392.patch, LUCENE-2392.patch,
LUCENE-2392.patch, LUCENE-2392_take2.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-949) Add QueryResponse and SolrQuery support for TermVectorComponent

2011-07-07 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061437#comment-13061437
 ] 

David Smiley commented on SOLR-949:
---

It would be easier for a committer to digest this patch if you didn't do any 
reformatting of existing code.

 Add QueryResponse and SolrQuery support for TermVectorComponent
 ---

 Key: SOLR-949
 URL: https://issues.apache.org/jira/browse/SOLR-949
 Project: Solr
  Issue Type: New Feature
  Components: clients - java
Reporter: Aleksander M. Stensby
Priority: Minor
 Attachments: SOLR-949.patch


 In a similar fashion to Facet information, it would be nice to have support 
 for easily setting TermVector related parameters through SolrQuery, and it 
 would be nice to have methods in QueryResponse to easily retrieve TermVector 
 information

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Getting patches (with tests!) committed

2011-07-07 Thread Erick Erickson

Yeah, this is kind of a grey area, I think we should do what we can to
encourage contributions and being better about applying patches when
someone has gone through the effort of making one in the first place
certainly goes in the right direction...

It *may* help that there have been several more committers added in the
recent past (myself included), so perhaps there's some more bandwidth
available now.

Best
Erick

On Thu, Jul 7, 2011 at 12:44 PM, Smiley, David W. dsmi...@mitre.org wrote:

On Jul 6, 2011, at 6:44 PM, Erick Erickson wrote:

In the past I've had to ping the dev list with an include patch XYZ
please message

Yeah... doesn't that strike you as a problem though? Maybe I should dig up
all my issues and start bugging the dev list with Commit this, pretty
please? messages. Or not and they will say in the JIRA graveyard.

But I've just assigned it to myself, I'll see if I can get it
committed, I'm new enough at the process that I need the practice

I noticed, thanks.

Best
Erick

On Wed, Jul 6, 2011 at 1:51 PM, Smiley, David W. dsmi...@mitre.org wrote:
How do committers recommend that patch contributors (like me) get their
patches committed? At the moment I'm thinking of this one:
https://issues.apache.org/jira/browse/SOLR-2535
This is a regression bug. I found the bug, I added a patch which fixes the
bug and tested that it was fixed. The tests are actually new tests that
tested code that wasn't tested before. I put the fix version in JIRA as
3.3 at the time I did this, because it was ready to go. Well 3.3 came and
went, and the version got bumped to 3.4. There are no processes in place
for committers to recognize completed patches. I think that's a problem.
It's very discouraging, as the contributor. I think prior to a release and
ideally at other occasions, issues assigned to the next release number
should actually be examined. Granted there are ~250 of them on the Solr
side:
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=project+%3D+SOLR+AND+resolution+%3D+Unresolved+AND+fixVersion+%3D+12316683+ORDER+BY+priority+DESC
And some initial triage could separate the wheat from the chaff.

~ David Smiley
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Getting patches (with tests!) committed

2011-07-07 Thread Simon Willnauer

On Thu, Jul 7, 2011 at 6:52 PM, Erick Erickson erickerick...@gmail.com wrote:
Yeah, this is kind of a grey area, I think we should do what we can to
encourage contributions and being better about applying patches when
someone has gone through the effort of making one in the first place
certainly goes in the right direction...

It *may* help that there have been several more committers added in the
recent past (myself included), so perhaps there's some more bandwidth
available now.

hopefully!!! this is why we added and keep on adding committers since
we have too much work than we can handle.

simon

Best
Erick

On Thu, Jul 7, 2011 at 12:44 PM, Smiley, David W. dsmi...@mitre.org wrote:

On Jul 6, 2011, at 6:44 PM, Erick Erickson wrote:

In the past I've had to ping the dev list with an include patch XYZ
please message

Yeah... doesn't that strike you as a problem though? Maybe I should dig up
all my issues and start bugging the dev list with Commit this, pretty
please? messages. Or not and they will say in the JIRA graveyard.

But I've just assigned it to myself, I'll see if I can get it
committed, I'm new enough at the process that I need the practice

I noticed, thanks.

Best
Erick

On Wed, Jul 6, 2011 at 1:51 PM, Smiley, David W. dsmi...@mitre.org wrote:
How do committers recommend that patch contributors (like me) get their
patches committed? At the moment I'm thinking of this one:
https://issues.apache.org/jira/browse/SOLR-2535
This is a regression bug. I found the bug, I added a patch which fixes the
bug and tested that it was fixed. The tests are actually new tests that
tested code that wasn't tested before. I put the fix version in JIRA as
3.3 at the time I did this, because it was ready to go. Well 3.3 came and
went, and the version got bumped to 3.4. There are no processes in place
for committers to recognize completed patches. I think that's a problem.
It's very discouraging, as the contributor. I think prior to a release
and ideally at other occasions, issues assigned to the next release number
should actually be examined. Granted there are ~250 of them on the Solr
side:
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=project+%3D+SOLR+AND+resolution+%3D+Unresolved+AND+fixVersion+%3D+12316683+ORDER+BY+priority+DESC
And some initial triage could separate the wheat from the chaff.

~ David Smiley
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061439#comment-13061439
 ] 

Michael McCandless commented on LUCENE-2793:


Looks good!  +1 to land it!

Just a few things:

  * Shouldn't WindowsDirectory also call BII.bufferSize(context) and
do the same Math.max it used to do?

  * Should VarGapTermsIndexReader should pass READONCE context down when it
opens/reads the FST?  Hmm, though, it should just replace the ctx
passed in, ie if we are merging vs reading we want to
differentiate.  Let's open separate issue for this and address
post merge?

  * Can you open an issue for this one: // TODO: context should be
part of the key used to cache that reader in the pool.?  This is
pretty important, else you can get NRT readers with too-large
buffer sizes because the readers had been opened for merging
first.

  * Extra space in SegmentInfo.java: IOContext.READONCE );


 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793_final.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2640) Error message typo for missing field

Error message typo for missing field


 Key: SOLR-2640
 URL: https://issues.apache.org/jira/browse/SOLR-2640
 Project: Solr
  Issue Type: Bug
  Components: search
Reporter: Benson Margulies


2011-07-07 13:03:16,630 [http-bio-9167-exec-6] ERROR 
org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException: Specify 
at least on field, function or query to group by.

at org.apache.solr.search.Grouping.execute(Grouping.java:264)



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2640) Error message typo for missing field


 [ 
https://issues.apache.org/jira/browse/SOLR-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benson Margulies updated SOLR-2640:
---

Attachment: SOLR-2640.patch

 Error message typo for missing field
 

 Key: SOLR-2640
 URL: https://issues.apache.org/jira/browse/SOLR-2640
 Project: Solr
  Issue Type: Bug
  Components: search
Reporter: Benson Margulies
 Attachments: SOLR-2640.patch


 2011-07-07 13:03:16,630 [http-bio-9167-exec-6] ERROR 
 org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException: Specify 
 at least on field, function or query to group by.
 at org.apache.solr.search.Grouping.execute(Grouping.java:264)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Getting patches (with tests!) committed

2011-07-07 Thread Michael McCandless

On Thu, Jul 7, 2011 at 12:44 PM, Smiley, David W. dsmi...@mitre.org wrote:

 On Jul 6, 2011, at 6:44 PM, Erick Erickson wrote:

 In the past I've had to ping the dev list with an include patch XYZ
 please message

 Yeah... doesn't that strike you as a problem though?

Actually I think gentle-nagging is an incredibly important part of OS
(and, life in general).  The process here is not perfect -- we all
have our ways of tracking TODOs, but, inevitably, often, things fall
past the event horizon on anyone's TODO list, and gentle nagging /
bump is very much appreciated to bring attention back, but
unfortunately not done nearly often enough.

That said, we of course will also forever need more committers...

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 9398 - Failure

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/9398/

1 tests failed.
REGRESSION:  org.apache.lucene.search.TestSpanQueryFilter.testFilterWorks

Error Message:
docIdSet doesn't contain docId 10

Stack Trace:
junit.framework.AssertionFailedError: docIdSet doesn't contain docId 10
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1435)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1353)
at 
org.apache.lucene.search.TestSpanQueryFilter.assertContainsDocId(TestSpanQueryFilter.java:84)
at 
org.apache.lucene.search.TestSpanQueryFilter.testFilterWorks(TestSpanQueryFilter.java:56)




Build Log (for compile errors):
[...truncated 1226 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3289) FST should allow controlling how hard builder tries to share suffixes


[ 
https://issues.apache.org/jira/browse/LUCENE-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061453#comment-13061453
 ] 

Robert Muir commented on LUCENE-3289:
-

I think thats probably good for most cases?

In the example you gave, it seems that FST might not be the best algorithm? The 
strings are extremely long (more like short documents) and probably need to be 
compressed in some different datastructure, e.g. a word-based one?

 FST should allow controlling how hard builder tries to share suffixes
 -

 Key: LUCENE-3289
 URL: https://issues.apache.org/jira/browse/LUCENE-3289
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3289.patch, LUCENE-3289.patch


 Today we have a boolean option to the FST builder telling it whether
 it should share suffixes.
 If you turn this off, building is much faster, uses much less RAM, and
 the resulting FST is a prefix trie.  But, the FST is larger than it
 needs to be.  When it's on, the builder maintains a node hash holding
 every node seen so far in the FST -- this uses up RAM and slows things
 down.
 On a dataset that Elmer (see java-user thread Autocompletion on large
 index on Jul 6 2011) provided (thank you!), which is 1.32 M titles
 avg 67.3 chars per title, building with suffix sharing on took 22.5
 seconds, required 1.25 GB heap, and produced 91.6 MB FST.  With suffix
 sharing off, it was 8.2 seconds, 450 MB heap and 129 MB FST.
 I think we should allow this boolean to be shade-of-gray instead:
 usually, how well suffixes can share is a function of how far they are
 from the end of the string, so, by adding a tunable N to only share
 when suffix length  N, we can let caller make reasonable tradeoffs. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2308) Separately specify a field's type

[
https://issues.apache.org/jira/browse/LUCENE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061451#comment-13061451
]

Michael McCandless commented on LUCENE-2308:

I'm seeing compilation errors with the last patch, eg:
{noformat}
[javac]
/lucene/fieldtype/lucene/src/test/org/apache/lucene/index/TestSegmentMerger.java:53:
setupDoc(org.apache.lucene.document2.Document) in
org.apache.lucene.index.DocHelper cannot be applied to
(org.apache.lucene.document.Document)
[javac] DocHelper.setupDoc(doc1);
[javac] ^
{noformat}

Otherwise patch looks good!

Separately specify a field's type
-

Key: LUCENE-2308
URL: https://issues.apache.org/jira/browse/LUCENE-2308
Project: Lucene - Java
Issue Type: Improvement
Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
Labels: gsoc2011, lucene-gsoc-11, mentor
Fix For: 4.0

Attachments: LUCENE-2308-2.patch, LUCENE-2308-3.patch,
LUCENE-2308-4.patch, LUCENE-2308-4.patch, LUCENE-2308-5.patch,
LUCENE-2308-6.patch, LUCENE-2308.patch, LUCENE-2308.patch

This came up from dicussions on IRC. I'm summarizing here...
Today when you make a Field to add to a document you can set things
index or not, stored or not, analyzed or not, details like omitTfAP,
omitNorms, index term vectors (separately controlling
offsets/positions), etc.
I think we should factor these out into a new class (FieldType?).
Then you could re-use this FieldType instance across multiple fields.
The Field instance would still hold the actual value.
We could then do per-field analyzers by adding a setAnalyzer on the
FieldType, instead of the separate PerFieldAnalzyerWrapper (likewise
for per-field codecs (with flex), where we now have
PerFieldCodecWrapper).
This would NOT be a schema! It's just refactoring what we already
specify today. EG it's not serialized into the index.
This has been discussed before, and I know Michael Busch opened a more
ambitious (I think?) issue. I think this is a good first baby step. We could
consider a hierarchy of FIeldType (NumericFieldType, etc.) but maybe hold
off on that for starters...

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Putting search-lucene.com back on l.a.o/solr

2011-07-07 Thread Robert Muir

Hi Otis,

I think its most likely the case I broke this when releasing! Sorry!

Not to defer the blame, but I think the confusing aspect of the solr
website wrt releasing is that unlike lucene, solr doesnt have a
separate versioned and unversioned site. So this causes some
difficulties like having to guess release dates, commit release
announcements before the RC, as well as merging difficulties across
branches...

I think we just need to make sure the latest (3.3) updates are merged
into trunk/branch_3x and then republish the site.
I'll take a look at this.

On Thu, Jul 7, 2011 at 1:36 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Hi,

 I just noticed that over on http://lucene.apache.org/solr/ we are back to 
 Lucid Find being the only search provider.  5 months ago we added 
 search-lucene.com there, but now it's gone.  Google Analytics shows that 
 search-lucene.com was removed from there on June 4.  This is when Lucene 3.2 
 was released, so I suspect the site was somehow rebuilt and published without 
 it.

 Aha, I see, it looks like https://issues.apache.org/jira/browse/LUCENE-2660 
 was applied to trunk only and not branch_3x, and the site was built from 3x 
 branch.

 As I'm about to go on vacation, I don't want to mess up the site by 
 reforresting it (did it locally and it looks good, but it's past 1 AM here) 
 and publishing it, so I'll just commit stuff in Solr's src/site after 
 applying the patch from LUCENE-2660:

 branch_3x/solr/src/site$ svn st
 ?   LUCENE-2660-solr.patch
 M   src/documentation/skins/lucene/css/screen.css
 M   src/documentation/skins/lucene/xslt/html/site-to-xhtml.xsl

 It would be great if somebody could publish this.

 Thanks,
 Otis

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3289) FST should allow controlling how hard builder tries to share suffixes


[ 
https://issues.apache.org/jira/browse/LUCENE-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061456#comment-13061456
 ] 

Michael McCandless commented on LUCENE-3289:


Yeah I think costly but perfect minimization is the right default.

 FST should allow controlling how hard builder tries to share suffixes
 -

 Key: LUCENE-3289
 URL: https://issues.apache.org/jira/browse/LUCENE-3289
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3289.patch, LUCENE-3289.patch


 Today we have a boolean option to the FST builder telling it whether
 it should share suffixes.
 If you turn this off, building is much faster, uses much less RAM, and
 the resulting FST is a prefix trie.  But, the FST is larger than it
 needs to be.  When it's on, the builder maintains a node hash holding
 every node seen so far in the FST -- this uses up RAM and slows things
 down.
 On a dataset that Elmer (see java-user thread Autocompletion on large
 index on Jul 6 2011) provided (thank you!), which is 1.32 M titles
 avg 67.3 chars per title, building with suffix sharing on took 22.5
 seconds, required 1.25 GB heap, and produced 91.6 MB FST.  With suffix
 sharing off, it was 8.2 seconds, 450 MB heap and 129 MB FST.
 I think we should allow this boolean to be shade-of-gray instead:
 usually, how well suffixes can share is a function of how far they are
 from the end of the string, so, by adding a tunable N to only share
 when suffix length  N, we can let caller make reasonable tradeoffs. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2640) Error message typo for missing field


 [ 
https://issues.apache.org/jira/browse/SOLR-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated SOLR-2640:
--

 Priority: Trivial  (was: Major)
Affects Version/s: 4.0
   3.4
   3.3
Fix Version/s: 4.0
   3.3
   Issue Type: Test  (was: Bug)

 Error message typo for missing field
 

 Key: SOLR-2640
 URL: https://issues.apache.org/jira/browse/SOLR-2640
 Project: Solr
  Issue Type: Test
  Components: search
Affects Versions: 3.3, 3.4, 4.0
Reporter: Benson Margulies
Assignee: Simon Willnauer
Priority: Trivial
 Fix For: 3.3, 4.0

 Attachments: SOLR-2640.patch


 2011-07-07 13:03:16,630 [http-bio-9167-exec-6] ERROR 
 org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException: Specify 
 at least on field, function or query to group by.
 at org.apache.solr.search.Grouping.execute(Grouping.java:264)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-2640) Error message typo for missing field


 [ 
https://issues.apache.org/jira/browse/SOLR-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reassigned SOLR-2640:
-

Assignee: Simon Willnauer

 Error message typo for missing field
 

 Key: SOLR-2640
 URL: https://issues.apache.org/jira/browse/SOLR-2640
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 3.3, 3.4, 4.0
Reporter: Benson Margulies
Assignee: Simon Willnauer
 Fix For: 3.3, 4.0

 Attachments: SOLR-2640.patch


 2011-07-07 13:03:16,630 [http-bio-9167-exec-6] ERROR 
 org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException: Specify 
 at least on field, function or query to group by.
 at org.apache.solr.search.Grouping.execute(Grouping.java:264)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2640) Error message typo for missing field


 [ 
https://issues.apache.org/jira/browse/SOLR-2640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved SOLR-2640.
---

Resolution: Fixed

committed, thanks!

 Error message typo for missing field
 

 Key: SOLR-2640
 URL: https://issues.apache.org/jira/browse/SOLR-2640
 Project: Solr
  Issue Type: Task
  Components: search
Affects Versions: 3.3, 3.4, 4.0
Reporter: Benson Margulies
Assignee: Simon Willnauer
Priority: Trivial
 Fix For: 3.3, 4.0

 Attachments: SOLR-2640.patch


 2011-07-07 13:03:16,630 [http-bio-9167-exec-6] ERROR 
 org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException: Specify 
 at least on field, function or query to group by.
 at org.apache.solr.search.Grouping.execute(Grouping.java:264)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2615) Have LogUpdateProcessor log each command (add, delete, ...) at debug/FINE level

2011-07-07 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061467#comment-13061467
 ] 

David Smiley commented on SOLR-2615:


Yonik, if I instead use a doDebug boolean flag initialized in the constructor, 
would that sufficiently satisfy you to commit this?

 Have LogUpdateProcessor log each command (add, delete, ...) at debug/FINE 
 level
 ---

 Key: SOLR-2615
 URL: https://issues.apache.org/jira/browse/SOLR-2615
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: David Smiley
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: SOLR-2615_LogUpdateProcessor_debug_logging.patch


 It would be great if the LogUpdateProcessor logged each command (add, delete, 
 ...) at debug (Fine) level. Presently it only logs a summary of 8 commands 
 and it does so at the very end.
 The attached patch implements this.
 * I moved the LogUpdateProcessor ahead of RunUpdateProcessor so that the 
 debug level log happens before Solr does anything with it. It should not 
 affect the ordering of the existing summary log which happens at finish(). 
 * I changed UpdateRequestProcessor's static log variable to be an instance 
 variable that uses the current class name. I think this makes much more sense 
 since I want to be able to alter logging levels for a specific processor 
 without doing it for all of them. This change did require me to tweak the 
 factory's detection of the log level which avoids creating the 
 LogUpdateProcessor.
 * There was an NPE bug in AddUpdateCommand.getPrintableId() in the event 
 there is no schema unique field. I fixed that.
 You may notice I use SLF4J's nifty log.debug(message blah {} blah, var) 
 syntax, which is both performant and concise as there's no point in guarding 
 the debug message with an isDebugEnabled() since debug() will internally 
 check this any way and there is no string concatenation if debug isn't 
 enabled.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Please commit SOLR-2616 Include jdk14 logging configuration file

2011-07-07 Thread Smiley, David W.

Please review/commit:
https://issues.apache.org/jira/browse/SOLR-2616Include jdk14 logging 
configuration file
 ~ David
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2795) Genericize DirectIOLinuxDir - UnixDir

2011-07-07 Thread Varun Thacker (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated LUCENE-2795:
--

Attachment: LUCENE-2795.patch

the open_direct, posix_fadvise function in onlylinux.h remains the same. 

In onlybsd.h open_direct, posix_fadvise are the same too except that the 
O_NOATIME flag is not present. 

In onlyosx the open_direct is implemented in a different way. 

Also I have added a open_normal function to all of the headers which will be 
used in case the IOContext in not a MERGE.

 Genericize DirectIOLinuxDir - UnixDir
 --

 Key: LUCENE-2795
 URL: https://issues.apache.org/jira/browse/LUCENE-2795
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2795.patch, LUCENE-2795.patch


 Today DirectIOLinuxDir is tricky/dangerous to use, because you only want to 
 use it for indexWriter and not IndexReader (searching).  It's a trap.
 But, once we do LUCENE-2793, we can make it fully general purpose because 
 then a single native Dir impl can be used.
 I'd also like to make it generic to other Unices, if we can, so that it 
 becomes UnixDirectory.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2011-07-07 Thread Mike Sokolov (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061480#comment-13061480
 ] 

Mike Sokolov commented on LUCENE-2878:
--

bq. what do you mean by sweeping updates?

I meant adding positions to filters would be a sweeping update.  But it sounds 
as if the idea of rewriting differently is a better approach (certainly much 
less change).

bq. For highlighting I think we should also go a two stage approach. 

I think I agree.  The only possible trade-off that goes the other way is in the 
case where you have the positions available already during initial 
search/scoring, and there is not too much turnover in the TopDocs priority 
queue during hit collection.  Then a Highlighter might save some time by not 
re-scoring and re-iterating the positions if it accumulated them up front (even 
for docs that were eventually dropped off the queue).  I think it should be 
possible to test out both approaches given the right API here though?

The callback idea sounds appealing, but I still think we should also consider 
enabling the top-down approach: especially if this is going to run in two 
passes, why not let the highlighter drive the iteration? Keep in mind that 
positions consumers (like highlighters) may possibly be interested in more than 
just the lowest-level positions (they may want to see phrases, eg, and 
near-clauses - trying to avoid the s-word).

Another consideration is ordering.  I think (?) that positions are retrieved 
from the index in document order.  This could be a natural order for many 
cases, but score order will also be useful.  I'm not sure whose responsibility 
the sorting should be. Highlighters will want to be able to optimize their work 
(esp for very large documents) by terminating after considering only the first 
N matches, where the ordering could either be score or document-order.

I'm glad you will create a branch - this patch is getting a bit unwieldy.  I 
think the PosHighlighter code should probably (?) end up as test code only - I 
guess we'll see.  It seems like we could get further faster using the existing 
Highlighter, with a positions-based TokenStream; I'll post a patch once the 
branch is in place.

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Affects Versions: Bulk Postings branch
Reporter: Simon Willnauer
Assignee: Simon Willnauer
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2878-OR.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I

[jira] [Updated] (LUCENE-3233) HuperDuperSynonymsFilter™


 [ 
https://issues.apache.org/jira/browse/LUCENE-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3233:
---

Attachment: LUCENE-3233.patch

Another rev of the patch: I did a hard bump the FST version (so
existing trunk indices must be rebuilt), and added NOTE in suggest's
FST impl that the file format is experimental; removed
maxVerticalContext; fixed false test failure.


 HuperDuperSynonymsFilter™
 -

 Key: LUCENE-3233
 URL: https://issues.apache.org/jira/browse/LUCENE-3233
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-3223.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, 
 LUCENE-3233.patch, LUCENE-3233.patch, LUCENE-3233.patch, synonyms.zip


 The current synonymsfilter uses a lot of ram and cpu, especially at build 
 time.
 I think yesterday I heard about huge synonyms files three times.
 So, I think we should use an FST-based structure, sharing the inputs and 
 outputs.
 And we should be more efficient with the tokenStream api, e.g. using 
 save/restoreState instead of cloneAttributes()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Putting search-lucene.com back on l.a.o/solr

2011-07-07 Thread Chris Hostetter


: Not to defer the blame, but I think the confusing aspect of the solr
: website wrt releasing is that unlike lucene, solr doesnt have a
: separate versioned and unversioned site. So this causes some

yeah .. at one point we started looking into making this change to be 
consistent with ./java, but then there was the push to merge development, 
and reducing the sub-projects in general, which lead to a discussion about 
moving all of the unversioned parts of the site into the existing 
directory for the TLP pages (so there would only be one set of forrest 
docs for the entire website), and then the new Apache CMS came out and 
grant started looking into that instead of wassting effort merging the 
forrest docs 

It's kind of a cluster fuck now.

-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3289) FST should allow controlling how hard builder tries to share suffixes

2011-07-07 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061512#comment-13061512
 ] 

Dawid Weiss commented on LUCENE-3289:
-

Exactly. This is a very specific use case (long suggestions).

 FST should allow controlling how hard builder tries to share suffixes
 -

 Key: LUCENE-3289
 URL: https://issues.apache.org/jira/browse/LUCENE-3289
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3289.patch, LUCENE-3289.patch


 Today we have a boolean option to the FST builder telling it whether
 it should share suffixes.
 If you turn this off, building is much faster, uses much less RAM, and
 the resulting FST is a prefix trie.  But, the FST is larger than it
 needs to be.  When it's on, the builder maintains a node hash holding
 every node seen so far in the FST -- this uses up RAM and slows things
 down.
 On a dataset that Elmer (see java-user thread Autocompletion on large
 index on Jul 6 2011) provided (thank you!), which is 1.32 M titles
 avg 67.3 chars per title, building with suffix sharing on took 22.5
 seconds, required 1.25 GB heap, and produced 91.6 MB FST.  With suffix
 sharing off, it was 8.2 seconds, 450 MB heap and 129 MB FST.
 I think we should allow this boolean to be shade-of-gray instead:
 usually, how well suffixes can share is a function of how far they are
 from the end of the string, so, by adding a tunable N to only share
 when suffix length  N, we can let caller make reasonable tradeoffs. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2500) TestSolrProperties sometimes fails with no such core: core0


[ 
https://issues.apache.org/jira/browse/SOLR-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061540#comment-13061540
 ] 

Steven Rowe commented on SOLR-2500:
---

On Windows 7 using Oracle JDK 1.6.0_21, {{TestSolrProperties#testProperties()}} 
is consistently failing for me, both individually and with all Solr tests, and 
Ant and in IntelliJ:

{quote}
java.lang.AssertionError: Failed to delete 
C:\svn\lucene\dev\trunk\solr\build\tests\solr\shared\solr-persist.xml
at org.junit.Assert.fail(Assert.java:91)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.solr.client.solrj.embedded.TestSolrProperties.tearDown(TestSolrProperties.java:107)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1430)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1348)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at org.junit.runner.JUnitCore.run(JUnitCore.java:157)
{quote}

The failure is in TestSolrProperties.tearDown():

{code:java}
107: assertTrue(Failed to delete +persistedFile, persistedFile.delete());
{code}


 TestSolrProperties sometimes fails with no such core: core0
 -

 Key: SOLR-2500
 URL: https://issues.apache.org/jira/browse/SOLR-2500
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Doron Cohen
 Fix For: 3.2, 4.0

 Attachments: SOLR-2500.patch, SOLR-2500.patch, SOLR-2500.patch, 
 solr-after-1st-run.xml, solr-clean.xml


 [junit] Testsuite: 
 org.apache.solr.client.solrj.embedded.TestSolrProperties
 [junit] Testcase: 
 testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): 
 Caused an ERROR
 [junit] No such core: core0
 [junit] org.apache.solr.common.SolrException: No such core: core0
 [junit] at 
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118)
 [junit] at 
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
 [junit] at 
 org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2500) TestSolrProperties sometimes fails with no such core: core0


[ 
https://issues.apache.org/jira/browse/SOLR-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061549#comment-13061549
 ] 

Steven Rowe commented on SOLR-2500:
---

I can get this failure to consistently succeed by calling System.gc() prior to 
the attempt to delete the file.

Any objections to adding this?

 TestSolrProperties sometimes fails with no such core: core0
 -

 Key: SOLR-2500
 URL: https://issues.apache.org/jira/browse/SOLR-2500
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Doron Cohen
 Fix For: 3.2, 4.0

 Attachments: SOLR-2500.patch, SOLR-2500.patch, SOLR-2500.patch, 
 solr-after-1st-run.xml, solr-clean.xml


 [junit] Testsuite: 
 org.apache.solr.client.solrj.embedded.TestSolrProperties
 [junit] Testcase: 
 testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): 
 Caused an ERROR
 [junit] No such core: core0
 [junit] org.apache.solr.common.SolrException: No such core: core0
 [junit] at 
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118)
 [junit] at 
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
 [junit] at 
 org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Reopened] (SOLR-2500) TestSolrProperties sometimes fails with no such core: core0


 [ 
https://issues.apache.org/jira/browse/SOLR-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe reopened SOLR-2500:
---

  Assignee: Steven Rowe  (was: Doron Cohen)

Reopening to address Windows test failure.

 TestSolrProperties sometimes fails with no such core: core0
 -

 Key: SOLR-2500
 URL: https://issues.apache.org/jira/browse/SOLR-2500
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Steven Rowe
 Fix For: 3.2, 4.0

 Attachments: SOLR-2500.patch, SOLR-2500.patch, SOLR-2500.patch, 
 solr-after-1st-run.xml, solr-clean.xml


 [junit] Testsuite: 
 org.apache.solr.client.solrj.embedded.TestSolrProperties
 [junit] Testcase: 
 testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): 
 Caused an ERROR
 [junit] No such core: core0
 [junit] org.apache.solr.common.SolrException: No such core: core0
 [junit] at 
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118)
 [junit] at 
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
 [junit] at 
 org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2500) TestSolrProperties sometimes fails with no such core: core0


[ 
https://issues.apache.org/jira/browse/SOLR-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061557#comment-13061557
 ] 

Robert Muir commented on SOLR-2500:
---

seems like calling gc() is just masking the problem? we should hunt down which 
finalizer is closing the file and explicitly close instead / fix the leak?

 TestSolrProperties sometimes fails with no such core: core0
 -

 Key: SOLR-2500
 URL: https://issues.apache.org/jira/browse/SOLR-2500
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Steven Rowe
 Fix For: 3.2, 4.0

 Attachments: SOLR-2500.patch, SOLR-2500.patch, SOLR-2500.patch, 
 solr-after-1st-run.xml, solr-clean.xml


 [junit] Testsuite: 
 org.apache.solr.client.solrj.embedded.TestSolrProperties
 [junit] Testcase: 
 testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): 
 Caused an ERROR
 [junit] No such core: core0
 [junit] org.apache.solr.common.SolrException: No such core: core0
 [junit] at 
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118)
 [junit] at 
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
 [junit] at 
 org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2538) Math overflow in LongRangeEndpointCalculator and DoubleRangeEndpointCalculator

2011-07-07 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-2538.


   Resolution: Fixed
Fix Version/s: 4.0
   3.4
 Assignee: Hoss Man

Erbi: thanks for catching this.  looks like a cut/paste error, but i went ahead 
and added a test to reduce the risk of future regression.

Committed revision 1144014. - trunk
Committed revision 1144016. - 3x


 Math overflow in LongRangeEndpointCalculator and 
 DoubleRangeEndpointCalculator 
 ---

 Key: SOLR-2538
 URL: https://issues.apache.org/jira/browse/SOLR-2538
 Project: Solr
  Issue Type: Bug
Affects Versions: 3.1
 Environment: AMD64+Ubuntu 10.10
Reporter: Erbi Hanka
Assignee: Hoss Man
 Fix For: 3.4, 4.0


 In the classes LongRangeEndpointCalculator and DoubleRangeEndpointCalculator, 
 in the method parseAndAddGap, there is a loss of precision:
 1318  private static class DoubleRangeEndpointCalculator
 1319  extends RangeEndpointCalculator {
 1320   
 1321  public DoubleRangeEndpointCalculator(final SchemaField f) { 
 super(f); }
 1322  @Override
 1323  protected Double parseVal(String rawval) {
 1324return Double.valueOf(rawval);
 1325  }
 1326  @Override
 1327  public Double parseAndAddGap(Double value, String gap) {
 1328 ---  return new Double(value.floatValue() + 
 Double.valueOf(gap).floatValue()); --
 1329  }
 1330  
 [..]
 1344private static class LongRangeEndpointCalculator
 1345  extends RangeEndpointCalculator {
 1346   
 1347  public LongRangeEndpointCalculator(final SchemaField f) { super(f); 
 }
 1348  @Override
 1349  protected Long parseVal(String rawval) {
 1350return Long.valueOf(rawval);
 1351  }
 1352  @Override
 1353  public Long parseAndAddGap(Long value, String gap) {
 1354  return new Long(value.intValue() + 
 Long.valueOf(gap).intValue()); ---
 1355  }
 1356}
 As result, the following code is detecting a data overflow because the long 
 number is being treated as an integer:
 1068  while (low.compareTo(end)  0) {
 1069T high = calc.addGap(low, gap);
 1070if (end.compareTo(high)  0) {
 1071  if 
 (params.getFieldBool(f,FacetParams.FACET_RANGE_HARD_END,false)) {
 1072high = end;
 1073  } else {
 1074end = high;
 1075  }
 1076}
 1077if (high.compareTo(low)  0) {
 1078  throw new SolrException
 1079(SolrException.ErrorCode.BAD_REQUEST,
 1080 range facet infinite loop (is gap negative? did the math 
 overflow?));
 1081}
 1082 
 Changing the 'intValue()' by a 'longValue()' and  the 'floatValue()' by 
 'doubleValue()' should work. We have detected this bug when faceting a very 
 long start and end values. We have tested edge values (the transition from 32 
 to 64 bits) and any value below the threshold works fine. Any value greater 
 than 2^32 doesn't work. We have not tested the 'double' version, but seems 
 that can suffer from the same problem.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2500) TestSolrProperties sometimes fails with no such core: core0


[ 
https://issues.apache.org/jira/browse/SOLR-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061580#comment-13061580
 ] 

Steven Rowe commented on SOLR-2500:
---

bq. seems like calling gc() is just masking the problem? we should hunt down 
which finalizer is closing the file and explicitly close instead / fix the leak?

I agree.

I tracked down the actual file activity to SolrXMLSerializer.persistFile() - 
this class was created as part of SOLR-2331, which Mark M. committed 2 days 
ago; the timing makes it the likely culprit.

 TestSolrProperties sometimes fails with no such core: core0
 -

 Key: SOLR-2500
 URL: https://issues.apache.org/jira/browse/SOLR-2500
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Steven Rowe
 Fix For: 3.2, 4.0

 Attachments: SOLR-2500.patch, SOLR-2500.patch, SOLR-2500.patch, 
 solr-after-1st-run.xml, solr-clean.xml


 [junit] Testsuite: 
 org.apache.solr.client.solrj.embedded.TestSolrProperties
 [junit] Testcase: 
 testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): 
 Caused an ERROR
 [junit] No such core: core0
 [junit] org.apache.solr.common.SolrException: No such core: core0
 [junit] at 
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118)
 [junit] at 
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
 [junit] at 
 org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2641) Auto Facet Selection component

Auto Facet Selection component
--

 Key: SOLR-2641
 URL: https://issues.apache.org/jira/browse/SOLR-2641
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Reporter: Erik Hatcher
Assignee: Erik Hatcher
Priority: Minor


It sure would be nice if you could have Solr automatically select field(s) for 
faceting based dynamically off the profile of the results.  For example, you're 
indexing disparate types of products, all with varying attributes (color, size 
- like for apparel, memory_size - for electronics, subject - for books, etc), 
and a user searches for ipod where most products match products with color 
and memory_size attributes... let's automatically facet on those fields.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2641) Auto Facet Selection component


 [ 
https://issues.apache.org/jira/browse/SOLR-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher updated SOLR-2641:
---

Attachment: SOLR_2641.patch

Basic implementation of a search component to put after query and before facet 
that keys off a fields used field (see SOLR-1280 for how this can be created 
automatically too), selects the top N fields and sets those as facet.field's 
automatically.

 Auto Facet Selection component
 --

 Key: SOLR-2641
 URL: https://issues.apache.org/jira/browse/SOLR-2641
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Reporter: Erik Hatcher
Assignee: Erik Hatcher
Priority: Minor
 Attachments: SOLR_2641.patch


 It sure would be nice if you could have Solr automatically select field(s) 
 for faceting based dynamically off the profile of the results.  For example, 
 you're indexing disparate types of products, all with varying attributes 
 (color, size - like for apparel, memory_size - for electronics, subject - for 
 books, etc), and a user searches for ipod where most products match 
 products with color and memory_size attributes... let's automatically facet 
 on those fields.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2641) Auto Facet Selection component

[
https://issues.apache.org/jira/browse/SOLR-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061584#comment-13061584
]

Erik Hatcher commented on SOLR-2641:

There's loads of room for improvement here, and likely there's better ways to
go about even the simple stuff I've done in this initial patch. Some ideas for
improvement: pluggable implementations to determine the best facets to
auto-select given the current request and results, ability to tailor the
parameters for each field selected for faceting (should facets be sorted by
index or count order? mincount? limit? how to determine these for each
field?).

Auto Facet Selection component
--

Key: SOLR-2641
URL: https://issues.apache.org/jira/browse/SOLR-2641
Project: Solr
Issue Type: Improvement
Components: SearchComponents - other
Reporter: Erik Hatcher
Assignee: Erik Hatcher
Priority: Minor
Attachments: SOLR_2641.patch

It sure would be nice if you could have Solr automatically select field(s)
for faceting based dynamically off the profile of the results. For example,
you're indexing disparate types of products, all with varying attributes
(color, size - like for apparel, memory_size - for electronics, subject - for
books, etc), and a user searches for ipod where most products match
products with color and memory_size attributes... let's automatically facet
on those fields.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2641) Auto Facet Selection component


[ 
https://issues.apache.org/jira/browse/SOLR-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061588#comment-13061588
 ] 

Erik Hatcher commented on SOLR-2641:


What's needed for this type of thing to do the right thing with distributed 
search?   The delegating server will need to cull together the counts (in this 
current implementation) to determine the best field(s) to facet on before 
distributing those requests to ensure each shard is faceting on the same 
field(s).  Not sure, yet, how to go about that.

 Auto Facet Selection component
 --

 Key: SOLR-2641
 URL: https://issues.apache.org/jira/browse/SOLR-2641
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Reporter: Erik Hatcher
Assignee: Erik Hatcher
Priority: Minor
 Attachments: SOLR_2641.patch


 It sure would be nice if you could have Solr automatically select field(s) 
 for faceting based dynamically off the profile of the results.  For example, 
 you're indexing disparate types of products, all with varying attributes 
 (color, size - like for apparel, memory_size - for electronics, subject - for 
 books, etc), and a user searches for ipod where most products match 
 products with color and memory_size attributes... let's automatically facet 
 on those fields.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2230) solrj: submitting more than one stream/file via CommonsHttpSolrServer fails

2011-07-07 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-2230.


   Resolution: Fixed
Fix Version/s: 4.0
   3.4
 Assignee: Hoss Man

Although CommonsHttpSolrServer's code for dealing with multiple streams had 
canged significantly since Stephan posted his patch, a simple test verified 
that multiple addFile calls did not work.

I've committed some improved tests, along with massaged version of Stephan's 
test

Committed revision 1144038. - trunk
Committed revision 1144041. - 3x


 solrj: submitting more than one stream/file via CommonsHttpSolrServer fails
 ---

 Key: SOLR-2230
 URL: https://issues.apache.org/jira/browse/SOLR-2230
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 1.4.1
Reporter: Stephan Günther
Assignee: Hoss Man
 Fix For: 3.4, 4.0

 Attachments: 
 0001-solrj-fix-submitting-more-that-one-stream-via-multip.patch


 If you are using an HTTP-client (CommonsHttpSolrServer) to connect to Solr, 
 you are unable to push more than one File/Stream over the wire. 
 For example, if you call 
 ContentStreamUpdateRequest.addContentStream()/.addFile() twice to index both 
 files via Tika, you get the following exception at your Solr server:
 15:48:59 [ERROR] http-8983-1 [org.apache.solr.core.SolrCore] - 
 org.apache.solr.common.SolrException: missing content stream
   at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:49)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
   at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
   at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
   at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
   at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
   at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
   at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
   at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
   at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857)
   at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
   at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
   at java.lang.Thread.run(Thread.java:619)
 Seems that the POST body send by CommonsHttpSolrServer is not correct.
 If you push only one file, everything works as expected.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[Lucene.Net] [jira] [Commented] (LUCENENET-172) This patch fixes the unexceptional exceptions ecountered in FastCharStream and SupportClass


[ 
https://issues.apache.org/jira/browse/LUCENENET-172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061595#comment-13061595
 ] 

Digy commented on LUCENENET-172:


Already fixed for 2.9.4g

 This patch fixes the unexceptional exceptions ecountered in FastCharStream 
 and SupportClass
 ---

 Key: LUCENENET-172
 URL: https://issues.apache.org/jira/browse/LUCENENET-172
 Project: Lucene.Net
  Issue Type: Improvement
  Components: Lucene.Net Core
Affects Versions: Lucene.Net 2.3.1, Lucene.Net 2.3.2
Reporter: Ben Martz
Assignee: Scott Lombard
Priority: Minor
 Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g

 Attachments: lucene_2.3.1_exceptions_fix.patch, 
 lucene_2.9.4g_exceptions_fix


 The java version of Lucene handles end-of-file in FastCharStream by throwing 
 an exception. This behavior has been ported to .NET but the behavior carries 
 an unacceptable cost in the .NET environment. This patch is based on the 
 prior work in LUCENENET-8 and LUCENENET-11, which I gratefully acknowledge 
 for the solution. While I understand that this patch is outside of the 
 current project specification in that it deviates from the pure nature of 
 the port, I believe that it is very important to make the patch available to 
 any developer looking to leverage Lucene.Net in their project. Thanks for 
 your consideration.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[Lucene.Net] [jira] [Updated] (LUCENENET-172) This patch fixes the unexceptional exceptions ecountered in FastCharStream and SupportClass


 [ 
https://issues.apache.org/jira/browse/LUCENENET-172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Digy updated LUCENENET-172:
---

Fix Version/s: Lucene.Net 2.9.4g

 This patch fixes the unexceptional exceptions ecountered in FastCharStream 
 and SupportClass
 ---

 Key: LUCENENET-172
 URL: https://issues.apache.org/jira/browse/LUCENENET-172
 Project: Lucene.Net
  Issue Type: Improvement
  Components: Lucene.Net Core
Affects Versions: Lucene.Net 2.3.1, Lucene.Net 2.3.2
Reporter: Ben Martz
Assignee: Scott Lombard
Priority: Minor
 Fix For: Lucene.Net 2.9.4, Lucene.Net 2.9.4g

 Attachments: lucene_2.3.1_exceptions_fix.patch, 
 lucene_2.9.4g_exceptions_fix


 The java version of Lucene handles end-of-file in FastCharStream by throwing 
 an exception. This behavior has been ported to .NET but the behavior carries 
 an unacceptable cost in the .NET environment. This patch is based on the 
 prior work in LUCENENET-8 and LUCENENET-11, which I gratefully acknowledge 
 for the solution. While I understand that this patch is outside of the 
 current project specification in that it deviates from the pure nature of 
 the port, I believe that it is very important to make the patch available to 
 any developer looking to leverage Lucene.Net in their project. Thanks for 
 your consideration.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 9415 - Failure

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/9415/

No tests ran.

Build Log (for compile errors):
[...truncated 2658 lines...]
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/src/solrj/org/apache/solr/client/solrj/response/DocumentAnalysisResponse.java:53:
 warning: [unchecked] unchecked cast
[javac] found   : java.lang.Object
[javac] required: org.apache.solr.common.util.NamedListjava.lang.Object
[javac] NamedListObject query = (NamedListObject) 
field.get(query);
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/src/solrj/org/apache/solr/client/solrj/response/DocumentAnalysisResponse.java:59:
 warning: [unchecked] unchecked cast
[javac] found   : java.lang.Object
[javac] required: org.apache.solr.common.util.NamedListjava.lang.Object
[javac] NamedListObject index = (NamedListObject) 
field.get(index);
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/src/solrj/org/apache/solr/client/solrj/response/DocumentAnalysisResponse.java:62:
 warning: [unchecked] unchecked cast
[javac] found   : java.lang.Object
[javac] required: org.apache.solr.common.util.NamedListjava.lang.Object
[javac]   NamedListObject valueNL = (NamedListObject) 
valueEntry.getValue();
[javac] 
 ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/src/solrj/org/apache/solr/client/solrj/response/FieldAnalysisResponse.java:47:
 warning: [unchecked] unchecked cast
[javac] found   : java.lang.Object
[javac] required: org.apache.solr.common.util.NamedListjava.lang.Object
[javac] NamedListObject fieldTypesNL = (NamedListObject) 
analysisNL.get(field_types);
[javac] 
   ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/src/solrj/org/apache/solr/client/solrj/response/FieldAnalysisResponse.java:51:
 warning: [unchecked] unchecked cast
[javac] found   : java.lang.Object
[javac] required: org.apache.solr.common.util.NamedListjava.lang.Object
[javac]   NamedListObject queryNL = (NamedListObject) 
fieldTypeNL.get(query);
[javac] 
 ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/src/solrj/org/apache/solr/client/solrj/response/FieldAnalysisResponse.java:54:
 warning: [unchecked] unchecked cast
[javac] found   : java.lang.Object
[javac] required: org.apache.solr.common.util.NamedListjava.lang.Object
[javac]   NamedListObject indexNL = (NamedListObject) 
fieldTypeNL.get(index);
[javac] 
 ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/src/solrj/org/apache/solr/client/solrj/response/FieldAnalysisResponse.java:61:
 warning: [unchecked] unchecked cast
[javac] found   : java.lang.Object
[javac] required: org.apache.solr.common.util.NamedListjava.lang.Object
[javac] NamedListObject fieldNamesNL = (NamedListObject) 
analysisNL.get(field_names);
[javac] 
   ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/src/solrj/org/apache/solr/client/solrj/response/FieldAnalysisResponse.java:65:
 warning: [unchecked] unchecked cast
[javac] found   : java.lang.Object
[javac] required: org.apache.solr.common.util.NamedListjava.lang.Object
[javac]   NamedListObject queryNL = (NamedListObject) 
fieldNameNL.get(query);
[javac] 
 ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/src/solrj/org/apache/solr/client/solrj/response/FieldAnalysisResponse.java:68:
 warning: [unchecked] unchecked cast
[javac] found   : java.lang.Object
[javac] required: org.apache.solr.common.util.NamedListjava.lang.Object
[javac]   NamedListObject indexNL = (NamedListObject) 
fieldNameNL.get(index);
[javac] 
 ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/src/solrj/org/apache/solr/client/solrj/request/JavaBinUpdateRequestCodec.java:54:
 warning: [unchecked] unchecked call to add(java.lang.String,T) as a member of 
the raw type org.apache.solr.common.util.NamedList
[javac]   params.add(commitWithin,

Re: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 9415 - Failure

2011-07-07 Thread Chris Hostetter


Hmmm, not sure what i fucked up here...

[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/src/solrj/org/apache/solr/client/solrj/impl/CommonsHttpSolrServer.java:328:
 
method does not override a method from its superclass
[javac] @Override
[javac]  ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/solr/src/solrj/org/apache/solr/client/solrj/impl/CommonsHttpSolrServer.java:337:
 
method does not override a method from its superclass
[javac] @Override
[javac]  ^


...i'm not seeing this locally ... investigating.


-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2331) Refactor CoreContainer's SolrXML serialization code and improve testing


 [ 
https://issues.apache.org/jira/browse/SOLR-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated SOLR-2331:
--

Attachment: SOLR-2331-fix-windows-file-deletion-failure.patch

I reopened SOLR-2500 because TestSolrProperties is failing consistently on 
Windows 7/Oracle JDK 1.6.0_21 for me, but it appears that this is the issue 
that introduced the problem.

I've tracked the issue down to the anonymous {{FileInputStream}} created in 
order to print out the contents of the persisted core configuration to STDOUT 
-- the following line was uncommented when Mark committed the patch on this 
issue:

{code:java}
206:  System.out.println(IOUtils.toString(new FileInputStream(new 
File(solrXml.getParent(), solr-persist.xml;
{code}

This patch de-anonymizes the {{FileInputStream}} and closes it after the file 
contents are printed out.

I plan to commit this later tonight.


 Refactor CoreContainer's SolrXML serialization code and improve testing
 ---

 Key: SOLR-2331
 URL: https://issues.apache.org/jira/browse/SOLR-2331
 Project: Solr
  Issue Type: Improvement
  Components: multicore
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2331-fix-windows-file-deletion-failure.patch, 
 SOLR-2331.patch


 CoreContainer has enough code in it - I'd like to factor out the solr.xml 
 serialization code into SolrXMLSerializer or something - which should make 
 testing it much easier and lightweight.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2500) TestSolrProperties sometimes fails with no such core: core0


 [ 
https://issues.apache.org/jira/browse/SOLR-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe resolved SOLR-2500.
---

Resolution: Fixed
  Assignee: Doron Cohen  (was: Steven Rowe)

I attached a patch with a fix to SOLR-2331, which introduced the problem.

 TestSolrProperties sometimes fails with no such core: core0
 -

 Key: SOLR-2500
 URL: https://issues.apache.org/jira/browse/SOLR-2500
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Doron Cohen
 Fix For: 4.0, 3.2

 Attachments: SOLR-2500.patch, SOLR-2500.patch, SOLR-2500.patch, 
 solr-after-1st-run.xml, solr-clean.xml


 [junit] Testsuite: 
 org.apache.solr.client.solrj.embedded.TestSolrProperties
 [junit] Testcase: 
 testProperties(org.apache.solr.client.solrj.embedded.TestSolrProperties): 
 Caused an ERROR
 [junit] No such core: core0
 [junit] org.apache.solr.common.SolrException: No such core: core0
 [junit] at 
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:118)
 [junit] at 
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
 [junit] at 
 org.apache.solr.client.solrj.embedded.TestSolrProperties.testProperties(TestSolrProperties.java:128)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1260)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1189)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2331) Refactor CoreContainer's SolrXML serialization code and improve testing


 [ 
https://issues.apache.org/jira/browse/SOLR-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated SOLR-2331:
--

Attachment: SOLR-2331-fix-windows-file-deletion-failure.patch

This version of the patch wraps the persisted core config printing to STDOUT in 
an {{if (VERBOSE)}} block.

Committing shortly.

 Refactor CoreContainer's SolrXML serialization code and improve testing
 ---

 Key: SOLR-2331
 URL: https://issues.apache.org/jira/browse/SOLR-2331
 Project: Solr
  Issue Type: Improvement
  Components: multicore
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2331-fix-windows-file-deletion-failure.patch, 
 SOLR-2331-fix-windows-file-deletion-failure.patch, SOLR-2331.patch


 CoreContainer has enough code in it - I'd like to factor out the solr.xml 
 serialization code into SolrXMLSerializer or something - which should make 
 testing it much easier and lightweight.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2331) Refactor CoreContainer's SolrXML serialization code and improve testing