[jira] Commented: (SOLR-1204) Enhance SpellingQueryConverter to handle UTF-8 instead of ASCII only

2009-09-04 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751337#action_12751337
 ] 

Shalin Shekhar Mangar commented on SOLR-1204:
-

bq. Since this ticket is marked resolved, I filed SOLR-1407 to point out some 
closely related problems.

Yes, that is how I remembered this one :)

 Enhance SpellingQueryConverter to handle UTF-8 instead of ASCII only
 

 Key: SOLR-1204
 URL: https://issues.apache.org/jira/browse/SOLR-1204
 Project: Solr
  Issue Type: Improvement
  Components: spellchecker
Affects Versions: 1.3
Reporter: Michael Ludwig
Assignee: Shalin Shekhar Mangar
Priority: Trivial
 Fix For: 1.4

 Attachments: SpellingQueryConverter.java.diff, 
 SpellingQueryConverter.java.diff


 Solr - User - SpellCheckComponent: queryAnalyzerFieldType
 http://www.nabble.com/SpellCheckComponent%3A-queryAnalyzerFieldType-td23870668.html
 In the above thread, it was suggested to extend the SpellingQueryConverter to 
 cover the full UTF-8 range instead of handling US-ASCII only. This might be 
 as simple as changing the regular expression used to tokenize the input 
 string to accept a sequence of one or more Unicode letters ( \p{L}+ ) instead 
 of a sequence of one or more word characters ( \w+ ).
 See http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html for 
 Java regular expression reference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-236) Field collapsing

2009-09-04 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751358#action_12751358
 ] 

Martijn van Groningen commented on SOLR-236:


Hi Abdul, nice improvements. It makes absolutely sense to keep the field values 
around during the collapsing as a StringIndex. From what I understand the 
StringIndex does not have duplicate string values, whereas the plain string 
array has. This will lower the memory footprint. I will add these improvements 
to the next patch. Thanks for pointing this out! 

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, 
 SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1410) remove deprecated custom encoding support in russian/greek analysis

2009-09-04 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751374#action_12751374
 ] 

Shalin Shekhar Mangar commented on SOLR-1410:
-

bq. I don't think we've ever really had a situation like this ...logging a 
warning seems like the right course of action for now ... 

We actually have done this in DataImportHandler in relation to the syntax for 
evaluators. Logging a warning is the right way to go.


 remove deprecated custom encoding support in russian/greek analysis
 ---

 Key: SOLR-1410
 URL: https://issues.apache.org/jira/browse/SOLR-1410
 Project: Solr
  Issue Type: Task
  Components: Analysis
Reporter: Robert Muir
Priority: Minor
 Attachments: SOLR-1410.patch


 In this case, analyzers have strange encoding support and it has been 
 deprecated in lucene.
 For example someone using CP1251 in the russian analyzer is simply storing Ж 
 as 0xC6, its being represented as Æ
 LUCENE-1793: Deprecate the custom encoding support in the Greek and Russian
 Analyzers. If you need to index text in these encodings, please use Java's
 character set conversion facilities (InputStreamReader, etc) during I/O, 
 so that Lucene can analyze this text as Unicode instead.
 I noticed in solr, the factories for these tokenstreams allow these 
 configuration options, which are deprecated in 2.9 to be removed in 3.0
 Let me know the policy (how do you deprecate a config option in solr exactly, 
 log a warning, etc?) and I'd be happy to create a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson build is back to normal: Solr-trunk #914

2009-09-04 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Solr-trunk/914/changes




[jira] Updated: (SOLR-1407) SpellingQueryConverter now disallows underscores and digits in field names (but allows all UTF-8 letters)

2009-09-04 Thread Michael Ludwig (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ludwig updated SOLR-1407:
-

Attachment: SpellingQueryConverter.java

As announced in SOLR-1204, I'm posting the version I had prepared back in June. 
Maybe it is useful, maybe not. The question of why there is this extra sequence 
of digits in the regular expression is still entirely unclear to me. Caveat 
emptor!

 SpellingQueryConverter now disallows underscores and digits in field names 
 (but allows all UTF-8 letters)
 -

 Key: SOLR-1407
 URL: https://issues.apache.org/jira/browse/SOLR-1407
 Project: Solr
  Issue Type: Improvement
  Components: spellchecker
Affects Versions: 1.3
Reporter: David Bowen
Assignee: Shalin Shekhar Mangar
Priority: Trivial
 Fix For: 1.4

 Attachments: SpellingQueryConverter.java, SpellingQueryConverter.java


 SpellingQueryConverter was extended to cover the full UTF-8 range instead of 
 handling US-ASCII only, but in the process it was broken for field names that 
 contain underscores or digits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1408) Allow classes from ${solr.home}/lib to be loaded by the same classloader as solr war to prevent ClassCastException

2009-09-04 Thread Luke Forehand (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751487#action_12751487
 ] 

Luke Forehand commented on SOLR-1408:
-

This is also happening when I try to extend EventListener, I get the mysterious 
ClassCastException from within Solr.  I am running solr from a jetty server, 
specifying solr.home using JNDI, and I am starting the jetty server from within 
a unit test for integration testing purposes.

 Allow classes from ${solr.home}/lib to be loaded by the same classloader as 
 solr war to prevent ClassCastException
 --

 Key: SOLR-1408
 URL: https://issues.apache.org/jira/browse/SOLR-1408
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.3
Reporter: Luke Forehand

 When extending org.apache.solr.handler.dataimport.DataSource, I would like to 
 package my extended class in ${solr.home}/lib to that I can keep the vanilla 
 copy of my solr.war intact.  The problem is I encounter a ClassCastException 
 when Solr tries to create a newInstance of my extended class, which I suspect 
 has to do with the DataSource and my extended class being loaded from 
 different classloaders.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1411) SolrJ SolrCell Request

2009-09-04 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-1411:
--

Attachment: SOLR-1411.patch

Adds SolrCellRequest to the SolrJ common.  Will commit in a day or two.

 SolrJ SolrCell Request
 --

 Key: SOLR-1411
 URL: https://issues.apache.org/jira/browse/SOLR-1411
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Trivial
 Fix For: 1.4

 Attachments: SOLR-1411.patch


 Create a SolrRequest for SolrJ that can add Solr Cell documents (PDF, Word, 
 etc.) to Solr for indexing.
 Patch shortly

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-1400) Document with empty or white-space only string causes exception with TrimFilter

2009-09-04 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned SOLR-1400:
-

Assignee: Grant Ingersoll

 Document with empty or white-space only string causes exception with 
 TrimFilter
 ---

 Key: SOLR-1400
 URL: https://issues.apache.org/jira/browse/SOLR-1400
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.4
Reporter: Peter Wolanin
Assignee: Grant Ingersoll
 Fix For: 1.4

 Attachments: trim-example.xml


 Observed with Solr trunk.  Posting any empty or whitespace-only string to a 
 field using the {code}filter class=solr.TrimFilterFactory /{code}
 Causes a java exception:
 {code}
 Sep 1, 2009 4:58:09 PM org.apache.solr.common.SolrException log
 SEVERE: java.lang.ArrayIndexOutOfBoundsException: -1
   at 
 org.apache.solr.analysis.TrimFilter.incrementToken(TrimFilter.java:63)
   at 
 org.apache.solr.analysis.PatternReplaceFilter.incrementToken(PatternReplaceFilter.java:74)
   at 
 org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:138)
   at 
 org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244)
   at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772)
   at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:755)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2611)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2583)
   at 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241)
   at 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
   at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
   at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
   at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 {code}
 Trim of an empty or WS-only string should not fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Lucene RC2

2009-09-04 Thread Grant Ingersoll


On Aug 29, 2009, at 3:38 PM, Yonik Seeley wrote:


On Sat, Aug 29, 2009 at 5:44 PM, Bill Aubill.w...@gmail.com wrote:

Yonik,
 Are you in the process of trying it out or upgrading Solr, or  
both?

Bill


It's done: http://svn.apache.org/viewvc?view=revrevision=809010


You should add a note to CHANGES.txt.


[jira] Commented: (SOLR-1400) Document with empty or white-space only string causes exception with TrimFilter

2009-09-04 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751511#action_12751511
 ] 

Grant Ingersoll commented on SOLR-1400:
---

Hmm, trimFilter has a test for all whitespace

 Document with empty or white-space only string causes exception with 
 TrimFilter
 ---

 Key: SOLR-1400
 URL: https://issues.apache.org/jira/browse/SOLR-1400
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.4
Reporter: Peter Wolanin
Assignee: Grant Ingersoll
 Fix For: 1.4

 Attachments: trim-example.xml


 Observed with Solr trunk.  Posting any empty or whitespace-only string to a 
 field using the {code}filter class=solr.TrimFilterFactory /{code}
 Causes a java exception:
 {code}
 Sep 1, 2009 4:58:09 PM org.apache.solr.common.SolrException log
 SEVERE: java.lang.ArrayIndexOutOfBoundsException: -1
   at 
 org.apache.solr.analysis.TrimFilter.incrementToken(TrimFilter.java:63)
   at 
 org.apache.solr.analysis.PatternReplaceFilter.incrementToken(PatternReplaceFilter.java:74)
   at 
 org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:138)
   at 
 org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244)
   at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772)
   at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:755)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2611)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2583)
   at 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241)
   at 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
   at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
   at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
   at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 {code}
 Trim of an empty or WS-only string should not fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Solr development with IntelliJIDEA - looking for advice

2009-09-04 Thread Lukáš Vlček
Grant,
Are you able to run single unit test from IDEA? How do you setup resource
folders for tests in this case?
Or do you run it manually from command line via ant?

Regards,
Lukas

On Thu, Sep 3, 2009 at 4:05 PM, Grant Ingersoll gsing...@apache.org wrote:

 I usually skip through the Wizard stuff as fast as possible and then just
 add the modules by hand, as IntelliJ thinks it is smart at this stuff when
 it really isn't.  For the core Solr, I create a Project Library dependency
 that has 3 JAR Directories as dependencies:
 ./lib
 example/lib
 example/lib/jsp-2.1

 YMMV.

 This is one place where Maven is _so much better_ than Ant.  Point IntelliJ
 at the pom.xml, and you have it all setup, including all the submodules,
 etc.


 On Sep 3, 2009, at 6:42 AM, Lukáš Vlček wrote:

  Hello,
 I noticed that several developers (Yonik, Grant, ... ?) are using
 IntelliJIDEA for Solr development. Is anybody willing to share his/her
 experience about how to setup and open Solr project in IntelliJIDEA? I am
 quite new to IntelliJIDEA and I would greatly appreciate any *how-to* or
 *for dummies* step-by-step tutorial. I tried to create a new project in
 IDEA
 from existing sources (fresh solr-trunk) and simply followed the wizard
 but
 this does not seem to be the best option (getting some circular
 dependencies
 and missing classpath issues).

 Note: I am using IntelliJIDEA 8.1.3

 Regards,
 Lukas


 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
 Solr/Lucene:
 http://www.lucidimagination.com/search




Re: capturing field length into a stored document field

2009-09-04 Thread mike.schultz

Sorry wrong list


mike.schultz wrote:
 
 For various statistics I collect from an index it's important for me to
 know the length (measured in tokens) of a document field.  I can get that
 information to some degree from the norms for the field but a) the
 resolution isn't that great, and b) more importantly, if boosts are used
 it's almost impossible to get lengths from this.
 
 Here's two ideas I was thinking about that maybe some can comment on.
 
 1) Use copyto to copy the field in question, fieldA to an addition field,
 fieldALength, which has an extra filter that just counts the tokens and
 only outputs a token representing the length of the field.  This has the
 disadvantage of retokenizing basically the whole document (because the
 field in question is basically the body).  Plus I would think littering
 the term space with these tokens might be bad for performance, I'm not
 sure.
 
 2) Add a filter to the field in question which again counts the tokens. 
 This filter allows the regular tokens to be indexed as usual but somehow
 manages to get the token-count into a stored field of the document.  This
 has the advantage of not having to retokenize the field and instead of
 littering the token space, the count becomes docdata for each doc.  Can
 this be done?  Maybe using threadLocal to temporarily store the count?
 
 Thanks.
 

-- 
View this message in context: 
http://www.nabble.com/capturing-field-length-into-a-stored-document-field-tp25297597p25297661.html
Sent from the Solr - Dev mailing list archive at Nabble.com.



[jira] Updated: (SOLR-1400) Document with empty or white-space only string causes exception with TrimFilter

2009-09-04 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-1400:
--

Attachment: SOLR-1400.patch

Try this out.

 Document with empty or white-space only string causes exception with 
 TrimFilter
 ---

 Key: SOLR-1400
 URL: https://issues.apache.org/jira/browse/SOLR-1400
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.4
Reporter: Peter Wolanin
Assignee: Grant Ingersoll
 Fix For: 1.4

 Attachments: SOLR-1400.patch, trim-example.xml


 Observed with Solr trunk.  Posting any empty or whitespace-only string to a 
 field using the {code}filter class=solr.TrimFilterFactory /{code}
 Causes a java exception:
 {code}
 Sep 1, 2009 4:58:09 PM org.apache.solr.common.SolrException log
 SEVERE: java.lang.ArrayIndexOutOfBoundsException: -1
   at 
 org.apache.solr.analysis.TrimFilter.incrementToken(TrimFilter.java:63)
   at 
 org.apache.solr.analysis.PatternReplaceFilter.incrementToken(PatternReplaceFilter.java:74)
   at 
 org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:138)
   at 
 org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244)
   at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772)
   at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:755)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2611)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2583)
   at 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241)
   at 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
   at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
   at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
   at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 {code}
 Trim of an empty or WS-only string should not fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1406) Add ability to retrieve DataConfig from dataimport Context

2009-09-04 Thread Luke Forehand (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751587#action_12751587
 ] 

Luke Forehand commented on SOLR-1406:
-

I could extend FileListEntityProcessor if it was written in a more extensible 
way, for example, exposing it's baseUrl and fileName private members with 
accessor methods, and refactoring some of the private methods that do fileName 
filtering so that they are reusable and protected.

 Add ability to retrieve DataConfig from dataimport Context
 --

 Key: SOLR-1406
 URL: https://issues.apache.org/jira/browse/SOLR-1406
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.4
Reporter: Luke Forehand
Assignee: Noble Paul
 Attachments: SOLR-1406.patch


 The ability to retrieve the DataConfig is very useful for inspecting 
 configuration attributes within an EventListener!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Lucene RC2

2009-09-04 Thread Mark Miller

I keep sending emails from the wrong account: attempt 2:

I think it's kind of weird how we add an entry every update - IMO it  
should be one entry- upgraded to Lucene 2.9. That's going to be the  
only change.


- Mark

http://www.lucidimagination.com (mobile)

On Sep 4, 2009, at 12:03 PM, Grant Ingersoll gsing...@apache.org  
wrote:




On Aug 29, 2009, at 3:38 PM, Yonik Seeley wrote:


On Sat, Aug 29, 2009 at 5:44 PM, Bill Aubill.w...@gmail.com wrote:

Yonik,
Are you in the process of trying it out or upgrading Solr, or  
both?

Bill


It's done: http://svn.apache.org/viewvc?view=revrevision=809010


You should add a note to CHANGES.txt.


[jira] Created: (SOLR-1412) Add solr-lucene-memory and solr-lucene-misc jars to maven repository

2009-09-04 Thread Igor Motov (JIRA)
Add solr-lucene-memory and solr-lucene-misc jars to maven repository


 Key: SOLR-1412
 URL: https://issues.apache.org/jira/browse/SOLR-1412
 Project: Solr
  Issue Type: Wish
Affects Versions: 1.4
Reporter: Igor Motov
Priority: Minor


Since solr-lucene-memory and solr-lucene-misc jars were added to the 
distribution (see [SOLR-804|https://issues.apache.org/jira/browse/SOLR-804]) it 
would make sense to add them to maven repository as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1412) Add solr-lucene-memory and solr-lucene-misc jars to maven repository

2009-09-04 Thread Igor Motov (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Motov updated SOLR-1412:
-

Attachment: SOLR-1412.patch

The patch that adds solr-lucene-misc and solr-lucene-memory to maven 
repository. 

 Add solr-lucene-memory and solr-lucene-misc jars to maven repository
 

 Key: SOLR-1412
 URL: https://issues.apache.org/jira/browse/SOLR-1412
 Project: Solr
  Issue Type: Wish
Affects Versions: 1.4
Reporter: Igor Motov
Priority: Minor
 Attachments: SOLR-1412.patch


 Since solr-lucene-memory and solr-lucene-misc jars were added to the 
 distribution (see [SOLR-804|https://issues.apache.org/jira/browse/SOLR-804]) 
 it would make sense to add them to maven repository as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-236) Field collapsing

2009-09-04 Thread Abdul Chaudhry (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751615#action_12751615
 ] 

Abdul Chaudhry commented on SOLR-236:
-

If this helps you fix your unit tests. I fixed the unit tests by changing the 
CollapseFilter constructor that's used for testing to take a StringIndex like 
so :-

-  CollapseFilter(int collapseMaxDocs, int collapseTreshold) {
+  CollapseFilter(int collapseMaxDocs, int collapseTreshold, 
FieldCache.StringIndex index) {
+this.collapseIndex = index;

and then I changed the unit test cases to move values into a StringIndex in 
CollapseFilterTest like so:-

   public void testNormalCollapse_collapseThresholdOne() {
-collapseFilter = new CollapseFilter(Integer.MAX_VALUE, 1);
+String[] values = new String[]{a, b, c};
+int[] order = new int[]{0, 1, 0, 2, 1, 0, 1};
+FieldCache.StringIndex index = new FieldCache.StringIndex(order, values);
+int[] docIds = new int[]{1, 2, 0, 3, 4, 5, 6};
+
+collapseFilter = new CollapseFilter(Integer.MAX_VALUE, 1, index);

-String[] values = new String[]{a, b, a, c, b, a, b};



 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, solr-236.patch, SOLR-236_collapsing.patch, 
 SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Solr development with IntelliJIDEA - looking for advice

2009-09-04 Thread Shalin Shekhar Mangar
On Fri, Sep 4, 2009 at 9:43 PM, Lukáš Vlček lukas.vl...@gmail.com wrote:

 Grant,
 Are you able to run single unit test from IDEA? How do you setup resource
 folders for tests in this case?
 Or do you run it manually from command line via ant?


To run a test from IDEA, set the start path (I don't remember the exact
name) to src/test/test-files. To run only one single test from ant use
-Dtestcase=class-name

-- 
Regards,
Shalin Shekhar Mangar.


[jira] Commented: (SOLR-1406) Add ability to retrieve DataConfig from dataimport Context

2009-09-04 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751638#action_12751638
 ] 

Shalin Shekhar Mangar commented on SOLR-1406:
-

bq. I could extend FileListEntityProcessor if it was written in a more 
extensible way, for example, exposing it's baseUrl and fileName private members 
with accessor methods, and refactoring some of the private methods that do 
fileName filtering so that they are reusable and protected.

Ah, I see. Well, that is easier than exposing DataConfig. DataConfig was never 
really meant to be exposed. We need to have another look at DataConfig before 
exposing making it a public API. How about you create an issue (or rename this 
one) to make FileListEntityProcessor more extensible rather than exposing 
DataConfig? We can get that in for 1.4.

 Add ability to retrieve DataConfig from dataimport Context
 --

 Key: SOLR-1406
 URL: https://issues.apache.org/jira/browse/SOLR-1406
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.4
Reporter: Luke Forehand
Assignee: Noble Paul
 Attachments: SOLR-1406.patch


 The ability to retrieve the DataConfig is very useful for inspecting 
 configuration attributes within an EventListener!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Lucene RC2

2009-09-04 Thread Grant Ingersoll
It's very useful to know the rev # in a place that doesn't require: 1)  
starting up Solr, 2) unpacking the Lucene jar, but yeah, we could just  
have one entry at the top or something that just lists what the  
current version and rev # are.


On Sep 4, 2009, at 2:41 PM, Mark Miller wrote:


I keep sending emails from the wrong account: attempt 2:

I think it's kind of weird how we add an entry every update - IMO it  
should be one entry- upgraded to Lucene 2.9. That's going to be the  
only change.


- Mark

http://www.lucidimagination.com (mobile)

On Sep 4, 2009, at 12:03 PM, Grant Ingersoll gsing...@apache.org  
wrote:




On Aug 29, 2009, at 3:38 PM, Yonik Seeley wrote:


On Sat, Aug 29, 2009 at 5:44 PM, Bill Aubill.w...@gmail.com wrote:

Yonik,
   Are you in the process of trying it out or upgrading Solr, or  
both?

Bill


It's done: http://svn.apache.org/viewvc?view=revrevision=809010


You should add a note to CHANGES.txt.





Re: Lucene RC2

2009-09-04 Thread Mark Miller
+1 - I'm not against knowing what the last rev upgraded to was - I also
think thats important. It just seems the Changes log should read what
changed from 1.3 or else its a little confusing. You could make another
argument with so many on trunk - but in my mind, the only thing those
going from 1.3 to 1.4 should need to worry about is upgraded to 2.9 -
not follow the whole dev path as changes invalidate changes. Not a big
deal if I am the only one that thinks that, just a thought. If we didn't
do it in general, it wouldn't matter if we didn't do with the Lucene
upgrade though.

- Mark

Grant Ingersoll wrote:
 It's very useful to know the rev # in a place that doesn't require: 1)
 starting up Solr, 2) unpacking the Lucene jar, but yeah, we could just
 have one entry at the top or something that just lists what the
 current version and rev # are.

 On Sep 4, 2009, at 2:41 PM, Mark Miller wrote:

 I keep sending emails from the wrong account: attempt 2:

 I think it's kind of weird how we add an entry every update - IMO it
 should be one entry- upgraded to Lucene 2.9. That's going to be the
 only change.

 - Mark

 http://www.lucidimagination.com (mobile)

 On Sep 4, 2009, at 12:03 PM, Grant Ingersoll gsing...@apache.org
 wrote:


 On Aug 29, 2009, at 3:38 PM, Yonik Seeley wrote:

 On Sat, Aug 29, 2009 at 5:44 PM, Bill Aubill.w...@gmail.com wrote:
 Yonik,
Are you in the process of trying it out or upgrading Solr, or
 both?
 Bill

 It's done: http://svn.apache.org/viewvc?view=revrevision=809010

 You should add a note to CHANGES.txt.




-- 
- Mark

http://www.lucidimagination.com





[jira] Updated: (SOLR-1406) Refactor FileDataSource and FileListEntityProcessor to be more extendable

2009-09-04 Thread Luke Forehand (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Forehand updated SOLR-1406:


Description: 
FileDataSource should make openStream method protected so we can extend 
FileDataSource for other File types such as GZip, by controlling the underlying 
InputStreamReader implementation being returned.

FileListEntityProcessor needs to aggregate a list of files that were processed 
and expose that list in an accessible way so that further processing on that 
file list can be done in the close method.  For example, deletion or archiving.

Another improvement would be that in the event of an indexing rollback event, 
processing of the close method either does not occur, or the close method is 
allowed access to that event, to prevent processing within the close method if 
necessary.

  was:The ability to retrieve the DataConfig is very useful for inspecting 
configuration attributes within an EventListener!

Summary: Refactor FileDataSource and FileListEntityProcessor to be more 
extendable  (was: Add ability to retrieve DataConfig from dataimport Context)

 Refactor FileDataSource and FileListEntityProcessor to be more extendable
 -

 Key: SOLR-1406
 URL: https://issues.apache.org/jira/browse/SOLR-1406
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 1.4
Reporter: Luke Forehand
Assignee: Noble Paul
 Attachments: SOLR-1406.patch


 FileDataSource should make openStream method protected so we can extend 
 FileDataSource for other File types such as GZip, by controlling the 
 underlying InputStreamReader implementation being returned.
 FileListEntityProcessor needs to aggregate a list of files that were 
 processed and expose that list in an accessible way so that further 
 processing on that file list can be done in the close method.  For example, 
 deletion or archiving.
 Another improvement would be that in the event of an indexing rollback event, 
 processing of the close method either does not occur, or the close method is 
 allowed access to that event, to prevent processing within the close method 
 if necessary.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1408) Classes in ${solr.home}/lib are not able to extend classes loaded by solr war - ClassCastException

2009-09-04 Thread Luke Forehand (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Forehand updated SOLR-1408:


Description: 
When extending org.apache.solr.handler.dataimport.DataSource, I would like to 
package my extended class in ${solr.home}/lib to that I can keep the vanilla 
copy of my solr.war intact.  The problem is I encounter a ClassCastException 
when Solr tries to create a newInstance of my extended class.

Although the parent classloader of ${solr.home}/lib classloader loads 
DataSource, I am still getting a ClassCastException when a class in 
${solr.home}/lib extends DataSource.

The solr instance is being deployed to a jetty plus server that is running 
inside a unit test.

  was:When extending org.apache.solr.handler.dataimport.DataSource, I would 
like to package my extended class in ${solr.home}/lib to that I can keep the 
vanilla copy of my solr.war intact.  The problem is I encounter a 
ClassCastException when Solr tries to create a newInstance of my extended 
class, which I suspect has to do with the DataSource and my extended class 
being loaded from different classloaders.

 Issue Type: Bug  (was: Improvement)
Summary: Classes in ${solr.home}/lib are not able to extend classes 
loaded by solr war - ClassCastException  (was: Allow classes from 
${solr.home}/lib to be loaded by the same classloader as solr war to prevent 
ClassCastException)

 Classes in ${solr.home}/lib are not able to extend classes loaded by solr war 
 - ClassCastException
 --

 Key: SOLR-1408
 URL: https://issues.apache.org/jira/browse/SOLR-1408
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3
Reporter: Luke Forehand

 When extending org.apache.solr.handler.dataimport.DataSource, I would like to 
 package my extended class in ${solr.home}/lib to that I can keep the vanilla 
 copy of my solr.war intact.  The problem is I encounter a ClassCastException 
 when Solr tries to create a newInstance of my extended class.
 Although the parent classloader of ${solr.home}/lib classloader loads 
 DataSource, I am still getting a ClassCastException when a class in 
 ${solr.home}/lib extends DataSource.
 The solr instance is being deployed to a jetty plus server that is running 
 inside a unit test.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1408) Classes in ${solr.home}/lib are not able to extend classes loaded by solr war - ClassCastException

2009-09-04 Thread Avlesh Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751684#action_12751684
 ] 

Avlesh Singh commented on SOLR-1408:


bq. I am starting the jetty server from within a unit test for integration 
testing purposes. 
Does it fail in unit testing?

I too suspect that there is a problem. I too have similar extensions of DIH and 
UpdateProcessors, which lie in the lib. I have never faced any such issue on 
any of the platforms.


 Classes in ${solr.home}/lib are not able to extend classes loaded by solr war 
 - ClassCastException
 --

 Key: SOLR-1408
 URL: https://issues.apache.org/jira/browse/SOLR-1408
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3
Reporter: Luke Forehand

 When extending org.apache.solr.handler.dataimport.DataSource, I would like to 
 package my extended class in ${solr.home}/lib to that I can keep the vanilla 
 copy of my solr.war intact.  The problem is I encounter a ClassCastException 
 when Solr tries to create a newInstance of my extended class.
 Although the parent classloader of ${solr.home}/lib classloader loads 
 DataSource, I am still getting a ClassCastException when a class in 
 ${solr.home}/lib extends DataSource.
 The solr instance is being deployed to a jetty plus server that is running 
 inside a unit test.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1401) solr should error on document add/update if uniqueKey field has multiple tokens.

2009-09-04 Thread Igor Motov (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751692#action_12751692
 ] 

Igor Motov commented on SOLR-1401:
--

It might be helpful to expand this to other non-trivial analyzers as well. Even 
if an analyzer produces a single token, removal of duplicates and distributed 
search don't function properly for any ids that were modified by the analyzer. 
To see how it works, just change type of id field to tightText and add a record 
with id ID twice. The tightText analyzer produces a single token for this 
value, and yet the record appears twice in the result list. At the same time, 
in distributed search (even with a single shard), these records completely 
disappear from the result list.  

This problem combined with recommendation for using textTight for SKUs in the 
schema.xml causes problems for some novice users. Frequently, SKU is a natural 
id and changing type for id from string to textTight is one of the first 
schema modifications that some users do, and then it takes them days to figure 
out the problem:

http://www.nabble.com/uniqueKey-gives-duplicate-values-td15341288.html
http://www.nabble.com/Adding-new-docs%2C-but-duplicating-instead-of-updating-td25241444.html
http://www.nabble.com/Solr-Shard---Strange-results-td23561201.html
http://www.nabble.com/Shard-Query-Problem-td22110121.html


 solr should error on document add/update if uniqueKey field has multiple 
 tokens.
 

 Key: SOLR-1401
 URL: https://issues.apache.org/jira/browse/SOLR-1401
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man

 over the years, have seem more then a few solr-user posts noticing odd 
 behavior when using a uniqueKey field configured to use TextField with a non 
 trivial analyzer ... we shouldn't error on TextField (KeyworkdTokenizer is 
 perfectly legitimate) but we should error if that analyzer produces multiple 
 tokens.  
 Likewise we should verify that good error messages if uniqueKey field is 
 configured such that multivalued=true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Closed: (SOLR-1408) Classes in ${solr.home}/lib are not able to extend classes loaded by solr war - ClassCastException

2009-09-04 Thread Luke Forehand (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Forehand closed SOLR-1408.
---

Resolution: Invalid

This is not a bug.  The problem was that my extending classes were being 
compiled onto the testing classpath and they were also packaged into the jar 
within ${solr.home}/lib.  They were being loaded by junit before being loaded 
by solr and that was causing the ClassCastException.  When I removed the 
extending classes from the test classpath, everything worked.

 Classes in ${solr.home}/lib are not able to extend classes loaded by solr war 
 - ClassCastException
 --

 Key: SOLR-1408
 URL: https://issues.apache.org/jira/browse/SOLR-1408
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.3
Reporter: Luke Forehand

 When extending org.apache.solr.handler.dataimport.DataSource, I would like to 
 package my extended class in ${solr.home}/lib to that I can keep the vanilla 
 copy of my solr.war intact.  The problem is I encounter a ClassCastException 
 when Solr tries to create a newInstance of my extended class.
 Although the parent classloader of ${solr.home}/lib classloader loads 
 DataSource, I am still getting a ClassCastException when a class in 
 ${solr.home}/lib extends DataSource.
 The solr instance is being deployed to a jetty plus server that is running 
 inside a unit test.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Closed: (SOLR-1408) Classes in ${solr.home}/lib are not able to extend classes loaded by solr war - ClassCastException

2009-09-04 Thread Avlesh Singh
It is generally a good idea to cross check for a bug on the user/dev mailing
list and then create the issue, Luke.

Cheers
Avlesh

On Sat, Sep 5, 2009 at 9:37 AM, Luke Forehand (JIRA) j...@apache.orgwrote:


 [
 https://issues.apache.org/jira/browse/SOLR-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]

 Luke Forehand closed SOLR-1408.
 ---

Resolution: Invalid

 This is not a bug.  The problem was that my extending classes were being
 compiled onto the testing classpath and they were also packaged into the jar
 within ${solr.home}/lib.  They were being loaded by junit before being
 loaded by solr and that was causing the ClassCastException.  When I removed
 the extending classes from the test classpath, everything worked.

  Classes in ${solr.home}/lib are not able to extend classes loaded by solr
 war - ClassCastException
 
 --
 
  Key: SOLR-1408
  URL: https://issues.apache.org/jira/browse/SOLR-1408
  Project: Solr
   Issue Type: Bug
   Components: contrib - DataImportHandler
 Affects Versions: 1.3
 Reporter: Luke Forehand
 
  When extending org.apache.solr.handler.dataimport.DataSource, I would
 like to package my extended class in ${solr.home}/lib to that I can keep the
 vanilla copy of my solr.war intact.  The problem is I encounter a
 ClassCastException when Solr tries to create a newInstance of my extended
 class.
  Although the parent classloader of ${solr.home}/lib classloader loads
 DataSource, I am still getting a ClassCastException when a class in
 ${solr.home}/lib extends DataSource.
  The solr instance is being deployed to a jetty plus server that is
 running inside a unit test.

 --
 This message is automatically generated by JIRA.
 -
 You can reply to this email to add a comment to the issue online.