[jira] Updated: (SOLR-1336) Add support for lucene's SmartChineseAnalyzer

2009-08-08 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-1336:
--

Attachment: SOLR-1336.patch

add warning about large dictionaries, note that stopwords are being loaded from 
jar file, and add an international.xml with examples for several languages.

> Add support for lucene's SmartChineseAnalyzer
> -
>
> Key: SOLR-1336
> URL: https://issues.apache.org/jira/browse/SOLR-1336
> Project: Solr
>  Issue Type: New Feature
>  Components: Analysis
>Reporter: Robert Muir
> Attachments: SOLR-1336.patch, SOLR-1336.patch
>
>
> SmartChineseAnalyzer was contributed to lucene, it indexes simplified chinese 
> text as words.
> if the factories for the tokenizer and word token filter are added to solr it 
> can be used, although there should be a sample config or wiki entry showing 
> how to apply the built-in stopwords list.
> this is because it doesn't contain actual stopwords, but must be used to 
> prevent indexing punctuation... 
> note: we did some refactoring/cleanup on this analyzer recently, so it would 
> be much easier to do this after the next lucene update.
> it has also been moved out of -analyzers.jar due to size, and now builds in 
> its own smartcn jar file, so that would need to be added if this feature is 
> desired.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1275) Add expungeDeletes to DirectUpdateHandler2

2009-08-08 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-1275:
---

Fix Version/s: (was: 1.5)
   1.4

> Add expungeDeletes to DirectUpdateHandler2
> --
>
> Key: SOLR-1275
> URL: https://issues.apache.org/jira/browse/SOLR-1275
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.3
>Reporter: Jason Rutherglen
>Assignee: Noble Paul
>Priority: Trivial
> Fix For: 1.4
>
> Attachments: SOLR-1275.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> expungeDeletes is a useful method somewhat like optimize is offered by 
> IndexWriter that can be implemented in DirectUpdateHandler2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1275) Add expungeDeletes to DirectUpdateHandler2

2009-08-08 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740994#action_12740994
 ] 

Yonik Seeley commented on SOLR-1275:


Apologies - my brain went the wrong way when I saw "expungeDeletes" and I 
temporarily thought it meant throwing away the deletedDocs BitVector (i.e. 
undeleting all previously deleted documents).  So there are definitely use 
cases for this.  Seems simple enough, I'll move it back to 1.4


> Add expungeDeletes to DirectUpdateHandler2
> --
>
> Key: SOLR-1275
> URL: https://issues.apache.org/jira/browse/SOLR-1275
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.3
>Reporter: Jason Rutherglen
>Assignee: Noble Paul
>Priority: Trivial
> Fix For: 1.4
>
> Attachments: SOLR-1275.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> expungeDeletes is a useful method somewhat like optimize is offered by 
> IndexWriter that can be implemented in DirectUpdateHandler2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1275) Add expungeDeletes to DirectUpdateHandler2

2009-08-08 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740992#action_12740992
 ] 

Jason Rutherglen commented on SOLR-1275:


Given the simplicity of the functionality I don't see a reason to put this off. 
 expungeDeletes has been in INdexWriter since 2.4?  

> Add expungeDeletes to DirectUpdateHandler2
> --
>
> Key: SOLR-1275
> URL: https://issues.apache.org/jira/browse/SOLR-1275
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.3
>Reporter: Jason Rutherglen
>Assignee: Noble Paul
>Priority: Trivial
> Fix For: 1.5
>
> Attachments: SOLR-1275.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> expungeDeletes is a useful method somewhat like optimize is offered by 
> IndexWriter that can be implemented in DirectUpdateHandler2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: solr 1.4 schedule

2009-08-08 Thread Yonik Seeley
24 open issues left - and nothing too difficult left.  It's looking
like we should be to hit the goal of releasing this month - a week
(at most) after lucene!

-Yonik
http://www.lucidimagination.com


[jira] Commented: (SOLR-1336) Add support for lucene's SmartChineseAnalyzer

2009-08-08 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740988#action_12740988
 ] 

Robert Muir commented on SOLR-1336:
---

{quote}
Are the stopwords (words="org/apache/lucene/analysis/cn/stopwords.txt") being 
loaded directly from the jar? If so, a comment to that effect might prevent 
some confusion. 
{quote}

Yes, good idea.

{quote}
Do you happen to know what the memory footprint of this analyzer is if it's 
used? I assume the dictionaries will get loaded on the first use.
{quote}

No, I am not sure of the footprint, but it is probably quite large (a few MB). 
They will be loaded on first use, correct. Also, the smartcn jar file itself is 
large due to the dictionaries in question. So, you may have noticed solr.war is 
much smaller after the last lucene update, since it was factored out of 
analyzers.jar. 

{quote}
Might be cool to add a chinese field to example/exampledocs/solr.xml... or 
maybe there should be an international.xml doc where we could add a few 
different languages?
{quote}

I figured this wasn't the best place to have an example... i like the idea of 
international.xml, with some examples for other languages too.

If there is some concern about the size of this (monster) analyzer, one option 
is to put these factories/examples elsewhere, to keep the size of solr smaller. 


> Add support for lucene's SmartChineseAnalyzer
> -
>
> Key: SOLR-1336
> URL: https://issues.apache.org/jira/browse/SOLR-1336
> Project: Solr
>  Issue Type: New Feature
>  Components: Analysis
>Reporter: Robert Muir
> Attachments: SOLR-1336.patch
>
>
> SmartChineseAnalyzer was contributed to lucene, it indexes simplified chinese 
> text as words.
> if the factories for the tokenizer and word token filter are added to solr it 
> can be used, although there should be a sample config or wiki entry showing 
> how to apply the built-in stopwords list.
> this is because it doesn't contain actual stopwords, but must be used to 
> prevent indexing punctuation... 
> note: we did some refactoring/cleanup on this analyzer recently, so it would 
> be much easier to do this after the next lucene update.
> it has also been moved out of -analyzers.jar due to size, and now builds in 
> its own smartcn jar file, so that would need to be added if this feature is 
> desired.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-705) Distributed search should optionally return docID->shard map

2009-08-08 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740986#action_12740986
 ] 

Yonik Seeley commented on SOLR-705:
---

If we do go with "meta", I'm also not concerned with the hypothetical 
field-name collision... this is a one time thing, and hard-coding it to "meta" 
makes things simpler and more predictable.

> Distributed search should optionally return docID->shard map
> 
>
> Key: SOLR-705
> URL: https://issues.apache.org/jira/browse/SOLR-705
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.3
> Environment: all
>Reporter: Brian Whitman
>Assignee: Ryan McKinley
> Fix For: 1.4
>
> Attachments: SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, 
> SOLR-705.patch, SOLR-705.patch, SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard 
> mapping in the response. Without it, updating/deleting documents when the # 
> of shards is variable is hard. We currently set this with a special 
> requestHandler that filters /update and inserts the shard as a field in the 
> index but it would be better if the shard location came back in the query 
> response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-705) Distributed search should optionally return docID->shard map

2009-08-08 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740984#action_12740984
 ] 

Yonik Seeley commented on SOLR-705:
---

I go back and forth on the "meta" thing...

On one hand, if one is looking at the output, it makes perfect sense to have a 
separate meta section per document.
However, when one looks at it from a client API perspective (how one asks for 
the value of a particular metadata value) having two different ways to access 
values ("real" fields vs "meta" fields) doesn't seem desirable.

>From a client coding perspective, consistency is nice:
  sdoc.get("id")
  sdoc.get("_shard")

After all, many of the stored fields of a document are actually just metadata 
too.  So an alternative is simple convention... metadata fields start with an 
underscore, and no more work needs ot be done at the client side.

But I'm really not convinced either way ;-)

> Distributed search should optionally return docID->shard map
> 
>
> Key: SOLR-705
> URL: https://issues.apache.org/jira/browse/SOLR-705
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.3
> Environment: all
>Reporter: Brian Whitman
>Assignee: Ryan McKinley
> Fix For: 1.4
>
> Attachments: SOLR-705.patch, SOLR-705.patch, SOLR-705.patch, 
> SOLR-705.patch, SOLR-705.patch, SOLR-705.patch
>
>
> SOLR-303 queries with &shards parameters set need to return the dociD->shard 
> mapping in the response. Without it, updating/deleting documents when the # 
> of shards is variable is hard. We currently set this with a special 
> requestHandler that filters /update and inserts the shard as a field in the 
> index but it would be better if the shard location came back in the query 
> response outside of the index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1336) Add support for lucene's SmartChineseAnalyzer

2009-08-08 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740981#action_12740981
 ] 

Yonik Seeley commented on SOLR-1336:


Thanks Robert!
Are the stopwords (words="org/apache/lucene/analysis/cn/stopwords.txt") being 
loaded directly from the jar?  If so, a comment to that effect might prevent 
some confusion.

Do you happen to know what the memory footprint of this analyzer is if it's 
used?  I assume the dictionaries will get loaded on the first use.

Might be cool to add a chinese field to example/exampledocs/solr.xml... or 
maybe there should be an international.xml doc where we could add a few 
different languages?

> Add support for lucene's SmartChineseAnalyzer
> -
>
> Key: SOLR-1336
> URL: https://issues.apache.org/jira/browse/SOLR-1336
> Project: Solr
>  Issue Type: New Feature
>  Components: Analysis
>Reporter: Robert Muir
> Attachments: SOLR-1336.patch
>
>
> SmartChineseAnalyzer was contributed to lucene, it indexes simplified chinese 
> text as words.
> if the factories for the tokenizer and word token filter are added to solr it 
> can be used, although there should be a sample config or wiki entry showing 
> how to apply the built-in stopwords list.
> this is because it doesn't contain actual stopwords, but must be used to 
> prevent indexing punctuation... 
> note: we did some refactoring/cleanup on this analyzer recently, so it would 
> be much easier to do this after the next lucene update.
> it has also been moved out of -analyzers.jar due to size, and now builds in 
> its own smartcn jar file, so that would need to be added if this feature is 
> desired.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1259) scale() function doesn't work in multisegment indexes

2009-08-08 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-1259:
---

Fix Version/s: (was: 1.4)
   1.5

Committed, and moving remainder of the work (per-segment fieldcache usage, 
caching min+max) to 1.5

> scale() function doesn't work in multisegment indexes
> -
>
> Key: SOLR-1259
> URL: https://issues.apache.org/jira/browse/SOLR-1259
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.4
>Reporter: Hoss Man
> Fix For: 1.5
>
> Attachments: SOLR-1259.patch
>
>
> per yonik's comments in an email...
> bq. Darn... another SOLR- related issue.  scale() will now only scale 
> per-segment.
> ...we either need to fix, or document prior to releasing 1.4

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1259) scale() function doesn't work in multisegment indexes

2009-08-08 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-1259:
---

Attachment: SOLR-1259.patch

Attaching quick hack of a patch to handle the situation the same as ord()... 
via top() to pop back to the top level reader.  This isn't so bad since scale() 
was never really production quality anyway, since it doesn't cache the min and 
max -recomputing it each time.

> scale() function doesn't work in multisegment indexes
> -
>
> Key: SOLR-1259
> URL: https://issues.apache.org/jira/browse/SOLR-1259
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.4
>Reporter: Hoss Man
> Fix For: 1.4
>
> Attachments: SOLR-1259.patch
>
>
> per yonik's comments in an email...
> bq. Darn... another SOLR- related issue.  scale() will now only scale 
> per-segment.
> ...we either need to fix, or document prior to releasing 1.4

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1336) Add support for lucene's SmartChineseAnalyzer

2009-08-08 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-1336:
--

Attachment: SOLR-1336.patch

patch, needs lucene-smartcn-2.9-dev.jar added to lib to work (this analyzer is 
not in the -analyzers.jar anymore)


> Add support for lucene's SmartChineseAnalyzer
> -
>
> Key: SOLR-1336
> URL: https://issues.apache.org/jira/browse/SOLR-1336
> Project: Solr
>  Issue Type: New Feature
>  Components: Analysis
>Reporter: Robert Muir
> Attachments: SOLR-1336.patch
>
>
> SmartChineseAnalyzer was contributed to lucene, it indexes simplified chinese 
> text as words.
> if the factories for the tokenizer and word token filter are added to solr it 
> can be used, although there should be a sample config or wiki entry showing 
> how to apply the built-in stopwords list.
> this is because it doesn't contain actual stopwords, but must be used to 
> prevent indexing punctuation... 
> note: we did some refactoring/cleanup on this analyzer recently, so it would 
> be much easier to do this after the next lucene update.
> it has also been moved out of -analyzers.jar due to size, and now builds in 
> its own smartcn jar file, so that would need to be added if this feature is 
> desired.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1111) fix FieldCache usage in Solr

2009-08-08 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-:
---

Fix Version/s: (was: 1.4)
   1.5

Moving the rest of this to 1.5

> fix FieldCache usage in Solr
> 
>
> Key: SOLR-
> URL: https://issues.apache.org/jira/browse/SOLR-
> Project: Solr
>  Issue Type: Bug
>Reporter: Yonik Seeley
> Fix For: 1.5
>
> Attachments: SOLR--distrib.patch, SOLR-_sort.patch, 
> SOLR-_sort.patch, SOLR-_sort.patch, SOLR-_sort.patch
>
>
> Recent changes in Lucene have altered how the FieldCache is used and as-is 
> could lead to previously working Solr installations blowing up when they 
> upgrade to 1.4.  We need to fix, or document the affects of these changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1275) Add expungeDeletes to DirectUpdateHandler2

2009-08-08 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-1275:
---

Fix Version/s: (was: 1.4)
   1.5

Pushing this off to 1.5 given that it's very expert use, and it's not even 
clear what the use cases are.

> Add expungeDeletes to DirectUpdateHandler2
> --
>
> Key: SOLR-1275
> URL: https://issues.apache.org/jira/browse/SOLR-1275
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.3
>Reporter: Jason Rutherglen
>Assignee: Noble Paul
>Priority: Trivial
> Fix For: 1.5
>
> Attachments: SOLR-1275.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> expungeDeletes is a useful method somewhat like optimize is offered by 
> IndexWriter that can be implemented in DirectUpdateHandler2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1309) Exception thrown by debugging component after phonetic filter parses numeric query, BUG?

2009-08-08 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-1309.


Resolution: Cannot Reproduce

> Exception thrown by debugging component after phonetic filter parses numeric 
> query, BUG?
> 
>
> Key: SOLR-1309
> URL: https://issues.apache.org/jira/browse/SOLR-1309
> Project: Solr
>  Issue Type: Bug
>  Components: Analysis
>Affects Versions: 1.3
> Environment: Redhat 5.3 Enterprise x64.   (latest)  running Ludid 
> Imagination 1.30 release 
>Reporter: Robert Petersen
>Priority: Minor
> Fix For: 1.4
>
>
> It certainly looks like a bug - definitely in QueryParsing.toString() and 
> perhaps with the phonetic filter for producing a zero length term?  Please do 
> open a bug and target to v1.4
> -Yonik
> Here is my bug description:
> Exception thrown by debugging component when query hits phonetic filter 
> factory with a numeric term no matter what kind of phonetic filter is 
> selected.  I am reposting with this new subject line thinking this is a 
> potential issue which possibly needs addressing in future releases and should 
> be submitted as a BUG?It must be getting an empty field object from the 
> phonetic filter factory for numeric terms or something similar.
> Jul 23, 2009 2:58:17 PM org.apache.solr.core.SolrCore execute
> INFO: [10017] webapp=/solr path=/select/ 
> params={debugQuery=true&rows=10&start=0&q=allDoublemetaphone:"2343")^0.5)))}
>  hits=6873 status=500 QTime=3 
> Jul 23, 2009 2:58:17 PM org.apache.solr.common.SolrException log
> SEVERE: java.lang.RuntimeException: java.lang.IllegalArgumentException: name 
> and value cannot both be empty
>   at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:470)
>   at 
> org.apache.solr.util.SolrPluginUtils.doStandardDebug(SolrPluginUtils.java:399)
>   at 
> org.apache.solr.handler.component.DebugComponent.process(DebugComponent.java:54)
>   at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:177)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1205)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
>   at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>   at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>   at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>   at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>   at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>   at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>   at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>   at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
>   at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
>   at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>   at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
>   at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.IllegalArgumentException: name and value cannot both be 
> empty
>   at org.apache.lucene.document.Field.(Field.java:277)
>   at org.apache.lucene.document.Field.(Field.java:251)
>   at 
> org.apache.solr.search.QueryParsing.writeFieldVal(QueryParsing.java:307)
>   at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:320)
>   at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:467)
>   ... 19 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1091) "phps" (serialized PHP) writer produces invalid output

2009-08-08 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740971#action_12740971
 ] 

Yonik Seeley commented on SOLR-1091:


Does anyone know if there is actually a bug here, and if so, how it should be 
fixed?

> "phps" (serialized PHP) writer produces invalid output
> --
>
> Key: SOLR-1091
> URL: https://issues.apache.org/jira/browse/SOLR-1091
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.3
> Environment: Sun JRE 1.6.0 on Centos 5
>Reporter: frank farmer
>Priority: Minor
> Fix For: 1.4
>
>
> The serialized PHP output writer can outputs invalid string lengths for 
> certain (unusual) input values.  Specifically, I had a document containing 
> the following 6 byte character sequence: \xED\xAF\x80\xED\xB1\xB8
> I was able to create a document in the index containing this value without 
> issue; however, when fetching the document back out using the serialized PHP 
> writer, it returns a string like the following:
> s:4:"􀁸";
> Note that the string length specified is 4, while the string is actually 6 
> bytes long.
> When using PHP's native serialize() function, it correctly sets the length to 
> 6:
> # php -r 'var_dump(serialize("\xED\xAF\x80\xED\xB1\xB8"));'
> string(13) "s:6:"􀁸";"
> The "wt=php" writer, which produces output to be parsed with eval(), doesn't 
> have any trouble with this string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1350) Change LOCALSOLR to not limit results but provide distance from center pt and soring by distance.

2009-08-08 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-1350:
---

Affects Version/s: (was: 1.4)
Fix Version/s: (was: 1.4)

> Change LOCALSOLR to not limit results but provide distance from center pt and 
> soring by distance.
> -
>
> Key: SOLR-1350
> URL: https://issues.apache.org/jira/browse/SOLR-1350
> Project: Solr
>  Issue Type: Improvement
>  Components: search
> Environment: All
>Reporter: Bill Bell
>
> Here is the improvement to LOCALSOLR. 
> Allow radius=-1 to indicate to not limit results by radius from center point. 
> Provide Standard query with 2 enhancements:
> - Allow sort by geo_distance
> - Provide lat/long for center point (origin) and results have distance from 
> this point
> This would allow results to be relevant to the query, and provide user to 
> sort by distance and display the distance. This is a simple fix the the 
> LOCALSOLR module.
> Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1339) DisMax parser exception with Trie* fields

2009-08-08 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-1339.


Resolution: Fixed

committed.

> DisMax parser exception with Trie* fields
> -
>
> Key: SOLR-1339
> URL: https://issues.apache.org/jira/browse/SOLR-1339
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.4
>Reporter: Yonik Seeley
> Fix For: 1.4
>
>
> Trie* fields throw exceptions on invalid input - causes DisMax parser to fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1014) SolrExampleStreamingTest failures

2009-08-08 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-1014:
---

Fix Version/s: (was: 1.4)
   1.5

Moving to 1.5 - there may be some kind of startup concurrency bug or race 
condition here, but it doesn't look critical (or anything that would affect one 
once up and running)

> SolrExampleStreamingTest  failures
> --
>
> Key: SOLR-1014
> URL: https://issues.apache.org/jira/browse/SOLR-1014
> Project: Solr
>  Issue Type: Bug
>Reporter: Yonik Seeley
> Fix For: 1.5
>
>
> SolrExampleStreamingTest  intermittently fails.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1288) better Trie* integration

2009-08-08 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-1288:
---

Fix Version/s: (was: 1.4)
   1.5

Everything easily doable or critical has been done - moving the rest of these 
tasks out to 1.5

> better Trie* integration
> 
>
> Key: SOLR-1288
> URL: https://issues.apache.org/jira/browse/SOLR-1288
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.4
>Reporter: Yonik Seeley
> Fix For: 1.5
>
>
> Improve support for the Trie* fields up to the level of Solr's existing 
> numeric types.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1071) spellcheck.extendedResults returns an invalid JSON response when count > 1

2009-08-08 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740965#action_12740965
 ] 

Yonik Seeley commented on SOLR-1071:


bq. Why isn't this a JSONResponseWriter bug?

JSON does not require unique keys.  It's certainly a good idea though.

Donovan had it right:
bq. This seems, at least to me, like a more general issue of convention and 
guidelines in the search components.

It's a bad idea to repeat keys... this isn't XML.  When access-by-key is more 
important, use SimpleOrderedMap... when position must be maintained, use 
NamedList.  In both cases, we should strive not to repeat keys.

Here is what the current output is for non-extendedResults:
{code}
 
  
2
0
4

 dell
 all

  
 
{code}

The logical extension for extended results would be to simply replace each 
string in the array with a map.

{code}
  
2
0
4
0

   
 1
 dell
   
   
 1
 all
   

  
{code}

If extended results only ever added frequency, we could further simplify to 
have the suggest be the key and the freq be the val, but I don't know that we 
wouldn't want to add more metadata in the future.







> spellcheck.extendedResults returns an invalid JSON response when count > 1
> --
>
> Key: SOLR-1071
> URL: https://issues.apache.org/jira/browse/SOLR-1071
> Project: Solr
>  Issue Type: Bug
>  Components: spellchecker
>Affects Versions: 1.3
>Reporter: Uri Boness
>Assignee: Grant Ingersoll
> Fix For: 1.4
>
> Attachments: SpellCheckComponent_fix.patch, 
> SpellCheckComponent_new_structure.patch, 
> SpellCheckComponent_new_structure_incl_test.patch
>
>
> When: wt=json & spellcheck.extendedResults=true & spellcheck.count > 1, the 
> suggestions are returned in the following format:
> "suggestions":[
>   "amsterdm",{
>"numFound":5,
>"startOffset":0,
>"endOffset":8,
>"origFreq":0,
>"suggestion":{
> "frequency":8498,
> "word":"amsterdam"},
>"suggestion":{
> "frequency":1,
> "word":"amsterd"},
>"suggestion":{
> "frequency":8,
> "word":"amsterdams"},
>"suggestion":{
> "frequency":1,
> "word":"amstedam"},
>"suggestion":{
> "frequency":22,
> "word":"amsterdamse"}},
>   "beak",{
>"numFound":5,
>"startOffset":9,
>"endOffset":13,
>"origFreq":0,
>"suggestion":{
> "frequency":379,
> "word":"beek"},
>"suggestion":{
> "frequency":26,
> "word":"beau"},
>"suggestion":{
> "frequency":26,
> "word":"baak"},
>"suggestion":{
> "frequency":15,
> "word":"teak"},
>"suggestion":{
> "frequency":11,
> "word":"beuk"}},
>   "correctlySpelled",false,
>   "collation","amsterdam beek"]}}
> This is an invalid json as each term is associated with a JSON object which 
> holds multiple "suggestion" attributes. When working with a JSON library only 
> the last "suggestion" attribute is picked up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1142) faster example schema

2009-08-08 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-1142.


Resolution: Fixed

committed (but still open to review of course... I'm just pushing a little 
faster to get 1.4 out the door).

The other changes that I'd like to make to the example schema / solrconfig 
involve good OOTB indexing with the new extracting request handler (including 
good metadata -> field mappings).

> faster example schema
> -
>
> Key: SOLR-1142
> URL: https://issues.apache.org/jira/browse/SOLR-1142
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Fix For: 1.4
>
> Attachments: SOLR-1142.patch, SOLR-1142.patch
>
>
> need faster example schema:
> http://www.lucidimagination.com/search/document/d46ea3fa441b6d94

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1142) faster example schema

2009-08-08 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-1142:
---

Attachment: SOLR-1142.patch

updated version that runs spellchecking directly off of the name field - this 
still produces decent results and eliminates the extra copyField.

> faster example schema
> -
>
> Key: SOLR-1142
> URL: https://issues.apache.org/jira/browse/SOLR-1142
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Fix For: 1.4
>
> Attachments: SOLR-1142.patch, SOLR-1142.patch
>
>
> need faster example schema:
> http://www.lucidimagination.com/search/document/d46ea3fa441b6d94

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1142) faster example schema

2009-08-08 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-1142:
---

Attachment: SOLR-1142.patch

Attaching patch...
 - changes example spelling field from "name" to "spell" and removes copyField 
from "name" since that's likely a field people will reuse, changes tests to use 
"spell" rather than name
- eliminates the copyField from id to sku since many will reuse the id field
- removes default values
- reformats really long comment lines
- comments out some other random copyField commands
- other little misc cleanups

So - some things like termvectors are kept for easy testing and demonstration 
purposes, but they are *not* on fields likely to be reused.  The biggest 
remaining cost is copyField of the various fields into the catchall "text" 
field... but I don't think we should get rid of that for the example.

> faster example schema
> -
>
> Key: SOLR-1142
> URL: https://issues.apache.org/jira/browse/SOLR-1142
> Project: Solr
>  Issue Type: Improvement
>Reporter: Yonik Seeley
> Fix For: 1.4
>
> Attachments: SOLR-1142.patch
>
>
> need faster example schema:
> http://www.lucidimagination.com/search/document/d46ea3fa441b6d94

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1067) QueryParsing.parseFunction uses Singleton Core (SolrCore.getSolrCore())

2009-08-08 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-1067:
---

Fix Version/s: (was: 1.4)
   1.5

I think we've done everything reasonable for 1.4, moving the rest to 1.5

> QueryParsing.parseFunction uses Singleton Core (SolrCore.getSolrCore())
> ---
>
> Key: SOLR-1067
> URL: https://issues.apache.org/jira/browse/SOLR-1067
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.3
>Reporter: Hoss Man
> Fix For: 1.5
>
> Attachments: SOLR-1067.patch
>
>
> QueryParsing.parseFunction is a static utility method that depends on the 
> SolrCore.getSolrCore singleton -- but it is not yet deprecated and is used in 
> some rather important places in the code base (the result is that the last 
> core initialized
> it was noted a while back, with some comments about how to tackle the 
> problem, but it looks like we never opened an issue to deal with it...
> http://www.nabble.com/QueryParsing-using-SolrCore.getSolrCore()-td19806087.html
> ...we should deal with this in some way prior to the 1.4 release (if nothing 
> else, we need to document it as a caveat).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1067) QueryParsing.parseFunction uses Singleton Core (SolrCore.getSolrCore())

2009-08-08 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-1067:
---

Attachment: SOLR-1067.patch

Here's a patch that incrementally improves the situation.
I think the only place a QParser isn't used now is deleteByQuery.


> QueryParsing.parseFunction uses Singleton Core (SolrCore.getSolrCore())
> ---
>
> Key: SOLR-1067
> URL: https://issues.apache.org/jira/browse/SOLR-1067
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.3
>Reporter: Hoss Man
> Fix For: 1.4
>
> Attachments: SOLR-1067.patch
>
>
> QueryParsing.parseFunction is a static utility method that depends on the 
> SolrCore.getSolrCore singleton -- but it is not yet deprecated and is used in 
> some rather important places in the code base (the result is that the last 
> core initialized
> it was noted a while back, with some comments about how to tackle the 
> problem, but it looks like we never opened an issue to deal with it...
> http://www.nabble.com/QueryParsing-using-SolrCore.getSolrCore()-td19806087.html
> ...we should deal with this in some way prior to the 1.4 release (if nothing 
> else, we need to document it as a caveat).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1351) facet on same field different ways

2009-08-08 Thread Yonik Seeley (JIRA)
facet on same field different ways
--

 Key: SOLR-1351
 URL: https://issues.apache.org/jira/browse/SOLR-1351
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley
 Fix For: 1.5


There is a general need to facet on the same field in different ways (different 
prefixes, different filters).  We need a way to express this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1335) load core properties from a properties file

2009-08-08 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740878#action_12740878
 ] 

Noble Paul commented on SOLR-1335:
--

hi everyone, If there are no objections I plan to commit this as soon as I 
write a testcase. Please comment, because this is a very visible change

> load core properties from a properties file
> ---
>
> Key: SOLR-1335
> URL: https://issues.apache.org/jira/browse/SOLR-1335
> Project: Solr
>  Issue Type: New Feature
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 1.4
>
> Attachments: SOLR-1335.patch
>
>
> There are  few ways of loading properties in runtime,
> # using env property using in the command line
> # if you use a multicore drop it in the solr.xml
> if not , the only way is to  keep separate solrconfig.xml for each instance.  
> #1 is error prone if the user fails to start with the correct system 
> property. 
> In our case we have four different configurations for the same deployment  . 
> And we have to disable replication of solrconfig.xml. 
> It would be nice if I can distribute four properties file so that our ops can 
> drop  the right one and start Solr. Or it is possible for the operations to 
> edit a properties file  but it is risky to edit solrconfig.xml if he does not 
> understand solr
> I propose a properties file in the instancedir as solrcore.properties . If 
> present would be loaded and added as core specific properties.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-1348) JdbcDataSource does not import Blob values correctly by default

2009-08-08 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul reassigned SOLR-1348:


Assignee: Noble Paul

> JdbcDataSource does not import Blob values correctly by default
> ---
>
> Key: SOLR-1348
> URL: https://issues.apache.org/jira/browse/SOLR-1348
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4
>Reporter: Jay Clelland
>Assignee: Noble Paul
>Priority: Minor
>
> When blob values are returned through a java ResultSet Object they have the 
> type byte[]. 
> As byte[] doesn't have a useful toString method we end up with a reference 
> type value added to the solr document (i.e. [...@1f23c5). 
> The problem is easy to remedy by adding the attribute 'convertType="true"' to 
> the dataSource tag within data-config.xml.
> However this attribute does not appear to be documented anywhere and I was 
> only able to find it after a few hours digging through the source code. 
> A simple fix for this would be to change the default value of convertType to 
> true within the JdbcDataSource class. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1348) JdbcDataSource does not import Blob values correctly by default

2009-08-08 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740877#action_12740877
 ] 

Noble Paul commented on SOLR-1348:
--

bq.A simple fix for this would be to change the default value of convertType to 
true within the JdbcDataSource class

this was kept false deliberately. Users should not get any kind of nasty 
surprise when they write a Transformer and moreover it can break back compat . 
I guess it should be documented properly. 

How about a Transformer?



> JdbcDataSource does not import Blob values correctly by default
> ---
>
> Key: SOLR-1348
> URL: https://issues.apache.org/jira/browse/SOLR-1348
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4
>Reporter: Jay Clelland
>Priority: Minor
>
> When blob values are returned through a java ResultSet Object they have the 
> type byte[]. 
> As byte[] doesn't have a useful toString method we end up with a reference 
> type value added to the solr document (i.e. [...@1f23c5). 
> The problem is easy to remedy by adding the attribute 'convertType="true"' to 
> the dataSource tag within data-config.xml.
> However this attribute does not appear to be documented anywhere and I was 
> only able to find it after a few hours digging through the source code. 
> A simple fix for this would be to change the default value of convertType to 
> true within the JdbcDataSource class. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.