[jira] Commented: (SOLR-534) Return all query results with parameter rows=-1

2010-02-10 Thread Walter Underwood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832351#action_12832351
 ] 

Walter Underwood commented on SOLR-534:
---

-1

This adds a denial of service vulnerability to Solr. One query can use lots of 
CPU or memory, or even crash the server.

This could also take out an entire distributed system.

If this is added, we MUST add a config option to disable it.

Let's take this back to the mailing list and find out why they believe all 
results are needed.There must be a better way to solve this.

> Return all query results with parameter rows=-1
> ---
>
> Key: SOLR-534
> URL: https://issues.apache.org/jira/browse/SOLR-534
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
> Environment: Tomcat 5.5
>Reporter: Lars Kotthoff
>Priority: Minor
> Attachments: solr-all-results.patch
>
>
> The searcher should return all results matching a query when the parameter 
> rows=-1 is given.
> I know that it is a bad idea to do this in general, but as it explicitly 
> requires a special parameter, people using this feature will be aware of what 
> they are doing. The main use case for this feature is probably debugging, but 
> in some cases one might actually need to retrieve all results because they 
> e.g. are to be merged with results from different sources.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1216) disambiguate the replication command names

2009-06-15 Thread Walter Underwood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719625#action_12719625
 ] 

Walter Underwood commented on SOLR-1216:


If we choose a name for the thing we are pulling, like "image", then we can use 
"makeimage", "pullimage", etc.


> disambiguate the replication command names
> --
>
> Key: SOLR-1216
> URL: https://issues.apache.org/jira/browse/SOLR-1216
> Project: Solr
>  Issue Type: Improvement
>  Components: replication (java)
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 1.4
>
> Attachments: SOLR-1216.patch
>
>
> There is a lot of confusion in the naming of various commands such as 
> snappull, snapshot etc. This is a vestige of the script based replication we 
> currently have. The commands can be renamed to make more sense
> * 'snappull' to be renamed to 'sync'
> * 'snapshot' to be renamed to 'backup'
> thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1216) disambiguate the replication command names

2009-06-15 Thread Walter Underwood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12719609#action_12719609
 ] 

Walter Underwood commented on SOLR-1216:


"sync" is a weak name, because it doesn't say whether it is a push or pull 
synchronization.


> disambiguate the replication command names
> --
>
> Key: SOLR-1216
> URL: https://issues.apache.org/jira/browse/SOLR-1216
> Project: Solr
>  Issue Type: Improvement
>  Components: replication (java)
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 1.4
>
> Attachments: SOLR-1216.patch
>
>
> There is a lot of confusion in the naming of various commands such as 
> snappull, snapshot etc. This is a vestige of the script based replication we 
> currently have. The commands can be renamed to make more sense
> * 'snappull' to be renamed to 'sync'
> * 'snapshot' to be renamed to 'backup'
> thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1073) StrField should allow locale sensitive sorting

2009-04-28 Thread Walter Underwood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703893#action_12703893
 ] 

Walter Underwood commented on SOLR-1073:


Using the locale of the JVM is very, very bad for a multilingual server. Solr 
should always use the same, simple locale. It is OK to set a Locale in 
configuration for single-language installations, but using the JVM locale is a 
recipe for disaster. You move Solr to a different server and everything breaks. 
Very, very bad.  

In a multi-lingual config, locales must be set per-request.

Ideally, requests should send an ISO language code as context for the query.




> StrField should allow locale sensitive sorting
> --
>
> Key: SOLR-1073
> URL: https://issues.apache.org/jira/browse/SOLR-1073
> Project: Solr
>  Issue Type: Improvement
> Environment: All
>Reporter: Sachin
> Attachments: LocaleStrField.java
>
>
> Currently, StrField does not take a parameter which it can pass to ctor of 
> SortField making the StrField's sorting rely on the locale of the JVM.  
> Ideally, StrField should allow setting the locale in the schema.xml and use 
> it to create a new instance of the SortField in getSortField() method, 
> something like:
> snip:
>   public SortField getSortField(SchemaField field,boolean reverse)
>   {
> ...
>   Locale locale = new Locale(lang,country);
>   return new SortField(field.getName(), locale, reverse);
>  }
> More details about this issue here:
> http://www.nabble.com/CJKAnalyzer-and-Chinese-Text-sort-td22374195.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

2009-03-03 Thread Walter Underwood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678601#action_12678601
 ] 

Walter Underwood commented on SOLR-1044:


During the Oscars, the HTTP cache in front of our Solr farm had a 90% hit rate. 
I think a 10X reduction in server load is a testimony to the superiority of the 
HTTP approach.


> Use Hadoop RPC for inter Solr communication
> ---
>
> Key: SOLR-1044
> URL: https://issues.apache.org/jira/browse/SOLR-1044
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Noble Paul
>
> Solr uses http for distributed search . We can make it a whole lot faster if 
> we use an RPC mechanism which is more lightweight/efficient. 
> Hadoop RPC looks like a good candidate for this.  
> The implementation should just have one protocol. It should follow the Solr's 
> idiom of making remote calls . A uri + params +[optional stream(s)] . The 
> response can be a stream of bytes.
> To make this work we must make the SolrServer implementation pluggable in 
> distributed search. Users should be able to choose between the current 
> CommonshttpSolrServer, or a HadoopRpcSolrServer . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-822) CharFilter - normalize characters before tokenizer

2008-10-23 Thread Walter Underwood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642188#action_12642188
 ] 

Walter Underwood commented on SOLR-822:
---

Yes, it should be in Lucene. LIke this: 
http://webui.sourcelabs.com/lucene/issues/1343

There are (at least) four kinds of character mapping:

Unicode normalization from decomposed to composed forms (always safe).

Unicode normalization from compatability forms to standard forms (may change 
the look, like fullwidth to halfwidth Latin).

Language-specific normalization, like "oe" to "ö" (German-only).

Mappings that improve search but are linguistically dodgy, like stripping 
accents and mapping katakana to hirigana.

wunder


> CharFilter - normalize characters before tokenizer
> --
>
> Key: SOLR-822
> URL: https://issues.apache.org/jira/browse/SOLR-822
> Project: Solr
>  Issue Type: New Feature
>  Components: Analysis
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: character-normalization.JPG, sample_mapping_ja.txt, 
> SOLR-822.patch, SOLR-822.patch
>
>
> A new plugin which can be placed in front of .
> {code:xml}
>  positionIncrementGap="100" >
>   
>  mapping="mapping_ja.txt" />
> 
>  words="stopwords.txt"/>
> 
>   
> 
> {code}
>  can be multiple (chained). I'll post a JPEG file to show 
> character normalization sample soon.
> MOTIVATION:
> In Japan, there are two types of tokenizers -- N-gram (CJKTokenizer) and 
> Morphological Analyzer.
> When we use morphological analyzer, because the analyzer uses Japanese 
> dictionary to detect terms,
> we need to normalize characters before tokenization.
> I'll post a patch soon, too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-815) Add new Japanese half-width/full-width normalizaton Filter and Factory

2008-10-20 Thread Walter Underwood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641071#action_12641071
 ] 

Walter Underwood commented on SOLR-815:
---

I looked it up, and even found a reason to do it the right way.

Latin should be normalized to halfwidth (in the Latin-1 character space).

Kana should be normalized to fullwidth.

Normalizing Latin characters to fullwidth would mean you could not use the 
existing accent-stripping filters or probably any other filter that expected 
Latin-1, like synonyms. Normalizing to halfwidth makes the rest of Solr and 
Lucene work as expected.

See section 12.5: http://www.unicode.org/versions/Unicode5.0.0/ch12.pdf

The compatability forms (the ones we normalize away from) are int the Unicode 
range U+FF00 to U+FFEF.
The correct mappings from those forms are in this doc: 
http://www.unicode.org/charts/PDF/UFF00.pdf

Other charts are here: http://www.unicode.org/charts/


> Add new Japanese half-width/full-width normalizaton Filter and Factory
> --
>
> Key: SOLR-815
> URL: https://issues.apache.org/jira/browse/SOLR-815
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
>Reporter: Todd Feak
>Assignee: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-815.patch
>
>
> Japanese Katakana and  Latin alphabet characters exist as both a "half-width" 
> and "full-width" version. This new Filter normalizes to the full-width 
> version to allow searching and indexing using both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-815) Add new Japanese half-width/full-width normalizaton Filter and Factory

2008-10-17 Thread Walter Underwood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640609#action_12640609
 ] 

Walter Underwood commented on SOLR-815:
---

If I remember correctly, Latin characters should normalize to half-width, not 
full-width.


> Add new Japanese half-width/full-width normalizaton Filter and Factory
> --
>
> Key: SOLR-815
> URL: https://issues.apache.org/jira/browse/SOLR-815
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
>Reporter: Todd Feak
>Priority: Minor
> Attachments: SOLR-815.patch
>
>
> Japanese Katakana and  Latin alphabet characters exist as both a "half-width" 
> and "full-width" version. This new Filter normalizes to the full-width 
> version to allow searching and indexing using both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-814) Add new Japanese Hiragana Filter and Factory

2008-10-17 Thread Walter Underwood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640605#action_12640605
 ] 

Walter Underwood commented on SOLR-814:
---

This seems like a bad idea. Hirigana and katakana are used quite differently in 
Japanese. They are not interchangeable.

I was the engineer for Japanese support in Ultraseek for years and even visited 
our distributor there, but no one ever asked for this feature. They asked for a 
lot of things, but never this.

It is very useful, maybe essential, to normalize full-width and half-width 
versions of hirigana, katakana, and ASCII.


> Add new Japanese Hiragana Filter and Factory
> 
>
> Key: SOLR-814
> URL: https://issues.apache.org/jira/browse/SOLR-814
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
>Reporter: Todd Feak
>Priority: Minor
> Attachments: SOLR-814.patch
>
>
> Japanese Hiragana and Katakana character sets can be easily translated 
> between. This filter normalizes all Hiragana characters to their Katakana 
> counterpart, allowing for indexing and searching using either.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-777) backword match search, for domain search etc.

2008-09-18 Thread Walter Underwood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632489#action_12632489
 ] 

Walter Underwood commented on SOLR-777:
---

You don't need backwards matching for this, and it doesn't really do the right 
thing.

Split the string on ".", reverse the list, and join successive sublists with 
".". Don't index the length one list, since that is ".com", ".net", etc. Do the 
same processing at query time.

This is a special analyzer.



> backword match search, for domain search etc.
> -
>
> Key: SOLR-777
> URL: https://issues.apache.org/jira/browse/SOLR-777
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
>Reporter: Koji Sekiguchi
>Priority: Minor
>
> There is a requirement for searching domains with backward match. For 
> example, using "apache.org" for a query string, www.apache.org, 
> lucene.apache.org could be returned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-600) XML parser stops working under heavy load

2008-06-17 Thread Walter Underwood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605751#action_12605751
 ] 

Walter Underwood commented on SOLR-600:
---

It could also be a concurrency bug in Solr that shows up on the IBM JVM because 
the thread scheduler makes different decisions. 

> XML parser stops working under heavy load
> -
>
> Key: SOLR-600
> URL: https://issues.apache.org/jira/browse/SOLR-600
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 1.3
> Environment: Linux 2.6.19.7-ss0 #4 SMP Wed Mar 12 02:56:42 GMT 2008 
> x86_64 Intel(R) Xeon(R) CPU X5450 @ 3.00GHz GenuineIntel GNU/Linux
> Tomcat 6.0.16
> SOLR nightly 16 Jun 2008, and versions prior
> JRE 1.6.0
>Reporter: John Smith
>
> Under heavy load, the following is spat out for every update:
> org.apache.solr.common.SolrException log
> SEVERE: java.lang.NullPointerException
> at java.util.AbstractList$SimpleListIterator.hasNext(Unknown Source)
> at 
> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:225)
> at 
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:66)
> at 
> org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:196)
> at 
> org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:123)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:965)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
> at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
> at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
> at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
> at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
> at java.lang.Thread.run(Thread.java:735)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches

2008-02-08 Thread Walter Underwood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12567068#action_12567068
 ] 

Walter Underwood commented on SOLR-127:
---

Two reasons to do HTTP caching for Solr: First, Solr is HTTP and needs to 
implement that correctly. Second, caches are much harder to implement and test 
than the cache information in HTTP. HTTP caches already exist and are well 
tested, so the implementation cost is zero and deployment is very easy.

The HTTP spec already covers which responses should be cached.  A 400 response 
may only be cached if it includes explicit cache control headers which allow 
that. See RFC 2616.

We are using a caching load balancer and caching in Apache front ends to 
Tomcat. We see an increase of more than 2X in the capacity of our search farm.

I would recommend against Solr-specific cache information in the XML part of 
the responses. Distributed caching is extremely difficult to get right. Around 
25% of the HTTP 1.1 spec is devoted to caching and there are still grey areas.

> Make Solr more friendly to external HTTP caches
> ---
>
> Key: SOLR-127
> URL: https://issues.apache.org/jira/browse/SOLR-127
> Project: Solr
>  Issue Type: Wish
>Reporter: Hoss Man
>Assignee: Hoss Man
> Fix For: 1.3
>
> Attachments: CacheUnitTest.patch, CacheUnitTest.patch, 
> HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
> HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
> HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
> HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
> HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
> HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch
>
>
> an offhand comment I saw recently reminded me of something that really bugged 
> me about the serach solution i used *before* Solr -- it didn't play nicely 
> with HTTP caches that might be sitting in front of it.
> at the moment, Solr doesn't put in particularly usefull info in the HTTP 
> Response headers to aid in caching (ie: Last-Modified), responds to all HEAD 
> requests with a 400, and doesn't do anything special with If-Modified-Since.
> t the very least, we can set a Last-Modified based on when the current 
> IndexReder was open (if not the Date on the IndexReader) and use the same 
> info to determing how to respond to If-Modified-Since requests.
> (for the record, i think the reason this hasn't occured to me in the 2+ years 
> i've been using Solr, is because with the internal caching, i've yet to need 
> to put a proxy cache in front of Solr)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-127) Make Solr more friendly to external HTTP caches

2007-09-14 Thread Walter Underwood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527694
 ] 

Walter Underwood commented on SOLR-127:
---

Last-modified does require monotonic time, but ETags are version stamps without 
any ordering. The indexVersion should be fine for an ETag.

> Make Solr more friendly to external HTTP caches
> ---
>
> Key: SOLR-127
> URL: https://issues.apache.org/jira/browse/SOLR-127
> Project: Solr
>  Issue Type: Wish
>Reporter: Hoss Man
> Attachments: HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch, 
> HTTPCaching.patch, HTTPCaching.patch, HTTPCaching.patch
>
>
> an offhand comment I saw recently reminded me of something that really bugged 
> me about the serach solution i used *before* Solr -- it didn't play nicely 
> with HTTP caches that might be sitting in front of it.
> at the moment, Solr doesn't put in particularly usefull info in the HTTP 
> Response headers to aid in caching (ie: Last-Modified), responds to all HEAD 
> requests with a 400, and doesn't do anything special with If-Modified-Since.
> t the very least, we can set a Last-Modified based on when the current 
> IndexReder was open (if not the Date on the IndexReader) and use the same 
> info to determing how to respond to If-Modified-Since requests.
> (for the record, i think the reason this hasn't occured to me in the 2+ years 
> i've been using Solr, is because with the internal caching, i've yet to need 
> to put a proxy cache in front of Solr)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-277) Character Entity of XHTML is not supported with XmlUpdateRequestHandler .

2007-06-26 Thread Walter Underwood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508408
 ] 

Walter Underwood commented on SOLR-277:
---

This is not a bug. Solr accepts XML, not XHTML. It does not accept XHTML-only 
entities. 

The Solr update XML format is a specific Solr XML format, not XML, not DocBook, 
not
anything else.

To index XHTML, parse it and convert it to Solr XML update format.


> Character Entity of XHTML is not supported with XmlUpdateRequestHandler .
> -
>
> Key: SOLR-277
> URL: https://issues.apache.org/jira/browse/SOLR-277
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.3
>Reporter: Toru Matsuzawa
> Attachments: XmlUpdateRequestHandler.patch
>
>
> Character Entity of XHTML is not supported with XmlUpdateRequestHandler .
> http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
> http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent
> http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent
> It is necessary to correspond with XmlUpdateRequestHandler because xpp3 
> cannot use .
> I think it is necessary until StaxUpdateRequestHandler becomes "/update".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-216) Improvements to solr.py

2007-05-29 Thread Walter Underwood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499923
 ] 

Walter Underwood commented on SOLR-216:
---

GET is the right semantic for a query, since it doesn't change the resource. It 
also allows HTTP caching.

If Solr has URL length limits, that's a bug.


> Improvements to solr.py
> ---
>
> Key: SOLR-216
> URL: https://issues.apache.org/jira/browse/SOLR-216
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - python
>Affects Versions: 1.2
>Reporter: Jason Cater
>Assignee: Mike Klaas
>Priority: Trivial
> Attachments: solr.py
>
>
> I've taken the original solr.py code and extended it to include higher-level 
> functions.
>   * Requires python 2.3+
>   * Supports SSL (https://) schema
>   * Conforms (mostly) to PEP 8 -- the Python Style Guide
>   * Provides a high-level results object with implicit data type conversion
>   * Supports batching of update commands

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-208) RSS feed XSL example

2007-05-17 Thread Walter Underwood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12496624
 ] 

Walter Underwood commented on SOLR-208:
---

I wasn't in the RSS wars, either, but I was on the Atom working group. That was 
a bunch of volunteers making a clean, testable spec for RSS functionality 
(http://www.ietf.org/rfc/rfc4287). RSS 2.0 has some bad ambiguities, especially 
around ampersand and entities in titles. The default has changed over the years 
and clients do different, incompatible things.

GData is just a way to do search result stuff that we would need anyway. It is 
standard set of URL parameters for query, start-index, and categories, and a 
few Atom extensions for total results, items per page, and next/previous.

http://code.google.com/apis/gdata/reference.html


> RSS feed XSL example
> 
>
> Key: SOLR-208
> URL: https://issues.apache.org/jira/browse/SOLR-208
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java
>Affects Versions: 1.2
>Reporter: Brian Whitman
> Assigned To: Hoss Man
>Priority: Trivial
> Attachments: rss.xsl
>
>
> A quick .xsl file for transforming solr queries into RSS feeds. To get the 
> date and time in properly you'll need an XSL 2.0 processor, as in 
> http://wiki.apache.org/solr/XsltResponseWriter .  Tested to work with the 
> example solr distribution in the nightly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-208) RSS feed XSL example

2007-05-17 Thread Walter Underwood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12496608
 ] 

Walter Underwood commented on SOLR-208:
---

What kind of RSS?

-1 unless it is Atom. The nine variants of RSS have some nasty interop 
problems, even between
those that are supposed to implement the same spec.

Even better,  a GData interface returning Atom.



> RSS feed XSL example
> 
>
> Key: SOLR-208
> URL: https://issues.apache.org/jira/browse/SOLR-208
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java
>Affects Versions: 1.2
>Reporter: Brian Whitman
> Assigned To: Hoss Man
>Priority: Trivial
> Attachments: rss.xsl
>
>
> A quick .xsl file for transforming solr queries into RSS feeds. To get the 
> date and time in properly you'll need an XSL 2.0 processor, as in 
> http://wiki.apache.org/solr/XsltResponseWriter .  Tested to work with the 
> example solr distribution in the nightly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-161) Dangling dash causes stack trace

2007-02-15 Thread Walter Underwood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473628
 ] 

Walter Underwood commented on SOLR-161:
---

It is really a Lucene query parser bug, but it wouldn't hurt to do s/(.*)-/&/ 
as a workaround. Assuming my ed(1) syntax is still fresh. Regardless, no query 
string should ever give a stack trace. --wunder

> Dangling dash causes stack trace
> 
>
> Key: SOLR-161
> URL: https://issues.apache.org/jira/browse/SOLR-161
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.1.0
> Environment: Java 1.5, Tomcat 5.5.17, Fedora Core 4, Intel
>Reporter: Walter Underwood
>
> I'm running tests from our search logs, and we have a query that ends in a 
> dash. That caused a stack trace.
> org.apache.lucene.queryParser.ParseException: Cannot parse 'digging for the 
> truth -': Encountered "" at line 1, column 23.
> Was expecting one of:
> "(" ...
>  ...
>  ...
>  ...
>  ...
> "[" ...
> "{" ...
>  ...
> 
>   at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:127)
>   at 
> org.apache.solr.request.DisMaxRequestHandler.handleRequest(DisMaxRequestHandler.java:272)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:595)
>   at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:92)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-161) Dangling dash causes stack trace

2007-02-15 Thread Walter Underwood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473625
 ] 

Walter Underwood commented on SOLR-161:
---

The parser can have a rule for this rather than exploding. A trailing dash is 
never meaningful and can be omitted, whether we're allowing +/- or not. Seems 
like a grammar bug to me. --wunder

> Dangling dash causes stack trace
> 
>
> Key: SOLR-161
> URL: https://issues.apache.org/jira/browse/SOLR-161
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.1.0
> Environment: Java 1.5, Tomcat 5.5.17, Fedora Core 4, Intel
>Reporter: Walter Underwood
>
> I'm running tests from our search logs, and we have a query that ends in a 
> dash. That caused a stack trace.
> org.apache.lucene.queryParser.ParseException: Cannot parse 'digging for the 
> truth -': Encountered "" at line 1, column 23.
> Was expecting one of:
> "(" ...
>  ...
>  ...
>  ...
>  ...
> "[" ...
> "{" ...
>  ...
> 
>   at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:127)
>   at 
> org.apache.solr.request.DisMaxRequestHandler.handleRequest(DisMaxRequestHandler.java:272)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:595)
>   at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:92)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-161) Dangling dash causes stack trace

2007-02-15 Thread Walter Underwood (JIRA)
Dangling dash causes stack trace


 Key: SOLR-161
 URL: https://issues.apache.org/jira/browse/SOLR-161
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.1.0
 Environment: Java 1.5, Tomcat 5.5.17, Fedora Core 4, Intel
Reporter: Walter Underwood


I'm running tests from our search logs, and we have a query that ends in a 
dash. That caused a stack trace.

org.apache.lucene.queryParser.ParseException: Cannot parse 'digging for the 
truth -': Encountered "" at line 1, column 23.
Was expecting one of:
"(" ...
 ...
 ...
 ...
 ...
"[" ...
"{" ...
 ...

at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:127)
at 
org.apache.solr.request.DisMaxRequestHandler.handleRequest(DisMaxRequestHandler.java:272)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:595)
at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:92)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-129) Solrb - UTF 8 Support for add/delete

2007-01-31 Thread Walter Underwood (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469072
 ] 

Walter Underwood commented on SOLR-129:
---

This is not a bug, unless a bad error message is a bug. It looks like the XML 
uses the  HTML entity "å" , which is not defined in XML. It has nothing 
to do with UTF-8. It really should generate an error message with line number 
instead of a stack trace.

wunder

> Solrb - UTF 8 Support for add/delete
> 
>
> Key: SOLR-129
> URL: https://issues.apache.org/jira/browse/SOLR-129
> Project: Solr
>  Issue Type: Bug
>  Components: clients - ruby - flare
> Environment: OSX
>Reporter: Antonio Eggberg
>
> Hi:
> This could be a ruby utf-8 bug. Anyway when I try to do a UTF-8 document add 
> via post.sh and then do query via Solr Admin everything works as it should. 
> However using the solrb ruby lib or flare UTF-8 doc add doesn't work as it 
> should. I am not sure what I am doing wrong and I don't think its Solr cos it 
> works as it should.
> Could this be a famous utf-8 ruby bug? I am using ruby 1.8.5 with rails 1.2.1
> Cheers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-73) schema.xml and solrconfig.xml use CNET-internal class names

2006-12-05 Thread Walter Underwood (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-73?page=comments#action_12455684 ] 

Walter Underwood commented on SOLR-73:
--

Remember, this bug is only about removing aliased names from the sample files.

Note that the users in favor of having a alias-free sample files are all new to 
Solr. The people in favor of keeping them are generally long-time Solr users or 
developers. From a new user point of view, they are confusing.

Adding explicit alias definitions is a separate issue.




> schema.xml and solrconfig.xml use CNET-internal class names
> ---
>
> Key: SOLR-73
> URL: http://issues.apache.org/jira/browse/SOLR-73
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Reporter: Walter Underwood
>
> The configuration files in the example directory still use the old 
> CNET-internal class names, like solr.LRUCache instead of 
> org.apache.solr.search.LRUCache.  This is confusing to new users and should 
> be fixed before the first release.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (SOLR-73) schema.xml and solrconfig.xml use CNET-internal class names

2006-11-28 Thread Walter Underwood (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-73?page=comments#action_12454190 ] 

Walter Underwood commented on SOLR-73:
--

The context required to resolve the ambiguity is a wiki page that I didn't know 
existed. Since I didn't know about it, I tried to figure it out by reading the 
code, and then by sending e-mail to the list. In my case, I was writing two 
tiny classes, but the issue would be the same if I was a non-programmer adding 
some simple plug-ins.

With a full class name, there is no ambiguity. Again, this saves typing at the 
cost of requiring an indirection through some unspecified documentation.

I saw every customer support e-mail for eight years with Ultraseek, so I'm 
pretty familiar with the problems that search engine admins run into. 
One of the things we learned was that documentation doesn't fix an unclear 
product. You fix the product instead of documenting how to understand it.

Requiring users to edit an XML file is a separate issue, but I think it is a 
serious problem, especially because any error messages show up in the server 
logs. 


> schema.xml and solrconfig.xml use CNET-internal class names
> ---
>
> Key: SOLR-73
> URL: http://issues.apache.org/jira/browse/SOLR-73
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Reporter: Walter Underwood
>
> The configuration files in the example directory still use the old 
> CNET-internal class names, like solr.LRUCache instead of 
> org.apache.solr.search.LRUCache.  This is confusing to new users and should 
> be fixed before the first release.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (SOLR-73) schema.xml and solrconfig.xml use CNET-internal class names

2006-11-28 Thread Walter Underwood (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-73?page=comments#action_12454159 ] 

Walter Underwood commented on SOLR-73:
--

I think the aliases are harder to read. You need to go elsewhere to figure them 
out. I read documentation, but I didn't find the part of the wiki that 
explained them and I had to ask the mailing list.

The javadoc uses the full class name. Google and Yahoo searches should work 
better with the full class name (Yahoo is working much better than Google for 
that right now).

The aliases save typing, but I don't think they improve usability. Full class 
names are simple and unambiguous.

If we want usability for non-programmers, we can't have them editing an XML 
file. 


> schema.xml and solrconfig.xml use CNET-internal class names
> ---
>
> Key: SOLR-73
> URL: http://issues.apache.org/jira/browse/SOLR-73
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Reporter: Walter Underwood
>
> The configuration files in the example directory still use the old 
> CNET-internal class names, like solr.LRUCache instead of 
> org.apache.solr.search.LRUCache.  This is confusing to new users and should 
> be fixed before the first release.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (SOLR-73) schema.xml and solrconfig.xml use CNET-internal class names

2006-11-28 Thread Walter Underwood (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-73?page=comments#action_12454066 ] 

Walter Underwood commented on SOLR-73:
--

The aliasing requires documentation and using the full class names doesn't. It 
seems much simpler to me to use the real class names. Less to maintain, less to 
test, less to explain. 

> schema.xml and solrconfig.xml use CNET-internal class names
> ---
>
> Key: SOLR-73
> URL: http://issues.apache.org/jira/browse/SOLR-73
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Reporter: Walter Underwood
>
> The configuration files in the example directory still use the old 
> CNET-internal class names, like solr.LRUCache instead of 
> org.apache.solr.search.LRUCache.  This is confusing to new users and should 
> be fixed before the first release.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (SOLR-73) schema.xml and solrconfig.xml use CNET-internal class names

2006-11-28 Thread Walter Underwood (JIRA)
schema.xml and solrconfig.xml use CNET-internal class names
---

 Key: SOLR-73
 URL: http://issues.apache.org/jira/browse/SOLR-73
 Project: Solr
  Issue Type: Bug
  Components: search
Reporter: Walter Underwood


The configuration files in the example directory still use the old 
CNET-internal class names, like solr.LRUCache instead of 
org.apache.solr.search.LRUCache.  This is confusing to new users and should be 
fixed before the first release.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira