[jira] Commented: (SOLR-129) Solrb - UTF 8 Support for add/delete

2007-01-31 Thread Antonio Eggberg (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468920
 ] 

Antonio Eggberg commented on SOLR-129:
--

Please close this bug. I have found the problem. For those of you who might be 
wondering why do you see strange char in flare index page.. cos you are in 
debug mode :-) If I read the code a bit more carefully :-.. anyway turn off 
debug in your app/view pages. 

However the problem of post i.e. add document.. still exist. this is above my 
java expertise .. so here is the error log...

SEVERE: org.xmlpull.v1.XmlPullParserException: could not resolve entity named 
'aring' (position: START_TAG seen ...field 
name=\'description_text\'Tvaring;... @1:115) 
at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1282)
at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093)
at org.xmlpull.mxp1.MXParser.nextText(MXParser.java:1058)
at org.apache.solr.core.SolrCore.readDoc(SolrCore.java:927)
at org.apache.solr.core.SolrCore.update(SolrCore.java:720)
at 
org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:53)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:616)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:428)
at 
org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:473)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:568)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1530)
at 
org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:633)
at org.mortbay.http.HttpContext.handle(HttpContext.java:1482)
at org.mortbay.http.HttpServer.service(HttpServer.java:909)
at org.mortbay.http.HttpConnection.service(HttpConnection.java:820)
at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:986)
at org.mortbay.http.HttpConnection.handle(HttpConnection.java:837)
at 
org.mortbay.http.SocketListener.handleConnection(SocketListener.java:245)
at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) 

 Solrb - UTF 8 Support for add/delete
 

 Key: SOLR-129
 URL: https://issues.apache.org/jira/browse/SOLR-129
 Project: Solr
  Issue Type: Bug
  Components: clients - ruby - flare
 Environment: OSX
Reporter: Antonio Eggberg

 Hi:
 This could be a ruby utf-8 bug. Anyway when I try to do a UTF-8 document add 
 via post.sh and then do query via Solr Admin everything works as it should. 
 However using the solrb ruby lib or flare UTF-8 doc add doesn't work as it 
 should. I am not sure what I am doing wrong and I don't think its Solr cos it 
 works as it should.
 Could this be a famous utf-8 ruby bug? I am using ruby 1.8.5 with rails 1.2.1
 Cheers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-85) [PATCH] Add update form to the admin screen

2007-01-31 Thread Thorsten Scherler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469074
 ] 

Thorsten Scherler commented on SOLR-85:
---

Hi Ryan,

sorry for coming back so late on this, but I need to finish up the first 
version of a customer project.

Anyway, I saw that SOLR-104 is now applied meaning your last patch on this 
issue should work fine, right.

Are they any other blocker on this issue?

salu2

 [PATCH] Add update form to the admin screen
 ---

 Key: SOLR-85
 URL: https://issues.apache.org/jira/browse/SOLR-85
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Thorsten Scherler
 Attachments: solar-85.png, solar-85.png, 
 solar-85.with.file.upload.diff, solar-85.with.file.upload.diff, 
 solar-85.with.file.upload.diff, solar-85.with.file.upload.diff, 
 solr-85-with-104.patch, solr-85.diff, solr-85.diff, solr-85.FINAL.diff


 It would be nice to have a webform to update solr via a http interface 
 instead of using the post.sh.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-61) move XML update parsing out of SolrCore

2007-01-31 Thread Thorsten Scherler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-61?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469076
 ] 

Thorsten Scherler commented on SOLR-61:
---

Hi all,

I am keen to give this issue a go, somebody can give some hints where to start.

TIA

salu2

 move XML update parsing out of SolrCore
 ---

 Key: SOLR-61
 URL: https://issues.apache.org/jira/browse/SOLR-61
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley
Priority: Minor

 The XML parsing in SolrCore should be decoupled and moved out.
 We also might consider moving to StAX based parsing, as it is now a standard 
 and will be included in Java6 (Woodstox could be used for Java5).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-130) [Patch] [Docu] Starting a mySolr document, which tries to explain how to setup a custom solr instance

2007-01-31 Thread Thorsten Scherler (JIRA)
[Patch] [Docu] Starting a mySolr document, which tries to explain how to setup 
a custom solr instance
-

 Key: SOLR-130
 URL: https://issues.apache.org/jira/browse/SOLR-130
 Project: Solr
  Issue Type: Task
Reporter: Thorsten Scherler


While developing a custom search server based on solr I took some notes about 
the do's and don'ts. The initial patch is not a fully finished document but may 
invite other devs to enhance it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



JIRA - adding docu component?

2007-01-31 Thread Thorsten Scherler
Hi all,

I wonder whether we could add a docu component to our jira instance?

wdyt?

salu2
-- 
Thorsten Scherler   thorsten.at.apache.org
Open Source Java  XML  consulting, training and solutions



Re: JIRA - adding docu component?

2007-01-31 Thread Yonik Seeley

On 1/31/07, Thorsten Scherler [EMAIL PROTECTED] wrote:

I wonder whether we could add a docu component to our jira instance?


Done.

-Yonik


[jira] Commented: (SOLR-61) move XML update parsing out of SolrCore

2007-01-31 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-61?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469108
 ] 

Ryan McKinley commented on SOLR-61:
---

in SOLR104, xml parsing moved from SolrCore to XmlUpdateRequestHandler

http://svn.apache.org/repos/asf/lucene/solr/trunk/src/java/org/apache/solr/handler/XmlUpdateRequestHandler.java

 move XML update parsing out of SolrCore
 ---

 Key: SOLR-61
 URL: https://issues.apache.org/jira/browse/SOLR-61
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley
Priority: Minor

 The XML parsing in SolrCore should be decoupled and moved out.
 We also might consider moving to StAX based parsing, as it is now a standard 
 and will be included in Java6 (Woodstox could be used for Java5).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-85) [PATCH] Add update form to the admin screen

2007-01-31 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469104
 ] 

Ryan McKinley commented on SOLR-85:
---

the last patch (solr-85-with-104.patch) should work fine

no blocker issues

ryan

 [PATCH] Add update form to the admin screen
 ---

 Key: SOLR-85
 URL: https://issues.apache.org/jira/browse/SOLR-85
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Thorsten Scherler
 Attachments: solar-85.png, solar-85.png, 
 solar-85.with.file.upload.diff, solar-85.with.file.upload.diff, 
 solar-85.with.file.upload.diff, solar-85.with.file.upload.diff, 
 solr-85-with-104.patch, solr-85.diff, solr-85.diff, solr-85.FINAL.diff


 It would be nice to have a webform to update solr via a http interface 
 instead of using the post.sh.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-130) [Patch] [Docu] Starting a mySolr document, which tries to explain how to setup a custom solr instance

2007-01-31 Thread Antonio Eggberg (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469121
 ] 

Antonio Eggberg commented on SOLR-130:
--

Wow! you must be reading my mind :-) I can contribute with questions :-) As a 
newbie non-java user from an enterprise prospective! I am your idea target :-) 
Having said that I like to know about the following: 

1. The schema.xml and solrconfig.xml are in parts very well explained. But in 
some areas like as an example .. indexDefaults and other places there are no 
explanation. It would be nice to get more info there. Specifically for example 
if increase mergeFactor to 1000 what will happen? what are the highest value 
for each properties? what is for example a safe value. 
2. It would be nice to create a deployment scenarios i.e a single server 
install with XXX CPU and YYY memory just running Solr with AAA thousand docs 
how should your config look like and why? and you can get about xxx Query/Sec 
or something..
3. It would be nice to have a multi server deployment with some server spec and 
then how should the deployment be.
4. It would also be nice to have more info regarding stopwords synonoms etc. 
usage and facet etc.. 

I know that all of the above are case by case cos configuration by default 
means case by case. But what I want to propose is a Guidelines or Best 
Practice based on your production implementation/deployment you have done with 
Cocoon. It would be nice to have some real world stories.

I think you should do like the subversion book! - A Solr open source book! :-) 
 

 [Patch] [Docu] Starting a mySolr document, which tries to explain how to 
 setup a custom solr instance
 -

 Key: SOLR-130
 URL: https://issues.apache.org/jira/browse/SOLR-130
 Project: Solr
  Issue Type: Task
Reporter: Thorsten Scherler
 Attachments: SOLR-130.diff


 While developing a custom search server based on solr I took some notes about 
 the do's and don'ts. The initial patch is not a fully finished document but 
 may invite other devs to enhance it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-109) variable substitution in lucene query params

2007-01-31 Thread Thorsten Scherler (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thorsten Scherler updated SOLR-109:
---

Attachment: SOLR-109.diff

This is a first start.

What still is missing is ... a more general solution might be to modify the 
SolrQueryParser
directly to have a new void setParamVariables(SolrParams p) method.  if
it's called (with non null input), then any string that SolrQueryParser
instance is asked to parse would first be preprocessed looking for the ${}
pattern and pulling the values out of the SOlrParams instance.

I need to have a closer look on what Hoss means exactly with this. However I 
get lots of error after an svn up and I am not sure whether my local changes 
has caused this.

 variable substitution in lucene query params
 

 Key: SOLR-109
 URL: https://issues.apache.org/jira/browse/SOLR-109
 Project: Solr
  Issue Type: New Feature
Reporter: Thorsten Scherler
 Attachments: SOLR-109.diff


 Allowing variable substitution in the lucene query params seems pretty slick 
 ... a more general solution might be to modify the SolrQueryParser
 directly to have a new void setParamVariables(SolrParams p) method.
 http://marc.theaimsgroup.com/?t=11671237641r=1w=2

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



/update/xml dropping exceptions

2007-01-31 Thread Yonik Seeley

I haven't looked into it yet, but it seems like any problems in a
request to /update/xml get lost somewhere... a positive response is
always returned.

-Yonik


[jira] Commented: (SOLR-85) [PATCH] Add update form to the admin screen

2007-01-31 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469145
 ] 

Yonik Seeley commented on SOLR-85:
--

Ryan, see Thorsten's last patch:  solar-85.with.file.upload.diff
that addressed some previous comments (separate update page, able to be 
disabled from solrconfig, etc)


 [PATCH] Add update form to the admin screen
 ---

 Key: SOLR-85
 URL: https://issues.apache.org/jira/browse/SOLR-85
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Thorsten Scherler
 Attachments: solar-85.png, solar-85.png, 
 solar-85.with.file.upload.diff, solar-85.with.file.upload.diff, 
 solar-85.with.file.upload.diff, solar-85.with.file.upload.diff, 
 solr-85-with-104.patch, solr-85.diff, solr-85.diff, solr-85.FINAL.diff


 It would be nice to have a webform to update solr via a http interface 
 instead of using the post.sh.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-85) [PATCH] Add update form to the admin screen

2007-01-31 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469156
 ] 

Yonik Seeley commented on SOLR-85:
--

If you click on Manage Attachments (do you have that link?) it shows the date 
each attachment was added.
That's why I prefer versions of a patch all added under the same name... then 
JIRA takes care of telling me which is newest by graying out the old ones.

 [PATCH] Add update form to the admin screen
 ---

 Key: SOLR-85
 URL: https://issues.apache.org/jira/browse/SOLR-85
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Thorsten Scherler
 Attachments: solar-85.png, solar-85.png, 
 solar-85.with.file.upload.diff, solar-85.with.file.upload.diff, 
 solar-85.with.file.upload.diff, solar-85.with.file.upload.diff, 
 solr-85-with-104.patch, solr-85.diff, solr-85.diff, solr-85.FINAL.diff


 It would be nice to have a webform to update solr via a http interface 
 instead of using the post.sh.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-126) Auto-commit documents after time interval

2007-01-31 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469187
 ] 

Mike Klaas commented on SOLR-126:
-

Ryan: looking good!  A few comments:

 - You notify the tracker that the document is added before actually adding the 
document.  This is okay--commit() cannot run until addDoc() is complete--but it 
does mean that the autocommit maxTime is measured from the start of the 
document being added until after it has been processed.  I'm not sure it 
matters in practice.

- similarly, didCommit() is invoked before the searcher is warmed.  Autocommits 
will never occur simulatneously (as you note; due to synchronization of run()), 
but they could be invoked continually if warming takes a long time.

 - If 250ms is a small enough time to not care about, does it make sense to 
force the user to specify the time in milliseconds?

These are all relatively minor things--if no one else has any thoughts this can 
probably be committed soon.  

 Auto-commit documents after time interval
 -

 Key: SOLR-126
 URL: https://issues.apache.org/jira/browse/SOLR-126
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Priority: Minor
 Attachments: AutoCommit.patch, AutocommitingUpdateRequestHandler.patch


 If an index is getting updated from multiple sources and needs to add 
 documents reasonably quickly, there should be a good solr side mechanism to 
 help prevent the client from spawning multiple overlapping commit/ commands.
 My specific use case is sending each document to solr every time hibernate 
 saves an object (see SOLR-20).  This happens from multiple machines 
 simultaneously.  I'd like solr to make sure the documents are committed 
 within a second.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-126) Auto-commit documents after time interval

2007-01-31 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469204
 ] 

Ryan McKinley commented on SOLR-126:


 
  - You notify the tracker that the document is added before actually adding 
 the document.  This is okay--commit() cannot run until addDoc() is 
 complete--but it does mean that the autocommit maxTime is measured from the 
 start of the document being added until after it has been processed.  I'm not 
 sure it matters in practice.
 

I'm looking at it from the client perspective.  The timer should start as soon 
as close to the request time as possible.


 - similarly, didCommit() is invoked before the searcher is warmed.  
 Autocommits will never occur simulatneously (as you note; due to 
 synchronization of run()), but they could be invoked continually if warming 
 takes a long time.
 

I just left at were it was in the existing code.  I think it makes sense  
because the searcher has the proper data at that point - a second commit wont 
change the results.

Also, it will not start a new autocommit until the first has warmed the 
searcher anyway:

  CommitUpdateCommand command = new CommitUpdateCommand( false );
  command.waitFlush = true;
  command.waitSearcher = true; 


  - If 250ms is a small enough time to not care about, does it make sense to 
 force the user to specify the time in milliseconds?
 

This is trying to avoid is the case where 100 documents are added at the same 
time with maxDocs=10.  We don't want to commit 10 times, so it waits 1/4 sec. 
(could be shorter or longer in my opinion)

If anyone is worried about the timing, they should use maxTime, not maxDocs



 Auto-commit documents after time interval
 -

 Key: SOLR-126
 URL: https://issues.apache.org/jira/browse/SOLR-126
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Ryan McKinley
Priority: Minor
 Attachments: AutoCommit.patch, AutocommitingUpdateRequestHandler.patch


 If an index is getting updated from multiple sources and needs to add 
 documents reasonably quickly, there should be a good solr side mechanism to 
 help prevent the client from spawning multiple overlapping commit/ commands.
 My specific use case is sending each document to solr every time hibernate 
 saves an object (see SOLR-20).  This happens from multiple machines 
 simultaneously.  I'd like solr to make sure the documents are committed 
 within a second.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-130) [Patch] [Docu] Starting a mySolr document, which tries to explain how to setup a custom solr instance

2007-01-31 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-130:
--

Component/s: documentation

 [Patch] [Docu] Starting a mySolr document, which tries to explain how to 
 setup a custom solr instance
 -

 Key: SOLR-130
 URL: https://issues.apache.org/jira/browse/SOLR-130
 Project: Solr
  Issue Type: Task
  Components: documentation
Reporter: Thorsten Scherler
 Attachments: SOLR-130.diff


 While developing a custom search server based on solr I took some notes about 
 the do's and don'ts. The initial patch is not a fully finished document but 
 may invite other devs to enhance it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



empty contentStream?

2007-01-31 Thread Ryan McKinley

I'm trying to implement SOLR-85 using SOLR-104 content streams... but
it raises a simple behavior question.

If you have a form:
form
textarea  name=stream.body /textarea
input type=file name=file/
/form

If you upload a file, the update plugin is sent two content streams:
one with the contents of the file, the other with contents  .

As written the XmlUpdateHandler parses each stream and breaks when it
hits the empty string.

Options:
1. this should be implemented with two forms - every field sent should be used
2. if stream.body.trim().length() == 0, don't make a stream

I vote for #2, thoughts?


Re: loading many documents by ID

2007-01-31 Thread Erik Hatcher


On Jan 31, 2007, at 6:39 PM, Chris Hostetter wrote:

: Oh, and there have been numerous people interested in updateable
: documents, so it would be nice if that part was in the update  
handler.


We'd have to make it very clear that this only works if all fields are
STORED.


That is perfectly reasonable, for sure.  And I would support an  
update feature issuing an exception if it detected this case.


There is an important caveat to all fields being stored though... if  
an update was sending in updated fields for all the non-stored  
fields, and only stored fields were being copied internally, all  
would be fine too.


I think eventually we could have this sort of feature internally copy  
the terms for non-stored fields somehow, but maybe that would only  
come along once Lucene supported something to facilitate this more?


Erik




Re: [jira] Created: (SOLR-131) tutorial update: faceting, highlighting, etc

2007-01-31 Thread Erik Hatcher

What about putting the tutorial completely on the wiki?

We could pull the wiki page into a distribution to lock it in  
statically.


Just a thought.  I like it being off the wiki actually, but with the  
wiki anyone can lend a hand in wordsmithing and updating.


Erik


On Jan 31, 2007, at 9:31 PM, Yonik Seeley (JIRA) wrote:


tutorial update: faceting, highlighting, etc


 Key: SOLR-131
 URL: https://issues.apache.org/jira/browse/SOLR-131
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Reporter: Yonik Seeley


The tutorial hasn't really been changed since we entered the  
incubator.  Highlighting and Faceting might be nice additions.


Looking back, I wish I had chosen a different data set like books  
or movies (or a mix of both)... something that wouldn't get out of  
date as fast as electronics, and that more people could identify  
with.  The biggest downside is examples in the Wiki refer to the  
current example docs.


breaking into multiple pages, and a screenshot or two wouldn't be  
bad idea either.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.




Re: empty contentStream?

2007-01-31 Thread Yonik Seeley

On 1/31/07, Ryan McKinley [EMAIL PROTECTED] wrote:

Options:
1. this should be implemented with two forms - every field sent should be used
2. if stream.body.trim().length() == 0, don't make a stream

I vote for #2, thoughts?


Sigh... yes, it's practical.

-Yonmik


resin and UTF-8 in URLs

2007-01-31 Thread Yonik Seeley

So, we've conquered UTF-8 input in URLs for Jetty and Tomcat, so how
about Resin?

Right now, I can't get Resin 3.0.22 to see an e with a circumflex via
the following:

curl -i 'http://localhost:8983/solr/select?q=%C3%AAechoParams=explicit'

-Yonik


[jira] Commented: (SOLR-85) [PATCH] Add update form to the admin screen

2007-01-31 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469325
 ] 

Ryan McKinley commented on SOLR-85:
---

Ok, this one is based on solar-85.with.file.upload.diff!

It also adds a few minor fixes / adjustments to SOLR-104

 [PATCH] Add update form to the admin screen
 ---

 Key: SOLR-85
 URL: https://issues.apache.org/jira/browse/SOLR-85
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Thorsten Scherler
 Attachments: solar-85.png, solar-85.png, 
 solar-85.with.file.upload.diff, solar-85.with.file.upload.diff, 
 solar-85.with.file.upload.diff, solar-85.with.file.upload.diff, 
 solr-85-with-104.patch, solr-85-with-104.patch, solr-85.diff, solr-85.diff, 
 solr-85.FINAL.diff


 It would be nice to have a webform to update solr via a http interface 
 instead of using the post.sh.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: loading many documents by ID

2007-01-31 Thread Yonik Seeley

On 1/31/07, Erik Hatcher [EMAIL PROTECTED] wrote:


On Jan 31, 2007, at 6:39 PM, Chris Hostetter wrote:
 : Oh, and there have been numerous people interested in updateable
 : documents, so it would be nice if that part was in the update
 handler.

 We'd have to make it very clear that this only works if all fields are
 STORED.

That is perfectly reasonable, for sure.  And I would support an
update feature issuing an exception if it detected this case.

There is an important caveat to all fields being stored though... if
an update was sending in updated fields for all the non-stored
fields, and only stored fields were being copied internally, all
would be fine too.


I think there might be two useful types of updates:
1) overwrite original field
2) add an additional value for a multi-valued field (useful for tagging?)



I think eventually we could have this sort of feature internally copy
the terms for non-stored fields somehow, but maybe that would only
come along once Lucene supported something to facilitate this more?


Not unless you store more info (a lot more info).
We sould also be able to copy unstored fields with term vectors stored.

ParallelReader might also hold some promise (putting a field to be
updated in a separate index)  The problem is that the lucene ids need
to be kept in sync... I don't know how to do that w/o reindexing.

-Yonik


Re: svn commit: r501512 - in /lucene/solr/trunk: ./ src/java/org/apache/solr/core/ src/java/org/apache/solr/handler/ src/java/org/apache/solr/request/ src/java/org/apache/solr/search/ src/java/org/apa

2007-01-31 Thread Erik Hatcher

TODO: switch solrb to using wt=json instead of wt=ruby.

Whatcha think, Ed et al?

Erik


On Jan 30, 2007, at 1:36 PM, [EMAIL PROTECTED] wrote:


Author: yonik
Date: Tue Jan 30 10:36:32 2007
New Revision: 501512

URL: http://svn.apache.org/viewvc?view=revrev=501512
Log:
SimpleOrderedMap, JSON named list changes: SOLR-125




Re: svn commit: r501512 - in /lucene/solr/trunk: ./ src/java/org/apache/solr/core/ src/java/org/apache/solr/handler/ src/java/org/apache/solr/request/ src/java/org/apache/solr/search/ src/java/org/apa

2007-01-31 Thread Yonik Seeley

On 1/31/07, Erik Hatcher [EMAIL PROTECTED] wrote:

TODO: switch solrb to using wt=json instead of wt=ruby.


Why is that?

-Yonik


charset in POST from browser

2007-01-31 Thread Yonik Seeley

It seems that browsers do a form POST in the charset that the page was
encoded in.
Modifying form.jsp in solr/admin seems to work... the data comes
across encoded in UTF8.

The problem is that the charset isn't defined to be UTF-8 in the
headers, so the bytes are assumed to be latin-1.

Is this a problem we can fix in solr, or is it purely container config?

This will mimic what the browser sends back:
curl -i http://localhost:8983/solr/select -d 'q=%C3%AA'

-Yonik


Re: loading many documents by ID

2007-01-31 Thread Walter Underwood
On 1/31/07 3:39 PM, Chris Hostetter [EMAIL PROTECTED] wrote:
 
 : Oh, and there have been numerous people interested in updateable
 : documents, so it would be nice if that part was in the update handler.
 
 We'd have to make it very clear that this only works if all fields are
 STORED.

Isn't there some way to do this automatically instead of relying
on documentation? We might need to add something, maybe a
required attribute on fields, but a runtime error would be
much, much better than a page on the wiki.

wunder



Re: loading many documents by ID

2007-01-31 Thread Ryan McKinley


 We'd have to make it very clear that this only works if all fields are
 STORED.

Isn't there some way to do this automatically instead of relying
on documentation? We might need to add something, maybe a
required attribute on fields, but a runtime error would be
much, much better than a page on the wiki.



what about copyField?

With copyField, it is reasonable to have fields that are not stored
and are generated from the other stored fields.  (this is what my
setup looks like)


[jira] Closed: (SOLR-129) Solrb - UTF 8 Support for add/delete

2007-01-31 Thread Erik Hatcher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher closed SOLR-129.
-

Resolution: Cannot Reproduce

I added a controller and view to display the features from the 
utf8-example.xml file to flare.

1) fire up the Solr example application, and post.sh *.xml from the 
exampledocs directory.

2) fire up flare, hit /i18n (http://localhost:3000/i18n

Showing all the accented characters worked fine for me.

I suspect we probably still have some i18n issues to iron out, so any help or 
at least test cases in that regard would be most helpful.

 Solrb - UTF 8 Support for add/delete
 

 Key: SOLR-129
 URL: https://issues.apache.org/jira/browse/SOLR-129
 Project: Solr
  Issue Type: Bug
  Components: clients - ruby - flare
 Environment: OSX
Reporter: Antonio Eggberg

 Hi:
 This could be a ruby utf-8 bug. Anyway when I try to do a UTF-8 document add 
 via post.sh and then do query via Solr Admin everything works as it should. 
 However using the solrb ruby lib or flare UTF-8 doc add doesn't work as it 
 should. I am not sure what I am doing wrong and I don't think its Solr cos it 
 works as it should.
 Could this be a famous utf-8 ruby bug? I am using ruby 1.8.5 with rails 1.2.1
 Cheers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: loading many documents by ID

2007-01-31 Thread Walter Underwood
On 1/31/07 9:05 PM, Ryan McKinley [EMAIL PROTECTED] wrote:
 
 We'd have to make it very clear that this only works if all fields are
 STORED.
 
 Isn't there some way to do this automatically instead of relying
 on documentation? We might need to add something, maybe a
 required attribute on fields, but a runtime error would be
 much, much better than a page on the wiki.
 
 what about copyField?
 
 With copyField, it is reasonable to have fields that are not stored
 and are generated from the other stored fields.  (this is what my
 setup looks like).

Mine, too. That is why I suggested explicit declarations in the
schema to say which fields are required.

wunder



Re: empty contentStream?

2007-01-31 Thread Chris Hostetter

:  1. this should be implemented with two forms - every field sent should be 
used
:  2. if stream.body.trim().length() == 0, don't make a stream
: 
:  I vote for #2, thoughts?
:
: Sigh... yes, it's practical.

Alternate Idea #3: make the XmlUpdateRequestHandler more robust in
recieving empty streams (treat it as a NOOP, maybe return an error if
*all* the streams are empty)

i'm okay with #2 as long as it's only in the stream.body parsing and not
something we try to do with every stream.



-Hoss



Re: charset in POST from browser

2007-01-31 Thread Yonik Seeley

On 2/1/07, Chris Hostetter [EMAIL PROTECTED] wrote:

: The problem is that the charset isn't defined to be UTF-8 in the
: headers, so the bytes are assumed to be latin-1.
:
: Is this a problem we can fix in solr, or is it purely container config?

umm... we already fixed this the best way i know how in SOLR-35 ... all of
the JSPs that have forms should have this in them...

%@ page contentType=text/html; charset=utf-8 pageEncoding=UTF-8%

...is resin not respecting that?


The form that gets sent to the browser is in UTF8, and the browser
correctly sends back UTF8 in the post body.  *But* the browser doesn't
tell the container what the charset of the body is, so it's up to the
container to guess.  By default, resin seems to pick latin-1.

It seems like we should assume UTF-8 if no charset is sent for a text
content type.

-Yonik


Re: resin and UTF-8 in URLs

2007-01-31 Thread Ryan McKinley

I just tried this on two systems... it worked on one (I got the ê) and
the other I get ê -- both running resin 3.0.21

The one that works has http://securityfilter.sourceforge.net/ applied.
I'll look into what securityfilter is doing... it may be setting
something explicitly


Re: empty contentStream?

2007-01-31 Thread Ryan McKinley

I just posted SOLR-85 using strategy #2.

It makes sure stream.body and stream.url have content before making
streams out of them.  I think this makes sense given they are likely
to be used in forms similar to the 'update.jsp' where they may or may
not have content.



i'm okay with #2 as long as it's only in the stream.body parsing and not
something we try to do with every stream.



I totally agree it should not check 'real' streams, but these are
essentially helper streams that make it easy to post a stream from a
form.


Re: resin and UTF-8 in URLs

2007-01-31 Thread Yonik Seeley

On 2/1/07, Ryan McKinley [EMAIL PROTECTED] wrote:

I just tried this on two systems... it worked on one (I got the ê) and
the other I get ê -- both running resin 3.0.21


A co-worker informed me that adding a character-encoding attribute to
the web-app tag in web.xml will force a charset if not defined.  Seems
to work for both GET and POST.

web-app character-encoding=utf-8

This looks resin-specific though.

-Yonik


Re: charset in POST from browser

2007-01-31 Thread Chris Hostetter

: The form that gets sent to the browser is in UTF8, and the browser
: correctly sends back UTF8 in the post body.  *But* the browser doesn't
: tell the container what the charset of the body is, so it's up to the
: container to guess.  By default, resin seems to pick latin-1.

That's really weird ... i could have sworn browsers doing POST of form
data were suppose to sent a full content-type...

   Content-type: application/x-www-form-urlencoded; charset=utf-8

...picking the charset based on the charset of the page containing the
form  (i assume you tested and verified this isn't happening?)

a quick google search turned up this page, with this info...

http://www.systemvikar.biz/faq/servlet.xtp



Form character encoding doesn't work

A POST request with application/x-www-form-urlencoded doesn't contain any
information about the character request. So Resin needs to use a set of
heuristics to decode the form. Here's the order:

   1. request.getAttribute(caucho.form.character.encoding)
   2. The response.setContentType() encoding of the page.
   3. The character-encoding tag in the resin.conf.

Resin uses the default character encoding of your JVM to read form data.
To set the encoding to another charset, you'll need to change the
resin.conf as follows:

http-server character-encoding='Shift_JIS'
  ...
/http-server




Re: empty contentStream?

2007-01-31 Thread Chris Hostetter

: It makes sure stream.body and stream.url have content before making
: streams out of them.  I think this makes sense given they are likely
: to be used in forms similar to the 'update.jsp' where they may or may
: not have content.

yeah ... good call.


-Hoss



Re: svn commit: r501512 - in /lucene/solr/trunk: ./ src/java/org/apache/solr/core/ src/java/org/apache/solr/handler/ src/java/org/apache/solr/request/ src/java/org/apache/solr/search/ src/java/org/apa

2007-01-31 Thread Erik Hatcher


On Jan 31, 2007, at 11:08 PM, Yonik Seeley wrote:


On 1/31/07, Erik Hatcher [EMAIL PROTECTED] wrote:

TODO: switch solrb to using wt=json instead of wt=ruby.


Why is that?


To benefit from a richer data structure, avoid eval (which I hear is  
likely to be slower than parsing JSON, and eval is potentially more  
dangerous if code somehow got slipped in though that risk is not very  
high).


The downside is that we'd need to add a dependency on a JSON parsing  
library.  JSON is close enough to Ruby syntax that it can practically  
be eval'd, interestingly, but I don't think it's close enough.


Erik



Re: charset in POST from browser

2007-01-31 Thread Yonik Seeley

On 2/1/07, Chris Hostetter [EMAIL PROTECTED] wrote:

: The form that gets sent to the browser is in UTF8, and the browser
: correctly sends back UTF8 in the post body.  *But* the browser doesn't
: tell the container what the charset of the body is, so it's up to the
: container to guess.  By default, resin seems to pick latin-1.

That's really weird ... i could have sworn browsers doing POST of form
data were suppose to sent a full content-type...

   Content-type: application/x-www-form-urlencoded; charset=utf-8

...picking the charset based on the charset of the page containing the
form  (i assume you tested and verified this isn't happening?)


Yep, FireFox2.
I'd serve the page, do a search, kill the solr server, run nc -l -p
8983, and run the search again.  The body was encoded correctly, but
just no charset info.

I tried setting it explicitly by appending to enctype in the form, but
it doesn't go through.

-Yonik


Re: svn commit: r501512 - in /lucene/solr/trunk: ./ src/java/org/apache/solr/core/ src/java/org/apache/solr/handler/ src/java/org/apache/solr/request/ src/java/org/apache/solr/search/ src/java/org/apa

2007-01-31 Thread Yonik Seeley

On 2/1/07, Erik Hatcher [EMAIL PROTECTED] wrote:

On Jan 31, 2007, at 11:08 PM, Yonik Seeley wrote:

 On 1/31/07, Erik Hatcher [EMAIL PROTECTED] wrote:
 TODO: switch solrb to using wt=json instead of wt=ruby.

 Why is that?

To benefit from a richer data structure,


They seem to have the same power there.
Bear in mind that json params like json.nl apply to it's subtypes,
ruby and python also.


avoid eval (which I hear is
likely to be slower than parsing JSON,


If the JSON parser is written in C, yes.   Otherwise, I doubt it :-)


and eval is potentially more
dangerous if code somehow got slipped in though that risk is not very
high).


Yeah, I guess someone would have to say, here, point your client at
my solr system, and then they could be running something else that
gives you executable code.  But they could also just give you bogus
data, so it's bad to point at random things anyway.  (but I guess it
it *is* worse if you are trying to operate in some federated mode
across the internet with unknown peers).


The downside is that we'd need to add a dependency on a JSON parsing
library.  JSON is close enough to Ruby syntax that it can practically
be eval'd, interestingly, but I don't think it's close enough.

Erik