[jira] Updated: (SOLR-58) Change Admin components to return XML like the rest of the system

2006-11-07 Thread Otis Gospodnetic (JIRA)
 [ http://issues.apache.org/jira/browse/SOLR-58?page=all ]

Otis Gospodnetic updated SOLR-58:
-

Attachment: logging-xml.jsp

Here is the XML version of logging.jsp, named logging-xml.jsp.
Its output is trivial:


  INFO


I imagine the XSL would take this XML, convert it to HTML, and append the HTML 
with links to action.jsp with different logging levels.


> Change Admin components to return XML like the rest of the system
> -
>
> Key: SOLR-58
> URL: http://issues.apache.org/jira/browse/SOLR-58
> Project: Solr
>  Issue Type: New Feature
>  Components: web gui
>Reporter: Otis Gospodnetic
> Assigned To: Otis Gospodnetic
>Priority: Minor
> Attachments: analysis-xml-out.txt, analysis-xml.jsp, logging-xml.jsp, 
> ping-xml-out.txt, ping-xml.jsp, threaddump-xml-out.txt, threaddump-xml.jsp
>
>
> I need to expose the admin functionality to an external application.  I think 
> returning admin data as XML may be a good and simple first step towards that.
> To do that I think I'll mostly need to modify JSPs (but I haven't had a good 
> look at Admin GUI yet).  From what I saw a few weeks ago when I briefly 
> looked at this, no Java code will need to be modified.  If you have concrete 
> ideas about how this should be done, please comment before I start next week 
> (week of October 23rd 2006).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (SOLR-58) Change Admin components to return XML like the rest of the system

2006-11-07 Thread Otis Gospodnetic (JIRA)
 [ http://issues.apache.org/jira/browse/SOLR-58?page=all ]

Otis Gospodnetic updated SOLR-58:
-

Attachment: threaddump-xml.jsp
threaddump-xml-out.txt

Here is threaddump-xml.jsp and the example of its output.


> Change Admin components to return XML like the rest of the system
> -
>
> Key: SOLR-58
> URL: http://issues.apache.org/jira/browse/SOLR-58
> Project: Solr
>  Issue Type: New Feature
>  Components: web gui
>Reporter: Otis Gospodnetic
> Assigned To: Otis Gospodnetic
>Priority: Minor
> Attachments: analysis-xml-out.txt, analysis-xml.jsp, 
> ping-xml-out.txt, ping-xml.jsp, threaddump-xml-out.txt, threaddump-xml.jsp
>
>
> I need to expose the admin functionality to an external application.  I think 
> returning admin data as XML may be a good and simple first step towards that.
> To do that I think I'll mostly need to modify JSPs (but I haven't had a good 
> look at Admin GUI yet).  From what I saw a few weeks ago when I briefly 
> looked at this, no Java code will need to be modified.  If you have concrete 
> ideas about how this should be done, please comment before I start next week 
> (week of October 23rd 2006).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (SOLR-58) Change Admin components to return XML like the rest of the system

2006-11-07 Thread Otis Gospodnetic (JIRA)
 [ http://issues.apache.org/jira/browse/SOLR-58?page=all ]

Otis Gospodnetic updated SOLR-58:
-

Attachment: ping-xml.jsp
ping-xml-out.txt

Ping was simple, I just made it return  if ping was 
OK (attached), and if there was an error, then:

 
  exception trace here
 

Thoughts?

N.B.
I'm not attaching diffs for JSPs, as I'm letting both the original and the XML 
versions live side by side locally for now, but if you'd prefer diffs, let me 
know. 


> Change Admin components to return XML like the rest of the system
> -
>
> Key: SOLR-58
> URL: http://issues.apache.org/jira/browse/SOLR-58
> Project: Solr
>  Issue Type: New Feature
>  Components: web gui
>Reporter: Otis Gospodnetic
> Assigned To: Otis Gospodnetic
>Priority: Minor
> Attachments: analysis-xml-out.txt, analysis-xml.jsp, 
> ping-xml-out.txt, ping-xml.jsp
>
>
> I need to expose the admin functionality to an external application.  I think 
> returning admin data as XML may be a good and simple first step towards that.
> To do that I think I'll mostly need to modify JSPs (but I haven't had a good 
> look at Admin GUI yet).  From what I saw a few weeks ago when I briefly 
> looked at this, no Java code will need to be modified.  If you have concrete 
> ideas about how this should be done, please comment before I start next week 
> (week of October 23rd 2006).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (SOLR-65) Multithreaded DirectUpdateHandler2

2006-11-07 Thread Mike Klaas (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-65?page=comments#action_12448024 ] 

Mike Klaas commented on SOLR-65:


This version removes the attempt at parsing the rest of the xml if an error 
occurs during document update.  

It has mostly already been reviewed, but to summarize, this patch:
  - improves concurrency during multi-threaded document update
  - potentially optimizes huge commits (may prevent a stall if memory is 
constrained)
  - eliminates the edge cases of document update (esp. multi-doc update) which 
causes
   solr to return xml that couldn't be parsed as a single document (should also 
solve SOLR-2; SOLR-54)

If no-one has objections, I'll commit the patch tomorrow.

> Multithreaded DirectUpdateHandler2
> --
>
> Key: SOLR-65
> URL: http://issues.apache.org/jira/browse/SOLR-65
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Mike Klaas
> Assigned To: Mike Klaas
> Attachments: autocommit_patch.diff, autocommit_patch.diff, 
> autocommit_patch.diff, autocommit_patch.diff
>
>
> Basic implementation of autoCommi functionality, plus overhaul of DUH2 
> threading to reduce contention

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (SOLR-65) Multithreaded DirectUpdateHandler2

2006-11-07 Thread Mike Klaas (JIRA)
 [ http://issues.apache.org/jira/browse/SOLR-65?page=all ]

Mike Klaas updated SOLR-65:
---

Attachment: autocommit_patch.diff

> Multithreaded DirectUpdateHandler2
> --
>
> Key: SOLR-65
> URL: http://issues.apache.org/jira/browse/SOLR-65
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Mike Klaas
> Assigned To: Mike Klaas
> Attachments: autocommit_patch.diff, autocommit_patch.diff, 
> autocommit_patch.diff, autocommit_patch.diff
>
>
> Basic implementation of autoCommi functionality, plus overhaul of DUH2 
> threading to reduce contention

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Work started: (SOLR-58) Change Admin components to return XML like the rest of the system

2006-11-07 Thread Otis Gospodnetic (JIRA)
 [ http://issues.apache.org/jira/browse/SOLR-58?page=all ]

Work on SOLR-58 started by Otis Gospodnetic.

> Change Admin components to return XML like the rest of the system
> -
>
> Key: SOLR-58
> URL: http://issues.apache.org/jira/browse/SOLR-58
> Project: Solr
>  Issue Type: New Feature
>  Components: web gui
>Reporter: Otis Gospodnetic
> Assigned To: Otis Gospodnetic
>Priority: Minor
> Attachments: analysis-xml-out.txt, analysis-xml.jsp
>
>
> I need to expose the admin functionality to an external application.  I think 
> returning admin data as XML may be a good and simple first step towards that.
> To do that I think I'll mostly need to modify JSPs (but I haven't had a good 
> look at Admin GUI yet).  From what I saw a few weeks ago when I briefly 
> looked at this, no Java code will need to be modified.  If you have concrete 
> ideas about how this should be done, please comment before I start next week 
> (week of October 23rd 2006).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (SOLR-58) Change Admin components to return XML like the rest of the system

2006-11-07 Thread Otis Gospodnetic (JIRA)
 [ http://issues.apache.org/jira/browse/SOLR-58?page=all ]

Otis Gospodnetic updated SOLR-58:
-

Attachment: analysis-xml-out.txt
analysis-xml.jsp

I took a stab at analysis page, and it turned out it lends itself to 
XMLization.  I'm attaching two files:
1) analysis XML output
2) analysis-xml.jsp - the JSP that replaces the output portion of analysis.jsp
(if this looks good, then I'll just change the FORM action in analysis.jsp 
to point to analysis-xml.jsp and somebody familiar with XSL could provide that 
piece)

Please comment.

These are my targets for XMLization, and I'm going to work on them next.

ANALYSIS (this attachment)
STATISTICS (already XMLized)
INFO
DISTRIBUTION
PING
LOGGING
THREAD DUMP


> Change Admin components to return XML like the rest of the system
> -
>
> Key: SOLR-58
> URL: http://issues.apache.org/jira/browse/SOLR-58
> Project: Solr
>  Issue Type: New Feature
>  Components: web gui
>Reporter: Otis Gospodnetic
> Assigned To: Otis Gospodnetic
>Priority: Minor
> Attachments: analysis-xml-out.txt, analysis-xml.jsp
>
>
> I need to expose the admin functionality to an external application.  I think 
> returning admin data as XML may be a good and simple first step towards that.
> To do that I think I'll mostly need to modify JSPs (but I haven't had a good 
> look at Admin GUI yet).  From what I saw a few weeks ago when I briefly 
> looked at this, no Java code will need to be modified.  If you have concrete 
> ideas about how this should be done, please comment before I start next week 
> (week of October 23rd 2006).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [jira] Commented: (SOLR-66) bulk data loader

2006-11-07 Thread Chris Hostetter

: Doesn't this bulk upload sound a bit like Simon's GData server?  That,
: too, uses APP, I believe.

bulk *upload* is really an afterthough of SOLR-66 ... from yonik's initial
description of the issue, he seems to be trying to address "boostrap"
problems for quickly indexing a lot of data from a *local* file with
minimal markup/parsing (ie: tab delimited or CSV)

Personaly, i think we may be getting a little too side tracked by
discussions of uploading/transforming and APP ... those are all decent
ideas, but they are seperate issues.

I think if we had a basic mechanims for parsing CSV files that assumed
homogenous fields, we would get pretty damn far for a lot of people --
even if we didn't address multi-value fields.


- Original Message 
From: Walter Underwood <[EMAIL PROTECTED]>
To: solr-dev@lucene.apache.org
Sent: Tuesday, November 7, 2006 2:36:37 PM
Subject: Re: [jira] Commented: (SOLR-66) bulk data loader

On 11/7/06 11:22 AM, "Yonik Seeley (JIRA)" <[EMAIL PROTECTED]> wrote:

> Yes, posting queries work because it's all form-data (query args).
> But, what if we want to post a complete file, *and* some extra info/parameters
> about how that file should be handled?

One approach is the Atom Publishing Protocol. That is pretty clear
about content and metainformation. It isn't designed to solve every
problem, but it handles a broad range of publishing, so it could be
a good fit for many uses of Solr.

APP is nearly finished. The latest draft is here (second URL also
has HTML versions).

 http://www.ietf.org/internet-drafts/draft-ietf-atompub-protocol-11.txt
 http://tools.ietf.org/wg/atompub/draft-ietf-atompub-protocol/

wunder
-- 
Walter Underwood
Search Guru, Netflix





:



-Hoss



Re: [jira] Commented: (SOLR-66) bulk data loader

2006-11-07 Thread Otis Gospodnetic
Doesn't this bulk upload sound a bit like Simon's GData server?  That, too, 
uses APP, I believe.

Otis

- Original Message 
From: Walter Underwood <[EMAIL PROTECTED]>
To: solr-dev@lucene.apache.org
Sent: Tuesday, November 7, 2006 2:36:37 PM
Subject: Re: [jira] Commented: (SOLR-66) bulk data loader

On 11/7/06 11:22 AM, "Yonik Seeley (JIRA)" <[EMAIL PROTECTED]> wrote:

> Yes, posting queries work because it's all form-data (query args).
> But, what if we want to post a complete file, *and* some extra info/parameters
> about how that file should be handled?

One approach is the Atom Publishing Protocol. That is pretty clear
about content and metainformation. It isn't designed to solve every
problem, but it handles a broad range of publishing, so it could be
a good fit for many uses of Solr.

APP is nearly finished. The latest draft is here (second URL also
has HTML versions).

 http://www.ietf.org/internet-drafts/draft-ietf-atompub-protocol-11.txt
 http://tools.ietf.org/wg/atompub/draft-ietf-atompub-protocol/

wunder
-- 
Walter Underwood
Search Guru, Netflix







Re: Adding Phonetic Search to Solr

2006-11-07 Thread Otis Gospodnetic
Grab the code from Lucene in Action, it's got something to get you going, see:

  http://www.lucenebook.com/search?query=metaphone

Otis

- Original Message 
From: Chris Hostetter <[EMAIL PROTECTED]>
To: solr-dev@lucene.apache.org
Sent: Tuesday, November 7, 2006 6:04:02 PM
Subject: Re: Adding Phonetic Search to Solr


: >> 1. Adding fuzzy to the DisMax specs.
: >
: > What do you envisage the implementation looking like?
:
: Probably continue with the template-like patterns already there.
:
:   title^2.0   (search title field with boost of 2.0)
:   title~  (search title field with fuzzy matching)

Interesting idea ... that's really a seperate idea from "Phonetic" search
right? ... fuzzy searching (ala: FuzzyQuery) is really a seperate beast
from phonetics (which i assume would best be implimented using your second
idea of a new TokenFilter)

I'm all in favor of both ideas ... but i wonder if adding fuzzy support to
dismax would be better done with a new param similar to "pf" (ie "ff" for
fuzzy fields and "fslop" for the min similaity)

: Still, it seems like others might want to use a phonetic token
: filter with the  specs. I'd be glad to contribute that,
: if others think it would be useful.

yeah .. you may want to just write a generic Lucene TokenFilter that does
the phonetics, submit it as a page to Lucene-Java and then it's triial to
whip up a TokenFilterFactory to use it in Solr.



-Hoss






Re: Adding Phonetic Search to Solr

2006-11-07 Thread Walter Underwood
On 11/7/06 3:26 PM, "Mike Klaas" <[EMAIL PROTECTED]> wrote:

> Is the state of the art in phonetic token generation reasonable?  I've
> been rather disappointed with some implementations (eg. SOUNDEX in
> MySQL, MSSQL).

SOUNDEX is excellent technology for its time, but its time was 1920.

Double Metaphone is far more complex and works fairly well. There is
an Apache commons codec implementation available. It is certainly
good enough for matching proper names, like Moody and Mudie or
Cathy and Kathie.

There are some commercial phonetic coders, but I don't have any
experience with those.

wunder
-- 
Walter Underwood
Search Guru, Netflix




Re: Re: Adding Phonetic Search to Solr

2006-11-07 Thread Mike Klaas

On 11/7/06, Walter Underwood <[EMAIL PROTECTED]> wrote:

On 11/7/06 2:30 PM, "Mike Klaas" <[EMAIL PROTECTED]> wrote:
> On 11/7/06, Walter Underwood <[EMAIL PROTECTED]> wrote:
>>
>> 1. Adding fuzzy to the DisMax specs.
>
> What do you envisage the implementation looking like?

Probably continue with the template-like patterns already there.

  title^2.0   (search title field with boost of 2.0)
  title~  (search title field with fuzzy matching)


Ah, I see what you mean.  Seems reasonable (as someone who is utter
unfamiliar with fuzzy queries in lucene).

<>

Ah, I missed the  example with a stock Lucene analyzer.
Oops. I still need to write an Analyzer, because there is no standard
phonetic search in Lucene today. There are some patches and addons
floating around.


Is the state of the art in phonetic token generation reasonable?  I've
been rather disappointed with some implementations (eg. SOUNDEX in
MySQL, MSSQL).

cheers,
-Mike


Re: Adding Phonetic Search to Solr

2006-11-07 Thread Chris Hostetter

: >> 1. Adding fuzzy to the DisMax specs.
: >
: > What do you envisage the implementation looking like?
:
: Probably continue with the template-like patterns already there.
:
:   title^2.0   (search title field with boost of 2.0)
:   title~  (search title field with fuzzy matching)

Interesting idea ... that's really a seperate idea from "Phonetic" search
right? ... fuzzy searching (ala: FuzzyQuery) is really a seperate beast
from phonetics (which i assume would best be implimented using your second
idea of a new TokenFilter)

I'm all in favor of both ideas ... but i wonder if adding fuzzy support to
dismax would be better done with a new param similar to "pf" (ie "ff" for
fuzzy fields and "fslop" for the min similaity)

: Still, it seems like others might want to use a phonetic token
: filter with the  specs. I'd be glad to contribute that,
: if others think it would be useful.

yeah .. you may want to just write a generic Lucene TokenFilter that does
the phonetics, submit it as a page to Lucene-Java and then it's triial to
whip up a TokenFilterFactory to use it in Solr.



-Hoss



[jira] Commented: (SOLR-66) bulk data loader

2006-11-07 Thread Fuad Efendi (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-66?page=comments#action_12447953 ] 

Fuad Efendi commented on SOLR-66:
-

even 3-column... 
BTW, good SOLR-targeted single-table database design:
,,
Yes, we can use even index-organized tables in Oracle, without repeated 
'parent' values!
And good standard for CNET customers sending them daily updated product info 
(is it really search engine?...)
Thanks


> bulk data loader
> 
>
> Key: SOLR-66
> URL: http://issues.apache.org/jira/browse/SOLR-66
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
> Assigned To: Yonik Seeley
>
> A way to efficiently load simple formatted text files, including CSV files.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (SOLR-68) Custom ClassLoader for "plugins"

2006-11-07 Thread Hoss Man (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-68?page=comments#action_12447950 ] 

Hoss Man commented on SOLR-68:
--


> and we don't need an explicit instance of a ClassLoader at all... just put 
> JARs in a classpath... and it works in 
> all containers, because we use 'parent' classloader with inherited 
> permissions instead of touching the 
> 'container-managed' Thread...

...it's not that easy.  If you need to load a class which refrences another 
class defined in the webapp itself, your approach breaks because the 
ClassLoader won't wlk back down the ClassLoader hierarchy to try and find it.

Ie: if you create a "CustomRequestHandler implements SolrRequestHandler" and 
put it in a JAR which you then put in the "shared" lib dir for Tomcat, you'll 
get a class not found error for something like SolrQueryRequest when 
CustomRequestHandler is loaded, because SolrQueryRequest is defined in the WAR 
and "shared" class loader higher doesn't have access to those classes...

http://tomcat.apache.org/tomcat-5.5-doc/class-loader-howto.html

...hence my idea to create a new ClassLoader instance which was a "child" of 
the class loader being used for the webapp itself, so when it delegates on 
classes it doesn't recognize, it delegates "up" to the solr.war class loader.


Regarding your previous comment about the Nutch/Eclipse plugin model ... i'm 
not sure if that configuration syntax really fits for us ... it assumes a 
specific set of extension points, such that a single plugin might map to 
multiple points, and (as far as i can tell) each extension point is either 
bound to a single plugin, or hte plugins "chain" themselves.

This doesn't really fit our use case, where you might want dozens of differnet 
analyzers used in differnet fields -- ideally someone should be able to pull 
any jar out of the lucene/contrib directory containing an analyzer or a 
Similarity class they want to use, drop that jar into a specified location, and 
put the name of the schema where they want to use it.

As for *how* Nutch does this ... they seem to be doing roughly the same thing 
as I'm trying for in this patch, except the more specific meaning of "plugin" 
allows them to have a seperate ClassLoader per plugin, that loads the exact 
jars enumerated in the plugins "manifest" (an xml config file; not a jar 
MANIFEST) .. but again i'm not sure if/how they deal with the possibility of a 
plugin loaded class using reflection to try and load another class later on.




> Custom ClassLoader for "plugins"
> 
>
> Key: SOLR-68
> URL: http://issues.apache.org/jira/browse/SOLR-68
> Project: Solr
>  Issue Type: New Feature
>Reporter: Hoss Man
> Attachments: classloader.patch
>
>
> After beating my head against my desk for a few hours yesterday trying to 
> document how to load custom plugins (ie: Analyzers, RequestHandlers, 
> Similarities, etc...) in the various Servlet Containers -- only to discover 
> that it is aparently impossible unless you use Resin, it occured to me in the 
> wee hours of last night that since the only time we ever need to load 
> "pluggable" classes is when explicitly lookup the class by name, we could 
> make out own ClassLoader and use it ... so i whiped together a little patch 
> to Config.java that would load JARs out of $solr.home}/lib and was seriously 
> suprised to discover that it seemed to work.
> In the clod light of day, I am again suprised that I still think this might 
> be a good idea, but i'm not very familiar with ClassLoader semantics, so i'm 
> not sure if what i've done is an abomination or not -- or if the idea is 
> sound, but the implimentation is crap.  
> I'm also not sure if it works in all cases: more testing of various 
> Containers would be good, as well as testing more complex sitautions (ie: 
> what if a class explicitly named as a plugin and loaded by this new 
> classloader then uses reflection to load another class from the same Jar 
> using Thread.currentThread().getContextClassLoader() ... will that fail?)
> So far I've quick and dirty testing with my apachecon JAR under 
> apache-tomcat-5.5.17, the jetty start.jar we use for the example, 
> resin-3.0.21 and jettyplus-5.1.11-- all of which seemed to work fine except 
> for jettyplus-5.1.11 -- but that may have been because of some other 
> configuration problem I had.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Adding Phonetic Search to Solr

2006-11-07 Thread Walter Underwood
On 11/7/06 2:30 PM, "Mike Klaas" <[EMAIL PROTECTED]> wrote:
> On 11/7/06, Walter Underwood <[EMAIL PROTECTED]> wrote:
>> 
>> 1. Adding fuzzy to the DisMax specs.
> 
> What do you envisage the implementation looking like?

Probably continue with the template-like patterns already there.

  title^2.0   (search title field with boost of 2.0)
  title~  (search title field with fuzzy matching)

>> 2. Adding a phonetic token filter and relying on the per-field analyzer
>> support.
> 
> I'm not sure why any modification to solr would be necessary.  You
> could add a field with a phonetic analyzer and use copyField to copy
> your search fields to it.  Search will use the modified analyzer
> automatically.

Ah, I missed the  example with a stock Lucene analyzer.
Oops. I still need to write an Analyzer, because there is no standard
phonetic search in Lucene today. There are some patches and addons
floating around.

Still, it seems like others might want to use a phonetic token
filter with the  specs. I'd be glad to contribute that,
if others think it would be useful.

wunder
-- 
Walter Underwood
Search Guru, Netflix




[jira] Commented: (SOLR-68) Custom ClassLoader for "plugins"

2006-11-07 Thread Fuad Efendi (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-68?page=comments#action_12447947 ] 

Fuad Efendi commented on SOLR-68:
-

why do you need URLClassLoader? 

1) public class URLClassLoader extends SecureClassLoader
2) "The classes that are loaded are by default granted permission only to 
access the URLs specified when the URLClassLoader was created."

we need to create an instance defined in XML... just call 
Class.forName("org.zzz.Foo").newInstance()... it will throw an exception 
ClassNotFoundException if a class can't be found in a path, or 
NoClassDefFoundError in case of unresolved nested dependencies (am I right?)...

> Custom ClassLoader for "plugins"
> 
>
> Key: SOLR-68
> URL: http://issues.apache.org/jira/browse/SOLR-68
> Project: Solr
>  Issue Type: New Feature
>Reporter: Hoss Man
> Attachments: classloader.patch
>
>
> After beating my head against my desk for a few hours yesterday trying to 
> document how to load custom plugins (ie: Analyzers, RequestHandlers, 
> Similarities, etc...) in the various Servlet Containers -- only to discover 
> that it is aparently impossible unless you use Resin, it occured to me in the 
> wee hours of last night that since the only time we ever need to load 
> "pluggable" classes is when explicitly lookup the class by name, we could 
> make out own ClassLoader and use it ... so i whiped together a little patch 
> to Config.java that would load JARs out of $solr.home}/lib and was seriously 
> suprised to discover that it seemed to work.
> In the clod light of day, I am again suprised that I still think this might 
> be a good idea, but i'm not very familiar with ClassLoader semantics, so i'm 
> not sure if what i've done is an abomination or not -- or if the idea is 
> sound, but the implimentation is crap.  
> I'm also not sure if it works in all cases: more testing of various 
> Containers would be good, as well as testing more complex sitautions (ie: 
> what if a class explicitly named as a plugin and loaded by this new 
> classloader then uses reflection to load another class from the same Jar 
> using Thread.currentThread().getContextClassLoader() ... will that fail?)
> So far I've quick and dirty testing with my apachecon JAR under 
> apache-tomcat-5.5.17, the jetty start.jar we use for the example, 
> resin-3.0.21 and jettyplus-5.1.11-- all of which seemed to work fine except 
> for jettyplus-5.1.11 -- but that may have been because of some other 
> configuration problem I had.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Adding Phonetic Search to Solr

2006-11-07 Thread Yonik Seeley

On 11/7/06, Mike Klaas <[EMAIL PROTECTED]> wrote:

> 2. Adding a phonetic token filter and relying on the per-field analyzer
> support.
>
> Option 2 seems like it would be a lot faster in production, and
> probably easier to implement. Does that seem right?

I'm not sure why any modification to solr would be necessary.  You
could add a field with a phonetic analyzer and use copyField to copy
your search fields to it.  Search will use the modified analyzer
automatically.


I assumed he meant the implementation of the filter/analyzer itself
(unless there is already a Lucene filter we can use).  Unless there is
a lucene analyzer (which can be used directly), a FilterFactory for
the filter will have to be added, but that's easy.


> How do I specify the new token filter factory in the schema file?
> I don't quite get the mapping from solr.FooFilterFactory to
> org.apache.solr.analysis.FooFilterFactory.


When Solr was an internal CNET project, the package name was just solr
(or actually, just solar).  I kept the short version as a form of
backward compatibility, and I sort of liked it because it made the
schema less verbose.

Anyway there is a set of standard  packages that will be checked if
the class name isn't found as-is.  I thought it was documented but I
can't seem to find it...


-Yonik


Re: Adding Phonetic Search to Solr

2006-11-07 Thread Chris Hostetter

: 2. Adding a phonetic token filter and relying on the per-field analyzer
: support.
:
: Option 2 seems like it would be a lot faster in production, and
: probably easier to implement. Does that seem right?

yep, just write your Analyzer (or TokenFilter) and drop it in.

: How do I specify the new token filter factory in the schema file?
: I don't quite get the mapping from solr.FooFilterFactory to
: org.apache.solr.analysis.FooFilterFactory.

Classes *should* be specified using the fully qualified package, but for
convincince, if Solr sees a package name of just "solr" it checks a few
known packages based on context (ie: org.apache.solr.analysis,
o.a.s.schema, etc...)

As discussed in SOLR-68, getting the classloader to find your custom
analyzer is currently a pain in the ass .. the easiest thing to do that
should work in any servlet container is add it directly to the solr.war
... but i'm hopeingto make that unneccessary soon.



-Hoss



Re: Adding Phonetic Search to Solr

2006-11-07 Thread Mike Klaas

On 11/7/06, Walter Underwood <[EMAIL PROTECTED]> wrote:

I haven't found fuzzy or phonetic search in Solr, and I have a couple
of approaches I might try:

1. Adding fuzzy to the DisMax specs.


What do you envisage the implementation looking like?


2. Adding a phonetic token filter and relying on the per-field analyzer
support.

Option 2 seems like it would be a lot faster in production, and
probably easier to implement. Does that seem right?


I'm not sure why any modification to solr would be necessary.  You
could add a field with a phonetic analyzer and use copyField to copy
your search fields to it.  Search will use the modified analyzer
automatically.

-Mike


[jira] Commented: (SOLR-68) Custom ClassLoader for "plugins"

2006-11-07 Thread Fuad Efendi (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-68?page=comments#action_12447942 ] 

Fuad Efendi commented on SOLR-68:
-

>...is when explicitly lookup the class by name, we could make out own 
>ClassLoader and use it...

The simplest way is to create a Class object is:
  Class fooClass = Class.forName("Foo")
which is equivalent to: 
  Class.forName("Foo", true, this.getClass().getClassLoader())
 
Then,
   Foo fooObject = fooClass.newInstance()

and we don't need an explicit instance of a ClassLoader at all... just put JARs 
in a classpath... and it works in all containers, because we use 'parent' 
classloader with inherited permissions instead of touching the 
'container-managed' Thread...

Sorry, still trying to understand why we would need that method:
   public static Class findClass(String cname, String... subpackages)
even here, we can loop in [subpackage+"."+cname] using Class.forName()...

> Custom ClassLoader for "plugins"
> 
>
> Key: SOLR-68
> URL: http://issues.apache.org/jira/browse/SOLR-68
> Project: Solr
>  Issue Type: New Feature
>Reporter: Hoss Man
> Attachments: classloader.patch
>
>
> After beating my head against my desk for a few hours yesterday trying to 
> document how to load custom plugins (ie: Analyzers, RequestHandlers, 
> Similarities, etc...) in the various Servlet Containers -- only to discover 
> that it is aparently impossible unless you use Resin, it occured to me in the 
> wee hours of last night that since the only time we ever need to load 
> "pluggable" classes is when explicitly lookup the class by name, we could 
> make out own ClassLoader and use it ... so i whiped together a little patch 
> to Config.java that would load JARs out of $solr.home}/lib and was seriously 
> suprised to discover that it seemed to work.
> In the clod light of day, I am again suprised that I still think this might 
> be a good idea, but i'm not very familiar with ClassLoader semantics, so i'm 
> not sure if what i've done is an abomination or not -- or if the idea is 
> sound, but the implimentation is crap.  
> I'm also not sure if it works in all cases: more testing of various 
> Containers would be good, as well as testing more complex sitautions (ie: 
> what if a class explicitly named as a plugin and loaded by this new 
> classloader then uses reflection to load another class from the same Jar 
> using Thread.currentThread().getContextClassLoader() ... will that fail?)
> So far I've quick and dirty testing with my apachecon JAR under 
> apache-tomcat-5.5.17, the jetty start.jar we use for the example, 
> resin-3.0.21 and jettyplus-5.1.11-- all of which seemed to work fine except 
> for jettyplus-5.1.11 -- but that may have been because of some other 
> configuration problem I had.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Adding Phonetic Search to Solr

2006-11-07 Thread Walter Underwood
I haven't found fuzzy or phonetic search in Solr, and I have a couple
of approaches I might try:

1. Adding fuzzy to the DisMax specs.

2. Adding a phonetic token filter and relying on the per-field analyzer
support.

Option 2 seems like it would be a lot faster in production, and
probably easier to implement. Does that seem right?

How do I specify the new token filter factory in the schema file?
I don't quite get the mapping from solr.FooFilterFactory to
org.apache.solr.analysis.FooFilterFactory.

wunder
-- 
Walter Underwood
Search Guru, Netflix




[jira] Commented: (SOLR-68) Custom ClassLoader for "plugins"

2006-11-07 Thread Fuad Efendi (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-68?page=comments#action_12447932 ] 

Fuad Efendi commented on SOLR-68:
-

It was done in Eclipse, for instance. Nutch project also has huge 
'plugins'-supporting codebase which are automatically loaded and 'wired' 
together without explicit XML definitions, they did it Eclipse-way...

I mean if you use explicit XML like 

- you don't need to design smth like Eclipse or Nutch... A lot of headache just 
to replace strings like "org.apache.nutch.parse.html.HtmlParser" by 
  plugin.includes
  
protocol-http|urlfilter-regex|parse-(text|html|js)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic

Here, "parse-(text|html|js)" will look for a jar file with ID "parse-html" in 
plugin.xml file (inside the jar file):

...

The main reason for such design: to avoid explicit names 
"org.apache.nutch.parse.html.HtmlParser" in main configuration XML, to allow 
third-parties to develop compatible plugins (which could have very different 
complicated extension points and implementations with specific settings, not 
defined in main Solr schema.xml file)...

I don't think it would work in WebLogic without some explicit settings; and JEE 
does not easily allow direct access to a file system, especially to load a 
class 'from a different context'  (security considerations)...


> Custom ClassLoader for "plugins"
> 
>
> Key: SOLR-68
> URL: http://issues.apache.org/jira/browse/SOLR-68
> Project: Solr
>  Issue Type: New Feature
>Reporter: Hoss Man
> Attachments: classloader.patch
>
>
> After beating my head against my desk for a few hours yesterday trying to 
> document how to load custom plugins (ie: Analyzers, RequestHandlers, 
> Similarities, etc...) in the various Servlet Containers -- only to discover 
> that it is aparently impossible unless you use Resin, it occured to me in the 
> wee hours of last night that since the only time we ever need to load 
> "pluggable" classes is when explicitly lookup the class by name, we could 
> make out own ClassLoader and use it ... so i whiped together a little patch 
> to Config.java that would load JARs out of $solr.home}/lib and was seriously 
> suprised to discover that it seemed to work.
> In the clod light of day, I am again suprised that I still think this might 
> be a good idea, but i'm not very familiar with ClassLoader semantics, so i'm 
> not sure if what i've done is an abomination or not -- or if the idea is 
> sound, but the implimentation is crap.  
> I'm also not sure if it works in all cases: more testing of various 
> Containers would be good, as well as testing more complex sitautions (ie: 
> what if a class explicitly named as a plugin and loaded by this new 
> classloader then uses reflection to load another class from the same Jar 
> using Thread.currentThread().getContextClassLoader() ... will that fail?)
> So far I've quick and dirty testing with my apachecon JAR under 
> apache-tomcat-5.5.17, the jetty start.jar we use for the example, 
> resin-3.0.21 and jettyplus-5.1.11-- all of which seemed to work fine except 
> for jettyplus-5.1.11 -- but that may have been because of some other 
> configuration problem I had.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: changes before release?

2006-11-07 Thread Yonik Seeley

Unless there are objections, I'll start of by switching to the new
license header and then try and search for any files that may be
missing it.

-Yonik


[jira] Commented: (SOLR-66) bulk data loader

2006-11-07 Thread Fuad Efendi (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-66?page=comments#action_12447924 ] 

Fuad Efendi commented on SOLR-66:
-

This is probably SOLR-specific (the best? be focused on task?)...

We stick on 4-column format for everything (in case of surrogate PK we may have 
[id,"001,003,abxc"]):

id,9885A004,name,Canon PowerShot SD500
id,9885A004,manu,Canon Inc.
id,9885A004,cat,electronics 
id,9885A004,cat,camera 
id,9885A004,features,"3x zoop, 7.1 megapixel Digital ELPH"
id,9885A004,features,movie clips up to 640x480 @30 fps
id,9885A004,features,"2.0"" TFT LCD, 118,000 pixels" 
id,9885A004,features,"built in flash, red-eye reduction"
id,9885A004,includes,"32MB SD card, USB cable, AV cable, battery"
id,9885A004,weight,6.4
id,9885A004,price,329.95
id,9885A004,popularity,7 
id,9885A004,inStock,true


> bulk data loader
> 
>
> Key: SOLR-66
> URL: http://issues.apache.org/jira/browse/SOLR-66
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
> Assigned To: Yonik Seeley
>
> A way to efficiently load simple formatted text files, including CSV files.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (SOLR-66) bulk data loader

2006-11-07 Thread Fuad Efendi (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-66?page=comments#action_12447923 ] 

Fuad Efendi commented on SOLR-66:
-

Another sample...


  9885A004
  Canon PowerShot SD500
  Canon Inc.
  electronics
  camera
  3x zoop, 7.1 megapixel Digital ELPH
  movie clips up to 640x480 @30 fps
  2.0" TFT LCD, 118,000 pixels
  built in flash, red-eye reduction
  32MB SD card, USB cable, AV cable, battery
  6.4
  329.95
  7
  true



Carthesian Product of Facets (can I cay that?):

9885A004, Canon PowerShot SD500, Canon Inc., electronics, 3x zoop, 7.1 
megapixel Digital ELPH, "32MB SD card, USB cable, AV cable, battery", 6.4, 
329.95, 7, true
9885A004, Canon PowerShot SD500, Canon Inc., electronics, 3x zoop, movie clips 
up to 640x480 @30 fps, "32MB SD card, USB cable, AV cable, battery", 6.4, 
329.95, 7, true
9885A004, Canon PowerShot SD500, Canon Inc., electronics, 3x zoop, 2.0" TFT 
LCD, 118,000 pixels, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 
7, true
9885A004, Canon PowerShot SD500, Canon Inc., electronics, 3x zoop, built in 
flash, red-eye reduction, "32MB SD card, USB cable, AV cable, battery", 6.4, 
329.95, 7, true
9885A004, Canon PowerShot SD500, Canon Inc., camera, 3x zoop, 7.1 megapixel 
Digital ELPH, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7, true
9885A004, Canon PowerShot SD500, Canon Inc., camera, 3x zoop, movie clips up to 
640x480 @30 fps, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7, 
true
9885A004, Canon PowerShot SD500, Canon Inc., camera, 3x zoop, 2.0" TFT LCD, 
118,000 pixels, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7, 
true
9885A004, Canon PowerShot SD500, Canon Inc., camera, 3x zoop, built in flash, 
red-eye reduction, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 
7, true


Optimized CSV (just for improved network traffic for areas without DSL!!!):
9885A004, Canon PowerShot SD500, Canon Inc., electronics, "3x zoop, 7.1 
megapixel Digital ELPH", "32MB SD card, USB cable, AV cable, battery", 6.4, 
329.95, 7 true
9885A004, , , , "movie clips up to 640x480 @30 fps", , , , ,
9885A004, , , , "3x zoop, 2.0" TFT LCD, 118,000 pixels", , , , ,
9885A004, , , , "built in flash, red-eye reduction", , , , ,
9885A004, , ,  camera, "3x zoop, 7.1 megapixel Digital ELPH", 
9885A004, , , , "movie clips up to 640x480 @30 fps", , , , ,
9885A004, , , , "3x zoop, 2.0" TFT LCD, 118,000 pixels", , , , ,
9885A004, , , , "built in flash, red-eye reduction", , , , ,

almost EDI... XML looks much better... May be specific GZIP version for a 
standard "Carthesian" CSV?



> bulk data loader
> 
>
> Key: SOLR-66
> URL: http://issues.apache.org/jira/browse/SOLR-66
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
> Assigned To: Yonik Seeley
>
> A way to efficiently load simple formatted text files, including CSV files.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (SOLR-66) bulk data loader

2006-11-07 Thread Fuad Efendi (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-66?page=comments#action_12447919 ] 

Fuad Efendi commented on SOLR-66:
-

mistake... (Paper, instead of IBM):

001, Paper, 001, 17R7021, 14 7/8 X 8 1/2" - 1/2" Greenbar 
001, Paper, 002, 17R8018, 8 1/2 x 11" Micro Perf @ 3 2/3" 
...

optimized:
001, Paper, 001, 17R7021, 14 7/8 X 8 1/2" - 1/2" Greenbar 
001, , 002, 17R8018, 8 1/2 x 11" Micro Perf @ 3 2/3" 
...


> bulk data loader
> 
>
> Key: SOLR-66
> URL: http://issues.apache.org/jira/browse/SOLR-66
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
> Assigned To: Yonik Seeley
>
> A way to efficiently load simple formatted text files, including CSV files.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (SOLR-66) bulk data loader

2006-11-07 Thread Fuad Efendi (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-66?page=comments#action_12447918 ] 

Fuad Efendi commented on SOLR-66:
-

CSV:
- should we support standard CSVs generated by Excel, Oracle DataPump, etc?

XML: we currently preprocess some data to create XML, then we post it to SOLR.

Can we preprocess standard CSV? For instance, we have two tables: CATEGORY 
(parent), PRODUCT (child)
CSV produced by Oracle might seem like

001,IBM,001,17R7021,14 7/8 X 8 1/2" - 1/2" Greenbar
001,IBM,002,17R8018,8 1/2 x 11" Micro Perf @ 3 2/3"

Here, [001,Paper] is a single record from CATEGORY table, and rest is PK, SKU, 
NAME from PRODUCT table.

1. Use 'extended' CSV such as
001,Paper,multi-value:"001,17R7021,14 7/8 X 8 1/2"" - 1/2"" 
Greenbar002,17R8018,8 1/2 x 11"" Micro Perf @ 3 2/3"""
(multi-value:",...")
- very difficult... and not compatible with exported data...

2. Standard CSV with fixed width + preprocessing (sorting, and removing 
repeated values)

001,Paper,001,17R7021,14 7/8 X 8 1/2" - 1/2" Greenbar
001,,002,17R8018,8 1/2 x 11" Micro Perf @ 3 2/3"


We removed repeated value 'Paper', but we left Primary Key of this Category 
intact... It should work with both, standard 'large' CSV and preprocessed 
one... And, we don't have huge single line in case of IBM producing different 
kinds of paper...; we have multi-line with fixed width... First column 
(repeated 001 value) is primary key, same as  001


> bulk data loader
> 
>
> Key: SOLR-66
> URL: http://issues.apache.org/jira/browse/SOLR-66
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
> Assigned To: Yonik Seeley
>
> A way to efficiently load simple formatted text files, including CSV files.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [jira] Created: (SOLR-68) Custom ClassLoader for "plugins"

2006-11-07 Thread Chris Hostetter
: Probably. Unless you previously set your custom loader to be the
: thread's context class loader. If you do that, you'd have to ensure
: that your loader is aware of the context classpath (which could be
: accomplished by making it a child of the old context class loader).
: Otherwise, you might break other code which also uses this loader.

I'm already making sure the new loader uses
Thread.currentThread().getContextClassLoader() as it's parent ... but it
never occured to me that we could call Thread.setContextClassLoader ...
even then it gets confusing, because there will probably be a lot more
threads using the Class then the one that loaded the configs (and the
plugin classes) ... but i suppose we could call
Thread.currentThread().setContextClassLoader in SolrCore.execute ... or
even SolrCore.getSolrCore ...

...but again, i'm not sure if that would be an abomination or not.



-Hoss



[jira] Commented: (SOLR-66) bulk data loader

2006-11-07 Thread Fuad Efendi (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-66?page=comments#action_12447911 ] 

Fuad Efendi commented on SOLR-66:
-

Sorry for not correctly understanding the multipart HTTP POST / File Upload 
issue, it's not easy, I just browsed sources of 
org.springframework.web.multipart.support (although it's very easy with 
Spring...)


> bulk data loader
> 
>
> Key: SOLR-66
> URL: http://issues.apache.org/jira/browse/SOLR-66
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
> Assigned To: Yonik Seeley
>
> A way to efficiently load simple formatted text files, including CSV files.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [jira] Created: (SOLR-67) query interface with faceted browsing, highligting

2006-11-07 Thread Chris Hostetter

: It would be nice to a nice HTML page allowing the user to query and display
: 1) some faceting info such as hit counts with links that allowed the user to 
narrow their search results.
: 2) highlighted summaries
: 3) easy way to query the dismax handler as well as the standard request 
handler
: Most likely this would be built into the admin pages (and have access to all 
the field info).

as a shorter term solution to a progromaticly built page that was aware of
the schema and the registered handelrs, we could just enhance form.jsp
to include the various facet and dismax params as flat HTML, and to use
the XSLTResponseWriter to generate results with hyperlinks for
filtering/expanding the results.

we might even want to solicit volunteers from solr-users, as there seem to
be quite a few users who aren't familiar with java but would like to help
contribute to the project ... but i don't know if it's bad form to go
asking for volunteers on the user list.


-Hoss



[jira] Commented: (SOLR-67) query interface with faceted browsing, highligting

2006-11-07 Thread Hoss Man (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-67?page=comments#action_12447907 ] 

Hoss Man commented on SOLR-67:
--

See also: http://wiki.apache.org/solr/MakeSolrMoreSelfService for an extensive 
listof thoughts on making the admin screens more driven by the 
schema/solrconfig.

> query interface with faceted browsing, highligting
> --
>
> Key: SOLR-67
> URL: http://issues.apache.org/jira/browse/SOLR-67
> Project: Solr
>  Issue Type: Wish
>  Components: web gui
>Reporter: Yonik Seeley
>
> It would be nice to a nice HTML page allowing the user to query and display
> 1) some faceting info such as hit counts with links that allowed the user to 
> narrow their search results.
> 2) highlighted summaries
> 3) easy way to query the dismax handler as well as the standard request 
> handler
> Most likely this would be built into the admin pages (and have access to all 
> the field info).
> It would also seem useful to have an "advanced query page"... something like
> http://www.nabble.com/forum/AdvSearch.jtp or 
> http://www.google.com/advanced_search?hl=en
> that would allow one to easily customize and drop into their site.  This 
> might be best as contrib module run outside of Solr (a JSP, etc?)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [jira] Created: (SOLR-68) Custom ClassLoader for "plugins"

2006-11-07 Thread Mike Baranczak

On Nov 7, 2006, at 2:10 PM, Hoss Man (JIRA) wrote:


I'm also not sure if it works in all cases: more testing of various  
Containers would be good, as well as testing more complex  
sitautions (ie: what if a class explicitly named as a plugin and  
loaded by this new classloader then uses reflection to load another  
class from the same Jar using Thread.currentThread 
().getContextClassLoader() ... will that fail?



Probably. Unless you previously set your custom loader to be the  
thread's context class loader. If you do that, you'd have to ensure  
that your loader is aware of the context classpath (which could be  
accomplished by making it a child of the old context class loader).  
Otherwise, you might break other code which also uses this loader.


-MB



Re: [jira] Commented: (SOLR-66) bulk data loader

2006-11-07 Thread Walter Underwood
On 11/7/06 11:22 AM, "Yonik Seeley (JIRA)" <[EMAIL PROTECTED]> wrote:

> Yes, posting queries work because it's all form-data (query args).
> But, what if we want to post a complete file, *and* some extra info/parameters
> about how that file should be handled?

One approach is the Atom Publishing Protocol. That is pretty clear
about content and metainformation. It isn't designed to solve every
problem, but it handles a broad range of publishing, so it could be
a good fit for many uses of Solr.

APP is nearly finished. The latest draft is here (second URL also
has HTML versions).

 http://www.ietf.org/internet-drafts/draft-ietf-atompub-protocol-11.txt
 http://tools.ietf.org/wg/atompub/draft-ietf-atompub-protocol/

wunder
-- 
Walter Underwood
Search Guru, Netflix




[jira] Commented: (SOLR-66) bulk data loader

2006-11-07 Thread Yonik Seeley (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-66?page=comments#action_12447901 ] 

Yonik Seeley commented on SOLR-66:
--

> How to encode 'comma'?

For standard CSV, ytou could quote the entire field value...   "a,b"
I don't know if Commons CSV supports backslash escaping or not, but that would 
be another way.

> How to encode UTF-8?

Two ways... the user can define a charset for the file (and the file could 
actually be UTF-8),
and we can support unicode escapes \u1234

> Should we use Base64 and encode raw values?

I hadn't thought about binary fields (they aren't even supported in the XML 
update yet).
Doing Base64 would seem relatively easy though.


> bulk data loader
> 
>
> Key: SOLR-66
> URL: http://issues.apache.org/jira/browse/SOLR-66
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
> Assigned To: Yonik Seeley
>
> A way to efficiently load simple formatted text files, including CSV files.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (SOLR-66) bulk data loader

2006-11-07 Thread Hoss Man (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-66?page=comments#action_12447900 ] 

Hoss Man commented on SOLR-66:
--

Fuad: the issue isn't really wether POSTed *queries* work ... those have been 
tested and are known to work ... it's more a question of POSTed *updates* ... 
the current update mechanism does not use "application/x-www-form-urlencoded" 
instead the raw POST body is read as an XML message containing docs to index.

This issue is attempting to address a more convininet method to bulk import 
records, possibly using CSV, and probably using a local file -- but we'd want 
to support a POSTed file as well, so there was some discussion (on list) of how 
to POST both a file nd send query params (using either 
"application/x-www-form-urlencoded" or the mechanism we currently use)

> bulk data loader
> 
>
> Key: SOLR-66
> URL: http://issues.apache.org/jira/browse/SOLR-66
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
> Assigned To: Yonik Seeley
>
> A way to efficiently load simple formatted text files, including CSV files.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (SOLR-66) bulk data loader

2006-11-07 Thread Yonik Seeley (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-66?page=comments#action_12447898 ] 

Yonik Seeley commented on SOLR-66:
--

> Existing SOLR should work with POST HTML forms without any change in Java...

Yes, posting queries work because it's all form-data (query args).
But, what if we want to post a complete file, *and* some extra info/parameters 
about how that file should be handled?


> bulk data loader
> 
>
> Key: SOLR-66
> URL: http://issues.apache.org/jira/browse/SOLR-66
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
> Assigned To: Yonik Seeley
>
> A way to efficiently load simple formatted text files, including CSV files.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: incubator report due

2006-11-07 Thread Chris Hostetter

: http://wiki.apache.org/incubator/November2006

+1



-Hoss



[jira] Commented: (SOLR-66) bulk data loader

2006-11-07 Thread Fuad Efendi (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-66?page=comments#action_12447894 ] 

Fuad Efendi commented on SOLR-66:
-

/sorry for not having access to E-mail and using POST temporarily.../

HTTP-POST: should work without any code changes.

In /resources/admin/index.jsp,  Simply replace GET to POST, and everything should work ...

You have following in org.apache.solr.servlet.SolrServlet:
  public void doPost(HttpServletRequest request, HttpServletResponse response) 
throws ServletException, IOException {
doGet(request,response);
  }

And, you are using standard Servlet API to retrieve ServletRequest parameters,
http://java.sun.com/j2ee/1.4/docs/api/javax/servlet/ServletRequest.html#getParameterMap()
 

public class ServletSolrParams extends MultiMapSolrParams {
  public ServletSolrParams(ServletRequest req) {
super(req.getParameterMap());
  }

Existing SOLR should work with POST HTML forms without any change in Java...

> bulk data loader
> 
>
> Key: SOLR-66
> URL: http://issues.apache.org/jira/browse/SOLR-66
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
> Assigned To: Yonik Seeley
>
> A way to efficiently load simple formatted text files, including CSV files.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (SOLR-68) Custom ClassLoader for "plugins"

2006-11-07 Thread Hoss Man (JIRA)
 [ http://issues.apache.org/jira/browse/SOLR-68?page=all ]

Hoss Man updated SOLR-68:
-

Attachment: classloader.patch

patch that creates a new classloader containing any JARs found in 
${solr.home}/lib and uses that class loader anytime class names read from a 
config file.

> Custom ClassLoader for "plugins"
> 
>
> Key: SOLR-68
> URL: http://issues.apache.org/jira/browse/SOLR-68
> Project: Solr
>  Issue Type: New Feature
>Reporter: Hoss Man
> Attachments: classloader.patch
>
>
> After beating my head against my desk for a few hours yesterday trying to 
> document how to load custom plugins (ie: Analyzers, RequestHandlers, 
> Similarities, etc...) in the various Servlet Containers -- only to discover 
> that it is aparently impossible unless you use Resin, it occured to me in the 
> wee hours of last night that since the only time we ever need to load 
> "pluggable" classes is when explicitly lookup the class by name, we could 
> make out own ClassLoader and use it ... so i whiped together a little patch 
> to Config.java that would load JARs out of $solr.home}/lib and was seriously 
> suprised to discover that it seemed to work.
> In the clod light of day, I am again suprised that I still think this might 
> be a good idea, but i'm not very familiar with ClassLoader semantics, so i'm 
> not sure if what i've done is an abomination or not -- or if the idea is 
> sound, but the implimentation is crap.  
> I'm also not sure if it works in all cases: more testing of various 
> Containers would be good, as well as testing more complex sitautions (ie: 
> what if a class explicitly named as a plugin and loaded by this new 
> classloader then uses reflection to load another class from the same Jar 
> using Thread.currentThread().getContextClassLoader() ... will that fail?)
> So far I've quick and dirty testing with my apachecon JAR under 
> apache-tomcat-5.5.17, the jetty start.jar we use for the example, 
> resin-3.0.21 and jettyplus-5.1.11-- all of which seemed to work fine except 
> for jettyplus-5.1.11 -- but that may have been because of some other 
> configuration problem I had.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (SOLR-68) Custom ClassLoader for "plugins"

2006-11-07 Thread Hoss Man (JIRA)
Custom ClassLoader for "plugins"


 Key: SOLR-68
 URL: http://issues.apache.org/jira/browse/SOLR-68
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man


After beating my head against my desk for a few hours yesterday trying to 
document how to load custom plugins (ie: Analyzers, RequestHandlers, 
Similarities, etc...) in the various Servlet Containers -- only to discover 
that it is aparently impossible unless you use Resin, it occured to me in the 
wee hours of last night that since the only time we ever need to load 
"pluggable" classes is when explicitly lookup the class by name, we could make 
out own ClassLoader and use it ... so i whiped together a little patch to 
Config.java that would load JARs out of $solr.home}/lib and was seriously 
suprised to discover that it seemed to work.

In the clod light of day, I am again suprised that I still think this might be 
a good idea, but i'm not very familiar with ClassLoader semantics, so i'm not 
sure if what i've done is an abomination or not -- or if the idea is sound, but 
the implimentation is crap.  

I'm also not sure if it works in all cases: more testing of various Containers 
would be good, as well as testing more complex sitautions (ie: what if a class 
explicitly named as a plugin and loaded by this new classloader then uses 
reflection to load another class from the same Jar using 
Thread.currentThread().getContextClassLoader() ... will that fail?)


So far I've quick and dirty testing with my apachecon JAR under 
apache-tomcat-5.5.17, the jetty start.jar we use for the example, resin-3.0.21 
and jettyplus-5.1.11-- all of which seemed to work fine except for 
jettyplus-5.1.11 -- but that may have been because of some other configuration 
problem I had.



-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (SOLR-67) query interface with faceted browsing, highligting

2006-11-07 Thread Yonik Seeley (JIRA)
query interface with faceted browsing, highligting
--

 Key: SOLR-67
 URL: http://issues.apache.org/jira/browse/SOLR-67
 Project: Solr
  Issue Type: Wish
  Components: web gui
Reporter: Yonik Seeley


It would be nice to a nice HTML page allowing the user to query and display
1) some faceting info such as hit counts with links that allowed the user to 
narrow their search results.
2) highlighted summaries
3) easy way to query the dismax handler as well as the standard request handler
Most likely this would be built into the admin pages (and have access to all 
the field info).

It would also seem useful to have an "advanced query page"... something like
http://www.nabble.com/forum/AdvSearch.jtp or 
http://www.google.com/advanced_search?hl=en
that would allow one to easily customize and drop into their site.  This might 
be best as contrib module run outside of Solr (a JSP, etc?)

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: incubator report due

2006-11-07 Thread Yoav Shapira

Looks good to me.  +1

Yoav

On 11/7/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:

Report to the Incubator is due by this Friday.
I've put together a preliminary report already.

http://wiki.apache.org/incubator/November2006

-Yonik



Re: [jira] Commented: (SOLR-66) bulk data loader

2006-11-07 Thread Chris Hostetter

: Any ideas on what the interface should look like?  
/solr/upload/xml?xsl=foo.xsl
:
: Seems to run into some of the same questions... how should we allow
: POST and specify params about the post at the same time?

alternately, the API could just require that the XML file be POSTed, and
we could extract the stylesheet URL from the  processing
directive, and load it from a remote URL.

: Can we do something more efficient than create XML just to parse it
: again... like use an API or an intermediate format?

As I understand the javax.xml.transform.* packages, both teh Transform
source nad result can either be DOM structures, IO Streams that XML are
read-from/written-to, or SAX Handler hooks.

So we could transform from an arbitrary File to the "standard" DOM
structure that we walk, or to a set of SAX Handlers ... but either way we
would be duplicating code that is already implimented in XPP.




-Hoss



incubator report due

2006-11-07 Thread Yonik Seeley

Report to the Incubator is due by this Friday.
I've put together a preliminary report already.

http://wiki.apache.org/incubator/November2006

-Yonik


[jira] Commented: (SOLR-66) bulk data loader

2006-11-07 Thread Fuad Efendi (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-66?page=comments#action_12447877 ] 

Fuad Efendi commented on SOLR-66:
-

Encoding:
How to encode 'comma'?
How to encode UTF-8?
Should we use Base64 and encode raw values?

http://rfc.net/rfc4180.html:
"Common usage of CSV is US-ASCII, but other character sets defined by IANA for 
the "text" tree may be used in conjunction with the  "charset" parameter.

http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm
http://www.edoceo.com/utilis/csv-file-format.php
http://www.ricebridge.com/products/csvman/reference.htm

This is interesting (from last link):
FIELD:[trim]? ( UNQUOTED | QUOTED ) [trim]? 
UNQUOTED: ( [data]* | ESCAPE )*;
QUOTED:   [quote] ( DOUBLE | ESCAPE | [data]* )* [quote]




> bulk data loader
> 
>
> Key: SOLR-66
> URL: http://issues.apache.org/jira/browse/SOLR-66
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
> Assigned To: Yonik Seeley
>
> A way to efficiently load simple formatted text files, including CSV files.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




acts-as-solr (Ruby on Rails)

2006-11-07 Thread Yonik Seeley

Erik, it looks as if someone ran with your acts_as_solr implementation
http://acts-as-solr.rubyforge.org/

Is it a proper superset of what you had, and should it be considered
the latest definitive version of acts_as_solr?  We should probably add
a link from http://wiki.apache.org/solr/SolRuby

-Yonik


[jira] Resolved: (SOLR-62) scripts don't check return code

2006-11-07 Thread Bill Au (JIRA)
 [ http://issues.apache.org/jira/browse/SOLR-62?page=all ]

Bill Au resolved SOLR-62.
-

Resolution: Fixed

patch committed.

> scripts don't check return code
> ---
>
> Key: SOLR-62
> URL: http://issues.apache.org/jira/browse/SOLR-62
> Project: Solr
>  Issue Type: Bug
>  Components: replication
>Reporter: Yonik Seeley
>Priority: Minor
> Attachments: solr-scripts-solr-62.patch
>
>
> Solr scripts that post commands to solr don't check the return code.
> The scripts (like optimize) currently follow this pattern:
> rs=`curl http://localhost:5051/update -s -d ""`
> if [[ $? != 0 ]]
> then
>   [...]
> fi
> # check status of optimize request
> rc=`echo $rs|cut -f2 -d'"'`
> if [[ $? != 0 ]]
> then
>   [...]
> $rc is never checked.  In addition, the line that grabs rc appears pretty
> fragile by depending on an exact field column.  Unless we have a simple 
> command
> line XML parser, how about checking for the return code this way:
> echo $rs | grep ' /dev/null 2>&1
> if [[ $? != 0 ]]
> then
>   [...]

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Re: svn commit: r471866 - in /incubator/solr/trunk: site/features.html site/features.pdf src/site/src/documentation/content/xdocs/features.xml

2006-11-07 Thread Yonik Seeley

On 11/6/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:

> On 11/6/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> >
> > Looks like we should talk to infra but what should the perms be changed 
to?
> > What were the perms before the move... apcvs?

OK, sit tight.  all Solr committers are going to be added to the
incubator group (infra preference).


It's done.
I sync'd the website, and the automatic javadoc update should now
succeed after the next nightly build.


-Yonik


Re: [jira] Commented: (SOLR-66) bulk data loader

2006-11-07 Thread Yonik Seeley

On 11/6/06, Erik Hatcher (JIRA) <[EMAIL PROTECTED]> wrote:

What about having an XSL transformation on the input to Solr as well?   This would 
allow someone to POST in XML documents of any variety, but an XSL would turn it 
into the field definitions.  This would certainly increase the appeal of Solr in 
my (library) domain - a standard TEI -> Solr stylesheet would allow folks to 
POST into Solr without doing much on the client end at all.


Sounds like a good idea... Solr is all about doing things that people
would otherwise have to do themselves.

Any ideas on what the interface should look like?  /solr/upload/xml?xsl=foo.xsl

Seems to run into some of the same questions... how should we allow
POST and specify params about the post at the same time?

Can we do something more efficient than create XML just to parse it
again... like use an API or an intermediate format?

-Yonik


[jira] Commented: (SOLR-59) Copy request parameters to Solr's response

2006-11-07 Thread Bertrand Delacretaz (JIRA)
[ 
http://issues.apache.org/jira/browse/SOLR-59?page=comments#action_12447713 ] 

Bertrand Delacretaz commented on SOLR-59:
-

Ah ok, as the JSON format hasn't changed, I guess it just needs recreating this 
in the XMLWriter:


  0
  1234 


When version < 2.2

I probably won't have time to do that today or tomorrow, but of course feel 
free to "patch my patch" if you want.


> Copy request parameters to Solr's response
> --
>
> Key: SOLR-59
> URL: http://issues.apache.org/jira/browse/SOLR-59
> Project: Solr
>  Issue Type: Improvement
>Reporter: Bertrand Delacretaz
> Attachments: SOLR-59-20061024.patch, SOLR-59-20061102.patch, 
> SOLR-59-20061103.patch, SOLR-59-20061106-newfiles.tar.gz, 
> SOLR-59-20061106.patch, SOLR-59-new-files-20061102.tar.gz
>
>
> This patch copies the request parameters (explicit ones only, not the 
> defaults) to Solr's XML output.
> It is not configurable yet, it is enabled by default and adds a 
> "queryParameters" list to the responseHeader:
> 
> 0
> 1
> 
> 
> red
> blue
> 
> 10
> 0
> on
> solr
> 
> 2.1
> 
> 
> The above example includes a multi-valued parameter, "multi".
> This might still change a bit, but if someone wants to play with it or 
> improve it, here you go.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira