[jira] Updated: (SOLR-58) Change Admin components to return XML like the rest of the system
[ http://issues.apache.org/jira/browse/SOLR-58?page=all ] Otis Gospodnetic updated SOLR-58: - Attachment: logging-xml.jsp Here is the XML version of logging.jsp, named logging-xml.jsp. Its output is trivial: INFO I imagine the XSL would take this XML, convert it to HTML, and append the HTML with links to action.jsp with different logging levels. > Change Admin components to return XML like the rest of the system > - > > Key: SOLR-58 > URL: http://issues.apache.org/jira/browse/SOLR-58 > Project: Solr > Issue Type: New Feature > Components: web gui >Reporter: Otis Gospodnetic > Assigned To: Otis Gospodnetic >Priority: Minor > Attachments: analysis-xml-out.txt, analysis-xml.jsp, logging-xml.jsp, > ping-xml-out.txt, ping-xml.jsp, threaddump-xml-out.txt, threaddump-xml.jsp > > > I need to expose the admin functionality to an external application. I think > returning admin data as XML may be a good and simple first step towards that. > To do that I think I'll mostly need to modify JSPs (but I haven't had a good > look at Admin GUI yet). From what I saw a few weeks ago when I briefly > looked at this, no Java code will need to be modified. If you have concrete > ideas about how this should be done, please comment before I start next week > (week of October 23rd 2006). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (SOLR-58) Change Admin components to return XML like the rest of the system
[ http://issues.apache.org/jira/browse/SOLR-58?page=all ] Otis Gospodnetic updated SOLR-58: - Attachment: threaddump-xml.jsp threaddump-xml-out.txt Here is threaddump-xml.jsp and the example of its output. > Change Admin components to return XML like the rest of the system > - > > Key: SOLR-58 > URL: http://issues.apache.org/jira/browse/SOLR-58 > Project: Solr > Issue Type: New Feature > Components: web gui >Reporter: Otis Gospodnetic > Assigned To: Otis Gospodnetic >Priority: Minor > Attachments: analysis-xml-out.txt, analysis-xml.jsp, > ping-xml-out.txt, ping-xml.jsp, threaddump-xml-out.txt, threaddump-xml.jsp > > > I need to expose the admin functionality to an external application. I think > returning admin data as XML may be a good and simple first step towards that. > To do that I think I'll mostly need to modify JSPs (but I haven't had a good > look at Admin GUI yet). From what I saw a few weeks ago when I briefly > looked at this, no Java code will need to be modified. If you have concrete > ideas about how this should be done, please comment before I start next week > (week of October 23rd 2006). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (SOLR-58) Change Admin components to return XML like the rest of the system
[ http://issues.apache.org/jira/browse/SOLR-58?page=all ] Otis Gospodnetic updated SOLR-58: - Attachment: ping-xml.jsp ping-xml-out.txt Ping was simple, I just made it return if ping was OK (attached), and if there was an error, then: exception trace here Thoughts? N.B. I'm not attaching diffs for JSPs, as I'm letting both the original and the XML versions live side by side locally for now, but if you'd prefer diffs, let me know. > Change Admin components to return XML like the rest of the system > - > > Key: SOLR-58 > URL: http://issues.apache.org/jira/browse/SOLR-58 > Project: Solr > Issue Type: New Feature > Components: web gui >Reporter: Otis Gospodnetic > Assigned To: Otis Gospodnetic >Priority: Minor > Attachments: analysis-xml-out.txt, analysis-xml.jsp, > ping-xml-out.txt, ping-xml.jsp > > > I need to expose the admin functionality to an external application. I think > returning admin data as XML may be a good and simple first step towards that. > To do that I think I'll mostly need to modify JSPs (but I haven't had a good > look at Admin GUI yet). From what I saw a few weeks ago when I briefly > looked at this, no Java code will need to be modified. If you have concrete > ideas about how this should be done, please comment before I start next week > (week of October 23rd 2006). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (SOLR-65) Multithreaded DirectUpdateHandler2
[ http://issues.apache.org/jira/browse/SOLR-65?page=comments#action_12448024 ] Mike Klaas commented on SOLR-65: This version removes the attempt at parsing the rest of the xml if an error occurs during document update. It has mostly already been reviewed, but to summarize, this patch: - improves concurrency during multi-threaded document update - potentially optimizes huge commits (may prevent a stall if memory is constrained) - eliminates the edge cases of document update (esp. multi-doc update) which causes solr to return xml that couldn't be parsed as a single document (should also solve SOLR-2; SOLR-54) If no-one has objections, I'll commit the patch tomorrow. > Multithreaded DirectUpdateHandler2 > -- > > Key: SOLR-65 > URL: http://issues.apache.org/jira/browse/SOLR-65 > Project: Solr > Issue Type: New Feature > Components: update >Reporter: Mike Klaas > Assigned To: Mike Klaas > Attachments: autocommit_patch.diff, autocommit_patch.diff, > autocommit_patch.diff, autocommit_patch.diff > > > Basic implementation of autoCommi functionality, plus overhaul of DUH2 > threading to reduce contention -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (SOLR-65) Multithreaded DirectUpdateHandler2
[ http://issues.apache.org/jira/browse/SOLR-65?page=all ] Mike Klaas updated SOLR-65: --- Attachment: autocommit_patch.diff > Multithreaded DirectUpdateHandler2 > -- > > Key: SOLR-65 > URL: http://issues.apache.org/jira/browse/SOLR-65 > Project: Solr > Issue Type: New Feature > Components: update >Reporter: Mike Klaas > Assigned To: Mike Klaas > Attachments: autocommit_patch.diff, autocommit_patch.diff, > autocommit_patch.diff, autocommit_patch.diff > > > Basic implementation of autoCommi functionality, plus overhaul of DUH2 > threading to reduce contention -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Work started: (SOLR-58) Change Admin components to return XML like the rest of the system
[ http://issues.apache.org/jira/browse/SOLR-58?page=all ] Work on SOLR-58 started by Otis Gospodnetic. > Change Admin components to return XML like the rest of the system > - > > Key: SOLR-58 > URL: http://issues.apache.org/jira/browse/SOLR-58 > Project: Solr > Issue Type: New Feature > Components: web gui >Reporter: Otis Gospodnetic > Assigned To: Otis Gospodnetic >Priority: Minor > Attachments: analysis-xml-out.txt, analysis-xml.jsp > > > I need to expose the admin functionality to an external application. I think > returning admin data as XML may be a good and simple first step towards that. > To do that I think I'll mostly need to modify JSPs (but I haven't had a good > look at Admin GUI yet). From what I saw a few weeks ago when I briefly > looked at this, no Java code will need to be modified. If you have concrete > ideas about how this should be done, please comment before I start next week > (week of October 23rd 2006). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (SOLR-58) Change Admin components to return XML like the rest of the system
[ http://issues.apache.org/jira/browse/SOLR-58?page=all ] Otis Gospodnetic updated SOLR-58: - Attachment: analysis-xml-out.txt analysis-xml.jsp I took a stab at analysis page, and it turned out it lends itself to XMLization. I'm attaching two files: 1) analysis XML output 2) analysis-xml.jsp - the JSP that replaces the output portion of analysis.jsp (if this looks good, then I'll just change the FORM action in analysis.jsp to point to analysis-xml.jsp and somebody familiar with XSL could provide that piece) Please comment. These are my targets for XMLization, and I'm going to work on them next. ANALYSIS (this attachment) STATISTICS (already XMLized) INFO DISTRIBUTION PING LOGGING THREAD DUMP > Change Admin components to return XML like the rest of the system > - > > Key: SOLR-58 > URL: http://issues.apache.org/jira/browse/SOLR-58 > Project: Solr > Issue Type: New Feature > Components: web gui >Reporter: Otis Gospodnetic > Assigned To: Otis Gospodnetic >Priority: Minor > Attachments: analysis-xml-out.txt, analysis-xml.jsp > > > I need to expose the admin functionality to an external application. I think > returning admin data as XML may be a good and simple first step towards that. > To do that I think I'll mostly need to modify JSPs (but I haven't had a good > look at Admin GUI yet). From what I saw a few weeks ago when I briefly > looked at this, no Java code will need to be modified. If you have concrete > ideas about how this should be done, please comment before I start next week > (week of October 23rd 2006). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] Commented: (SOLR-66) bulk data loader
: Doesn't this bulk upload sound a bit like Simon's GData server? That, : too, uses APP, I believe. bulk *upload* is really an afterthough of SOLR-66 ... from yonik's initial description of the issue, he seems to be trying to address "boostrap" problems for quickly indexing a lot of data from a *local* file with minimal markup/parsing (ie: tab delimited or CSV) Personaly, i think we may be getting a little too side tracked by discussions of uploading/transforming and APP ... those are all decent ideas, but they are seperate issues. I think if we had a basic mechanims for parsing CSV files that assumed homogenous fields, we would get pretty damn far for a lot of people -- even if we didn't address multi-value fields. - Original Message From: Walter Underwood <[EMAIL PROTECTED]> To: solr-dev@lucene.apache.org Sent: Tuesday, November 7, 2006 2:36:37 PM Subject: Re: [jira] Commented: (SOLR-66) bulk data loader On 11/7/06 11:22 AM, "Yonik Seeley (JIRA)" <[EMAIL PROTECTED]> wrote: > Yes, posting queries work because it's all form-data (query args). > But, what if we want to post a complete file, *and* some extra info/parameters > about how that file should be handled? One approach is the Atom Publishing Protocol. That is pretty clear about content and metainformation. It isn't designed to solve every problem, but it handles a broad range of publishing, so it could be a good fit for many uses of Solr. APP is nearly finished. The latest draft is here (second URL also has HTML versions). http://www.ietf.org/internet-drafts/draft-ietf-atompub-protocol-11.txt http://tools.ietf.org/wg/atompub/draft-ietf-atompub-protocol/ wunder -- Walter Underwood Search Guru, Netflix : -Hoss
Re: [jira] Commented: (SOLR-66) bulk data loader
Doesn't this bulk upload sound a bit like Simon's GData server? That, too, uses APP, I believe. Otis - Original Message From: Walter Underwood <[EMAIL PROTECTED]> To: solr-dev@lucene.apache.org Sent: Tuesday, November 7, 2006 2:36:37 PM Subject: Re: [jira] Commented: (SOLR-66) bulk data loader On 11/7/06 11:22 AM, "Yonik Seeley (JIRA)" <[EMAIL PROTECTED]> wrote: > Yes, posting queries work because it's all form-data (query args). > But, what if we want to post a complete file, *and* some extra info/parameters > about how that file should be handled? One approach is the Atom Publishing Protocol. That is pretty clear about content and metainformation. It isn't designed to solve every problem, but it handles a broad range of publishing, so it could be a good fit for many uses of Solr. APP is nearly finished. The latest draft is here (second URL also has HTML versions). http://www.ietf.org/internet-drafts/draft-ietf-atompub-protocol-11.txt http://tools.ietf.org/wg/atompub/draft-ietf-atompub-protocol/ wunder -- Walter Underwood Search Guru, Netflix
Re: Adding Phonetic Search to Solr
Grab the code from Lucene in Action, it's got something to get you going, see: http://www.lucenebook.com/search?query=metaphone Otis - Original Message From: Chris Hostetter <[EMAIL PROTECTED]> To: solr-dev@lucene.apache.org Sent: Tuesday, November 7, 2006 6:04:02 PM Subject: Re: Adding Phonetic Search to Solr : >> 1. Adding fuzzy to the DisMax specs. : > : > What do you envisage the implementation looking like? : : Probably continue with the template-like patterns already there. : : title^2.0 (search title field with boost of 2.0) : title~ (search title field with fuzzy matching) Interesting idea ... that's really a seperate idea from "Phonetic" search right? ... fuzzy searching (ala: FuzzyQuery) is really a seperate beast from phonetics (which i assume would best be implimented using your second idea of a new TokenFilter) I'm all in favor of both ideas ... but i wonder if adding fuzzy support to dismax would be better done with a new param similar to "pf" (ie "ff" for fuzzy fields and "fslop" for the min similaity) : Still, it seems like others might want to use a phonetic token : filter with the specs. I'd be glad to contribute that, : if others think it would be useful. yeah .. you may want to just write a generic Lucene TokenFilter that does the phonetics, submit it as a page to Lucene-Java and then it's triial to whip up a TokenFilterFactory to use it in Solr. -Hoss
Re: Adding Phonetic Search to Solr
On 11/7/06 3:26 PM, "Mike Klaas" <[EMAIL PROTECTED]> wrote: > Is the state of the art in phonetic token generation reasonable? I've > been rather disappointed with some implementations (eg. SOUNDEX in > MySQL, MSSQL). SOUNDEX is excellent technology for its time, but its time was 1920. Double Metaphone is far more complex and works fairly well. There is an Apache commons codec implementation available. It is certainly good enough for matching proper names, like Moody and Mudie or Cathy and Kathie. There are some commercial phonetic coders, but I don't have any experience with those. wunder -- Walter Underwood Search Guru, Netflix
Re: Re: Adding Phonetic Search to Solr
On 11/7/06, Walter Underwood <[EMAIL PROTECTED]> wrote: On 11/7/06 2:30 PM, "Mike Klaas" <[EMAIL PROTECTED]> wrote: > On 11/7/06, Walter Underwood <[EMAIL PROTECTED]> wrote: >> >> 1. Adding fuzzy to the DisMax specs. > > What do you envisage the implementation looking like? Probably continue with the template-like patterns already there. title^2.0 (search title field with boost of 2.0) title~ (search title field with fuzzy matching) Ah, I see what you mean. Seems reasonable (as someone who is utter unfamiliar with fuzzy queries in lucene). <> Ah, I missed the example with a stock Lucene analyzer. Oops. I still need to write an Analyzer, because there is no standard phonetic search in Lucene today. There are some patches and addons floating around. Is the state of the art in phonetic token generation reasonable? I've been rather disappointed with some implementations (eg. SOUNDEX in MySQL, MSSQL). cheers, -Mike
Re: Adding Phonetic Search to Solr
: >> 1. Adding fuzzy to the DisMax specs. : > : > What do you envisage the implementation looking like? : : Probably continue with the template-like patterns already there. : : title^2.0 (search title field with boost of 2.0) : title~ (search title field with fuzzy matching) Interesting idea ... that's really a seperate idea from "Phonetic" search right? ... fuzzy searching (ala: FuzzyQuery) is really a seperate beast from phonetics (which i assume would best be implimented using your second idea of a new TokenFilter) I'm all in favor of both ideas ... but i wonder if adding fuzzy support to dismax would be better done with a new param similar to "pf" (ie "ff" for fuzzy fields and "fslop" for the min similaity) : Still, it seems like others might want to use a phonetic token : filter with the specs. I'd be glad to contribute that, : if others think it would be useful. yeah .. you may want to just write a generic Lucene TokenFilter that does the phonetics, submit it as a page to Lucene-Java and then it's triial to whip up a TokenFilterFactory to use it in Solr. -Hoss
[jira] Commented: (SOLR-66) bulk data loader
[ http://issues.apache.org/jira/browse/SOLR-66?page=comments#action_12447953 ] Fuad Efendi commented on SOLR-66: - even 3-column... BTW, good SOLR-targeted single-table database design: ,, Yes, we can use even index-organized tables in Oracle, without repeated 'parent' values! And good standard for CNET customers sending them daily updated product info (is it really search engine?...) Thanks > bulk data loader > > > Key: SOLR-66 > URL: http://issues.apache.org/jira/browse/SOLR-66 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley > Assigned To: Yonik Seeley > > A way to efficiently load simple formatted text files, including CSV files. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (SOLR-68) Custom ClassLoader for "plugins"
[ http://issues.apache.org/jira/browse/SOLR-68?page=comments#action_12447950 ] Hoss Man commented on SOLR-68: -- > and we don't need an explicit instance of a ClassLoader at all... just put > JARs in a classpath... and it works in > all containers, because we use 'parent' classloader with inherited > permissions instead of touching the > 'container-managed' Thread... ...it's not that easy. If you need to load a class which refrences another class defined in the webapp itself, your approach breaks because the ClassLoader won't wlk back down the ClassLoader hierarchy to try and find it. Ie: if you create a "CustomRequestHandler implements SolrRequestHandler" and put it in a JAR which you then put in the "shared" lib dir for Tomcat, you'll get a class not found error for something like SolrQueryRequest when CustomRequestHandler is loaded, because SolrQueryRequest is defined in the WAR and "shared" class loader higher doesn't have access to those classes... http://tomcat.apache.org/tomcat-5.5-doc/class-loader-howto.html ...hence my idea to create a new ClassLoader instance which was a "child" of the class loader being used for the webapp itself, so when it delegates on classes it doesn't recognize, it delegates "up" to the solr.war class loader. Regarding your previous comment about the Nutch/Eclipse plugin model ... i'm not sure if that configuration syntax really fits for us ... it assumes a specific set of extension points, such that a single plugin might map to multiple points, and (as far as i can tell) each extension point is either bound to a single plugin, or hte plugins "chain" themselves. This doesn't really fit our use case, where you might want dozens of differnet analyzers used in differnet fields -- ideally someone should be able to pull any jar out of the lucene/contrib directory containing an analyzer or a Similarity class they want to use, drop that jar into a specified location, and put the name of the schema where they want to use it. As for *how* Nutch does this ... they seem to be doing roughly the same thing as I'm trying for in this patch, except the more specific meaning of "plugin" allows them to have a seperate ClassLoader per plugin, that loads the exact jars enumerated in the plugins "manifest" (an xml config file; not a jar MANIFEST) .. but again i'm not sure if/how they deal with the possibility of a plugin loaded class using reflection to try and load another class later on. > Custom ClassLoader for "plugins" > > > Key: SOLR-68 > URL: http://issues.apache.org/jira/browse/SOLR-68 > Project: Solr > Issue Type: New Feature >Reporter: Hoss Man > Attachments: classloader.patch > > > After beating my head against my desk for a few hours yesterday trying to > document how to load custom plugins (ie: Analyzers, RequestHandlers, > Similarities, etc...) in the various Servlet Containers -- only to discover > that it is aparently impossible unless you use Resin, it occured to me in the > wee hours of last night that since the only time we ever need to load > "pluggable" classes is when explicitly lookup the class by name, we could > make out own ClassLoader and use it ... so i whiped together a little patch > to Config.java that would load JARs out of $solr.home}/lib and was seriously > suprised to discover that it seemed to work. > In the clod light of day, I am again suprised that I still think this might > be a good idea, but i'm not very familiar with ClassLoader semantics, so i'm > not sure if what i've done is an abomination or not -- or if the idea is > sound, but the implimentation is crap. > I'm also not sure if it works in all cases: more testing of various > Containers would be good, as well as testing more complex sitautions (ie: > what if a class explicitly named as a plugin and loaded by this new > classloader then uses reflection to load another class from the same Jar > using Thread.currentThread().getContextClassLoader() ... will that fail?) > So far I've quick and dirty testing with my apachecon JAR under > apache-tomcat-5.5.17, the jetty start.jar we use for the example, > resin-3.0.21 and jettyplus-5.1.11-- all of which seemed to work fine except > for jettyplus-5.1.11 -- but that may have been because of some other > configuration problem I had. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Adding Phonetic Search to Solr
On 11/7/06 2:30 PM, "Mike Klaas" <[EMAIL PROTECTED]> wrote: > On 11/7/06, Walter Underwood <[EMAIL PROTECTED]> wrote: >> >> 1. Adding fuzzy to the DisMax specs. > > What do you envisage the implementation looking like? Probably continue with the template-like patterns already there. title^2.0 (search title field with boost of 2.0) title~ (search title field with fuzzy matching) >> 2. Adding a phonetic token filter and relying on the per-field analyzer >> support. > > I'm not sure why any modification to solr would be necessary. You > could add a field with a phonetic analyzer and use copyField to copy > your search fields to it. Search will use the modified analyzer > automatically. Ah, I missed the example with a stock Lucene analyzer. Oops. I still need to write an Analyzer, because there is no standard phonetic search in Lucene today. There are some patches and addons floating around. Still, it seems like others might want to use a phonetic token filter with the specs. I'd be glad to contribute that, if others think it would be useful. wunder -- Walter Underwood Search Guru, Netflix
[jira] Commented: (SOLR-68) Custom ClassLoader for "plugins"
[ http://issues.apache.org/jira/browse/SOLR-68?page=comments#action_12447947 ] Fuad Efendi commented on SOLR-68: - why do you need URLClassLoader? 1) public class URLClassLoader extends SecureClassLoader 2) "The classes that are loaded are by default granted permission only to access the URLs specified when the URLClassLoader was created." we need to create an instance defined in XML... just call Class.forName("org.zzz.Foo").newInstance()... it will throw an exception ClassNotFoundException if a class can't be found in a path, or NoClassDefFoundError in case of unresolved nested dependencies (am I right?)... > Custom ClassLoader for "plugins" > > > Key: SOLR-68 > URL: http://issues.apache.org/jira/browse/SOLR-68 > Project: Solr > Issue Type: New Feature >Reporter: Hoss Man > Attachments: classloader.patch > > > After beating my head against my desk for a few hours yesterday trying to > document how to load custom plugins (ie: Analyzers, RequestHandlers, > Similarities, etc...) in the various Servlet Containers -- only to discover > that it is aparently impossible unless you use Resin, it occured to me in the > wee hours of last night that since the only time we ever need to load > "pluggable" classes is when explicitly lookup the class by name, we could > make out own ClassLoader and use it ... so i whiped together a little patch > to Config.java that would load JARs out of $solr.home}/lib and was seriously > suprised to discover that it seemed to work. > In the clod light of day, I am again suprised that I still think this might > be a good idea, but i'm not very familiar with ClassLoader semantics, so i'm > not sure if what i've done is an abomination or not -- or if the idea is > sound, but the implimentation is crap. > I'm also not sure if it works in all cases: more testing of various > Containers would be good, as well as testing more complex sitautions (ie: > what if a class explicitly named as a plugin and loaded by this new > classloader then uses reflection to load another class from the same Jar > using Thread.currentThread().getContextClassLoader() ... will that fail?) > So far I've quick and dirty testing with my apachecon JAR under > apache-tomcat-5.5.17, the jetty start.jar we use for the example, > resin-3.0.21 and jettyplus-5.1.11-- all of which seemed to work fine except > for jettyplus-5.1.11 -- but that may have been because of some other > configuration problem I had. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Adding Phonetic Search to Solr
On 11/7/06, Mike Klaas <[EMAIL PROTECTED]> wrote: > 2. Adding a phonetic token filter and relying on the per-field analyzer > support. > > Option 2 seems like it would be a lot faster in production, and > probably easier to implement. Does that seem right? I'm not sure why any modification to solr would be necessary. You could add a field with a phonetic analyzer and use copyField to copy your search fields to it. Search will use the modified analyzer automatically. I assumed he meant the implementation of the filter/analyzer itself (unless there is already a Lucene filter we can use). Unless there is a lucene analyzer (which can be used directly), a FilterFactory for the filter will have to be added, but that's easy. > How do I specify the new token filter factory in the schema file? > I don't quite get the mapping from solr.FooFilterFactory to > org.apache.solr.analysis.FooFilterFactory. When Solr was an internal CNET project, the package name was just solr (or actually, just solar). I kept the short version as a form of backward compatibility, and I sort of liked it because it made the schema less verbose. Anyway there is a set of standard packages that will be checked if the class name isn't found as-is. I thought it was documented but I can't seem to find it... -Yonik
Re: Adding Phonetic Search to Solr
: 2. Adding a phonetic token filter and relying on the per-field analyzer : support. : : Option 2 seems like it would be a lot faster in production, and : probably easier to implement. Does that seem right? yep, just write your Analyzer (or TokenFilter) and drop it in. : How do I specify the new token filter factory in the schema file? : I don't quite get the mapping from solr.FooFilterFactory to : org.apache.solr.analysis.FooFilterFactory. Classes *should* be specified using the fully qualified package, but for convincince, if Solr sees a package name of just "solr" it checks a few known packages based on context (ie: org.apache.solr.analysis, o.a.s.schema, etc...) As discussed in SOLR-68, getting the classloader to find your custom analyzer is currently a pain in the ass .. the easiest thing to do that should work in any servlet container is add it directly to the solr.war ... but i'm hopeingto make that unneccessary soon. -Hoss
Re: Adding Phonetic Search to Solr
On 11/7/06, Walter Underwood <[EMAIL PROTECTED]> wrote: I haven't found fuzzy or phonetic search in Solr, and I have a couple of approaches I might try: 1. Adding fuzzy to the DisMax specs. What do you envisage the implementation looking like? 2. Adding a phonetic token filter and relying on the per-field analyzer support. Option 2 seems like it would be a lot faster in production, and probably easier to implement. Does that seem right? I'm not sure why any modification to solr would be necessary. You could add a field with a phonetic analyzer and use copyField to copy your search fields to it. Search will use the modified analyzer automatically. -Mike
[jira] Commented: (SOLR-68) Custom ClassLoader for "plugins"
[ http://issues.apache.org/jira/browse/SOLR-68?page=comments#action_12447942 ] Fuad Efendi commented on SOLR-68: - >...is when explicitly lookup the class by name, we could make out own >ClassLoader and use it... The simplest way is to create a Class object is: Class fooClass = Class.forName("Foo") which is equivalent to: Class.forName("Foo", true, this.getClass().getClassLoader()) Then, Foo fooObject = fooClass.newInstance() and we don't need an explicit instance of a ClassLoader at all... just put JARs in a classpath... and it works in all containers, because we use 'parent' classloader with inherited permissions instead of touching the 'container-managed' Thread... Sorry, still trying to understand why we would need that method: public static Class findClass(String cname, String... subpackages) even here, we can loop in [subpackage+"."+cname] using Class.forName()... > Custom ClassLoader for "plugins" > > > Key: SOLR-68 > URL: http://issues.apache.org/jira/browse/SOLR-68 > Project: Solr > Issue Type: New Feature >Reporter: Hoss Man > Attachments: classloader.patch > > > After beating my head against my desk for a few hours yesterday trying to > document how to load custom plugins (ie: Analyzers, RequestHandlers, > Similarities, etc...) in the various Servlet Containers -- only to discover > that it is aparently impossible unless you use Resin, it occured to me in the > wee hours of last night that since the only time we ever need to load > "pluggable" classes is when explicitly lookup the class by name, we could > make out own ClassLoader and use it ... so i whiped together a little patch > to Config.java that would load JARs out of $solr.home}/lib and was seriously > suprised to discover that it seemed to work. > In the clod light of day, I am again suprised that I still think this might > be a good idea, but i'm not very familiar with ClassLoader semantics, so i'm > not sure if what i've done is an abomination or not -- or if the idea is > sound, but the implimentation is crap. > I'm also not sure if it works in all cases: more testing of various > Containers would be good, as well as testing more complex sitautions (ie: > what if a class explicitly named as a plugin and loaded by this new > classloader then uses reflection to load another class from the same Jar > using Thread.currentThread().getContextClassLoader() ... will that fail?) > So far I've quick and dirty testing with my apachecon JAR under > apache-tomcat-5.5.17, the jetty start.jar we use for the example, > resin-3.0.21 and jettyplus-5.1.11-- all of which seemed to work fine except > for jettyplus-5.1.11 -- but that may have been because of some other > configuration problem I had. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Adding Phonetic Search to Solr
I haven't found fuzzy or phonetic search in Solr, and I have a couple of approaches I might try: 1. Adding fuzzy to the DisMax specs. 2. Adding a phonetic token filter and relying on the per-field analyzer support. Option 2 seems like it would be a lot faster in production, and probably easier to implement. Does that seem right? How do I specify the new token filter factory in the schema file? I don't quite get the mapping from solr.FooFilterFactory to org.apache.solr.analysis.FooFilterFactory. wunder -- Walter Underwood Search Guru, Netflix
[jira] Commented: (SOLR-68) Custom ClassLoader for "plugins"
[ http://issues.apache.org/jira/browse/SOLR-68?page=comments#action_12447932 ] Fuad Efendi commented on SOLR-68: - It was done in Eclipse, for instance. Nutch project also has huge 'plugins'-supporting codebase which are automatically loaded and 'wired' together without explicit XML definitions, they did it Eclipse-way... I mean if you use explicit XML like - you don't need to design smth like Eclipse or Nutch... A lot of headache just to replace strings like "org.apache.nutch.parse.html.HtmlParser" by plugin.includes protocol-http|urlfilter-regex|parse-(text|html|js)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic Here, "parse-(text|html|js)" will look for a jar file with ID "parse-html" in plugin.xml file (inside the jar file): ... The main reason for such design: to avoid explicit names "org.apache.nutch.parse.html.HtmlParser" in main configuration XML, to allow third-parties to develop compatible plugins (which could have very different complicated extension points and implementations with specific settings, not defined in main Solr schema.xml file)... I don't think it would work in WebLogic without some explicit settings; and JEE does not easily allow direct access to a file system, especially to load a class 'from a different context' (security considerations)... > Custom ClassLoader for "plugins" > > > Key: SOLR-68 > URL: http://issues.apache.org/jira/browse/SOLR-68 > Project: Solr > Issue Type: New Feature >Reporter: Hoss Man > Attachments: classloader.patch > > > After beating my head against my desk for a few hours yesterday trying to > document how to load custom plugins (ie: Analyzers, RequestHandlers, > Similarities, etc...) in the various Servlet Containers -- only to discover > that it is aparently impossible unless you use Resin, it occured to me in the > wee hours of last night that since the only time we ever need to load > "pluggable" classes is when explicitly lookup the class by name, we could > make out own ClassLoader and use it ... so i whiped together a little patch > to Config.java that would load JARs out of $solr.home}/lib and was seriously > suprised to discover that it seemed to work. > In the clod light of day, I am again suprised that I still think this might > be a good idea, but i'm not very familiar with ClassLoader semantics, so i'm > not sure if what i've done is an abomination or not -- or if the idea is > sound, but the implimentation is crap. > I'm also not sure if it works in all cases: more testing of various > Containers would be good, as well as testing more complex sitautions (ie: > what if a class explicitly named as a plugin and loaded by this new > classloader then uses reflection to load another class from the same Jar > using Thread.currentThread().getContextClassLoader() ... will that fail?) > So far I've quick and dirty testing with my apachecon JAR under > apache-tomcat-5.5.17, the jetty start.jar we use for the example, > resin-3.0.21 and jettyplus-5.1.11-- all of which seemed to work fine except > for jettyplus-5.1.11 -- but that may have been because of some other > configuration problem I had. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: changes before release?
Unless there are objections, I'll start of by switching to the new license header and then try and search for any files that may be missing it. -Yonik
[jira] Commented: (SOLR-66) bulk data loader
[ http://issues.apache.org/jira/browse/SOLR-66?page=comments#action_12447924 ] Fuad Efendi commented on SOLR-66: - This is probably SOLR-specific (the best? be focused on task?)... We stick on 4-column format for everything (in case of surrogate PK we may have [id,"001,003,abxc"]): id,9885A004,name,Canon PowerShot SD500 id,9885A004,manu,Canon Inc. id,9885A004,cat,electronics id,9885A004,cat,camera id,9885A004,features,"3x zoop, 7.1 megapixel Digital ELPH" id,9885A004,features,movie clips up to 640x480 @30 fps id,9885A004,features,"2.0"" TFT LCD, 118,000 pixels" id,9885A004,features,"built in flash, red-eye reduction" id,9885A004,includes,"32MB SD card, USB cable, AV cable, battery" id,9885A004,weight,6.4 id,9885A004,price,329.95 id,9885A004,popularity,7 id,9885A004,inStock,true > bulk data loader > > > Key: SOLR-66 > URL: http://issues.apache.org/jira/browse/SOLR-66 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley > Assigned To: Yonik Seeley > > A way to efficiently load simple formatted text files, including CSV files. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (SOLR-66) bulk data loader
[ http://issues.apache.org/jira/browse/SOLR-66?page=comments#action_12447923 ] Fuad Efendi commented on SOLR-66: - Another sample... 9885A004 Canon PowerShot SD500 Canon Inc. electronics camera 3x zoop, 7.1 megapixel Digital ELPH movie clips up to 640x480 @30 fps 2.0" TFT LCD, 118,000 pixels built in flash, red-eye reduction 32MB SD card, USB cable, AV cable, battery 6.4 329.95 7 true Carthesian Product of Facets (can I cay that?): 9885A004, Canon PowerShot SD500, Canon Inc., electronics, 3x zoop, 7.1 megapixel Digital ELPH, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7, true 9885A004, Canon PowerShot SD500, Canon Inc., electronics, 3x zoop, movie clips up to 640x480 @30 fps, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7, true 9885A004, Canon PowerShot SD500, Canon Inc., electronics, 3x zoop, 2.0" TFT LCD, 118,000 pixels, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7, true 9885A004, Canon PowerShot SD500, Canon Inc., electronics, 3x zoop, built in flash, red-eye reduction, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7, true 9885A004, Canon PowerShot SD500, Canon Inc., camera, 3x zoop, 7.1 megapixel Digital ELPH, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7, true 9885A004, Canon PowerShot SD500, Canon Inc., camera, 3x zoop, movie clips up to 640x480 @30 fps, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7, true 9885A004, Canon PowerShot SD500, Canon Inc., camera, 3x zoop, 2.0" TFT LCD, 118,000 pixels, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7, true 9885A004, Canon PowerShot SD500, Canon Inc., camera, 3x zoop, built in flash, red-eye reduction, "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7, true Optimized CSV (just for improved network traffic for areas without DSL!!!): 9885A004, Canon PowerShot SD500, Canon Inc., electronics, "3x zoop, 7.1 megapixel Digital ELPH", "32MB SD card, USB cable, AV cable, battery", 6.4, 329.95, 7 true 9885A004, , , , "movie clips up to 640x480 @30 fps", , , , , 9885A004, , , , "3x zoop, 2.0" TFT LCD, 118,000 pixels", , , , , 9885A004, , , , "built in flash, red-eye reduction", , , , , 9885A004, , , camera, "3x zoop, 7.1 megapixel Digital ELPH", 9885A004, , , , "movie clips up to 640x480 @30 fps", , , , , 9885A004, , , , "3x zoop, 2.0" TFT LCD, 118,000 pixels", , , , , 9885A004, , , , "built in flash, red-eye reduction", , , , , almost EDI... XML looks much better... May be specific GZIP version for a standard "Carthesian" CSV? > bulk data loader > > > Key: SOLR-66 > URL: http://issues.apache.org/jira/browse/SOLR-66 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley > Assigned To: Yonik Seeley > > A way to efficiently load simple formatted text files, including CSV files. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (SOLR-66) bulk data loader
[ http://issues.apache.org/jira/browse/SOLR-66?page=comments#action_12447919 ] Fuad Efendi commented on SOLR-66: - mistake... (Paper, instead of IBM): 001, Paper, 001, 17R7021, 14 7/8 X 8 1/2" - 1/2" Greenbar 001, Paper, 002, 17R8018, 8 1/2 x 11" Micro Perf @ 3 2/3" ... optimized: 001, Paper, 001, 17R7021, 14 7/8 X 8 1/2" - 1/2" Greenbar 001, , 002, 17R8018, 8 1/2 x 11" Micro Perf @ 3 2/3" ... > bulk data loader > > > Key: SOLR-66 > URL: http://issues.apache.org/jira/browse/SOLR-66 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley > Assigned To: Yonik Seeley > > A way to efficiently load simple formatted text files, including CSV files. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (SOLR-66) bulk data loader
[ http://issues.apache.org/jira/browse/SOLR-66?page=comments#action_12447918 ] Fuad Efendi commented on SOLR-66: - CSV: - should we support standard CSVs generated by Excel, Oracle DataPump, etc? XML: we currently preprocess some data to create XML, then we post it to SOLR. Can we preprocess standard CSV? For instance, we have two tables: CATEGORY (parent), PRODUCT (child) CSV produced by Oracle might seem like 001,IBM,001,17R7021,14 7/8 X 8 1/2" - 1/2" Greenbar 001,IBM,002,17R8018,8 1/2 x 11" Micro Perf @ 3 2/3" Here, [001,Paper] is a single record from CATEGORY table, and rest is PK, SKU, NAME from PRODUCT table. 1. Use 'extended' CSV such as 001,Paper,multi-value:"001,17R7021,14 7/8 X 8 1/2"" - 1/2"" Greenbar002,17R8018,8 1/2 x 11"" Micro Perf @ 3 2/3""" (multi-value:",...") - very difficult... and not compatible with exported data... 2. Standard CSV with fixed width + preprocessing (sorting, and removing repeated values) 001,Paper,001,17R7021,14 7/8 X 8 1/2" - 1/2" Greenbar 001,,002,17R8018,8 1/2 x 11" Micro Perf @ 3 2/3" We removed repeated value 'Paper', but we left Primary Key of this Category intact... It should work with both, standard 'large' CSV and preprocessed one... And, we don't have huge single line in case of IBM producing different kinds of paper...; we have multi-line with fixed width... First column (repeated 001 value) is primary key, same as 001 > bulk data loader > > > Key: SOLR-66 > URL: http://issues.apache.org/jira/browse/SOLR-66 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley > Assigned To: Yonik Seeley > > A way to efficiently load simple formatted text files, including CSV files. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] Created: (SOLR-68) Custom ClassLoader for "plugins"
: Probably. Unless you previously set your custom loader to be the : thread's context class loader. If you do that, you'd have to ensure : that your loader is aware of the context classpath (which could be : accomplished by making it a child of the old context class loader). : Otherwise, you might break other code which also uses this loader. I'm already making sure the new loader uses Thread.currentThread().getContextClassLoader() as it's parent ... but it never occured to me that we could call Thread.setContextClassLoader ... even then it gets confusing, because there will probably be a lot more threads using the Class then the one that loaded the configs (and the plugin classes) ... but i suppose we could call Thread.currentThread().setContextClassLoader in SolrCore.execute ... or even SolrCore.getSolrCore ... ...but again, i'm not sure if that would be an abomination or not. -Hoss
[jira] Commented: (SOLR-66) bulk data loader
[ http://issues.apache.org/jira/browse/SOLR-66?page=comments#action_12447911 ] Fuad Efendi commented on SOLR-66: - Sorry for not correctly understanding the multipart HTTP POST / File Upload issue, it's not easy, I just browsed sources of org.springframework.web.multipart.support (although it's very easy with Spring...) > bulk data loader > > > Key: SOLR-66 > URL: http://issues.apache.org/jira/browse/SOLR-66 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley > Assigned To: Yonik Seeley > > A way to efficiently load simple formatted text files, including CSV files. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] Created: (SOLR-67) query interface with faceted browsing, highligting
: It would be nice to a nice HTML page allowing the user to query and display : 1) some faceting info such as hit counts with links that allowed the user to narrow their search results. : 2) highlighted summaries : 3) easy way to query the dismax handler as well as the standard request handler : Most likely this would be built into the admin pages (and have access to all the field info). as a shorter term solution to a progromaticly built page that was aware of the schema and the registered handelrs, we could just enhance form.jsp to include the various facet and dismax params as flat HTML, and to use the XSLTResponseWriter to generate results with hyperlinks for filtering/expanding the results. we might even want to solicit volunteers from solr-users, as there seem to be quite a few users who aren't familiar with java but would like to help contribute to the project ... but i don't know if it's bad form to go asking for volunteers on the user list. -Hoss
[jira] Commented: (SOLR-67) query interface with faceted browsing, highligting
[ http://issues.apache.org/jira/browse/SOLR-67?page=comments#action_12447907 ] Hoss Man commented on SOLR-67: -- See also: http://wiki.apache.org/solr/MakeSolrMoreSelfService for an extensive listof thoughts on making the admin screens more driven by the schema/solrconfig. > query interface with faceted browsing, highligting > -- > > Key: SOLR-67 > URL: http://issues.apache.org/jira/browse/SOLR-67 > Project: Solr > Issue Type: Wish > Components: web gui >Reporter: Yonik Seeley > > It would be nice to a nice HTML page allowing the user to query and display > 1) some faceting info such as hit counts with links that allowed the user to > narrow their search results. > 2) highlighted summaries > 3) easy way to query the dismax handler as well as the standard request > handler > Most likely this would be built into the admin pages (and have access to all > the field info). > It would also seem useful to have an "advanced query page"... something like > http://www.nabble.com/forum/AdvSearch.jtp or > http://www.google.com/advanced_search?hl=en > that would allow one to easily customize and drop into their site. This > might be best as contrib module run outside of Solr (a JSP, etc?) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] Created: (SOLR-68) Custom ClassLoader for "plugins"
On Nov 7, 2006, at 2:10 PM, Hoss Man (JIRA) wrote: I'm also not sure if it works in all cases: more testing of various Containers would be good, as well as testing more complex sitautions (ie: what if a class explicitly named as a plugin and loaded by this new classloader then uses reflection to load another class from the same Jar using Thread.currentThread ().getContextClassLoader() ... will that fail? Probably. Unless you previously set your custom loader to be the thread's context class loader. If you do that, you'd have to ensure that your loader is aware of the context classpath (which could be accomplished by making it a child of the old context class loader). Otherwise, you might break other code which also uses this loader. -MB
Re: [jira] Commented: (SOLR-66) bulk data loader
On 11/7/06 11:22 AM, "Yonik Seeley (JIRA)" <[EMAIL PROTECTED]> wrote: > Yes, posting queries work because it's all form-data (query args). > But, what if we want to post a complete file, *and* some extra info/parameters > about how that file should be handled? One approach is the Atom Publishing Protocol. That is pretty clear about content and metainformation. It isn't designed to solve every problem, but it handles a broad range of publishing, so it could be a good fit for many uses of Solr. APP is nearly finished. The latest draft is here (second URL also has HTML versions). http://www.ietf.org/internet-drafts/draft-ietf-atompub-protocol-11.txt http://tools.ietf.org/wg/atompub/draft-ietf-atompub-protocol/ wunder -- Walter Underwood Search Guru, Netflix
[jira] Commented: (SOLR-66) bulk data loader
[ http://issues.apache.org/jira/browse/SOLR-66?page=comments#action_12447901 ] Yonik Seeley commented on SOLR-66: -- > How to encode 'comma'? For standard CSV, ytou could quote the entire field value... "a,b" I don't know if Commons CSV supports backslash escaping or not, but that would be another way. > How to encode UTF-8? Two ways... the user can define a charset for the file (and the file could actually be UTF-8), and we can support unicode escapes \u1234 > Should we use Base64 and encode raw values? I hadn't thought about binary fields (they aren't even supported in the XML update yet). Doing Base64 would seem relatively easy though. > bulk data loader > > > Key: SOLR-66 > URL: http://issues.apache.org/jira/browse/SOLR-66 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley > Assigned To: Yonik Seeley > > A way to efficiently load simple formatted text files, including CSV files. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (SOLR-66) bulk data loader
[ http://issues.apache.org/jira/browse/SOLR-66?page=comments#action_12447900 ] Hoss Man commented on SOLR-66: -- Fuad: the issue isn't really wether POSTed *queries* work ... those have been tested and are known to work ... it's more a question of POSTed *updates* ... the current update mechanism does not use "application/x-www-form-urlencoded" instead the raw POST body is read as an XML message containing docs to index. This issue is attempting to address a more convininet method to bulk import records, possibly using CSV, and probably using a local file -- but we'd want to support a POSTed file as well, so there was some discussion (on list) of how to POST both a file nd send query params (using either "application/x-www-form-urlencoded" or the mechanism we currently use) > bulk data loader > > > Key: SOLR-66 > URL: http://issues.apache.org/jira/browse/SOLR-66 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley > Assigned To: Yonik Seeley > > A way to efficiently load simple formatted text files, including CSV files. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (SOLR-66) bulk data loader
[ http://issues.apache.org/jira/browse/SOLR-66?page=comments#action_12447898 ] Yonik Seeley commented on SOLR-66: -- > Existing SOLR should work with POST HTML forms without any change in Java... Yes, posting queries work because it's all form-data (query args). But, what if we want to post a complete file, *and* some extra info/parameters about how that file should be handled? > bulk data loader > > > Key: SOLR-66 > URL: http://issues.apache.org/jira/browse/SOLR-66 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley > Assigned To: Yonik Seeley > > A way to efficiently load simple formatted text files, including CSV files. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: incubator report due
: http://wiki.apache.org/incubator/November2006 +1 -Hoss
[jira] Commented: (SOLR-66) bulk data loader
[ http://issues.apache.org/jira/browse/SOLR-66?page=comments#action_12447894 ] Fuad Efendi commented on SOLR-66: - /sorry for not having access to E-mail and using POST temporarily.../ HTTP-POST: should work without any code changes. In /resources/admin/index.jsp, Simply replace GET to POST, and everything should work ... You have following in org.apache.solr.servlet.SolrServlet: public void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { doGet(request,response); } And, you are using standard Servlet API to retrieve ServletRequest parameters, http://java.sun.com/j2ee/1.4/docs/api/javax/servlet/ServletRequest.html#getParameterMap() public class ServletSolrParams extends MultiMapSolrParams { public ServletSolrParams(ServletRequest req) { super(req.getParameterMap()); } Existing SOLR should work with POST HTML forms without any change in Java... > bulk data loader > > > Key: SOLR-66 > URL: http://issues.apache.org/jira/browse/SOLR-66 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley > Assigned To: Yonik Seeley > > A way to efficiently load simple formatted text files, including CSV files. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (SOLR-68) Custom ClassLoader for "plugins"
[ http://issues.apache.org/jira/browse/SOLR-68?page=all ] Hoss Man updated SOLR-68: - Attachment: classloader.patch patch that creates a new classloader containing any JARs found in ${solr.home}/lib and uses that class loader anytime class names read from a config file. > Custom ClassLoader for "plugins" > > > Key: SOLR-68 > URL: http://issues.apache.org/jira/browse/SOLR-68 > Project: Solr > Issue Type: New Feature >Reporter: Hoss Man > Attachments: classloader.patch > > > After beating my head against my desk for a few hours yesterday trying to > document how to load custom plugins (ie: Analyzers, RequestHandlers, > Similarities, etc...) in the various Servlet Containers -- only to discover > that it is aparently impossible unless you use Resin, it occured to me in the > wee hours of last night that since the only time we ever need to load > "pluggable" classes is when explicitly lookup the class by name, we could > make out own ClassLoader and use it ... so i whiped together a little patch > to Config.java that would load JARs out of $solr.home}/lib and was seriously > suprised to discover that it seemed to work. > In the clod light of day, I am again suprised that I still think this might > be a good idea, but i'm not very familiar with ClassLoader semantics, so i'm > not sure if what i've done is an abomination or not -- or if the idea is > sound, but the implimentation is crap. > I'm also not sure if it works in all cases: more testing of various > Containers would be good, as well as testing more complex sitautions (ie: > what if a class explicitly named as a plugin and loaded by this new > classloader then uses reflection to load another class from the same Jar > using Thread.currentThread().getContextClassLoader() ... will that fail?) > So far I've quick and dirty testing with my apachecon JAR under > apache-tomcat-5.5.17, the jetty start.jar we use for the example, > resin-3.0.21 and jettyplus-5.1.11-- all of which seemed to work fine except > for jettyplus-5.1.11 -- but that may have been because of some other > configuration problem I had. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (SOLR-68) Custom ClassLoader for "plugins"
Custom ClassLoader for "plugins" Key: SOLR-68 URL: http://issues.apache.org/jira/browse/SOLR-68 Project: Solr Issue Type: New Feature Reporter: Hoss Man After beating my head against my desk for a few hours yesterday trying to document how to load custom plugins (ie: Analyzers, RequestHandlers, Similarities, etc...) in the various Servlet Containers -- only to discover that it is aparently impossible unless you use Resin, it occured to me in the wee hours of last night that since the only time we ever need to load "pluggable" classes is when explicitly lookup the class by name, we could make out own ClassLoader and use it ... so i whiped together a little patch to Config.java that would load JARs out of $solr.home}/lib and was seriously suprised to discover that it seemed to work. In the clod light of day, I am again suprised that I still think this might be a good idea, but i'm not very familiar with ClassLoader semantics, so i'm not sure if what i've done is an abomination or not -- or if the idea is sound, but the implimentation is crap. I'm also not sure if it works in all cases: more testing of various Containers would be good, as well as testing more complex sitautions (ie: what if a class explicitly named as a plugin and loaded by this new classloader then uses reflection to load another class from the same Jar using Thread.currentThread().getContextClassLoader() ... will that fail?) So far I've quick and dirty testing with my apachecon JAR under apache-tomcat-5.5.17, the jetty start.jar we use for the example, resin-3.0.21 and jettyplus-5.1.11-- all of which seemed to work fine except for jettyplus-5.1.11 -- but that may have been because of some other configuration problem I had. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (SOLR-67) query interface with faceted browsing, highligting
query interface with faceted browsing, highligting -- Key: SOLR-67 URL: http://issues.apache.org/jira/browse/SOLR-67 Project: Solr Issue Type: Wish Components: web gui Reporter: Yonik Seeley It would be nice to a nice HTML page allowing the user to query and display 1) some faceting info such as hit counts with links that allowed the user to narrow their search results. 2) highlighted summaries 3) easy way to query the dismax handler as well as the standard request handler Most likely this would be built into the admin pages (and have access to all the field info). It would also seem useful to have an "advanced query page"... something like http://www.nabble.com/forum/AdvSearch.jtp or http://www.google.com/advanced_search?hl=en that would allow one to easily customize and drop into their site. This might be best as contrib module run outside of Solr (a JSP, etc?) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: incubator report due
Looks good to me. +1 Yoav On 11/7/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: Report to the Incubator is due by this Friday. I've put together a preliminary report already. http://wiki.apache.org/incubator/November2006 -Yonik
Re: [jira] Commented: (SOLR-66) bulk data loader
: Any ideas on what the interface should look like? /solr/upload/xml?xsl=foo.xsl : : Seems to run into some of the same questions... how should we allow : POST and specify params about the post at the same time? alternately, the API could just require that the XML file be POSTed, and we could extract the stylesheet URL from the processing directive, and load it from a remote URL. : Can we do something more efficient than create XML just to parse it : again... like use an API or an intermediate format? As I understand the javax.xml.transform.* packages, both teh Transform source nad result can either be DOM structures, IO Streams that XML are read-from/written-to, or SAX Handler hooks. So we could transform from an arbitrary File to the "standard" DOM structure that we walk, or to a set of SAX Handlers ... but either way we would be duplicating code that is already implimented in XPP. -Hoss
incubator report due
Report to the Incubator is due by this Friday. I've put together a preliminary report already. http://wiki.apache.org/incubator/November2006 -Yonik
[jira] Commented: (SOLR-66) bulk data loader
[ http://issues.apache.org/jira/browse/SOLR-66?page=comments#action_12447877 ] Fuad Efendi commented on SOLR-66: - Encoding: How to encode 'comma'? How to encode UTF-8? Should we use Base64 and encode raw values? http://rfc.net/rfc4180.html: "Common usage of CSV is US-ASCII, but other character sets defined by IANA for the "text" tree may be used in conjunction with the "charset" parameter. http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm http://www.edoceo.com/utilis/csv-file-format.php http://www.ricebridge.com/products/csvman/reference.htm This is interesting (from last link): FIELD:[trim]? ( UNQUOTED | QUOTED ) [trim]? UNQUOTED: ( [data]* | ESCAPE )*; QUOTED: [quote] ( DOUBLE | ESCAPE | [data]* )* [quote] > bulk data loader > > > Key: SOLR-66 > URL: http://issues.apache.org/jira/browse/SOLR-66 > Project: Solr > Issue Type: New Feature >Reporter: Yonik Seeley > Assigned To: Yonik Seeley > > A way to efficiently load simple formatted text files, including CSV files. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
acts-as-solr (Ruby on Rails)
Erik, it looks as if someone ran with your acts_as_solr implementation http://acts-as-solr.rubyforge.org/ Is it a proper superset of what you had, and should it be considered the latest definitive version of acts_as_solr? We should probably add a link from http://wiki.apache.org/solr/SolRuby -Yonik
[jira] Resolved: (SOLR-62) scripts don't check return code
[ http://issues.apache.org/jira/browse/SOLR-62?page=all ] Bill Au resolved SOLR-62. - Resolution: Fixed patch committed. > scripts don't check return code > --- > > Key: SOLR-62 > URL: http://issues.apache.org/jira/browse/SOLR-62 > Project: Solr > Issue Type: Bug > Components: replication >Reporter: Yonik Seeley >Priority: Minor > Attachments: solr-scripts-solr-62.patch > > > Solr scripts that post commands to solr don't check the return code. > The scripts (like optimize) currently follow this pattern: > rs=`curl http://localhost:5051/update -s -d ""` > if [[ $? != 0 ]] > then > [...] > fi > # check status of optimize request > rc=`echo $rs|cut -f2 -d'"'` > if [[ $? != 0 ]] > then > [...] > $rc is never checked. In addition, the line that grabs rc appears pretty > fragile by depending on an exact field column. Unless we have a simple > command > line XML parser, how about checking for the return code this way: > echo $rs | grep ' /dev/null 2>&1 > if [[ $? != 0 ]] > then > [...] -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Re: svn commit: r471866 - in /incubator/solr/trunk: site/features.html site/features.pdf src/site/src/documentation/content/xdocs/features.xml
On 11/6/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On 11/6/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > > > Looks like we should talk to infra but what should the perms be changed to? > > What were the perms before the move... apcvs? OK, sit tight. all Solr committers are going to be added to the incubator group (infra preference). It's done. I sync'd the website, and the automatic javadoc update should now succeed after the next nightly build. -Yonik
Re: [jira] Commented: (SOLR-66) bulk data loader
On 11/6/06, Erik Hatcher (JIRA) <[EMAIL PROTECTED]> wrote: What about having an XSL transformation on the input to Solr as well? This would allow someone to POST in XML documents of any variety, but an XSL would turn it into the field definitions. This would certainly increase the appeal of Solr in my (library) domain - a standard TEI -> Solr stylesheet would allow folks to POST into Solr without doing much on the client end at all. Sounds like a good idea... Solr is all about doing things that people would otherwise have to do themselves. Any ideas on what the interface should look like? /solr/upload/xml?xsl=foo.xsl Seems to run into some of the same questions... how should we allow POST and specify params about the post at the same time? Can we do something more efficient than create XML just to parse it again... like use an API or an intermediate format? -Yonik
[jira] Commented: (SOLR-59) Copy request parameters to Solr's response
[ http://issues.apache.org/jira/browse/SOLR-59?page=comments#action_12447713 ] Bertrand Delacretaz commented on SOLR-59: - Ah ok, as the JSON format hasn't changed, I guess it just needs recreating this in the XMLWriter: 0 1234 When version < 2.2 I probably won't have time to do that today or tomorrow, but of course feel free to "patch my patch" if you want. > Copy request parameters to Solr's response > -- > > Key: SOLR-59 > URL: http://issues.apache.org/jira/browse/SOLR-59 > Project: Solr > Issue Type: Improvement >Reporter: Bertrand Delacretaz > Attachments: SOLR-59-20061024.patch, SOLR-59-20061102.patch, > SOLR-59-20061103.patch, SOLR-59-20061106-newfiles.tar.gz, > SOLR-59-20061106.patch, SOLR-59-new-files-20061102.tar.gz > > > This patch copies the request parameters (explicit ones only, not the > defaults) to Solr's XML output. > It is not configurable yet, it is enabled by default and adds a > "queryParameters" list to the responseHeader: > > 0 > 1 > > > red > blue > > 10 > 0 > on > solr > > 2.1 > > > The above example includes a multi-valued parameter, "multi". > This might still change a bit, but if someone wants to play with it or > improve it, here you go. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira