Re: Solr Performance and Scalability
http://people.apache.org/~hossman/#solr-dev I would look at the HathiTrust project analysis: http://wiki.apache.org/solr/SolrPerformanceData Your question is pretty broad, but I don't see any reason why Solr wouldn't work for your problem assuming the project is appropriately resourced! Eric On Feb 11, 2010, at 12:18 PM, Wick2804 wrote: We are thinking of creating a Lucene Solr project to store 50million full text OCRed A4 pages. Is there anyone out there who could provide some kind of guidance on the size of index we are likely to generate, and are there any gotchas in the standard analysis engines for load and query that will cause us issues. Do large indexes cause memory issues on servers? Any help or advice greatly appreciated. -- View this message in context: http://old.nabble.com/Solr-Performance-and-Scalability-tp27552013p27552013.html Sent from the Solr - Dev mailing list archive at Nabble.com. - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Co-Author: Solr 1.4 Enterprise Search Server available from http://www.packtpub.com/solr-1-4-enterprise-search-server Free/Busy: http://tinyurl.com/eric-cal
[jira] Updated: (SOLR-773) Incorporate Local Lucene/Solr
[ https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh updated SOLR-773: --- Attachment: screenshot-1.jpg Idea of fuzzy borders drawing. Incorporate Local Lucene/Solr - Key: SOLR-773 URL: https://issues.apache.org/jira/browse/SOLR-773 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: exampleSpatial.zip, lucene-spatial-2.9-dev.jar, lucene.tar.gz, screenshot-1.jpg, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-spatial_solr.patch, SOLR-773.patch, SOLR-773.patch, solrGeoQuery.tar, spatial-solr.tar.gz Local Lucene has been donated to the Lucene project. It has some Solr components, but we should evaluate how best to incorporate it into Solr. See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-773) Incorporate Local Lucene/Solr
[ https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12787655#action_12787655 ] Eric Pugh edited comment on SOLR-773 at 12/8/09 7:00 PM: - Patrick, I tried out your Batteries Included example, and it worked great. One of the questions I have is that it seems like the scoring process doesn't take into account the distance from a central point.. In other words, if I specify a 10 mile radius, and there is a really high scoring match more then 10 miles out, it doesn't get returned. The radius functions as a strict filter of what gets returned. However, I think what we are really trying to do is to find the best search results, and have distance factored in as well. I was thinking that I could sort of do this fuzzy boundary by making a query with a radius x, and then doing the same query radius x * 2. Then, if any of the documents in x * 2 are much better then in radius x, then to include them. Obviously this would be somewhat clunky to do from the client side! A use case I can think of is searching for gas stations within 5 miles of me, but if a gas station has really cheap gas, and is 6 miles away, then include that. But just a penny cheaper ignore it. I added as a screenshot a drawing of what I was sort of thinking. was (Author: epugh): Idea of fuzzy borders drawing. Incorporate Local Lucene/Solr - Key: SOLR-773 URL: https://issues.apache.org/jira/browse/SOLR-773 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: exampleSpatial.zip, lucene-spatial-2.9-dev.jar, lucene.tar.gz, screenshot-1.jpg, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-spatial_solr.patch, SOLR-773.patch, SOLR-773.patch, solrGeoQuery.tar, spatial-solr.tar.gz Local Lucene has been donated to the Lucene project. It has some Solr components, but we should evaluate how best to incorporate it into Solr. See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1621) Deprecate deployments w/o solr.xml
[ https://issues.apache.org/jira/browse/SOLR-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12785932#action_12785932 ] Eric Pugh commented on SOLR-1621: - Glad to see this bug. In the Solr book we mentioned that this was a very likely thing, and that even if you think you only need 1 core, starting out with a multicore setup is the way to go! Especially when you toss in all the great management features of multiple cores! Deprecate deployments w/o solr.xml -- Key: SOLR-1621 URL: https://issues.apache.org/jira/browse/SOLR-1621 Project: Solr Issue Type: New Feature Affects Versions: 1.5 Reporter: Noble Paul Fix For: 1.5 supporting two different modes of deployments is turning out to be hard. This leads to duplication of code. Moreover there is a lot of confusion on where do we put common configuration. See the mail thread http://markmail.org/message/3m3rqvp2ckausjnf -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Entity Extraction feature
Also, using various services like OpenCalais can work. Assuming you are interested in extracting the types of entities that OpenCalais does! Eric On Nov 18, 2009, at 6:28 AM, Grant Ingersoll wrote: I've used OpenNLP with Lucene/Solr before, if you are looking for a good open source one. It's pretty easy to create a TokenFilter that does the work. On Nov 18, 2009, at 1:25 AM, Pradeep Pujari wrote: Hello all, Does Lucene or Solr has entity extraction feature? If so, what is the wiki URL? Thanks, Pradeep. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Co-Author: Solr 1.4 Enterprise Search Server available from http://www.packtpub.com/solr-1-4-enterprise-search-server Free/Busy: http://tinyurl.com/eric-cal
Re: [VOTE] Release Solr 1.4.0
A happy non binding +1! On Nov 2, 2009, at 10:30 AM, Grant Ingersoll wrote: +1 On Oct 30, 2009, at 9:40 AM, Grant Ingersoll wrote: OK, take 3: http://people.apache.org/~gsingers/solr/1.4.0/ On Oct 30, 2009, at 8:10 AM, Grant Ingersoll wrote: Got it. Will upload shortly. On Oct 29, 2009, at 8:33 PM, Yonik Seeley wrote: On Thu, Oct 29, 2009 at 7:36 PM, Yonik Seeley yo...@lucidimagination.com wrote: Lucene 2.9.1 respin 3 vote has started... I'm downloading now and will test + check in. Done. You're up Grant! -Yonik http://www.lucidimagination.com - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Co-Author: Solr 1.4 Enterprise Search Server available from http://www.packtpub.com/solr-1-4-enterprise-search-server Free/Busy: http://tinyurl.com/eric-cal
[jira] Commented: (SOLR-1294) SolrJS/Javascript client fails in IE8!
[ https://issues.apache.org/jira/browse/SOLR-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12762246#action_12762246 ] Eric Pugh commented on SOLR-1294: - I have compared Colin's patch against my own, and tested his on IE 7 as well. Can we go ahead and commit Colin's patch file? I've got some other patches for SolrJS that I'd like to submit that would do best built on his patch already being in. SolrJS/Javascript client fails in IE8! -- Key: SOLR-1294 URL: https://issues.apache.org/jira/browse/SOLR-1294 Project: Solr Issue Type: Bug Affects Versions: 1.4 Reporter: Eric Pugh Assignee: Ryan McKinley Fix For: 1.4 Attachments: jscalendar.tar, SOLR-1294-full.patch, SOLR-1294-IE8.patch, SOLR-1294.patch, solrjs-ie8-html-syntax-error.patch SolrJS seems to fail with 'jQuery.solrjs' is null or not an object errors under IE8. I am continuing to test if this occurs in IE 6 and 7 as well. This happens on both the Sample online site at http://solrjs.solrstuff.org/test/reuters/ as well as the /trunk/contrib/javascript library. Seems to be a show stopper from the standpoint of really using this library! I have posted a screenshot of the error at http://img.skitch.com/20090717-jejm71u6ghf2dpn3mwrkarigwm.png The error is just a whole bunch of repeated messages in the vein of: Message: 'jQuery.solrjs' is null or not an object Line: 24 Char: 1 Code: 0 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/QueryItem.js Message: 'jQuery.solrjs' is null or not an object Line: 37 Char: 1 Code: 0 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/Manager.js Message: 'jQuery.solrjs' is null or not an object Line: 24 Char: 1 Code: 0 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/AbstractSelectionView.js Message: 'jQuery.solrjs' is null or not an object Line: 27 Char: 1 Code: 0 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/AbstractWidget.js -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1486) Make getting solrJS running easier.
Make getting solrJS running easier. --- Key: SOLR-1486 URL: https://issues.apache.org/jira/browse/SOLR-1486 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Eric Pugh I am attaching a patch for simplifying starting up SolrJS. I found that the indexing process would break on a bad file, so made the indexing Java class a bit more robust. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1486) Make getting solrJS running easier.
[ https://issues.apache.org/jira/browse/SOLR-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh updated SOLR-1486: Attachment: build.xml.patch modification to build.xml to download reuters data. Make getting solrJS running easier. --- Key: SOLR-1486 URL: https://issues.apache.org/jira/browse/SOLR-1486 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Eric Pugh Attachments: build.xml.patch I am attaching a patch for simplifying starting up SolrJS. I found that the indexing process would break on a bad file, so made the indexing Java class a bit more robust. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1294) SolrJS/Javascript client fails in IE8!
[ https://issues.apache.org/jira/browse/SOLR-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12761697#action_12761697 ] Eric Pugh commented on SOLR-1294: - I would echo Bill's comment of don't let this hold up 1.4. I do have SolrJS working for www.newswise.com/search, however I am struggling with backporting my change. I've shot a day trying to back port the change, and I think I need to wait till my colleague, Michael Herndon, who is the JS Ninja to be back on Monday to sort this out. I will keep plugging on this though. SolrJS/Javascript client fails in IE8! -- Key: SOLR-1294 URL: https://issues.apache.org/jira/browse/SOLR-1294 Project: Solr Issue Type: Bug Affects Versions: 1.4 Reporter: Eric Pugh Assignee: Ryan McKinley Fix For: 1.4 Attachments: SOLR-1294-IE8.patch, SOLR-1294.patch, solrjs-ie8-html-syntax-error.patch SolrJS seems to fail with 'jQuery.solrjs' is null or not an object errors under IE8. I am continuing to test if this occurs in IE 6 and 7 as well. This happens on both the Sample online site at http://solrjs.solrstuff.org/test/reuters/ as well as the /trunk/contrib/javascript library. Seems to be a show stopper from the standpoint of really using this library! I have posted a screenshot of the error at http://img.skitch.com/20090717-jejm71u6ghf2dpn3mwrkarigwm.png The error is just a whole bunch of repeated messages in the vein of: Message: 'jQuery.solrjs' is null or not an object Line: 24 Char: 1 Code: 0 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/QueryItem.js Message: 'jQuery.solrjs' is null or not an object Line: 37 Char: 1 Code: 0 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/Manager.js Message: 'jQuery.solrjs' is null or not an object Line: 24 Char: 1 Code: 0 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/AbstractSelectionView.js Message: 'jQuery.solrjs' is null or not an object Line: 27 Char: 1 Code: 0 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/AbstractWidget.js -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1486) Make getting solrJS running easier.
[ https://issues.apache.org/jira/browse/SOLR-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh updated SOLR-1486: Attachment: README First cut of a README file to go in root of /javascript Make getting solrJS running easier. --- Key: SOLR-1486 URL: https://issues.apache.org/jira/browse/SOLR-1486 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Eric Pugh Attachments: build.xml.patch, README I am attaching a patch for simplifying starting up SolrJS. I found that the indexing process would break on a bad file, so made the indexing Java class a bit more robust. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1486) Make getting solrJS running easier.
[ https://issues.apache.org/jira/browse/SOLR-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh updated SOLR-1486: Attachment: ReutersService.java.patch Skip over badly formed files. Make getting solrJS running easier. --- Key: SOLR-1486 URL: https://issues.apache.org/jira/browse/SOLR-1486 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Eric Pugh Attachments: build.xml.patch, README, ReutersService.java.patch I am attaching a patch for simplifying starting up SolrJS. I found that the indexing process would break on a bad file, so made the indexing Java class a bit more robust. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1486) Make getting solrJS running easier.
[ https://issues.apache.org/jira/browse/SOLR-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12761707#action_12761707 ] Eric Pugh commented on SOLR-1486: - These patch files allow you to start up the Reuters example without using the shell script. Please delete from SVN the ./example/reuters/testdata/download-dataset.sh. Also, please put an svn:ignore on /testdata for *.*. I am assuming that integrating the download process into the ant script is acceptable to work around licensing issues with the Reuters data. Eric Make getting solrJS running easier. --- Key: SOLR-1486 URL: https://issues.apache.org/jira/browse/SOLR-1486 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Eric Pugh Attachments: build.xml.patch, README, ReutersService.java.patch I am attaching a patch for simplifying starting up SolrJS. I found that the indexing process would break on a bad file, so made the indexing Java class a bit more robust. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1294) SolrJS/Javascript client fails in IE8!
[ https://issues.apache.org/jira/browse/SOLR-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753792#action_12753792 ] Eric Pugh commented on SOLR-1294: - I need to get a patch created, I had one and lost it, but we have SolrJS properly working with IE6 and IE7, based on the above patches. You can see it at http://www.newswise.com/search SolrJS/Javascript client fails in IE8! -- Key: SOLR-1294 URL: https://issues.apache.org/jira/browse/SOLR-1294 Project: Solr Issue Type: Bug Affects Versions: 1.4 Reporter: Eric Pugh Fix For: 1.4 Attachments: SOLR-1294-IE8.patch, SOLR-1294.patch, solrjs-ie8-html-syntax-error.patch SolrJS seems to fail with 'jQuery.solrjs' is null or not an object errors under IE8. I am continuing to test if this occurs in IE 6 and 7 as well. This happens on both the Sample online site at http://solrjs.solrstuff.org/test/reuters/ as well as the /trunk/contrib/javascript library. Seems to be a show stopper from the standpoint of really using this library! I have posted a screenshot of the error at http://img.skitch.com/20090717-jejm71u6ghf2dpn3mwrkarigwm.png The error is just a whole bunch of repeated messages in the vein of: Message: 'jQuery.solrjs' is null or not an object Line: 24 Char: 1 Code: 0 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/QueryItem.js Message: 'jQuery.solrjs' is null or not an object Line: 37 Char: 1 Code: 0 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/Manager.js Message: 'jQuery.solrjs' is null or not an object Line: 24 Char: 1 Code: 0 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/AbstractSelectionView.js Message: 'jQuery.solrjs' is null or not an object Line: 27 Char: 1 Code: 0 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/AbstractWidget.js -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Indexing MySQL db using script
This question should be directed to the solr-user mailing list. However, look at this wiki page: http://wiki.apache.org/solr/DataImportHandler Eric On Sep 8, 2009, at 5:15 PM, kedardes wrote: Hi, I'm trying to find a way to index an external MySQL db so that searches made against the solr engine will bring back results from that db. I already have a script to index a different CMS db (non open source) and have noticed the Data Import Handler mentioned in these forums. Is there a simple example of the steps required to set this up? Thanks. -- View this message in context: http://www.nabble.com/Indexing-MySQL-db-using-script-tp25354356p25354356.html Sent from the Solr - Dev mailing list archive at Nabble.com. - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Co-Author: Solr 1.4 Enterprise Search Server available from http://www.packtpub.com/solr-1-4-enterprise-search-server Free/Busy: http://tinyurl.com/eric-cal
[jira] Commented: (SOLR-1369) Add HSQLDB Jar to example-dih
[ https://issues.apache.org/jira/browse/SOLR-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12746363#action_12746363 ] Eric Pugh commented on SOLR-1369: - I tweaked the docs to point to HSQLDB 1.8. I'll leave the unzip hsqldb.zip and svn add hsqldb/ and svn ci -m 'expanding example to make getting started easier' hsqldb/ to a committer versus attaching a large patch file! Eric Add HSQLDB Jar to example-dih - Key: SOLR-1369 URL: https://issues.apache.org/jira/browse/SOLR-1369 Project: Solr Issue Type: Improvement Components: contrib - DataImportHandler Reporter: Eric Pugh I went back to show someone the Example-DIH and followed the wiki page directions. I then ran into an error because the hsqldb uses 1.8, and the hsqldb.jar I downloaded from hsqldb.org was 1.9. The 1.9 rc shows up above the 1.8 version. I see two approaches: 1) Be clearer on the docs, maybe embed a direct link to http://sourceforge.net/projects/hsqldb/files/hsqldb/hsqldb_1_8_0/hsqldb_1_8_0_10.zip/download. 2) include hsqldb.jar in the example. I am assuming the reason this wasn't done was because of licensing issues?? Also, any real reason to zip the hsqldb database? It's under 20k expanded and adds another step. Figured I'd get the wisdom of the crowds before changing. Eric -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1369) Add HSQLDB Jar to example-dih
Add HSQLDB Jar to example-dih - Key: SOLR-1369 URL: https://issues.apache.org/jira/browse/SOLR-1369 Project: Solr Issue Type: Improvement Reporter: Eric Pugh I went back to show someone the Example-DIH and followed the wiki page directions. I then ran into an error because the hsqldb uses 1.8, and the hsqldb.jar I downloaded from hsqldb.org was 1.9. The 1.9 rc shows up above the 1.8 version. I see two approaches: 1) Be clearer on the docs, maybe embed a direct link to http://sourceforge.net/projects/hsqldb/files/hsqldb/hsqldb_1_8_0/hsqldb_1_8_0_10.zip/download. 2) include hsqldb.jar in the example. I am assuming the reason this wasn't done was because of licensing issues?? Also, any real reason to zip the hsqldb database? It's under 20k expanded and adds another step. Figured I'd get the wisdom of the crowds before changing. Eric -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1294) SolrJS/Javascript client fails in IE8!
[ https://issues.apache.org/jira/browse/SOLR-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12743148#action_12743148 ] Eric Pugh commented on SOLR-1294: - I am out of the office 8/14 - 8/17. For urgent issues, please contact Jason Hull at jh...@opensourceconnections.com or phone at (434) 409-8451. SolrJS/Javascript client fails in IE8! -- Key: SOLR-1294 URL: https://issues.apache.org/jira/browse/SOLR-1294 Project: Solr Issue Type: Bug Affects Versions: 1.4 Reporter: Eric Pugh Attachments: SOLR-1294-IE8.patch, SOLR-1294.patch, solrjs-ie8-html-syntax-error.patch SolrJS seems to fail with 'jQuery.solrjs' is null or not an object errors under IE8. I am continuing to test if this occurs in IE 6 and 7 as well. This happens on both the Sample online site at http://solrjs.solrstuff.org/test/reuters/ as well as the /trunk/contrib/javascript library. Seems to be a show stopper from the standpoint of really using this library! I have posted a screenshot of the error at http://img.skitch.com/20090717-jejm71u6ghf2dpn3mwrkarigwm.png The error is just a whole bunch of repeated messages in the vein of: Message: 'jQuery.solrjs' is null or not an object Line: 24 Char: 1 Code: 0 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/QueryItem.js Message: 'jQuery.solrjs' is null or not an object Line: 37 Char: 1 Code: 0 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/Manager.js Message: 'jQuery.solrjs' is null or not an object Line: 24 Char: 1 Code: 0 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/AbstractSelectionView.js Message: 'jQuery.solrjs' is null or not an object Line: 27 Char: 1 Code: 0 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/AbstractWidget.js -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-868) Prepare solrjs trunk to be integrated into contrib
[ https://issues.apache.org/jira/browse/SOLR-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742533#action_12742533 ] Eric Pugh commented on SOLR-868: I won't get to it this week I think, but I want to just take solrjs and point it at our default dataset. Not sure that it will expose *ALL* of the cool widgets the way the reuters data does, but we could also point to the reuters demo, or make the download of the reuters data something we document, and not have it be automatic.. More of an Now you've seen what it looks like with the example data, check this out... Prepare solrjs trunk to be integrated into contrib -- Key: SOLR-868 URL: https://issues.apache.org/jira/browse/SOLR-868 Project: Solr Issue Type: Task Affects Versions: 1.4 Reporter: Matthias Epheser Assignee: Ryan McKinley Fix For: 1.4 Attachments: javascript_contrib.zip, reutersimporter.jar, SOLR-868-testdata.patch, solrjs.zip This patch includes a zipfile snapshot of current solrjs trunk. The folder structure is applied to standard solr layout. It can be extracted to contrib/javascript. it includes a build.xml: * ant dist - creates a single js file and a jar that holds velocity templates. * ant docs - creates js docs. test in browser: doc/index.html * ant example-init - (depends ant dist on solr root) copies the current built of solr.war and solr-velocity.jar to example/testsolr/.. * ant example-start - starts the testsolr server on port 8983 * ant example-import - imports 3000 test data rows (requires a started testserver) Point your browser to example/testClientside.html ,example/testServerSide.html or test/reuters/index.html to see it working. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Doc Question for Solr Cell
I was refreshing my mind on the newly updated parameters on Solr Cell, and noticed that the Configuration section on http://wiki.apache.org/solr/ExtractingRequestHandler is out of date. Before I fixed it, I wanted to confirm that requestHandler name=/update/extract class=org.apache.solr.handler.extraction.ExtractingRequestHandler lst name=defaults str name=ext.map.Last-Modifiedlast_modified/str bool name=ext.ignore.und.fltrue/bool /lst Should be changed to map.Last-Modified only, and that the ignore.und.fl capability is now implemented via uprefix: uprefix=prefix - Prefix all fields that are not defined in the schema with the given prefix. This is very useful when combined with dynamic field definitions. Example: uprefix=ignored_ would effectively ignore all unknown fields generated by Tika given the example schema containsdynamicField name=ignored_* type=ignored/ Eric - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: Param Naming and Abbreviations
I wonder if we could use either for the various param names that can be used in either solrconfig.xml or passed in via request? Use either fields or fl? As we add more rich functionality like in the extractor aren't we going to be passing more and more params, and having long names is much easier to read. And then for those optimizing performance, they can swap to the short names? On Jul 28, 2009, at 6:50 AM, Noble Paul നോബിള് नोब्ळ् wrote: +1 for names in its actual form as long as it is not very long. In config it is nice to see long names becaase it enhances readability. But ,for request params, short ones are better because that price is paid by each request. imagine 'facetQuery' instead of 'fq' or fields instead of 'fl' On Tue, Jul 28, 2009 at 4:06 PM, Grant Ingersollgsing...@apache.org wrote: OK, color me confused about how naming should be done for params. There clearly seems to be two camps in Solr-land: 1. those who abbreviate params and 2. those who don't. Pick your sides, please! ;-) On SOLR-284 and SOLR-769, I had long names and Yonik changed them to be shorter (uprefix, anyone? Bueller? Yeah, it means unknown prefix). On SOLR-1237, the general feedback is that evt should be event and that newSrchr should be newSearcher or new_searcher or something like that. The SpellCheckComp. tends to be verbose, while faceting tends to be succinct. Thus, I'd like to suggest we layout some conventions for naming, as I personally am confused. Once we do this, we can wiki it up and then have something to refer others too. -Grnt Ingrsll (aka Grant Ingersoll) -- - Noble Paul | Principal Engineer| AOL | http://aol.com - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: Param Naming and Abbreviations
I think it's just if you want a richer syntax for naming things or not. I agree, it does add complexity because now everything has two names! But I liken it somewhat to the parameters for a command line app. I can do svn update or svn up. And when I am first learning a new command line app, I tend to use the longer name because it is more english like. But as I run the same command over and over, I learn the shortcut name. Eric On Jul 28, 2009, at 6:57 AM, Noble Paul നോബിള് नोब्ळ् wrote: assuming that there are no name collisions shorter ones are ok for request parameters.Having both the options may not really help On Tue, Jul 28, 2009 at 4:24 PM, Eric Pughep...@opensourceconnections.com wrote: I wonder if we could use either for the various param names that can be used in either solrconfig.xml or passed in via request? Use either fields or fl? As we add more rich functionality like in the extractor aren't we going to be passing more and more params, and having long names is much easier to read. And then for those optimizing performance, they can swap to the short names? On Jul 28, 2009, at 6:50 AM, Noble Paul നോബിള് नोब्ळ् wrote: +1 for names in its actual form as long as it is not very long. In config it is nice to see long names becaase it enhances readability. But ,for request params, short ones are better because that price is paid by each request. imagine 'facetQuery' instead of 'fq' or fields instead of 'fl' On Tue, Jul 28, 2009 at 4:06 PM, Grant Ingersollgsing...@apache.org wrote: OK, color me confused about how naming should be done for params. There clearly seems to be two camps in Solr-land: 1. those who abbreviate params and 2. those who don't. Pick your sides, please! ;-) On SOLR-284 and SOLR-769, I had long names and Yonik changed them to be shorter (uprefix, anyone? Bueller? Yeah, it means unknown prefix). On SOLR-1237, the general feedback is that evt should be event and that newSrchr should be newSearcher or new_searcher or something like that. The SpellCheckComp. tends to be verbose, while faceting tends to be succinct. Thus, I'd like to suggest we layout some conventions for naming, as I personally am confused. Once we do this, we can wiki it up and then have something to refer others too. -Grnt Ingrsll (aka Grant Ingersoll) -- - Noble Paul | Principal Engineer| AOL | http://aol.com - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal -- - Noble Paul | Principal Engineer| AOL | http://aol.com - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
[jira] Commented: (SOLR-868) Prepare solrjs trunk to be integrated into contrib
[ https://issues.apache.org/jira/browse/SOLR-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12735723#action_12735723 ] Eric Pugh commented on SOLR-868: I don't think so unless we want to have it deploy into the /examples app and update the related tutorial pages? That would mean we would need to change the data set that the example expects to use from Reuters data to shopping data. Which we can do, and would raise the visibility...? Should that be another ticket? Prepare solrjs trunk to be integrated into contrib -- Key: SOLR-868 URL: https://issues.apache.org/jira/browse/SOLR-868 Project: Solr Issue Type: Task Affects Versions: 1.4 Reporter: Matthias Epheser Assignee: Ryan McKinley Fix For: 1.4 Attachments: javascript_contrib.zip, reutersimporter.jar, SOLR-868-testdata.patch, solrjs.zip This patch includes a zipfile snapshot of current solrjs trunk. The folder structure is applied to standard solr layout. It can be extracted to contrib/javascript. it includes a build.xml: * ant dist - creates a single js file and a jar that holds velocity templates. * ant docs - creates js docs. test in browser: doc/index.html * ant example-init - (depends ant dist on solr root) copies the current built of solr.war and solr-velocity.jar to example/testsolr/.. * ant example-start - starts the testsolr server on port 8983 * ant example-import - imports 3000 test data rows (requires a started testserver) Point your browser to example/testClientside.html ,example/testServerSide.html or test/reuters/index.html to see it working. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1295) Move javascript client from contrib to clients dir in source
[ https://issues.apache.org/jira/browse/SOLR-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12733017#action_12733017 ] Eric Pugh commented on SOLR-1295: - SolrJS is a really neat way of interacting with Solr, but doesn't really get the love it needs. Like the current it doesn't work with IE8 bug! I think that having it packaged properly, and getting the docs more up to snuff will make folks use it more and hopefully lead to more development. Move javascript client from contrib to clients dir in source Key: SOLR-1295 URL: https://issues.apache.org/jira/browse/SOLR-1295 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Eric Pugh Assignee: Erik Hatcher Fix For: 1.4 Attachments: move_js_from_contrib_to_client.patch It seems odd that the javascript client is in the contrib, unless you think of it more of a library that you then hack up. At any rate, here is the patch assuming you have done svn mv http://svn.apache.org/repos/asf/lucene/solr/trunk/contrib/javascript http://svn.apache.org/repos/asf/lucene/solr/trunk/client/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1294) SolrJS/Javascript client fails in IE8!
[ https://issues.apache.org/jira/browse/SOLR-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh updated SOLR-1294: Description: SolrJS seems to fail with 'jQuery.solrjs' is null or not an object errors under IE8. I am continuing to test if this occurs in IE 6 and 7 as well. This happens on both the Sample online site at http://solrjs.solrstuff.org/test/reuters/ as well as the /trunk/contrib/javascript library. Seems to be a show stopper from the standpoint of really using this library! I have posted a screenshot of the error at http://img.skitch.com/20090717-jejm71u6ghf2dpn3mwrkarigwm.png The error is just a whole bunch of repeated messages in the vein of: Message: 'jQuery.solrjs' is null or not an object Line: 24 Char: 1 Code: 0 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/QueryItem.js Message: 'jQuery.solrjs' is null or not an object Line: 37 Char: 1 Code: 0 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/Manager.js Message: 'jQuery.solrjs' is null or not an object Line: 24 Char: 1 Code: 0 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/AbstractSelectionView.js Message: 'jQuery.solrjs' is null or not an object Line: 27 Char: 1 Code: 0 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/AbstractWidget.js was:SolrJS seems to fail with 'jQuery.solrjs' is null or not an object errors under IE8. I am continuing to test if this occurs in IE 6 and 7 as well. This happens on both the Sample online site at http://solrjs.solrstuff.org/test/reuters/ as well as the /trunk/contrib/javascript library. Seems to be a show stopper from the standpoint of really using this library! SolrJS/Javascript client fails in IE8! -- Key: SOLR-1294 URL: https://issues.apache.org/jira/browse/SOLR-1294 Project: Solr Issue Type: Bug Affects Versions: 1.4 Reporter: Eric Pugh SolrJS seems to fail with 'jQuery.solrjs' is null or not an object errors under IE8. I am continuing to test if this occurs in IE 6 and 7 as well. This happens on both the Sample online site at http://solrjs.solrstuff.org/test/reuters/ as well as the /trunk/contrib/javascript library. Seems to be a show stopper from the standpoint of really using this library! I have posted a screenshot of the error at http://img.skitch.com/20090717-jejm71u6ghf2dpn3mwrkarigwm.png The error is just a whole bunch of repeated messages in the vein of: Message: 'jQuery.solrjs' is null or not an object Line: 24 Char: 1 Code: 0 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/QueryItem.js Message: 'jQuery.solrjs' is null or not an object Line: 37 Char: 1 Code: 0 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/Manager.js Message: 'jQuery.solrjs' is null or not an object Line: 24 Char: 1 Code: 0 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/AbstractSelectionView.js Message: 'jQuery.solrjs' is null or not an object Line: 27 Char: 1 Code: 0 URI: file:///C:/dev/projects/lib/solr/contrib/javascript/src/core/AbstractWidget.js -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1295) Move javascript client from contrib to clients dir in source
[ https://issues.apache.org/jira/browse/SOLR-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh updated SOLR-1295: Attachment: move_js_from_contrib_to_client.patch Move javascript client from contrib to clients dir in source Key: SOLR-1295 URL: https://issues.apache.org/jira/browse/SOLR-1295 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Eric Pugh Fix For: 1.4 Attachments: move_js_from_contrib_to_client.patch It seems odd that the javascript client is in the contrib, unless you think of it more of a library that you then hack up. At any rate, here is the patch assuming you have done svn mv http://svn.apache.org/repos/asf/lucene/solr/trunk/contrib/javascript http://svn.apache.org/repos/asf/lucene/solr/trunk/client/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1295) Move javascript client from contrib to clients dir in source
Move javascript client from contrib to clients dir in source Key: SOLR-1295 URL: https://issues.apache.org/jira/browse/SOLR-1295 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Eric Pugh Fix For: 1.4 Attachments: move_js_from_contrib_to_client.patch It seems odd that the javascript client is in the contrib, unless you think of it more of a library that you then hack up. At any rate, here is the patch assuming you have done svn mv http://svn.apache.org/repos/asf/lucene/solr/trunk/contrib/javascript http://svn.apache.org/repos/asf/lucene/solr/trunk/client/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: SOLR automatic failover
Isn't the pool of slave Solr's behind a load balancer similar to SOLR automatic failover? Guess I am not sure what you mean? Seems like what ZooKeeper does (http://hadoop.apache.org/zookeeper/docs/current/zookeeperOver.html ) is analogous to the Solr master/pool of slave servers. On Jul 14, 2009, at 1:40 AM, Jason Rutherglen wrote: Basic failover, we can build from there? 2009/7/13 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com nope . what do you have in mind? On Tue, Jul 14, 2009 at 4:56 AM, Jason Rutherglenjason.rutherg...@gmail.com wrote: Has anyone looked at implementing automatic failover in SOLR using a naming service (like Zookeeper)? -- - Noble Paul | Principal Engineer| AOL | http://aol.com - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
[jira] Commented: (SOLR-284) Parsing Rich Document Types
[ https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12726115#action_12726115 ] Eric Pugh commented on SOLR-284: I am out of the office 6/29 - 6/30. For urgent issues, please contact Jason Hull at jh...@opensourceconnections.com or phone at (434) 409-8451. Parsing Rich Document Types --- Key: SOLR-284 URL: https://issues.apache.org/jira/browse/SOLR-284 Project: Solr Issue Type: New Feature Components: update Reporter: Eric Pugh Assignee: Grant Ingersoll Fix For: 1.4 Attachments: libs.zip, rich.patch, rich.patch, rich.patch, rich.patch, rich.patch, rich.patch, rich.patch, SOLR-284-no-key-gen.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, solr-word.pdf, source.zip, test-files.zip, test-files.zip, test.zip, un-hardcode-id.diff I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr. There is a wiki page with information here: http://wiki.apache.org/solr/UpdateRichDocuments -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-284) Parsing Rich Document Types
[ https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12724856#action_12724856 ] Eric Pugh commented on SOLR-284: I am out of the office 6/29 - 6/30. For urgent issues, please contact Jason Hull at jh...@opensourceconnections.com or phone at (434) 409-8451. Parsing Rich Document Types --- Key: SOLR-284 URL: https://issues.apache.org/jira/browse/SOLR-284 Project: Solr Issue Type: New Feature Components: update Reporter: Eric Pugh Assignee: Grant Ingersoll Fix For: 1.4 Attachments: libs.zip, rich.patch, rich.patch, rich.patch, rich.patch, rich.patch, rich.patch, rich.patch, SOLR-284-no-key-gen.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, SOLR-284.patch, solr-word.pdf, source.zip, test-files.zip, test-files.zip, test.zip, un-hardcode-id.diff I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr. There is a wiki page with information here: http://wiki.apache.org/solr/UpdateRichDocuments -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Time to rationalize SolrJS Docs???
Hi all, I started to create a bug for this, but thought I would just post to the mailing list. I've been using SolrJS for a while from http://solrjs.solrstuff.org/ , however from my conversations with Matthias, the plan is to move to the Solr JavaScript contrib module being the master, and indeed my typo patch was applied there! So, the docs at http://solrjs.solrstuff.org/ don't reflect the migration into Solr, and that is something Matthias will have to fix I assume. However, we don't list SolrJS on the homepage under http://wiki.apache.org/solr/ under Solr Clients, and the wiki page at http://wiki.apache.org/solr/SolrJS is confusing. I am happy to clean up the docs a bit, and point to the contrib/ javascript as the correct version. I just thought I would run it by the list first for confirmation that it should be done! Also, isn't SolrJS really a client versus a contrib? Seems like it should be in ./clients/javascript along with the Ruby and Python clients in source control? Eric - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: solr 1.4 lite ?
Has anyone really complained about the size of Solr? One of the things I like about Solr is how simple it is to get things up and running, and how accesible the example directory makes everything. When I first played with DIH and Cell, everything was there. I didn't have to chase down .jar's from multiple places. Maybe if it was as simple as ant build-example-cell and ant build-example-dih then there might not be a barrier to entry for new users. I'd be curious to hear what percentage of folks deploy solr based on the example app, and how many just start out with the most stripped down solr.war and build everything up from there? Eric On Jun 10, 2009, at 6:56 PM, Grant Ingersoll wrote: +1 Should be easy enough to conjure up the Ant magic. On Jun 10, 2009, at 5:55 PM, Yonik Seeley wrote: Thanks for bringing this up Patrick... clearly it would be nice to avoid (or mandate) 100MB downloads! -Yonik http://www.lucidimagination.com On Wed, Jun 10, 2009 at 5:50 PM, patrick o'leary pj...@pjaol.com wrote: Just using the apache-solr example directory, it seems to have gotten a bit big e.g. $ du -sh * 13M apache-solr-1.3.0 92M apache-solr-1.4.0 The biggest space user being example-DIH apache-solr-1.4.0/example $ du -sh * 4.0KREADME.txt 5.5Mclustering 80K etc *32M example-DIH* 42K exampleAnalysis 168Kexampledocs 13M lib 52K logs 118Kmulticore *31M solr* 20K start.jar 12M webapps 12K work solr/lib is now 30mb apache-solr-1.4.0/example/solr/lib $ ls -lhS total 30M -rwx-- 1 pjaol None 14M Jun 10 17:08 ooxml-schemas-1.0.jar -rwx-- 1 pjaol None 4.3M Jun 10 17:08 icu4j-3.8.jar -rwx-- 1 pjaol None 3.2M Jun 10 17:08 pdfbox-0.7.3.jar -rwx-- 1 pjaol None 2.6M Jun 10 17:08 xmlbeans-2.3.0.jar -rwx-- 1 pjaol None 1.5M Jun 10 17:08 poi-3.5-beta5.jar -rwx-- 1 pjaol None 1.2M Jun 10 17:08 xercesImpl-2.8.1.jar -rwx-- 1 pjaol None 1.1M Jun 10 17:08 bcprov-jdk14-132.jar as opposed to 0 for 1.3.0 This pushes solr to over a 100mb download for features that I'm sure can be packaged up separately as they look seldom used. It would make sense if there's going to be a batteries included version to also have a solr-lite version. P -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
[jira] Created: (SOLR-1178) Retrieve CoreContainer from EmbeddedSolrServer
Retrieve CoreContainer from EmbeddedSolrServer -- Key: SOLR-1178 URL: https://issues.apache.org/jira/browse/SOLR-1178 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Eric Pugh Priority: Minor Submitting the patch suggested by Paul Noble to deal with this issue: Hi all, I notice that when I use EmbeddedSolrServer I have to use Control C to stop the process. I think the way to shut it down is by calling coreContainer.shutdown(). However, is it possible to get the coreContainer from a SolrServer object? Right now it is defined as protected final CoreContainer coreContainer;. I wanted to do: ((EmbeddedSolrServer)solr)getCoreContainer.shutdown(); But is seem I need to keep my own reference to the coreContainer? Is changing this worth a patch? Eric -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Compile error from new contrib clustering?
Okay... I've commented it out by tweaking the contrib-crawl macrodef and adding an ignores in common-build.xml: fileset dir=. includes=contrib/*/build.xml excludes=**/clustering/build.xml / Does anyone have a link to Java 1.6 for OSX? I see references to some sort of Update 2 for Java, but haven't found a download link. Eric On May 20, 2009, at 2:07 PM, Grant Ingersoll wrote: I confirm it's there. For now, the work around is to use 1.6. Can anyone reproduce on Windows or Linux? -Grant On May 20, 2009, at 1:43 PM, Grant Ingersoll wrote: Hmm, checking. I know it spit out warnings on that stuff, but didn't think it would cause an error. On May 20, 2009, at 12:21 PM, Eric Pugh wrote: Hi all, Anyone else getting a compile error from the new contrib/ clustering stuff? On OSX with Java 1.5: budapest:asf_solr_src epugh$ java -version java version 1.5.0_16 Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_16- b06-284) Java HotSpot(TM) Client VM (build 1.5.0_16-133, mixed mode, sharing) I am getting: compile: [javac] Compiling 7 source files to /trunk/asf_solr_src/contrib/ clustering/build/classes [javac] An exception has occurred in the compiler (1.5.0_16). Please file a bug at the Java Developer Connection (http://java.sun.com/webapps/bugreport ) after checking the Bug Parade for duplicates. Include your program and the following diagnostic in your report. Thank you. [javac] com.sun.tools.javac.code.Symbol$CompletionFailure: file org/simpleframework/xml/Root.class not found Eric - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Compile error from new contrib clustering?
Hi all, Anyone else getting a compile error from the new contrib/clustering stuff? On OSX with Java 1.5: budapest:asf_solr_src epugh$ java -version java version 1.5.0_16 Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_16- b06-284) Java HotSpot(TM) Client VM (build 1.5.0_16-133, mixed mode, sharing) I am getting: compile: [javac] Compiling 7 source files to /trunk/asf_solr_src/contrib/ clustering/build/classes [javac] An exception has occurred in the compiler (1.5.0_16). Please file a bug at the Java Developer Connection (http://java.sun.com/webapps/bugreport ) after checking the Bug Parade for duplicates. Include your program and the following diagnostic in your report. Thank you. [javac] com.sun.tools.javac.code.Symbol$CompletionFailure: file org/simpleframework/xml/Root.class not found Eric - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
[jira] Updated: (SOLR-1178) Retrieve CoreContainer from EmbeddedSolrServer
[ https://issues.apache.org/jira/browse/SOLR-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh updated SOLR-1178: Attachment: embedded_solr_container.patch Patch file, and I added a really stupidly simple test. I added it to TestSolrProperties.java just cause it didn't seem worth creating another .java file, and I don't know if we need it. Retrieve CoreContainer from EmbeddedSolrServer -- Key: SOLR-1178 URL: https://issues.apache.org/jira/browse/SOLR-1178 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Eric Pugh Priority: Minor Attachments: embedded_solr_container.patch Submitting the patch suggested by Paul Noble to deal with this issue: Hi all, I notice that when I use EmbeddedSolrServer I have to use Control C to stop the process. I think the way to shut it down is by calling coreContainer.shutdown(). However, is it possible to get the coreContainer from a SolrServer object? Right now it is defined as protected final CoreContainer coreContainer;. I wanted to do: ((EmbeddedSolrServer)solr)getCoreContainer.shutdown(); But is seem I need to keep my own reference to the coreContainer? Is changing this worth a patch? Eric -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Compile error from new contrib clustering?
Thanks! I don't know that I have ever see the Java Preferences app before. At any rate, now I have: budapest:asf_solr_src epugh$ java -version java version 1.6.0_07 Java(TM) SE Runtime Environment (build 1.6.0_07-b06-153) Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_07-b06-57, mixed mode) but still getting the same: An exception has occurred in the compiler (1.5.0_16). Please file a bug at the Java Developer Connection (http://java.sun.com/webapps/bugreport ) after checking the Bug Parade for duplicates. Include your program and the following diagnostic in your report. Thank you. [javac] com.sun.tools.javac.code.Symbol$CompletionFailure: file org/simpleframework/xml/Root.class not found I ran ant -diagnostics and saw it was still using 1.5. Turns out that JAVA_HOME was pointed to /Library/Java/Home and I needed it to point to /usr/bin/java. So I did export JAVA_HOME=/usr and it seems to work. The cluster code all compiled. Hopefully this helps out anyone else on OSX Eric On May 20, 2009, at 3:11 PM, Matt Weber wrote: Java 1.6 is only available for Leopard. It should be installed by default, use the java preferences app to make it your default. Thanks, Matt Weber eSr Technologies http://www.esr-technologies.com On May 20, 2009, at 12:04 PM, Eric Pugh wrote: Okay... I've commented it out by tweaking the contrib-crawl macrodef and adding an ignores in common-build.xml: fileset dir=. includes=contrib/*/build.xml excludes=**/clustering/build.xml / Does anyone have a link to Java 1.6 for OSX? I see references to some sort of Update 2 for Java, but haven't found a download link. Eric On May 20, 2009, at 2:07 PM, Grant Ingersoll wrote: I confirm it's there. For now, the work around is to use 1.6. Can anyone reproduce on Windows or Linux? -Grant On May 20, 2009, at 1:43 PM, Grant Ingersoll wrote: Hmm, checking. I know it spit out warnings on that stuff, but didn't think it would cause an error. On May 20, 2009, at 12:21 PM, Eric Pugh wrote: Hi all, Anyone else getting a compile error from the new contrib/ clustering stuff? On OSX with Java 1.5: budapest:asf_solr_src epugh$ java -version java version 1.5.0_16 Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_16- b06-284) Java HotSpot(TM) Client VM (build 1.5.0_16-133, mixed mode, sharing) I am getting: compile: [javac] Compiling 7 source files to /trunk/asf_solr_src/contrib/ clustering/build/classes [javac] An exception has occurred in the compiler (1.5.0_16). Please file a bug at the Java Developer Connection (http://java.sun.com/webapps/bugreport ) after checking the Bug Parade for duplicates. Include your program and the following diagnostic in your report. Thank you. [javac] com.sun.tools.javac.code.Symbol$CompletionFailure: file org/simpleframework/xml/Root.class not found Eric - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/ Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
Re: Compile error from new contrib clustering?
Thanks! On May 20, 2009, at 3:40 PM, Matt Weber wrote: export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/ Versions/1.6.0/Home/ - Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com Free/Busy: http://tinyurl.com/eric-cal
[jira] Created: (SOLR-1100) Typo fixes for solrjs docs
Typo fixes for solrjs docs -- Key: SOLR-1100 URL: https://issues.apache.org/jira/browse/SOLR-1100 Project: Solr Issue Type: Improvement Reporter: Eric Pugh Priority: Minor Matthias suggested I put in a bug here for me small documentation fixes which were done against http://solrstuff.org/svn/solrjs/trunk/. Not sure if that is the latest or what is in the ASF solr contrib/javascript directory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1100) Typo fixes for solrjs docs
[ https://issues.apache.org/jira/browse/SOLR-1100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh updated SOLR-1100: Attachment: typos.patch Small typo fixes against http://solrstuff.org/svn/solrjs/trunk/ Typo fixes for solrjs docs -- Key: SOLR-1100 URL: https://issues.apache.org/jira/browse/SOLR-1100 Project: Solr Issue Type: Improvement Reporter: Eric Pugh Priority: Minor Attachments: typos.patch Matthias suggested I put in a bug here for me small documentation fixes which were done against http://solrstuff.org/svn/solrjs/trunk/. Not sure if that is the latest or what is in the ASF solr contrib/javascript directory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-284) Parsing Rich Document Types
[ https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12647003#action_12647003 ] Eric Pugh commented on SOLR-284: Grant, I am really excited that you are looking at this patch! While I am proud of it, and very proud of the number of organizations that have used it, and the people who have improved it (Thanks Chris!); it was just written to scratch an itch, and feel free to rip it apart to come up with a better solution for Solr. The ability for Solr to injest more formats I think is key aspect, not how this patch works. Parsing Rich Document Types --- Key: SOLR-284 URL: https://issues.apache.org/jira/browse/SOLR-284 Project: Solr Issue Type: New Feature Components: update Reporter: Eric Pugh Assignee: Grant Ingersoll Fix For: 1.4 Attachments: libs.zip, rich.patch, rich.patch, rich.patch, rich.patch, rich.patch, rich.patch, rich.patch, source.zip, test-files.zip, test-files.zip, test.zip, un-hardcode-id.diff I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr. There is a wiki page with information here: http://wiki.apache.org/solr/UpdateRichDocuments -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-284) Parsing Rich Document Types
[ https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12583231#action_12583231 ] Eric Pugh commented on SOLR-284: Chris, I like what you are thinking... Really this is sort of becoming the AllDocumentsUnderTheSunRequestHandler, but what that highlights is that the current solution really doesn't do what we need, which is making it dirt simple to add new handlers... While there are some efforts under way to do that, to provide the uber solution, I think adding another hack/method to RichDocumentRequestHandler is cool with me. Since it's just a patch file, feel free to take it, munge it, and post it back as the current patch. If you do, make sure to add to the docs on the wiki at http://wiki.apache.org/solr/UpdateRichDocuments. Heck, you may want to rip in Pompo's fix as well! Parsing Rich Document Types --- Key: SOLR-284 URL: https://issues.apache.org/jira/browse/SOLR-284 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.3 Reporter: Eric Pugh Fix For: 1.3 Attachments: libs.zip, rich.patch, source.zip, test-files.zip, test.zip I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr. There is a wiki page with information here: http://wiki.apache.org/solr/UpdateRichDocuments -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-284) Parsing Rich Document Types
[ https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12583233#action_12583233 ] Eric Pugh commented on SOLR-284: Oh, and don't forget to vote for it as well: https://issues.apache.org/jira/browse/SOLR?report=com.atlassian.jira.plugin.system.project:popularissues-panel It's the current leading vote getter! Parsing Rich Document Types --- Key: SOLR-284 URL: https://issues.apache.org/jira/browse/SOLR-284 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.3 Reporter: Eric Pugh Fix For: 1.3 Attachments: libs.zip, rich.patch, source.zip, test-files.zip, test.zip I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr. There is a wiki page with information here: http://wiki.apache.org/solr/UpdateRichDocuments -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-284) Parsing Rich Document Types
[ https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12541879 ] Eric Pugh commented on SOLR-284: Juri, Thanks for the vote on the issue! The next time I update this patch to work with the latest code, I'll apply your change. Since this is still a pending patch, I am not actively maintaining it. Thanks for voting for this patch, there is only one other patch with more votes, hopefully it will be added soon. I'd love to hear what the use case you have for this patch is. https://issues.apache.org/jira/browse/SOLR?report=com.atlassian.jira.plugin.system.project:popularissues-panel Eric Parsing Rich Document Types --- Key: SOLR-284 URL: https://issues.apache.org/jira/browse/SOLR-284 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.3 Reporter: Eric Pugh Fix For: 1.3 Attachments: libs.zip, rich.patch, source.zip, test-files.zip, test.zip I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr. There is a wiki page with information here: http://wiki.apache.org/solr/UpdateRichDocuments -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-284) Parsing Rich Document Types
[ https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh updated SOLR-284: --- Attachment: (was: rich.patch) Parsing Rich Document Types --- Key: SOLR-284 URL: https://issues.apache.org/jira/browse/SOLR-284 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.3 Reporter: Eric Pugh Fix For: 1.3 Attachments: libs.zip, rich.patch, test-files.zip I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr. I am attaching a patch file with the code changes, and if this looks good, will add a page similar to http://wiki.apache.org/solr/UpdateCSV. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-284) Parsing Rich Document Types
[ https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh updated SOLR-284: --- Attachment: rich.patch Update patches for revision 572774 Parsing Rich Document Types --- Key: SOLR-284 URL: https://issues.apache.org/jira/browse/SOLR-284 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.3 Reporter: Eric Pugh Fix For: 1.3 Attachments: libs.zip, rich.patch, test-files.zip I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr. I am attaching a patch file with the code changes, and if this looks good, will add a page similar to http://wiki.apache.org/solr/UpdateCSV. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-284) Parsing Rich Document Types
[ https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh updated SOLR-284: --- Attachment: source.zip Java Source code for RichDocumentRequestHandler and friends. Parsing Rich Document Types --- Key: SOLR-284 URL: https://issues.apache.org/jira/browse/SOLR-284 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.3 Reporter: Eric Pugh Fix For: 1.3 Attachments: libs.zip, rich.patch, source.zip, test-files.zip I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr. I am attaching a patch file with the code changes, and if this looks good, will add a page similar to http://wiki.apache.org/solr/UpdateCSV. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-284) Parsing Rich Document Types
[ https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh updated SOLR-284: --- Attachment: test.zip test code, this time with granted license! Parsing Rich Document Types --- Key: SOLR-284 URL: https://issues.apache.org/jira/browse/SOLR-284 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.3 Reporter: Eric Pugh Fix For: 1.3 Attachments: libs.zip, rich.patch, source.zip, test-files.zip, test.zip I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr. I am attaching a patch file with the code changes, and if this looks good, will add a page similar to http://wiki.apache.org/solr/UpdateCSV. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-284) Parsing Rich Document Types
[ https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh updated SOLR-284: --- Attachment: test.zip add the test code for richdocumenthandler. Parsing Rich Document Types --- Key: SOLR-284 URL: https://issues.apache.org/jira/browse/SOLR-284 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.3 Reporter: Eric Pugh Fix For: 1.3 Attachments: libs.zip, rich.patch, source.zip, test-files.zip, test.zip I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr. I am attaching a patch file with the code changes, and if this looks good, will add a page similar to http://wiki.apache.org/solr/UpdateCSV. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-284) Parsing Rich Document Types
[ https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh updated SOLR-284: --- Attachment: (was: test.zip) Parsing Rich Document Types --- Key: SOLR-284 URL: https://issues.apache.org/jira/browse/SOLR-284 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.3 Reporter: Eric Pugh Fix For: 1.3 Attachments: libs.zip, rich.patch, source.zip, test-files.zip, test.zip I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr. I am attaching a patch file with the code changes, and if this looks good, will add a page similar to http://wiki.apache.org/solr/UpdateCSV. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-284) Parsing Rich Document Types
[ https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh updated SOLR-284: --- Description: I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr. There is a wiki page with information here: http://wiki.apache.org/solr/UpdateRichDocuments was: I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr. I am attaching a patch file with the code changes, and if this looks good, will add a page similar to http://wiki.apache.org/solr/UpdateCSV. Parsing Rich Document Types --- Key: SOLR-284 URL: https://issues.apache.org/jira/browse/SOLR-284 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.3 Reporter: Eric Pugh Fix For: 1.3 Attachments: libs.zip, rich.patch, source.zip, test-files.zip, test.zip I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr. There is a wiki page with information here: http://wiki.apache.org/solr/UpdateRichDocuments -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Rich Docs Indexing
Hi all, I've been working with the RichDocumentRequestHandler (http:// issues.apache.org/jira/browse/SOLR-284) for the past weeks, and it seems to be working quite well. We discovered that when we throw a 27 MB PDF document at it we needed to beef up the Java Heap size, and we haven't come up with a great solution for handling PDF documents that have a password on them, beyond not indexing them. I wanted to see if I could get some momentum going on seeing if this is something that the committers want in Solr 1.3... I'd like to write up a wiki page similar to http://wiki.apache.org/solr/UpdateCSV page that would give folks a chance to see what this code can do, but highlight that it is a wiki page about just a patch file? Would this be okay, or misleading to folks? I've updated the patch to revision 555996. Thanks for your consideration! PS, is anyone going to be at OSCON in two weeks? I'd love to meet up with some other Solr folks. Eric --- Principal OpenSource Connections Site: http://www.opensourceconnections.com Blog: http://blog.opensourceconnections.com Cell: 1-434-466-1467
Re: URL Encoding/Decoding
Thanks... I am backing out my code! On Jul 10, 2007, at 12:45 AM, Chris Hostetter wrote: the URL encoding/decoding in Solr only happens when dealing with HTTP based requests. When writing unit test that deal with the SolrTestHarness (and LocalSOlrQueryRequest which is what the loadLocal() and req() methods do under the covers) you shouldn't be doing any URL escaping because no URLs are involved. : new code that showed they were being encoded But I think it may : have been because the unit test don't operate through a regular HTTP : layer? bingo. -Hoss --- Principal OpenSource Connections Site: http://www.opensourceconnections.com Blog: http://blog.opensourceconnections.com Cell: 1-434-466-1467
URL Encoding/Decoding
Hi all, My patch for adding rich unstructured content (https:// issues.apache.org/jira/browse/SOLR-284) has a problem when some of the extra field data passed in via the get request have spaces etc.. The content comes through URL encoded. Should the SolrParams object handle decoding of parameters, or should that be the domain of my RichDocumentRequestHandler since only some parameters will have URL encoding. Cheers, Eric Pugh --- Principal OpenSource Connections Site: http://www.opensourceconnections.com Blog: http://blog.opensourceconnections.com Cell: 1-434-466-1467
Re: URL Encoding/Decoding
It might have been... I wrote some code to decode them, and then I was told that it worked okay. However, i wrote a unit test for my new code that showed they were being encoded But I think it may have been because the unit test don't operate through a regular HTTP layer? This test (similar to what is in the CSVLoader test!) public void testPDFLoadWithExtraFieldsThatAreURLEncoded() throws Exception { makeFile(I love PDF documents.); loadLocal (stream.type,pdf,stream.file,filename,stream.fieldname,text,i d,100,fieldnames,name,subject,name,My%20Name%20is% 20Johnny, subject,A%20test%20document); assertU(commit()); assertQ(req(text:Love),//[EMAIL PROTECTED]'1']); assertQ(req(text:Hate),//[EMAIL PROTECTED]'0']); assertQ(req(name:My%20Name%20is%20Johnny),//[EMAIL PROTECTED]'0']); assertQ(req(subject:A%20test%20document),//[EMAIL PROTECTED]'0']); assertQ(req(name:My Name is Johnny),//[EMAIL PROTECTED]'1']); assertQ(req(subject:A test document),//[EMAIL PROTECTED]'1']); } was failing into I added an explicit decode I think I retract my initial email!! Eric On Jul 9, 2007, at 5:24 PM, Yonik Seeley wrote: On 7/9/07, Eric Pugh [EMAIL PROTECTED] wrote: My patch for adding rich unstructured content (https:// issues.apache.org/jira/browse/SOLR-284) has a problem when some of the extra field data passed in via the get request have spaces etc.. The content comes through URL encoded. Should the SolrParams object handle decoding of parameters, or should that be the domain of my RichDocumentRequestHandler since only some parameters will have URL encoding. Anhy URL encoding should already be automatically decoded by the time the handler gets any data via SolrParams. Or was it double-encoded perhaps? -Yonik --- Principal OpenSource Connections Site: http://www.opensourceconnections.com Blog: http://blog.opensourceconnections.com Cell: 1-434-466-1467
[jira] Updated: (SOLR-284) Parsing Rich Document Types
[ https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh updated SOLR-284: --- Attachment: rich.patch Updated patch file, properly handling missing stream.types, and cleaning up error messages a bit. Parsing Rich Document Types --- Key: SOLR-284 URL: https://issues.apache.org/jira/browse/SOLR-284 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.3 Reporter: Eric Pugh Fix For: 1.3 Attachments: libs.zip, rich.patch, rich.patch, test-files.zip I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr. I am attaching a patch file with the code changes, and if this looks good, will add a page similar to http://wiki.apache.org/solr/UpdateCSV. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-284) Parsing Rich Document Types
[ https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh updated SOLR-284: --- Attachment: (was: rich.patch) Parsing Rich Document Types --- Key: SOLR-284 URL: https://issues.apache.org/jira/browse/SOLR-284 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.3 Reporter: Eric Pugh Fix For: 1.3 Attachments: libs.zip, rich.patch, test-files.zip I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr. I am attaching a patch file with the code changes, and if this looks good, will add a page similar to http://wiki.apache.org/solr/UpdateCSV. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-284) Parsing Rich Document Types
Parsing Rich Document Types --- Key: SOLR-284 URL: https://issues.apache.org/jira/browse/SOLR-284 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.3 Reporter: Eric Pugh Fix For: 1.3 I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr. I am attaching a patch file with the code changes, and if this looks good, will add a page similar to http://wiki.apache.org/solr/UpdateCSV. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-284) Parsing Rich Document Types
[ https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh updated SOLR-284: --- Attachment: test-files.zip test files to go in test/test-files for unit testing. Parsing Rich Document Types --- Key: SOLR-284 URL: https://issues.apache.org/jira/browse/SOLR-284 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.3 Reporter: Eric Pugh Fix For: 1.3 Attachments: rich.patch, test-files.zip I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr. I am attaching a patch file with the code changes, and if this looks good, will add a page similar to http://wiki.apache.org/solr/UpdateCSV. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-284) Parsing Rich Document Types
[ https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh updated SOLR-284: --- Attachment: libs.zip new jars to go in trunk/lib for pdf and office parsing... Parsing Rich Document Types --- Key: SOLR-284 URL: https://issues.apache.org/jira/browse/SOLR-284 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.3 Reporter: Eric Pugh Fix For: 1.3 Attachments: libs.zip, rich.patch, test-files.zip I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr. I am attaching a patch file with the code changes, and if this looks good, will add a page similar to http://wiki.apache.org/solr/UpdateCSV. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-279) System Properties for Testing are now in Java code AND Ant build.xml
System Properties for Testing are now in Java code AND Ant build.xml Key: SOLR-279 URL: https://issues.apache.org/jira/browse/SOLR-279 Project: Solr Issue Type: Bug Affects Versions: 1.3 Reporter: Eric Pugh Priority: Minor Fix For: 1.3 The system properties can now be pulled out of build.xml due to commit revision 551701 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-279) System Properties for Testing are now in Java code AND Ant build.xml
[ https://issues.apache.org/jira/browse/SOLR-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh updated SOLR-279: --- Attachment: syspropties.patch Patch file for build.xml for removing system properties System Properties for Testing are now in Java code AND Ant build.xml Key: SOLR-279 URL: https://issues.apache.org/jira/browse/SOLR-279 Project: Solr Issue Type: Bug Affects Versions: 1.3 Reporter: Eric Pugh Priority: Minor Fix For: 1.3 Attachments: syspropties.patch The system properties can now be pulled out of build.xml due to commit revision 551701 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Thanks for commit 551701!
Yonik, Thanks for commit 551701, I have created bug https:// issues.apache.org/jira/browse/SOLR-279 for removing the properties from build.xml as well. Cheers, Eric --- Principal OpenSource Connections Site: http://www.opensourceconnections.com Blog: http://blog.opensourceconnections.com Cell: 1-434-466-1467
Re: Running Unit Tests from inside Eclipse
I agree with the thought about bending your code to fit your IDE. In the case of the unit tests and though, it seems like a simplification of the tests to make them not dependent on external configuration that is provided via Ant or any other tool Coming from the new to Solr and don't know the ins and outs end of things! Hence why I like defining the System properties inside the Java test code. Eric Pugh On Jun 27, 2007, at 4:11 PM, Chris Hostetter wrote: : the path in Config.java. Attached is a patch file for these two : changes. FYI; apache mailing lists strip most attachments ... i think it works if hte mime-type is text/plain, but the simplest thing to do is just include it inline in your message. (as a general philosophy, i'm opposed to code changes solely for the purpose of making IDEs happy ... IDEs should make developing code easier, not hte other way arround) -Hoss --- Principal OpenSource Connections Site: http://www.opensourceconnections.com Blog: http://blog.opensourceconnections.com Cell: 1-434-466-1467
Re: Running Unit Tests from inside Eclipse
On 6/28/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 6/28/07, Eric Pugh [EMAIL PROTECTED] wrote: In the case of the unit tests and though, it seems like a simplification of the tests to make them not dependent on external configuration that is provided via Ant or any other tool Yes, I agree. Any objections to committing the setProperty part? -Yonik Sounds great to me! In the future, should I be communicating via JIRA issues? I have a PDF handler modeled on the CSVHandler that allows you to stream a PDF document to Solr and extract the text and store it.
Re: Running Unit Tests from inside Eclipse
I have a PDF handler modeled on the CSVHandler that allows you to stream a PDF document to Solr and extract the text and store it. Cool! Any thoughts of a general framework for going from unstructured document - lucene document with fields? It feels like utilizing Apache Tika here would be the way to go (although it's in the really early stages). -Yonik Humm... So I have a PDF, Word, Excel, and Powerpoint, all as seperate handlers. And there is a lot of duplication between them... I may try and pull out the common stuff into some sort of AbstractRichDocumentHandler, and then just add the special sauce for each one. I am close to having the basic unit tests, modeled on CSVHandler, and will post a JIRA issue with it. I looked for Tika, but didn't see it, what is the URL?
Running Unit Tests from inside Eclipse
Hi all, I always run into path issues when running the unit tests from inside of Eclipse... For example, when I run TestCSVLoader.java I get java.lang.ExceptionInInitializerError at org.apache.solr.util.TestHarness.init(TestHarness.java:101) at org.apache.solr.util.AbstractSolrTestCase.setUp (AbstractSolrTestCase.java:102) at org.apache.solr.handler.TestCSVLoader.setUp(TestCSVLoader.java:43) at junit.framework.TestCase.runBare(TestCase.java:125) at junit.framework.TestResult$1.protect(TestResult.java:106) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.framework.TestResult.run(TestResult.java:109) at junit.framework.TestCase.run(TestCase.java:118) at junit.framework.TestSuite.runTest(TestSuite.java:208) at junit.framework.TestSuite.run(TestSuite.java:203) at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run (JUnit3TestReference.java:128) at org.eclipse.jdt.internal.junit.runner.TestExecution.run (TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests (RemoteTestRunner.java:460) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests (RemoteTestRunner.java:673) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run (RemoteTestRunner.java:386) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main (RemoteTestRunner.java:196) Caused by: java.lang.RuntimeException: Error in solrconfig.xml at org.apache.solr.core.SolrConfig.clinit(SolrConfig.java:90) ... 16 more Caused by: java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in classpath or 'solr/conf/', cwd=/Users/eric/ Documents/code/oss/apache/solr/trunk at org.apache.solr.core.Config.openResource(Config.java:357) at org.apache.solr.core.SolrConfig.initConfig(SolrConfig.java:79) at org.apache.solr.core.SolrConfig.clinit(SolrConfig.java:87) ... 16 more I have a tweak that gives it an alternate path to look up the file... Would that be of interest to anyone as a patch file? Is there a specific way to config my Eclipse project to find the path? Eric Pugh --- Principal OpenSource Connections Site: http://www.opensourceconnections.com Blog: http://blog.opensourceconnections.com Cell: 1-434-466-1467
Re: Running Unit Tests from inside Eclipse
I was wondering about that, but also, how do you handle the two system properties? I added to public TestHarness(String dataDirectory, String confFile, String schemaFile) { these lines: System.setProperty(solr.test.sys.prop1, propone); System.setProperty(solr.test.sys.prop2, proptwo); Otherwise I always got an error about a missing system property. Normally, via Ant, these are passed in via the JUnit test definition. Would this change be worth an issue in Jira and a patch file? Also, mucking around with working-directories didn't work, so I added the path in Config.java. Attached is a patch file for these two changes. Eric On Jun 27, 2007, at 2:29 PM, Yonik Seeley wrote: I tell IntelliJ IDEA to set the working directory for all the tests to F:\code\solr\src\test\test-files -Yonik On 6/27/07, Eric Pugh [EMAIL PROTECTED] wrote: Hi all, I always run into path issues when running the unit tests from inside of Eclipse... For example, when I run TestCSVLoader.java I get java.lang.ExceptionInInitializerError at org.apache.solr.util.TestHarness.init (TestHarness.java:101) at org.apache.solr.util.AbstractSolrTestCase.setUp (AbstractSolrTestCase.java:102) at org.apache.solr.handler.TestCSVLoader.setUp (TestCSVLoader.java:43) at junit.framework.TestCase.runBare(TestCase.java:125) at junit.framework.TestResult$1.protect(TestResult.java:106) at junit.framework.TestResult.runProtected(TestResult.java: 124) at junit.framework.TestResult.run(TestResult.java:109) at junit.framework.TestCase.run(TestCase.java:118) at junit.framework.TestSuite.runTest(TestSuite.java:208) at junit.framework.TestSuite.run(TestSuite.java:203) at org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run (JUnit3TestReference.java:128) at org.eclipse.jdt.internal.junit.runner.TestExecution.run (TestExecution.java:38) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests (RemoteTestRunner.java:460) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests (RemoteTestRunner.java:673) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run (RemoteTestRunner.java:386) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main (RemoteTestRunner.java:196) Caused by: java.lang.RuntimeException: Error in solrconfig.xml at org.apache.solr.core.SolrConfig.clinit (SolrConfig.java:90) ... 16 more Caused by: java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in classpath or 'solr/conf/', cwd=/Users/eric/ Documents/code/oss/apache/solr/trunk at org.apache.solr.core.Config.openResource(Config.java:357) at org.apache.solr.core.SolrConfig.initConfig (SolrConfig.java:79) at org.apache.solr.core.SolrConfig.clinit (SolrConfig.java:87) ... 16 more I have a tweak that gives it an alternate path to look up the file... Would that be of interest to anyone as a patch file? Is there a specific way to config my Eclipse project to find the path? Eric Pugh --- Principal OpenSource Connections Site: http://www.opensourceconnections.com Blog: http://blog.opensourceconnections.com Cell: 1-434-466-1467 --- Principal OpenSource Connections Site: http://www.opensourceconnections.com Blog: http://blog.opensourceconnections.com Cell: 1-434-466-1467
Re: Call for Papers Opens for ApacheCon US 2007
Hi all, Erik Hatcher has shown me some of the abilities of Flare, I've been digging into it for a jobby job project, and I've done my first small Solr project which was adding PDF, Word, Excel, and Powerpoint parsing in the vein of the CSVRequestHandler code. (Patches to be forthcoming!) I was thinking about doing something on this as well. Is there enough room for multiple presentations? Can two people do a presentation? Chris, would you be interested in co-presenting? I've mostly been on the outside of the Lucene community, be much more active in some of the Jakarta projects, and then seduced away by Ruby for the past 18 months, but the possibilities of Solr and Flare have had me interested in getting involved in Apache again. Eric Pugh On Apr 23, 2007, at 1:21 PM, Chris Hostetter wrote: : Is anyone willing to submit an introductory talk on Solr? I was thinking about submitting two talks... Novice: Solr Out of the Box Advanced: Solr Beyond the Box The first being an attempt at showcasing all of the features of Solr available without writing any code (just configuration and maybe some XSLT) ... loading data from CSV, dismax query parsing, facets, highlighting, date math, json output, etc., and any other cool features that get committed between now and then. I'll roabbly also talk about Flare (but that would mean needing to learn about Flare before November) The second would look at examples of how Solr can be customized without building the whole thing from scratch ... writing custom plugins, and embedding Solr in other applications. (the custom plugins part i think i can cover pretty well, but i'll need to pick the brains of people *doing* Solr embedding for the second half if the proposal is accepted) What do you guys think? -Hoss --- Principal OpenSource Connections Site: http://www.opensourceconnections.com Blog: http://blog.opensourceconnections.com Cell: 1-434-466-1467