stopwords file configuration
I'm using Lucid Imagination installation kit for SOLR (the last one with SOLR 1.4). I would like to use stopwords, and I installed in LucidWorks/lucidworks/solr/conf/stopwords.txt the italian version of the file. Moreover the field where I want to clean stopwords is declared in schema.xml as where textgen is this But if I index a document with 'stopworda' and 'stopwordb' that are the test stopword to verify that it works it doesn't work because I find these words inside the content_title field. Do I need to declare elsewhere that I'm using stopwords.txt file? Do you have any suggestion? thanks Ale -- View this message in context: http://lucene.472066.n3.nabble.com/stopwords-file-configuration-tp1910032p1910032.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: stopwords file configuration
I reply to myself because I founded the mistake. The italian stopwords file that I founded on apache site contains on the same line of each stopword a comment shell style, the stopwords tokenizer probably is basical and doesn't accept comments on the same line of stopwords. I dropped them and now it works. Anyway the stopwords are stored but not founded. -- View this message in context: http://lucene.472066.n3.nabble.com/stopwords-file-configuration-tp1910032p1910309.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Posting pdf file and posting from remote
Thanks a lot: this tip was very important for me. I tried with php curl with the purpose to send from Windows to MAC OS, after one day I discovered that the @filename doesn't work on Windows, the error was "26 failed creating formpost data" and the reason is that Windows php curl (I don't know where is the bug) is not able to open the file passing @filename. PHP Version 5.2.4. I tried: http://localhost:8010/solr/update/extract?literal.id=doc2&commit=true'); curl_setopt ($ch, CURLOPT_POST, 1); curl_setopt ($ch, CURLOPT_POSTFIELDS, array('myfile'=>"@paper.pdf")); $result= curl_exec ($ch); ?> and it works fine: I hope it'll work also from a remote Linux server. Lance Norskog-2 wrote: > > stream.file= means read a local file from the server that solr runs > on. It has to be a complete path that works from that server. To load > the file over HTTP you have to use @filename to have curl open it. > This path has to work from the program you run curl on, and relative > paths work. > > Also, tika does not save the PDF binary, it only pulls words out of > the PDF and stores those. > > There's a tika example in solr/trunk/example/exampleDIH in the current > solr trunk. (I don't remember if it's in the solr 1.4 release.) With > this you can save the pdf binary in one field and save the extracted > text in another field. I'm doing this now with html. > > On Tue, Feb 9, 2010 at 2:08 AM, alendo > wrote: >> >> Ok I'm going ahead (may be:). >> I tried another curl command to send the file from remote: >> >> http://mysolr:/solr/update/extract?literal.id=8514&stream.file=files/attach-8514.pdf&stream.contentType=application/pdf >> >> and the behaviour has been changed: now I get an error in solr log file: >> >> HTTP Status 500 - files/attach-8514.pdf (No such file or directory) >> java.io.FileNotFoundException: files/attach-8514.pdf (No such file or >> directory) at java.io.FileInputStream.open(Native Method) at >> java.io.FileInputStream.(FileInputStream.java:106) at >> org.apache.solr.common.util.ContentStreamBase$FileStream.getStream(ContentStreamBase.java:108) >> at >> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:158) >> at >> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) >> at >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) >> at >> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233) >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at >> >> etc etc... >> >> -- >> View this message in context: >> http://old.nabble.com/Posting-pdf-file-and-posting-from-remote-tp27512455p27512952.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> > > > > -- > Lance Norskog > goks...@gmail.com > > -- View this message in context: http://old.nabble.com/Posting-pdf-file-and-posting-from-remote-tp27512455p27543540.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Posting pdf file and posting from remote
Ok I'm going ahead (may be:). I tried another curl command to send the file from remote: http://mysolr:/solr/update/extract?literal.id=8514&stream.file=files/attach-8514.pdf&stream.contentType=application/pdf and the behaviour has been changed: now I get an error in solr log file: HTTP Status 500 - files/attach-8514.pdf (No such file or directory) java.io.FileNotFoundException: files/attach-8514.pdf (No such file or directory) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.(FileInputStream.java:106) at org.apache.solr.common.util.ContentStreamBase$FileStream.getStream(ContentStreamBase.java:108) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:158) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at etc etc... -- View this message in context: http://old.nabble.com/Posting-pdf-file-and-posting-from-remote-tp27512455p27512952.html Sent from the Solr - User mailing list archive at Nabble.com.
Posting pdf file and posting from remote
I understand that tika is able to index pdf content: its true? I tried to post a pdf from local and I've seen in the solr/admin schema browser another document, but when I search only the document id is available, the documents doesn't seem indexed. Do I need other products to index pdf content? Moreover I want to send a file from remote: it seems I must configure tika with a tika-config.xml file, enabling remote streaming as in the following: but I'm not able to find a tika-config.xml example... thanks a lot Alessandra -- View this message in context: http://old.nabble.com/Posting-pdf-file-and-posting-from-remote-tp27512455p27512455.html Sent from the Solr - User mailing list archive at Nabble.com.