stopwords file configuration

2010-11-16 Thread alendo

I'm using Lucid Imagination installation kit for SOLR (the last one with SOLR
1.4).
I would like to use stopwords, and I installed in
LucidWorks/lucidworks/solr/conf/stopwords.txt the italian version of the
file.
Moreover the field where I want to clean stopwords is declared in schema.xml
as 



where textgen is this
















But if I index a document with 'stopworda' and 'stopwordb' that are the test
stopword to verify that it works it doesn't work because I find these words
inside the content_title field. Do I need to declare elsewhere that I'm
using stopwords.txt file? Do you have any suggestion?
thanks
Ale
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/stopwords-file-configuration-tp1910032p1910032.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: stopwords file configuration

2010-11-16 Thread alendo

I reply to myself because I founded the mistake. The italian stopwords file
that I founded on apache site contains  on the same line of each stopword a
comment shell style, the stopwords tokenizer probably is basical and doesn't
accept comments on the same line of stopwords. I dropped them and now it
works. Anyway the stopwords are stored but not founded.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/stopwords-file-configuration-tp1910032p1910309.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Posting pdf file and posting from remote

2010-02-11 Thread alendo

Thanks a lot: this tip was very important for me.
I tried with php curl with the purpose to send from Windows to MAC OS, after
one day I discovered that the @filename doesn't work on Windows, the error
was "26 failed creating formpost data" and the reason is that Windows php
curl (I don't know where is the bug) is not able to open the file passing
@filename. PHP Version 5.2.4.
I tried:
http://localhost:8010/solr/update/extract?literal.id=doc2&commit=true');
 curl_setopt ($ch, CURLOPT_POST, 1);
 curl_setopt ($ch, CURLOPT_POSTFIELDS, array('myfile'=>"@paper.pdf"));
 $result= curl_exec ($ch);
?>
and it works fine: I hope it'll work also from a remote Linux server.


Lance Norskog-2 wrote:
> 
> stream.file= means read a local file from the server that solr runs
> on. It has to be a complete path that works from that server. To load
> the file over HTTP you have to use @filename to have curl open it.
> This path has to work from the program you run curl on, and relative
> paths work.
> 
> Also, tika does not save the PDF binary, it only pulls words out of
> the PDF and stores those.
> 
> There's a tika example in solr/trunk/example/exampleDIH in the current
> solr trunk. (I don't remember if it's in the solr 1.4 release.) With
> this you can save the pdf binary in one field and save the extracted
> text in another field. I'm doing this now with html.
> 
> On Tue, Feb 9, 2010 at 2:08 AM, alendo 
> wrote:
>>
>> Ok I'm going ahead (may be:).
>> I tried another curl command to send the file from remote:
>>
>> http://mysolr:/solr/update/extract?literal.id=8514&stream.file=files/attach-8514.pdf&stream.contentType=application/pdf
>>
>> and the behaviour has been changed: now I get an error in solr log file:
>>
>> HTTP Status 500 - files/attach-8514.pdf (No such file or directory)
>> java.io.FileNotFoundException: files/attach-8514.pdf (No such file or
>> directory) at java.io.FileInputStream.open(Native Method) at
>> java.io.FileInputStream.(FileInputStream.java:106) at
>> org.apache.solr.common.util.ContentStreamBase$FileStream.getStream(ContentStreamBase.java:108)
>> at
>> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:158)
>> at
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
>> at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>> at
>> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at
>>
>> etc etc...
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Posting-pdf-file-and-posting-from-remote-tp27512455p27512952.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> Lance Norskog
> goks...@gmail.com
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Posting-pdf-file-and-posting-from-remote-tp27512455p27543540.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Posting pdf file and posting from remote

2010-02-09 Thread alendo

Ok I'm going ahead (may be:).
I tried another curl command to send the file from remote:

http://mysolr:/solr/update/extract?literal.id=8514&stream.file=files/attach-8514.pdf&stream.contentType=application/pdf
 

and the behaviour has been changed: now I get an error in solr log file:

HTTP Status 500 - files/attach-8514.pdf (No such file or directory)
java.io.FileNotFoundException: files/attach-8514.pdf (No such file or
directory) at java.io.FileInputStream.open(Native Method) at
java.io.FileInputStream.(FileInputStream.java:106) at
org.apache.solr.common.util.ContentStreamBase$FileStream.getStream(ContentStreamBase.java:108)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:158)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at 

etc etc...

-- 
View this message in context: 
http://old.nabble.com/Posting-pdf-file-and-posting-from-remote-tp27512455p27512952.html
Sent from the Solr - User mailing list archive at Nabble.com.



Posting pdf file and posting from remote

2010-02-09 Thread alendo

I understand that tika is able to index pdf content: its true? I tried to
post a pdf from local and I've seen in the solr/admin schema browser another
document, but when I search only the document id is available, the documents
doesn't seem indexed. Do I need other products to index pdf content?

Moreover I want to send a file from remote: it seems I must configure tika
with a tika-config.xml file, enabling remote streaming as in the following:



but I'm not able to find a tika-config.xml example... 
thanks a lot
Alessandra
-- 
View this message in context: 
http://old.nabble.com/Posting-pdf-file-and-posting-from-remote-tp27512455p27512455.html
Sent from the Solr - User mailing list archive at Nabble.com.