Re: Indexing pdf files - question.

2013-09-08 Thread Nutan Shinde
Error got resolved,solution was  must be within 
tag.


On Sun, Sep 8, 2013 at 3:31 AM, Furkan KAMACI wrote:

> Could you show us logs you get when you start your web container?
>
>
> 2013/9/4 Nutan Shinde 
>
> > My solrconfig.xml is:
> >
> >
> >
> >  > class="solr.extraction.ExtractingRequestHandler" >
> >
> > 
> >
> > desc> which
> > is defined as shown below in schem.xml-->
> >
> > true
> >
> > attr_
> >
> > true
> >
> > 
> >
> > 
> >
> > 
> >
> >
> >
> > Schema.xml:
> >
> > 
> >
> >  > multiValued="false"/>
> >
> >  > multiValued="false"/>
> >
> >  > multiValued="false"/>
> >
> >  > multiValued="false"/>
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > 
> >
> > doc_id
> >
> >
> >
> > I have created extract directory and copied all required .jar and
> solr-cell
> > jar files into this extract directory and given its path in lib tag in
> > solrconfig.xml
> >
> >
> >
> > When I try out this:
> >
> >
> >
> > curl
> > "http://localhost:8080/solr/update/extract?literal.doc_id=1&commit=true";
> >
> > -F myfile=@solr-word.pdf    in Windows 7.
> >
> >
> >
> > I get /solr/update/extract is not available and sometimes I get access
> > denied error.
> >
> > I tried resolving through net,but in vain.as all the solutions are
> related
> > to linux os,im working on Windows.
> >
> > Please help me and provide solutions related o Windows os.
> >
> > I referred Apache_solr_4_Cookbook.
> >
> > Thanks a lot.
> >
> >
>


Re: Indexing pdf files - question.

2013-09-07 Thread Furkan KAMACI
Could you show us logs you get when you start your web container?


2013/9/4 Nutan Shinde 

> My solrconfig.xml is:
>
>
>
>  class="solr.extraction.ExtractingRequestHandler" >
>
> 
>
> descwhich
> is defined as shown below in schem.xml-->
>
> true
>
> attr_
>
> true
>
> 
>
> 
>
> 
>
>
>
> Schema.xml:
>
> 
>
>  multiValued="false"/>
>
>  multiValued="false"/>
>
>  multiValued="false"/>
>
>  multiValued="false"/>
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> 
>
> doc_id
>
>
>
> I have created extract directory and copied all required .jar and solr-cell
> jar files into this extract directory and given its path in lib tag in
> solrconfig.xml
>
>
>
> When I try out this:
>
>
>
> curl
> "http://localhost:8080/solr/update/extract?literal.doc_id=1&commit=true";
>
> -F myfile=@solr-word.pdf    in Windows 7.
>
>
>
> I get /solr/update/extract is not available and sometimes I get access
> denied error.
>
> I tried resolving through net,but in vain.as all the solutions are related
> to linux os,im working on Windows.
>
> Please help me and provide solutions related o Windows os.
>
> I referred Apache_solr_4_Cookbook.
>
> Thanks a lot.
>
>


Re: Indexing pdf files - question.

2013-09-04 Thread Nutan Shinde
My solrconfig.xml is:

 





desc   

true

attr_

true







 

Schema.xml:

 

  

  





 















doc_id

 

I have created extract directory and copied all required .jar and solr-cell
jar files into this extract directory and given its path in lib tag in
solrconfig.xml

 

When I try out this:

 

curl
"http://localhost:8080/solr/update/extract?literal.doc_id=1&commit=true";

-F myfile=@solr-word.pdf    in Windows 7.

 

I get /solr/update/extract is not available and sometimes I get access
denied error.

I tried resolving through net,but in vain.as all the solutions are related
to linux os,im working on Windows.

Please help me and provide solutions related o Windows os.

I referred Apache_solr_4_Cookbook.

Thanks a lot.



Re: Indexing pdf files - question.

2011-04-08 Thread Mike
Hi Erick,

Thank you for the Reply.

Now I am able to index the PDF files and search. 
I am left with couple of questions:
1. Can I add custom field to Search Response XML (Ex: Need to as description
which gives brief description about the PDF file). 
2. Currently Solr runs as a separate application. Can I run solr in the same
application server(Tomcat6) in which my actual application resides. 

Thanks,
Mike

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-pdf-files-question-tp2079505p2794841.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing pdf files - question.

2011-04-07 Thread Erick Erickson
Did you try the curl commands that Adam suggested as part of this e-mail
thread?
If so, what happened?

Best
Erick

On Wed, Apr 6, 2011 at 7:50 AM, Mike  wrote:

> Hi All,
>
> I am new to solr. I have gone through solr documents to index pdf files,
> But
> it was hard to find the exact procedure to get started.
> I need step by step procedure to do this. Could you please let me know the
> steps to index pdf files.
>
> Thanks,
> Mike
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Indexing-pdf-files-question-tp2079505p2784645.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Indexing pdf files - question.

2011-04-07 Thread Mike
Hi All,

I am new to solr. I have gone through solr documents to index pdf files, But
it was hard to find the exact procedure to get started.
I need step by step procedure to do this. Could you please let me know the
steps to index pdf files.

Thanks,
Mike

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-pdf-files-question-tp2079505p2784645.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing pdf files - question.

2010-12-13 Thread Wodek Siebor

The sample /docs/tutorial.pdf does not require OCR.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-pdf-files-question-tp2079505p2080307.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Indexing pdf files - question.

2010-12-13 Thread Adam Estrada
Hi,

I use the following command to post PDF files.

$ curl "http://localhost:8983/solr/update/extract?stream.file=C
:\temp\document.docx&stream.contentType=application/msword&literal.id
=esc.doc&commit=true"
$ curl "http://localhost:8983/solr/update/extract?stream.file=C
:\temp\features.pdf&stream.contentType=application/pdf&literal.id
=esc2.doc&commit=true"
$ curl "http://localhost:8983/solr/update/extract?stream.file=C
:\temp\Memo_ocrd.pdf&stream.contentType=application/pdf&literal.id
=Memo_ocrd.pdf&defaultField=text&commit=true"

The PDF's have to be OCR'd.

Adam

On Mon, Dec 13, 2010 at 11:01 AM, Siebor, Wlodek [USA] <
siebor_wlo...@bah.com> wrote:

> HI,
> Can sombody, please, send me a command for indexing a sample pdf with
> ExtractngRequestHandler file available in the /docs directory. I have
> lucidworks solr installed on linux, with standard schema.xml and
> solrconfig.xml files (unchanged). I want to pass as the unique id the name
> of the file.
> I’m trying various curl commands and so far I have either  “… missing
> required field: id” or “.. missing content stream” errors.
> Thanks for your help,
> Wlodek
>


Indexing pdf files - question.

2010-12-13 Thread Siebor, Wlodek [USA]
HI,
Can sombody, please, send me a command for indexing a sample pdf with 
ExtractngRequestHandler file available in the /docs directory. I have 
lucidworks solr installed on linux, with standard schema.xml and solrconfig.xml 
files (unchanged). I want to pass as the unique id the name of the file.
I’m trying various curl commands and so far I have either  “… missing required 
field: id” or “.. missing content stream” errors.
Thanks for your help,
Wlodek